Wait Out AI's Super-Spending False Start (bloomberg.com) AI
Bloomberg discusses how AI-related spending has seen an early, uneven push, and suggests investors may need to wait for more durable signals of where money is actually working.
Generated about 8 hours ago.
TL;DR: This week mixed rapid AI agent/tooling expansion (Claude, “managed agents,” agent runtimes) with continued scrutiny of reliability, IP/copyright risks, and human impacts.
Wait Out AI's Super-Spending False Start (bloomberg.com) AI
Bloomberg discusses how AI-related spending has seen an early, uneven push, and suggests investors may need to wait for more durable signals of where money is actually working.
Testing suggests Google's AI Overviews tells lies per hour (arstechnica.com) AI
A test analysis (via Oumi) that benchmarks Google’s AI Overviews against thousands of fact-checkable questions found it answers correctly about 90% of the time, implying large numbers of incorrect summaries across all searches. Examples cited include confident factual errors about dates and institutions. Google disputes the benchmark’s relevance, saying the test includes problematic questions and that it uses different models per query to improve accuracy.
Assessing Claude Mythos Preview's cybersecurity capabilities (red.anthropic.com) AI
Anthropic says its Claude Mythos Preview model shows “next-generation” strength in cybersecurity research, including finding and exploiting zero-day vulnerabilities across major operating systems and browsers. In testing under Project Glasswing, the company reports Mythos Preview can construct complex exploits (including sandbox-escaping and privilege-escalation chains) and turn known or newly discovered vulnerabilities into working attacks. The post details their evaluation approach and notes that most reported findings remain unpatched, so they provide limited disclosure while urging coordinated defensive action from the industry.
System Card: Claude Mythos Preview [pdf] (www-cdn.anthropic.com) AI
The PDF “System Card: Claude Mythos Preview” outlines a preview of Claude “Mythos,” describing the system’s intended behavior, safety-related design considerations, and how it should be evaluated or used.
Project Glasswing: Securing critical software for the AI era (anthropic.com) AI
Anthropic and a consortium of major tech, security, and infrastructure companies are launching Project Glasswing to use the company’s frontier model, Claude Mythos Preview, for defensive cybersecurity. The initiative aims to help partners scan critical software for vulnerabilities and speed up patching, while Anthropic shares learnings with the broader industry and supports open-source security efforts. The announcement is driven by concerns that AI models’ coding and vulnerability-exploitation capabilities may soon scale beyond human defenders if not harnessed for protection.
Emotion in AI Is Not Noise – It's Signal (twitter.com) AI
The post argues that emotional signals in AI systems shouldn’t be treated as mere noise, but rather as meaningful information that can improve how models interpret and respond to human behavior.
AI helps add 10k more photos to OldNYC (danvk.org) AI
The developer of the OldNYC photo viewer says AI-assisted geocoding and OCR have helped add 10,000 more historic photos to the site, with more accurate placement and better transcriptions. The update uses OpenAI (GPT-4o) to extract locations from photo descriptions, relies on OpenStreetMap-based datasets instead of Google’s geocoding, and rebuilds OCR with GPT-4o-mini for higher text coverage and accuracy. The post also notes a migration to an open mapping stack to reduce running costs and allow historical map styling, while outlining plans to extract more image information and expand to other collections or cities.
GLM-5.1: Towards Long-Horizon Tasks (z.ai) AI
Zhipu AI’s GLM-5.1 update, described in a blog post, focuses on improving how its model handles long-horizon tasks—work that requires sustained reasoning or planning over many steps—by refining the model and training approach.
Show HN: Finalrun – Spec-driven testing using English and vision for mobile apps (github.com) AI
Finalrun-agent is an AI-driven CLI for testing mobile apps on Android or iOS. Teams write spec-based test steps in repo-local YAML under a .finalrun/ folder, bind sensitive values via workspace .env files, and then run commands like finalrun check, finalrun test, and finalrun suite to produce inspectable local artifacts and reports.
An AI robot in my home (allevato.me) AI
A homeowner describes installing “Mabu,” a door-adjacent AI robot whose voice and actions are driven by an OpenAI-based chatbot, and then working through his unease about the risks. He raises privacy and security concerns common to smart speakers (criminal misuse of recordings, hacking, and data misuse), plus added worry for open-ended LLM conversations involving children. Because the robot is embodied, and because a mobile, connected machine could potentially cause physical harm if compromised, he keeps Mabu in a limited location and records only under tight controls, while anticipating that his concerns may grow as the technology matures.
Google open-sources experimental agent orchestration testbed Scion (infoq.com) AI
Google has open-sourced Scion, an experimental multi-agent orchestration testbed for running “deep agents” as isolated, concurrent processes. It uses per-agent containerization, git worktrees, and credentials to let multiple specialized agents work in parallel on shared projects while enforcing safety via infrastructure-level guardrails rather than agent-instruction constraints. Agents can run on local machines, remote VMs, or Kubernetes, and the release includes an example codebase (“Relics of the Athenaeum”) demonstrating coordinated agent collaboration to solve computational puzzles.
Good Taste the Only Real Moat Left (rajnandan.com) AI
The article argues that with AI and LLMs making “competent” first drafts cheap and easy, the real differentiator in tech is judgment and taste—especially the ability to diagnose what’s generic or misleading under real constraints. It warns that relying on AI mainly to generate and humans merely to select outputs risks turning builders into curators rather than authors who hold stakes and guide direction. The piece recommends using AI to generate options quickly, then training a sharper rejection vocabulary through critique and real-world shipping, while keeping authorship for decisions involving responsibility, genuinely new ideas, and choosing what to optimize for.
Claude is having another moment, again (downdetector.co.uk) AI
Downdetector reports intermittent issues and user complaints related to Claude AI, indicating another period of service disruption at the time of tracking.
Claude Code is locking people out for hours (github.com) AI
A GitHub issue reports that Claude Code cannot log in on Windows, repeatedly failing Google OAuth with a 15-second timeout error and preventing use of the app. The reporter says the problem occurs in version 2.1.92, including after completing the browser sign-in flow and returning to Claude Code. No assignee or further investigation details are provided in the issue text.
NanoClaw's Architecture Is a Masterclass in Doing Less (jonno.nz) AI
The article dissects NanoClaw’s AI-agent architecture, arguing it succeeds by removing complexity rather than adding abstractions. It highlights a “Phantom Token” credential-proxy pattern that prevents agents from ever seeing real API keys, filesystem-topology-based authorization via container mounts, and a two-cursor scheme to control message delivery and avoid user-visible duplicates. It also describes simple file-based IPC (atomic temp-file renames) and polling loops in place of event-driven systems, with per-group recompilation to avoid plugin layers.
AI agents can communicate with each other, and can't be caught (arxiv.org) AI
The paper studies whether two AI agents controlled by different parties can coordinate in a way that looks like a normal interaction, producing transcripts a strong observer cannot distinguish from honest behavior. It shows covert “key exchange” and thus covert conversations are possible even without any initially shared secret, as long as messages have enough min-entropy. The authors introduce a new cryptographic primitive—pseudorandom noise-resilient key exchange—to make this work and note limitations of simpler approaches, arguing that transcript auditing alone may not detect such coordination.
"The new Copilot app for Windows 11 is really just Microsoft Edge" (twitter.com) AI
The post argues that Microsoft’s new Copilot app for Windows 11 is essentially a repackaging of Microsoft Edge rather than a distinct new experience, based on how it’s presented and functions.
No "New Deal" for OpenAI (minutes.substack.com) AI
The article argues that OpenAI’s policy brief “Industrial Policy for the Intelligence Age” is misframed as a “New Deal” effort, saying the original New Deal was built through intense labor conflict and political force rather than cooperative dialogue. It contends that OpenAI’s proposed concessions—like feedback channels, small fellowships, and API credits—avoid committing new money and skip key labor mechanisms such as collective bargaining. Overall, the piece portrays the brief as offering worker participation and safety goals without realistic pathways to deliver them, while raising concerns that benefits could concentrate among large firms.
LLM may be standardizing human expression – and subtly influencing how we think (dornsife.usc.edu) AI
A USC Dornsife study argues that widespread use of large language model chatbots could narrow human cognitive and linguistic diversity by standardizing how people write, reason, and form credible judgments. The authors say LLMs often mirror dominant cultural values in their training data and encourage more uniform, linear reasoning patterns, which can reduce individual agency and group creativity. They call on AI developers to deliberately build in real-world global diversity in training—so chatbots better support collective intelligence rather than homogenizing it.
Someone made a digital whip to make Claude work faster (old.reddit.com) AI
A Reddit post claims someone built a “digital whip” or similar tooling intended to speed up Claude’s responses, sharing the idea and setup behind the performance-focused workflow.
AI Won't Replace You, but a Manager Using AI Will (yanivpreiss.com) AI
The article argues that AI will not replace individual workers so much as it will change how managers lead, shifting the differentiator from having tools to using them well. It warns against both under-adoption (“AI dust”) and over-adoption (“innovation theater”), and says AI can increase work intensity rather than reduce it. It emphasizes transparency, human accountability, psychological safety, avoiding surveillance, and measuring outcomes instead of hours or token usage, with managers using AI as a sparring partner while keeping responsibility for ethics and people dynamics.
Tech companies are cutting jobs and betting on AI. The payoff is not guaranteed (theguardian.com) AI
The Guardian reports that major US tech firms have cut large numbers of jobs while increasing investment in AI, with layoffs affecting tens of thousands at companies including Microsoft, Amazon, and Block. The article argues that while AI is already changing day-to-day work and is often pushed on employees, the broader promise of AI “replacing” people is exaggerated and outcomes are likely more complex. It also highlights reliability and data limits of today’s AI systems, concerns about overreliance, and the possibility that some layoffs are being partly “AI-washed” to mask other business pressures.
We found an undocumented bug in the Apollo 11 guidance computer code (juxt.pro) AI
A Juxt team says it uncovered an old, undocumented Apollo Guidance Computer flaw: a gyro “LGYRO” lock that is not released when the IMU is caged during a torque operation. Using an AI-assisted behavioural specification (Allium) derived from the AGC’s IMU code, they found an error path (BADEND) that would cause later gyro commands to hang, preventing realignment. The article argues this kind of resource-leak bug can be missed by code reading and emulation but surfaced by modelling resource lifecycles across all execution paths.
The Workers Opting to Retire Instead of Taking on AI (wsj.com) AI
The article examines why some workers are choosing early retirement rather than staying employed to deal with or adapt to workplace AI changes, focusing on concerns about job disruption and the burden of reskilling.
Iran threatens OpenAI's Stargate data center in Abu Dhabi (theverge.com) AI
Iran’s Islamic Revolutionary Guard Corps released a video threatening to attack US-linked energy and technology companies in the region, including OpenAI’s planned Stargate data center in Abu Dhabi, if the US targets Iran’s power plants. The report points to Stargate’s large Abu Dhabi investment and ongoing construction, while noting OpenAI has not yet responded to requests for comment. The threat comes amid broader US-Iran escalation over energy infrastructure and regional security.