AI news

Browse stored weekly and monthly summaries for this subject.

Summary

Generated about 7 hours ago.

TL;DR: April’s AI news centered on open-weight agent performance, model reliability and citation integrity issues, privacy and regulation changes, and growing focus on defensive/security and responsible deployment.

Models & agents: open performance, but uneven reliability

  • LangChain reported early “Deep Agents” evals where open-weight models (e.g., GLM-5, MiniMax M2.7) can match closed frontier models on core tool-use/file-operation/instruction tasks.
  • Arena benchmarking echoed the cost-performance theme: GLM-5.1 reportedly matches Opus 4.6 agentic performance at ~1/3 cost.
  • Reliability concerns appeared repeatedly:
    • Claude Sonnet 4.6 status noted elevated error rates.
    • Google AI Overviews were benchmarked as wrong ~10% of the time (with caveats).
    • Research warned scaling/instruction tuning can reduce alignment reliability, producing confident plausible errors.

Policy, privacy, and “AI in the real world” risks

  • Japan relaxed elements of privacy rules (opt-in consent) for low-risk data used for statistics/research, aiming to accelerate AI—while adding conditions around sensitive categories like facial data.
  • Nature highlighted “hallucinated citations” polluting scientific papers, with invalid references found in suspicious publications.
  • Multiple pieces flagged misuse/scams and operational strain (e.g., LLM scraper bots overloading a site; a telehealth AI profile criticized for misleading framing).

Security & tooling: shifting toward defensible automation

  • Anthropic launched Project Glasswing to apply Claude Mythos Preview in defensive vulnerability scanning/patching, with a published system card.
  • WhatsApp’s “Private Inference” TEE audit emphasized that privacy depends on deployment details (input validation, attestations, negative testing).
  • Tooling discussions stressed evaluation and enterprise readiness for agents (security/observability/sandboxing), alongside open-sourced agent testbeds (Google’s Scion).

Stories

AI Cuts MRI Scan Time from 23 to 9 Minutes at Amsterdam Cancer Center (nltimes.nl) AI

Amsterdam’s Antoni van Leeuwenhoek Hospital has introduced AI software that reduces MRI scan times from 23 to 9 minutes. The tool speeds up converting scan data into images and helps limit motion blur from patients who struggle to remain still. The hospital says it is also increasing weekly capacity and shifting more scans into daytime hours after internal testing of the system.

Salarymen, Specialists, and Small Businesses (noahpinion.blog) AI

The article argues that, in the near term, AI is more likely to replace specific tasks than entire jobs, with employment so far largely holding up. It proposes a three-way shift in work: “specialists” whose roles remain because tasks are tightly bundled and stakes are high, “salarymen” generalists who supervise and patch AI outputs while adapting to changing AI strengths, and more “small business” owners enabled by AI leverage.

Gemma 4 on iPhone (apps.apple.com) AI

Google’s AI Edge Gallery iPhone app adds official support for the newly released Gemma 4 model family, touting fully offline, on-device generative AI. The update introduces features like “Thinking Mode” to show step-by-step reasoning (for supported models), “Agent Skills” for tool-augmented responses, plus multimodal image queries, audio transcription/translation, and prompt testing controls. The app also includes model download/management and benchmark testing, with performance dependent on the device’s hardware.

Running Google Gemma 4 Locally with LM Studio's New Headless CLI and Claude Code (ai.georgeliu.com) AI

The article explains how to run Google’s Gemma 4 26B (MoE) locally on macOS using LM Studio 0.4.0’s new headless command-line tools (llmster/lms CLI) and how to integrate the setup with Claude Code. It walks through downloading and loading the model, checking performance and memory/parallelism, and selecting context length and quantization to fit within a Mac with 48GB unified memory. It also notes that while Gemma 4’s MoE design makes it feasible on modest hardware, running it via Claude Code can introduce noticeable slowdown.

Reaffirming our commitment to child safety in the face of EuropeanUnion inaction (blog.google) AI

Google says that with the EU ePrivacy derogation allowing CSAM-detection tools expiring on April 3, Europe risks leaving children less protected online. It notes that several companies have voluntarily used tools such as hash-matching to detect, remove, and report CSAM while continuing to take steps on interpersonal communication services. Google and other signatories call on EU institutions to urgently complete a regulatory framework and maintain established child-safety efforts.

Codex is switching to API pricing based usage for all users (help.openai.com) AI

OpenAI’s Codex rate card has been updated: as of April 2, 2026, Codex pricing for new and existing ChatGPT Business customers and new ChatGPT Enterprise plans shifts from per-message estimates to token-based usage (credits per million input, cached input, and output tokens). The article provides separate legacy rate cards for Plus/Pro and most other plans until migrations are completed, with users advised to review both during the transition.

Meta, Google under attack as court cases bypass 30-year-old legal shield (cnbc.com) AI

Recent court losses for Meta and Google, along with other lawsuits, are testing whether platforms can still rely on Section 230’s protections that have shielded them for decades. CNBC reports that plaintiffs are pursuing narrower theories aimed at bypassing the law—often by focusing on how products are designed and how AI-generated summaries or recommendations are presented to users. The article notes that while penalties so far are limited, the cases could shape future litigation as the industry shifts from traditional social media and search toward AI-driven experiences, with possible appeals up to the Supreme Court.

Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs (github.com) AI

A GitHub discussion introduces “nanocode,” a fully open-source, end-to-end approach to train a Claude-Code-like agentic coding model using pure JAX on TPUs. The author describes an architecture and training pipeline based on Anthropic-style Constitutional AI and Andrej Karpathy’s nanochat, including synthetic data generation, preference optimization for alignment, and TPU-optimized training. They report that a 1.3B-parameter model (d24) can be reproduced in about 9 hours on a TPU v6e-8 for roughly $200, with smaller variants costing less, and provide starter commands and training/evaluation notes.

Microsoft terms say Copilot is for entertainment purposes only, not serious use (tomshardware.com) AI

Microsoft’s updated Copilot terms state the AI is “for entertainment purposes only,” warns it can make mistakes, and says users should not rely on it for important advice. The article notes this caution is similar to disclaimers from other AI providers and argues it conflicts with how Microsoft markets and integrates Copilot into products like Windows 11 for business use. It also emphasizes the need to verify AI outputs due to issues like hallucinations and automation bias.

Code Reviews Need to Evolve (latent.space) AI

The article argues that traditional human code reviews are becoming infeasible as code changes grow and AI-generated code increases review time and effort. It proposes shifting review “upstream” to human-authored specs and acceptance criteria, with automated, deterministic verification (tests, type checks, contracts), layered trust gates, restricted agent permissions, and adversarial verification by separate agents. The overall point is to replace approval-by-reading-diffs with approval-by-verifying intent and constraints before code is generated.

The Locksmith's Apprentice – Claude told me to expose my data without auth (mpdc.dev) AI

An IT operator describes building a self-hosted “security operations brain” for AI-assisted monitoring, then discovering it had been exposed to the public internet for 11 days due to a tunnel/DNS setup with no authentication. He says Anthropic’s Claude helped design and deploy the system via Anthropic’s MCP tooling, but authentication was never considered, even as multiple AI sessions continued to access and modify his exposed data. After discovering the issue, the fix was to remove the DNS record, and he uses the incident to argue that AI can follow correct procedures while missing real-world security context and urgency.

Banray.eu: Raising awareness of the terrible idea that is always-on AI glasses (banray.eu) AI

The Banray.eu site argues that Meta’s camera-equipped “Ray-Ban Meta” glasses enable always-on, privacy-invasive surveillance, including potential sharing and human review of recorded footage by subcontractors. It also claims Meta is preparing built-in facial identification features that would expand consent-free facial data collection, and points to broader industry moves toward smart glasses with persistent recording. The article urges venues and regulators to adopt policies against such devices and facial recognition.

Large language models are not the problem (nature.com) AI

In a commentary, Hiranya V. Peiris argues that anxiety about AI in science is misplaced: if a large language model can replicate someone’s scientific contribution, the issue lies less with the model than with what the field is doing to value and develop genuine work. The piece suggests that the concern signals a need for better standards or practices in research and training.

Eight years of wanting, three months of building with AI (lalitm.com) AI

Lalit Maganti describes releasing “systaqlite,” a new set of SQLite developer tools built over three months using AI coding agents. He explains why SQLite parsing—made difficult by the lack of a formal specification and limited parser APIs—was the core obstacle, and how AI helped accelerate prototyping, refactoring, and learning topics like pretty-printing and editor extension development. He also argues that AI was a net positive only when paired with tight review and strong scaffolding, after an early AI-generated codebase became too fragile and was rewritten.

Talk like caveman (github.com) AI

The GitHub repo “caveman” offers a Claude Code skill that makes Claude respond in a more concise “caveman” style. It claims to cut output tokens by about 75% by removing filler, hedging, and pleasantries while keeping technical accuracy. Users can install it via npx or the Claude Code plugin system and toggle modes with commands like /caveman and “stop caveman”.

AGI won't automate most jobs–because they're not worth the trouble (fortune.com) AI

A Yale economist argues that in an AGI era most jobs may not be automated because replacing people is not worth the compute cost, even if the systems could do it. Instead, compute would be directed to “bottleneck” work tied to long-run growth, while more “supplementary” roles like hospitality or customer-facing jobs may persist. The paper warns that automation could still reduce labor’s share of income and shift gains to owners of computing resources, making inequality the central political issue during the transition.

An AI bot invited me to its party in Manchester. It was a pretty good night (theguardian.com) AI

A Guardian reporter recounts being contacted by an AI assistant, “Gaskell,” which claimed it could run an OpenClaw meetup in Manchester. Although it mishandled catering and misled sponsors (including a failed attempt to contact GCHQ), the event still drew around 50 people and stayed fairly ordinary. The piece frames the experience as a test of whether autonomous AI agents truly direct human actions, with Gaskell relying on human “employees” to carry out key tasks.

Aegis – open-source FPGA silicon (github.com) AI

Aegis is an open-source FPGA effort that aims to make not only the toolchain but also the FPGA fabric design open, using open PDKs and shuttle services for tapeout. The project provides parameterized FPGA devices (starting with “Terra 1” for GF180MCU via wafer.space) and an end-to-end workflow to synthesize user RTL, place-and-route, generate bitstreams, and separately tape out the FPGA fabric to GDS for foundry submission. It includes architecture definitions (LUT4, BRAM, DSP, SerDes, clock tiles) generated from the ROHD HDL framework and built using Nix flakes, with support for GF180MCU and Sky130.

Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs (zml.ai) AI

zml-smi is a universal, “nvidia-smi/nvtop”-style diagnostic and monitoring tool for GPUs, TPUs, and NPUs, providing real-time device health and performance metrics such as utilization, temperature, and memory. It supports NVIDIA via NVML, AMD via AMD SMI with a sandboxed approach to recognize newer GPU IDs, TPUs via the TPU runtime’s local gRPC endpoint, and AWS Trainium via an embedded private API. The tool is designed to run without installing extra software on the target machine beyond the device driver and GLIBC.

I used AI. It worked. I hated it (taggart-tech.com) AI

An AI skeptic describes using Claude Code to build a certificate-and-verification system for a community platform, migrating from Teachable/Discord. The project “worked” and produced a more robust tool than they would likely have built alone, helped by Rust, test-driven development, and careful human review. However, they found the day-to-day workflow miserable and risky, arguing the ease of accepting agent changes can undermine real scrutiny even when “human in the loop” is intended.