Apple WWDC 2026: The 7 biggest announcements
(theverge.com)
AI
Apple’s WWDC 2026 keynote highlighted an AI-upgraded Siri (“Siri AI”) with conversation and on-screen context features, alongside iOS 27, macOS 27, and Safari updates that expand Apple Intelligence across devices, plus improvements to Apple Home (including 4K support) and redesigned Screen Time and parental controls.
Apple reveals new AI architecture built around Google Gemini models
(macrumors.com)
AI
Apple announced an overhaul of Apple Intelligence, building a new architecture around foundation models co-developed with Google using Gemini-family technologies and running on-device plus via Private Cloud Compute. The update adds multimodal capabilities such as image understanding and generation, with device-specific higher-power versions that include speech generation and improved dictation, coordinated by a new “system orchestrator” for app- and task-aware responses while reiterating privacy protections.
We need to learn how to argue with AI
(ft.com)
AI
An opinion piece argues that people need to learn how to effectively argue with AI systems, suggesting communication skills and reasoning practices are important when interacting with AI.
How Confident Are AI Classifiers About Their Own Confidence?
(gmcirco.github.io)
AI
The post tests how reliable AI “confidence” scores are when an LLM classifies injury body parts from NEISS medical narratives, comparing LLM-emitted confidence values to token log-probabilities from the model output. Using a sample of 500 cases with a gpt-5-nano extraction pipeline, the author finds that LLM confidence is relatively close to observed accuracy at the highest confidence ranges but diverges outside the upper end, while token log-probabilities are generally more over-confident. The article also outlines methods to calibrate probabilities in multi-class settings, including “top-vs-all” calibration via isotonic regression.
Two Leaps to 1000 Tokens/s on a 1T-Parameter Model
(tilert.ai)
AI
TileRT argues that reaching 1000+ tokens per second on a large (up to 1T-parameter) model requires a shift from kernel/operator-level tuning to a persistent, continuously running execution engine that removes microsecond “execution gaps,” plus hardware–model co-design to eliminate microsecond-scale overheads in components like RMSNorm, RoPE, KV-cache writes, and multi-token prediction.
Siri AI
(apple.com)
AI
Apple’s Apple Intelligence update introduces “Siri AI,” a more capable Siri assistant available in English later this year, with features like richer conversational answers, actions across apps, and a new dedicated Siri app. The update also expands “Visual Intelligence” for tasks such as visual search and photo/video-related actions, and adds AI photo editing tools like Spatial Reframing, Extend, and Clean Up, alongside broader communication and productivity improvements. Apple emphasizes on-device processing and “Private Cloud Compute” to support privacy.
"Chat is dead": OpenAI preps overhaul of ChatGPT
(arstechnica.com)
AI
OpenAI is preparing a major overhaul of ChatGPT, aiming to reposition the chatbot as part of a “superapp” centered on AI agents and higher-margin products like coding tools (including Codex) ahead of a planned IPO and as it competes more directly with Anthropic.
xAI is looking more like a datacentre REIT than a frontier lab
(martinalderson.com)
AI
The article argues that xAI’s partnerships with Anthropic and Google—providing large amounts of datacentre capacity via SpaceX-linked channels—make it look more like a datacentre “REIT” with a frontier lab than a traditional frontier AI lab, citing massive monthly fee figures and cancellation clauses. It notes that Anthropic had been facing capacity crunches that the deals alleviate, while also raising red flags such as potential financial/strategic motivation connected to competition and SpaceX’s upcoming IPO, and suggesting xAI’s speed in datacentre buildout could be a key competitive advantage.
Terry Tao Became an Evangelist for AI in Math
(quantamagazine.org)
AI
Quanta Magazine profiles mathematician Terry Tao’s evolution from embracing large-scale public collaboration (including the Polymath Projects) to advocating computer-assisted proof checking, arguing that automated systems like Lean could eventually let proofs be assembled from smaller verified chunks rather than relying on human referees.
AI Is Slowing Down
(wheresyoured.at)
AI
In a piece by Ed Zitron, the argument is that the AI industry is “slowing down” because the compute and data-center buildouts now require far more revenue growth than OpenAI and Anthropic (and others) are projected to generate, potentially forcing large new rounds of funding or unsustainable assumptions about demand through 2030.
MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second
(mimo.xiaomi.com)
AI
Xiaomi has released MiMo-V2.5-Pro-UltraSpeed, a 1-trillion-parameter model claiming up to ~1000 tokens/second decode speed via collaboration with TileRT, using FP4 quantization and DFlash speculative decoding on commodity GPUs. The associated API is offered at a limited-time promotional price from June 9–23, 2026 (application-based access), with trial chat access during the same window.
The EU Open Source Strategy
(digital-strategy.ec.europa.eu)
AI
The European Commission’s EU Open Source Strategy aims to strengthen Europe’s technological sovereignty by promoting European open alternatives to non-EU proprietary software and by supporting the development, scaling, deployment and long-term sustainability of open source across public and private sectors. It addresses challenges such as limited long-term funding, difficulty maintaining projects, fragmented visibility, and dependence on dominant non-EU providers, with actions spanning procurement and public administration adoption, open source business models, standards and international outreach, and maintenance/security measures for critical components.
LLMs and performative productivity
(joshcollinsworth.com)
AI
The article argues that while LLM agents can make developers feel faster and more capable—especially on boilerplate or low-complexity greenfield tasks—the author’s own experience and cited studies suggest these gains are often situational, may trade away skills and code quality, and can come at the cost of deeper understanding and long-term productivity.
Is This the Dawn of the Tokenpocalypse?
(techcrunch.com)
AI
TechCrunch’s Equity podcast discusses Microsoft’s shift to charging more for GitHub Copilot based on tokens, framing it as a potential “Tokenpocalypse” where rising AI costs force usage limits and change business models as AI labs prepare for IPOs and investors pressure profitability.
Playing with Vision Embeddings
(prestonbjensen.com)
AI
The post explores how DINOv3 vision transformer embeddings (single 384-number vectors) encode image information by generating images via gradient optimization, then using a sparse autoencoder to learn thousands of more interpretable “feature directions” and decompose or recombine embeddings (e.g., identifying features for scenes like the Golden Gate Bridge, demonstrating feature superposition, and showing how adding/interpolating features blends or juxtaposes visual concepts).
Tiny hackable CUDA language model implementation
(github.com)
AI
The GitHub repository “markusheimerl/gpt” describes a tiny, hackable CUDA-oriented generative transformer that models data as 8-bit byte tokens and predicts the next byte using causal self-attention, feed-forward layers, and cross-entropy loss. It outlines the model’s byte embedding, rotary positional encoding, use of AdamW and OpenBLAS for efficient matrix operations, and provides instructions and sample outputs from running an inference command.
90210 – running the show without property tax
(github.com)
AI
A GitHub project called “90210” describes a production-grade local app that turns a screenplay into a finished short film with synchronized video, audio, dialogue, music, and subtitles, using services such as Google Veo, Gemini, and ElevenLabs. It outlines a FastAPI/Next.js architecture, setup steps, and an “oracle” system that uses multiple ML-based quality and story metrics to auto re-roll and adjust processing tiers, with a noted estimate of about $20 for a 2-minute movie.
KNN early termination in Manticore Search
(manticoresearch.com)
AI
Manticore Search’s blog explains how “early termination” for HNSW-based KNN vector search detects when the result set has converged (using a discovery-rate signal and adaptive quantile thresholds with a patience counter) and stops graph traversal before the exploration budget is exhausted. Benchmarks on a 1M-vector dataset report substantial reductions in distance computations at large k (e.g., ~65% of work at k=60, ~30% at k=1000, ~20% at k=10000) while keeping precision loss within ~2–4% and improving latency further under concurrent load. Early termination is enabled by default, disabled automatically for small k (<=10), and can be turned off in queries when maximum recall or deterministic benchmarking is needed.
Show HN: Nightwatch, The open-source, read-only AI SRE
(github.com)
AI
Nightwatch, an open-source project from ninoxAI, presents a “read-only AI SRE” layer that turns alert storms into grouped incidents, finds noisy checks, and uses a tool-calling AI agent to investigate root cause using evidence from live systems without executing changes. The system is designed to be monitoring-agnostic and local-first, sitting above tools like Prometheus, Checkmk, Icinga2, Zabbix, Grafana, Docker/Kubernetes, AWS, and GitHub, and it proposes ranked, human-gated fixes rather than auto-applying them.
VibeOS: First ever AI-native operating system
(vibeos.sh)
AI
VibeOS is presented as an “AI-native” operating system that uses an Anthropic Claude-powered agent (Claude Code) to control the computer from prompts, enabling instant creation of apps and tools like live-edit NextJS UI, MCP-based utilities, browser handoff to the AI agent, and AI-curated news feeds, with a Dockerized option aimed at privacy by not giving the agent hardware access by default.
What Are Tokens in LLMs?
(bearisland.dev)
AI
The article explains that in LLMs, text is converted into model-specific integer token IDs rather than raw characters or words, using tokenizers built from algorithms like (byte-level) BPE. It walks through how BPE incrementally builds a vocabulary by repeatedly merging frequent adjacent pairs, including an example showing how words like “cat” can become single tokens. It also clarifies the “strawberry” effect—models may split a word differently because their vocabularies differ, and byte-level tokenization avoids out-of-vocabulary characters by starting from UTF-8 bytes.