Researchers propose PreRL for reinforcement learning in pre-train space.
8/10
The paper introduces PreRL, a method that applies reward-driven online updates directly to the marginal distribution P(y) in the pre-train space, addressing the limitations of conventional reinforcement learning with verifiable rewards. This approach is theoretically and empirically validated, showing strong gradient alignment between log P(y) and log P(y|x). The researchers also propose Dual Space RL (DSRL), which leverages Negative Sample Reinforcement (NSR) to prune incorrect reasoning spaces and stimulate reflective behaviors. DSRL is shown to outperform strong baselines in extensive experiments. The method has implications for enhancing the reasoning abilities of large language models.
Researchers formalize 'vibe-testing' for LLM evaluation.
8/10
Evaluating Large Language Models (LLMs) is challenging due to the limitations of benchmark scores in capturing real-world usefulness. Users often rely on 'vibe-testing', an informal experience-based evaluation method. This work studies vibe-testing practices through surveys and model comparison reports, formalizing it as a two-part process where users personalize what they test and how they judge responses. A proof-of-concept evaluation pipeline is introduced, generating personalized prompts and comparing model outputs using user-aware subjective criteria. Experiments on coding benchmarks show that formalized vibe-testing can change which model is preferred, highlighting its potential for bridging benchmark scores and real-world experience.
Researchers study rhetorical questions in LLM representations.
8/10
A study on large language models (LLMs) analyzed how rhetorical questions are represented internally. Using linear probes on two social-media datasets, the study found that rhetorical signals emerge early and are captured by last-token representations. Rhetorical questions were found to be linearly separable from information-seeking questions, with detectability reaching an AUROC of 0.7-0.8 under cross-dataset transfer. However, the study also showed that transferability does not imply a shared representation, with probes trained on different datasets producing different rankings. The findings suggest that rhetorical questions in LLM representations are encoded by multiple linear directions emphasizing different cues.
Researchers propose $π$-Play, a self-evolution framework for multi-agent learning.
8/10
The $π$-Play framework addresses challenges in training deep search agents by leveraging self-play to generate high-quality privileged context for teacher models. This approach transforms conventional sparse-reward self-play into a dense-feedback self-evolution loop, eliminating the need for external data. The examiner generates tasks along with their question construction paths (QCPs), which serve as privileged context for the teacher model to supervise the student via self-distillation. Experiments demonstrate that $π$-Play surpasses fully supervised search agents and improves evolutionary efficiency. This method has potential applications in complex information-seeking tasks.
Datasette 1.0a27 released with CSRF and RenameTableEvent updates
6/10
Datasette version 1.0a27 has been released with two major changes. Firstly, it no longer uses Django-style CSRF form tokens, instead utilizing modern browser headers for CSRF protection. Secondly, a new RenameTableEvent is fired when a table is renamed during a SQLite transaction, which is useful for plugins that attach data to table records by name. Other changes are also included in this alpha release. Datasette is a tool for exploring and publishing data, and these updates improve its security and functionality.
Nathan Lambert, a notable figure, has published an article titled 'My bets on open models, mid-2026' where he discusses his expectations for the future of open models. The article focuses on the gap between open and closed models, highlighting potential developments. Lambert's insights are based on his understanding of the current landscape and trends in the field. The article is available on the Interconnects website.
Simon Willison created a custom preview UI for the datasette.io news section, which is built from a news.yaml file in the GitHub repository. The YAML file contains news entries with dates and descriptions of changes, such as updates to Datasette 1.0a27. The new UI aims to simplify editing and error checking. The tool was built using standard web technologies and Claude, an AI assistant. This development is related to the maintenance and improvement of the Datasette project, a tool for exploring and publishing data.
The datasette-export-database plugin has been updated to version 0.3a1. This release addresses a change in Datasette 1.0a27, which no longer sets the ds_csrftoken cookie. The plugin previously used this cookie as part of a custom signed URL. The update ensures compatibility with the latest Datasette version. The change is relevant for users of the Datasette platform who rely on the datasette-export-database plugin.
RedSun is a system exploit that allows bypassing user access controls on Windows 11, 10, and Server with the April 2026 Update. The exploit is available on GitHub and has been discussed on Hacker News. It potentially affects system security by allowing unauthorized access. The exploit's technical details and implications for Windows security are being examined. This could lead to updates from Microsoft to patch the vulnerability.
Amazon's AI content moderation system has been cancelling webcomics due to potentially explicit content. The system's algorithms flag certain images, leading to account suspensions. This affects webcomic creators who rely on Amazon for hosting. The issue highlights the challenges of balancing content moderation with artistic freedom. Amazon's AI-driven approach to moderation raises questions about the role of automation in creative platforms.
Gary Marcus discusses the current state of the AI market, highlighting its absurdities. His article outlines 22 points critiquing the market's trends and developments. The piece is a commentary on the AI industry's growth and the need for more realistic expectations. It matters technically as it touches on the limitations and potential misdirections in AI research and development. The article sparks discussion on the future of AI and its potential applications.
AI-assisted cognition may impact human development
6/10
A recent post on Heidenstedt.org discusses the potential risks of AI-assisted cognition on human development. The article explores how reliance on AI tools could affect human cognitive abilities. This topic is relevant to AI researchers and architects as it raises important questions about the long-term effects of AI integration on human intelligence and skills. The discussion has garnered significant attention, with 221 points and 173 comments on the topic.
A GitHub issue has been raised against Gastown, alleging that the platform is using users' Large Language Model (LLM) credits without their consent to improve its own performance. The issue has garnered significant attention with 111 comments and 225 points. The accusation centers around the unauthorized utilization of user credits for model fine-tuning or other purposes. This matter is technically significant as it pertains to data and resource usage transparency and ethics in AI development. The outcome of this issue could impact how AI platforms manage user resources and data.
Agent is a native Mac OS X coding IDE and harness. It has been released on GitHub by macOS26. The project provides a coding environment for developers, offering features such as code editing and project management. This release is notable for developers working on Mac OS X, as it provides a native alternative to existing IDEs.
A Tesla vehicle equipped with 'Full Self-Driving' (FSD) technology crashed through a railroad gate in Texas, seconds before a train was scheduled to pass. The incident is under investigation. The vehicle's owner reported the incident, which highlights the ongoing challenges in developing autonomous driving systems. The crash raises questions about the reliability of FSD in complex scenarios. Tesla's FSD technology is still in the development stage and has been the subject of regulatory scrutiny.