Researchers study knowledge distillation in LLM pretraining
8/10
A recent study on large language model pretraining examines the assumption that stronger teachers yield better students in knowledge distillation. The researchers varied architecture sizes and training token budgets to create different teacher-student relationships and found that even small and undertrained teachers can improve larger students with proper loss mixing. However, they also observed that a stronger teacher is not always better, as excessive parameters or training tokens can saturate or reverse distillation gains. The study further notes that distillation improves generalization more readily than in-domain fitting. These findings challenge the common belief that distillation pretraining always requires a strong teacher.
Researchers find hierarchical concept geometry in language models emerges from word co-occurrence.
8/10
A new study proposes a distributional theory of how hypernymy is encoded geometrically in language representations. The theory is based on the assumption that words closer on the WordNet hypernym graph co-occur more often. The researchers characterize the spectrum of the resulting embedding Gram matrix of word2vec embeddings and prove that the leading eigenvectors produce a hierarchical splitting geometry. The study confirms these predictions in word2vec embeddings and shows that the same signature extends to other models. The results indicate that hierarchical concept geometry in LLMs emerges from the spectral structure of pairwise word statistics.
Memory costs nearly two-thirds of AI chip component costs
8/10
A recent analysis by Epoch.ai reveals that memory has become a significant component of AI chip costs, accounting for nearly two-thirds of the total. This shift is largely driven by the increasing complexity and data requirements of modern AI models. As AI chips continue to evolve, the cost and efficiency of memory will play a crucial role in determining their overall performance. The findings are based on data insights from the AI chip industry, highlighting the growing importance of memory in AI chip design. This trend has significant implications for AI chip manufacturers and researchers.
DeepSeek is making a permanent 75% discount on its flagship AI model. This decision may impact the accessibility and affordability of AI technology for various users. The move could be a strategic business decision to expand customer base or to stay competitive in the market. The discount may also reflect changes in production costs or market demand. The flagship model's capabilities and applications are not specified in the report.
Researchers study fragility of LLM agents in code generation
8/10
A recent study on arXiv examines the concept of constraint decay in large language models (LLMs) used for back-end code generation. The researchers found that LLMs can struggle to maintain constraints and generate high-quality code, particularly in complex scenarios. This fragility has significant implications for the reliability and trustworthiness of AI-generated code. The study highlights the need for further research into improving the robustness of LLMs in code generation tasks.
Local LLMs improved with clarifying questions system
8/10
Researchers found that local Large Language Models (LLMs) perform better when taught to ask clarifying questions before providing answers. This approach allows the models to gather more context and provide more accurate responses. The system, which involves prompting the model to ask questions, has shown promising results in improving the overall performance of local LLMs. This development could lead to more efficient and effective use of LLMs in various applications. The study highlights the importance of contextual understanding in language models.
DeepSeek Reasonix is a native coding agent with high caching and low cost, developed by esengine. It is designed to improve coding efficiency and reduce costs. The agent is part of the DeepSeek project, which aims to create AI-powered coding tools. The release of DeepSeek Reasonix has garnered significant attention, with 499 points and 209 comments on the announcement. The technical details of the agent are available on the esengine GitHub page.
Apple has introduced the Perceptual Image Codec, a learned image compression technique. The codec is designed to improve image compression efficiency while maintaining perceptual quality. It is available on Apple's GitHub repository, ml-pico, along with a research paper detailing its development and technical aspects. This codec could potentially be used in various applications, including image and video processing. The release provides insights into Apple's approach to image compression using machine learning.
Datasette 1.0a30 released with customizable 'Jump to...' menu
6/10
Datasette 1.0a30 has been released with a new customizable 'Jump to...' menu. This feature allows users to quickly navigate to specific databases, tables, and debug options. The release also includes a new plugin hook, jump_items_sql(), which enables plugins to add their own items to the search set. The feature can be tried out on latest.datasette.io by hitting the / key. This update is part of the ongoing development of Datasette, an open-source tool for exploring and publishing data.
Datasette-agent 0.1a4 has been released, taking advantage of the new makeJumpSections() JavaScript plugin hook in Datasette 1.0a30. This update allows datasette-agent to present a 'Start a new agent chat' interface in the Jump to menu when the '/' key is pressed. The feature can be tested by signing into agent.datasette.io. The update integrates datasette-agent with Datasette's interface, enhancing user interaction. This integration is made possible by the recent addition of JavaScript plugin hooks in Datasette.
Datasette-fixtures 0.1a0 is a new plugin that utilizes the datasette.fixtures.populate_fixture_database API introduced in Datasette 1.0a30. This API allows for the creation of fixture database tables used by Datasette's own tests, which can be useful for plugin test suites. The datasette-fixtures plugin can be tried out using uvx without installing Datasette. It provides a helper for creating and populating fixture databases, which can be accessed via a JSON endpoint.
Network allow-lists are insufficient for stopping data exfiltration.
6/10
A recent analysis highlights the limitations of network allow-lists in preventing data exfiltration. The study discusses how allow-lists can be bypassed, allowing unauthorized data transfer. This is significant because it underscores the need for more comprehensive data loss prevention (DLP) strategies. The findings are based on an examination of canister egress proxy DLP, emphasizing the importance of layered security measures.
A new theoretical framework models Large Language Model (LLM) training as information transmission over a noisy channel, based on the Shannon-Hartley theorem. This perspective, known as the Shannon Scaling Law, captures the interaction between the learning signal and intrinsic noise in LLMs. The law is validated through experiments on Pythia and OLMo2 models under various perturbations, outperforming classical scaling laws. It accurately predicts performance degradation and loss basins, even for unseen models. The framework provides a unified explanation for non-monotonic phenomena in LLM training, such as catastrophic overtraining and quantization-induced degradation.
SkillOpt optimizes agent skills via controllable text-space optimization
9/10
Researchers introduced SkillOpt, a systematic controllable text-space optimizer for agent skills, which trains skills as the external state of a frozen agent. SkillOpt uses a separate optimizer model to make edits on a single skill document based on scored rollouts and a held-out validation score. The approach is evaluated across six benchmarks, seven target models, and three execution harnesses, outperforming human, one-shot LLM, and other skill optimization methods. SkillOpt shows significant improvements in accuracy, with an average lift of +23.5 points in direct chat and +24.8 points inside the Codex agentic loop. The optimized skill artifacts also retain value when transferred across model scales and execution environments.
Companies are rebranding themselves as tech-focused due to the growing interest in AI. This phenomenon, known as 'AI washing', involves firms emphasizing their use of AI to attract investors and customers. The trend is driven by the perceived value of AI in improving business operations and decision-making. As a result, many companies are now highlighting their AI capabilities, even if they are not a core part of their business. This shift in branding strategy reflects the increasing importance of AI in the business world.