Autoregressive (AR) models have made significant advances in language generation and are increasingly explored for image synthesis. However, scaling AR models to high-resolution images remains a persistent challenge. Unlike text, where relatively few tokens are required, high-resolution images necessitate thousands of tokens, leading to quadratic growth in computational cost. As a result, most AR-based multimodal…

Music AI Sandbox was developed by Adam Roberts, Amy Stuart, Ari Troper, Beat Gfeller, Chris Deaner, Chris Reardon, Colin McArdell, DY Kim, Ethan Manilow, Felix Riedel, George Brower, Hema Manickavasagam, Jeff Chang, Jesse Engel, Michael Chang, Moon Park, Pawel Wluka, Reed Enger, Ross Cairns, Sage Stevens, Tom Jenkins, Tom Hume and Yotam Mann. Additional contributions…

Designing intelligent systems that function reliably in dynamic physical environments remains one of the more difficult frontiers in AI. While significant advances have been made in perception and planning within simulated or controlled contexts, the real world is noisy, unpredictable, and resistant to abstraction. Traditional AI systems often rely on high-level representations detached from their…
You might be wondering whether GenAI is just hype or external noise. I also thought this was hype, and I could sit this one out until the dust cleared. Oh, boy, was I wrong. GenAI has real-world applications. It also generates revenue for companies, so we expect companies to invest heavily in research. Every time…



In recent years, vision-language models (VLMs) have advanced significantly in bridging image, video, and textual modalities. Yet, a persistent limitation remains: the inability to effectively process long-context multimodal data such as high-resolution imagery or extended video sequences. Many existing VLMs are optimized for short-context scenarios and struggle with performance degradation, inefficient memory usage, or loss…

Today we are rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI. Building upon the popular foundation of 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, while still prioritizing speed and cost. Gemini 2.5 Flash is our first…

Tactile sensing is a crucial modality for intelligent systems to perceive and interact with the physical world. The GelSight sensor and its variants have emerged as influential tactile technologies, providing detailed information about contact surfaces by transforming tactile data into visual images. However, vision-based tactile sensing lacks transferability between sensors due to design and manufacturing…