Skip to content Skip to sidebar Skip to footer

Meta CLIP 2: The First Contrastive Language-Image Pre-training (CLIP) Trained with Worldwide Image-Text Pairs from Scratch

Contrastive Language-Image Pre-training (CLIP) has become important for modern vision and multimodal models, enabling applications such as zero-shot image classification and serving as vision encoders in MLLMs. However, most CLIP variants, including Meta CLIP, are limited to English-only data curation, ignoring a significant amount of non-English content from the worldwide web. Scaling CLIP to include…

Read More

Deep Think is now rolling out

How Deep Think works: extending Gemini’s parallel “thinking time” Just as people tackle complex problems by taking the time to explore different angles, weigh potential solutions, and refine a final answer, Deep Think pushes the frontier of thinking capabilities by using parallel thinking techniques. This approach lets Gemini generate many ideas at once and consider…

Read More

Genie Envisioner: A Unified Video-Generative Platform for Scalable, Instruction-Driven Robotic Manipulation

Embodied AI agents that can perceive, think, and act in the real world mark a key step toward the future of robotics. A central challenge is building scalable, reliable robotic manipulation, the skill of deliberately interacting with and controlling objects through selective contact. While progress spans analytic methods, model-based approaches, and large-scale data-driven learning, most…

Read More

Qwen Team Introduces Qwen-Image-Edit: The Image Editing Version of Qwen-Image with Advanced Capabilities for Semantic and Appearance Editing

In the domain of multimodal AI, instruction-based image editing models are transforming how users interact with visual content. Just released in August 2025 by Alibaba’s Qwen Team, Qwen-Image-Edit builds on the 20B-parameter Qwen-Image foundation to deliver advanced editing capabilities. This model excels in semantic editing (e.g., style transfer and novel view synthesis) and appearance editing…

Read More

Meet DeepFleet: Amazon’s New AI Models Suite that can Predict Future Traffic Patterns for Fleets of Mobile Robots

Amazon has reached a remarkable milestone by deploying its one-millionth robot across global fulfillment and sortation centers, solidifying its position as the world’s largest operator of industrial mobile robotics. This achievement coincides with the launch of DeepFleet, a groundbreaking suite of foundation models designed to enhance coordination among vast fleets of mobile robots. Trained on…

Read More