AI News – Page 2 – Ai Info365

Salesforce AI Research Introduces WALT (Web Agents that Learn Tools): Enabling LLM agents to Automatically Discover Reusable Tools from Any Website

AI NewsOctober 25, 202528Views 0Likes 0Comments

A team of Salesforce AI researchers introduced WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent website functionality into reusable invocable tools. It reframes browser automation around callable tools rather than long chains of clicks. Agents then call operations such as search, filter, sort, post_comment, and create_listing. This reduces dependence on large language…

NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI

AI NewsOctober 10, 202535Views 0Likes 0Comments

How do you create 3D datasets to train AI for Robotics without expensive traditional approaches? A team of researchers from NVIDIA released “ViPE: Video Pose Engine for 3D Geometric Perception” bringing a key improvement for Spatial AI. It addresses the central, agonizing bottleneck that has constrained the field of 3D computer vision for years. ViPE…

Meta AI Researchers Release MapAnything: An End-to-End Transformer Architecture that Directly Regresses Factored, Metric 3D Scene Geometry

AI NewsOctober 5, 202543Views 0Likes 0Comments

A team of researchers from Meta Reality Labs and Carnegie Mellon University has introduced MapAnything, an end-to-end transformer architecture that directly regresses factored metric 3D scene geometry from images and optional sensor inputs. Released under Apache 2.0 with full training and benchmarking code, MapAnything advances beyond specialist pipelines by supporting over 12 distinct 3D vision…

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

AI NewsSeptember 30, 202540Views 0Likes 0Comments

IBM has released Granite-Docling-258M, an open-source (Apache-2.0) vision-language model designed specifically for end-to-end document conversion. The model targets layout-faithful extraction—tables, code, equations, lists, captions, and reading order—emitting a structured, machine-readable representation rather than lossy Markdown. It is available on Hugging Face with a live demo and MLX build for Apple Silicon. What’s new compared to…

How to Master Advanced TorchVision v2 Transforms, MixUp, CutMix, and Modern CNN Training for State-of-the-Art Computer Vision?

AI NewsSeptember 25, 202536Views 0Likes 0Comments

In this tutorial, we explore advanced computer vision techniques using TorchVision’s v2 transforms, modern augmentation strategies, and powerful training enhancements. We walk through the process of building an augmentation pipeline, applying MixUp and CutMix, designing a modern CNN with attention, and implementing a robust training loop. By running everything seamlessly in Google Colab, we position…

Top Computer Vision CV Blogs & News Websites (2025)

AI NewsSeptember 20, 202552Views 0Likes 0Comments

Computer vision moved fast in 2025: new multimodal backbones, larger open datasets, and tighter model–systems integration. Practitioners need sources that publish rigorously, link code and benchmarks, and track deployment patterns—not marketing posts. This list prioritizes primary research hubs, lab blogs, and production-oriented engineering outlets with consistent update cadence. Use it to monitor SOTA shifts, grab…

What are Optical Character Recognition (OCR) Models? Top Open-Source OCR Models

AI NewsSeptember 15, 202548Views 0Likes 0Comments

Optical Character Recognition (OCR) is the process of turning images that contain text—such as scanned pages, receipts, or photographs—into machine-readable text. What began as brittle rule-based systems has evolved into a rich ecosystem of neural architectures and vision-language models capable of reading complex, multi-lingual, and handwritten documents. How OCR Works? Every OCR system tackles three…

Apple Released FastVLM: A Novel Hybrid Vision Encoder which is 85x Faster and 3.4x Smaller than Comparable Sized Vision Language Models (VLMs)

AI NewsSeptember 10, 202540Views 0Likes 0Comments

Introduction Vision Language Models (VLMs) allow both text inputs and visual understanding. However, image resolution is crucial for VLM performance for processing text and chart-rich data. Increasing image resolution creates significant challenges. First, pretrained vision encoders often struggle with high-resolution images due to inefficient pretraining requirements. Running inference on high-resolution images increases computational costs and…

AI and the Brain: How DINOv3 Models Reveal Insights into Human Visual Processing

AI NewsSeptember 5, 202545Views 0Likes 0Comments

Introduction Understanding how the brain builds internal representations of the visual world is one of the most fascinating challenges in neuroscience. Over the past decade, deep learning has reshaped computer vision, producing neural networks that not only perform at human-level accuracy on recognition tasks but also seem to process information in ways that resemble our…

VLM2Vec-V2: A Unified Computer Vision Framework for Multimodal Embedding Learning Across Images, Videos, and Visual Documents

AI NewsAugust 31, 202544Views 0Likes 0Comments

Embedding models act as bridges between different data modalities by encoding diverse multimodal information into a shared dense representation space. There have been advancements in embedding models in recent years, driven by progress in large foundation models. However, existing multimodal embedding models are trained on datasets such as MMEB and M-BEIR, with most focus only…