Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

Convergence AI - Vision-Language

Proxy Lite-3B

Explore Proxy Lite-3B, a 3B-parameter open-source Vision-Language Model for efficient web automation. Achieve a 72.4% success rate with ease!
2025-02-25
Updated 2025-03-13 09:23:36

Key Points

- Proxy Lite-3B is a 3-billion-parameter open-source Vision-Language Model (VLM) for web automation, released by Convergence AI on February 25, 2025, and available on Hugging Face.

- It seems likely that it performs well in UI navigation, achieving a 72.4% success rate on the WebVoyager benchmark, leading among open-weights models.

- Research suggests it’s efficient, using fewer resources, and is finetuned from Qwen/Qwen2.5-VL-3B-Instruct, making it accessible for developers.

Comprehensive Analysis of Proxy Lite-3B: A Deep Dive into Its Capabilities and Implications

What is Proxy Lite-3B?

Proxy Lite-3B is a compact AI model designed for web automation tasks, such as filling forms or clicking buttons on websites. It’s open-source, meaning anyone can use, modify, or distribute it, which is great for developers looking to build AI applications without heavy computational needs.

Why It Matters

This model stands out because it’s small yet powerful, with a 72.4% success rate on the WebVoyager benchmark, which tests real-world web interactions. It’s also efficient, using fewer resources than larger models, making it practical for many users. An unexpected detail is its framework for VLM-browser interaction, using tools like Playwright for precise web navigation, which could open new ways to automate daily tasks.

How to Use It

You can host Proxy Lite-3B locally using vLLM, with instructions available on its GitHub page (proxy-lite). It’s recommended for production use to avoid the slower demo endpoint on Hugging Face (proxy-lite-3b).

Introduction

On February 25, 2025, Convergence AI released Proxy Lite-3B, a 3-billion-parameter Vision-Language Model (VLM) tailored for web automation tasks, marking a significant milestone in open-source AI development. Hosted on Hugging Face (proxy-lite-3b), this model is designed to navigate and interact with web interfaces, offering a compact yet efficient solution for developers and AI enthusiasts. This article explores Proxy Lite-3B’s architecture, performance, applications, and potential, providing a thorough examination for those interested in its technical and practical implications.

Background and Release

Proxy Lite-3B was announced by Convergence AI on their website (proxy_lite), emphasizing its role as a mini, open-weights version of their Proxy assistant. The release aligns with the growing trend of democratizing AI through open-source models, enabling broader access to advanced web automation capabilities. Its availability on Hugging Face (proxy-lite-3b) ensures easy access for the global developer community, with a GitHub repository (proxy-lite) providing additional resources and setup instructions.

Model Architecture and Technical Details

Proxy Lite-3B is finetuned from Qwen/Qwen2.5-VL-3B-Instruct, a 3B parameter VLM known for processing both visual and textual inputs. This base model allows Proxy Lite-3B to interpret web page visuals and text, enabling tasks like clicking buttons or filling forms. With a model size of 3.75B parameters and using BF16 tensor type, it’s designed for efficiency, requiring less computational power compared to larger models. The model’s architecture includes a framework for VLM-browser interaction, leveraging the `Runner` class and `BrowserTool` class for precise web navigation, as detailed in its GitHub repository (proxy-lite). This framework uses Playwright for browser control, with actions defined by `mark_id`s for interacting with web elements.

Performance on WebVoyager Benchmark

The WebVoyager benchmark, introduced by He et al. in their paper (WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models), evaluates AI agents on real-world web tasks across 15 popular websites. Proxy Lite-3B achieved a 72.4% success rate, as noted on its Hugging Face page (proxy-lite-3b), positioning it as the leader among open-weights models. Detailed performance metrics, provided by Convergence AI (proxy_lite), show varying success rates across websites, as seen in the table below:

Website Name Success Rate (%) Finish Rate (%) Avg. Messages
Allrecipes 87.8 95.1 10.3
Amazon 70.0 95.0 7.1
Apple 82.1 89.7 10.7
ArXiv 60.5 79.1 16.0
BBC News 69.4 77.8 15.9
Booking 70.0 85.0 24.8
Cambridge Dictionary 86.0 97.7 5.7
Coursera 82.5 97.5 4.7
ESPN 53.8 97.5 14.9
GitHub 85.0 92.5 10.0
Google Flights 38.5 51.3 34.8
Google Map 78.9 94.7 9.6
Google Search 71.4 92.9 6.0
Huggingface 68.6 74.3 18.4
Wolfram Alpha 78.3 93.5 6.1

These results, with full trajectories available at (eval trajectories), underscore its capability to handle diverse web tasks, though performance varies by website complexity.

Efficiency and Resource Use

One of Proxy Lite-3B’s standout features is its efficiency. With only 3B parameters, it uses a fraction of the computational resources required by larger models, making it suitable for deployment on devices with limited hardware. The GitHub repository (proxy-lite) recommends hosting it locally using vLLM, with a command like `vllm serve --model convergence-ai/proxy-lite-3b --trust-remote-code --enable-auto-tool-choice --tool-call-parser hermes --port 8008`, ensuring optimal performance. This efficiency is particularly beneficial for developers looking to integrate AI into applications without significant infrastructure costs.

Applications and Use Cases

Proxy Lite-3B’s primary application is web automation, enabling tasks such as:

- Automating repetitive web tasks, like form filling or scheduling appointments.

- Web scraping and data collection, offering a more intelligent approach compared to traditional methods.

- Enhancing AI assistants to interact directly with web content, expanding their functionality beyond text-based interactions.

Its open-source nature allows developers to fine-tune it for specific needs, potentially extending its use to custom web automation workflows. For example, it can assist in automating customer support tasks on e-commerce websites or streamline data entry processes.

Comparison with Other Models

Compared to other open-source models, Proxy Lite-3B’s 72.4% success rate on WebVoyager positions it as a leader among its peers, especially given its small size. Proprietary models, like those from Anthropic, may achieve higher rates, but they often require significant resources. Browser Use, another open-source agent mentioned in recent discussions (Browser Use AI model), claims a 89.1% success rate, but it’s a broader agent framework, not directly comparable to Proxy Lite-3B’s VLM focus. This comparison highlights Proxy Lite-3B’s niche strength in efficient, open-source web automation.

Getting Started and Community Engagement

To get started with Proxy Lite-3B, users can clone the GitHub repository (proxy-lite) and follow the setup instructions. The repository includes a demo API endpoint (demo-api) for initial testing, though it’s noted as unsuitable for production due to potential slowness under load. For production use, hosting locally with vLLM is recommended, with detailed commands provided. The community can engage through Convergence AI’s resources, though specific forums or Discord channels weren’t detailed in the release, suggesting users check the GitHub for contribution guidelines.

Limitations and Challenges

While Proxy Lite-3B offers significant advantages, it faces challenges such as anti-bot measures on websites, mitigated by using `playwright_stealth` and network proxies, especially in headless mode. Some tasks, like those on Google Flights, show lower success rates (38.5%), indicating areas for improvement in handling complex dynamic content.

Conclusion

Proxy Lite-3B represents a pivotal advancement in open-source AI for web automation, offering a balance of performance, efficiency, and accessibility. Its 72.4% success rate on the WebVoyager benchmark, coupled with its compact size, makes it a valuable tool for developers and researchers. As the AI community continues to build upon this model, we can anticipate further innovations in web interaction and automation, potentially transforming how we engage with digital interfaces.

Key Citations

- proxy-lite-3b Convergence AI Hugging Face model page

- Announcing Proxy Lite open-weights version Convergence AI website

- proxy-lite GitHub repository for setup and details

- WebVoyager Building an End-to-End Web Agent with Large Multimodal Models research paper

- Browser Use open-source AI agent for web automation InfoWorld article

- eval trajectories for Proxy Lite performance on WebVoyager

- demo-api Hugging Face spaces for Proxy Lite testing

Prev Article
Baichuan-M1
Next Article
Wan 2.1

Related to this topic:

No related pages found.