77 comments
aubanel · 12 days ago
Hi all! Aymeric (m-ric) here, maintainer of smolagents and part of the team who built this. Happy to see this interesting people here!

Few points:

- open Deep Research is not a production app, but it could easily be productionized (would need to be faster + good UX).

- As the GAIA score of 55% (not 54%, that would be lame) says, it's not far from the Deep Research score of 67%. It's also not there yet: I think the main point of progress is to improve web browsing. We're working on integrating vision models (for now we've used a text browser developed by the Microsofit autogen team, congrats to them) because it's probably the best way to really interact with webpages.

- Open Deep Research is built on smolagents, a library that we're building, for which the core is having agents that write their actions (tool calls) in code snippets instead of the unpractical JSON blobs + parsing that everyone incl OpenAI and Anthropic use for their agentic/tool-calling APIs. Don't hesitate to go try out the lib and drop issues/PRs!

- smolagents does code execution, which means "danger for your machine" if ran locally. We've railguardeed that a bit with our custom python interpreter, but it will never be 100% safe, so we're enabling remote execution with E2B and soon Docker.

Show replies

transpute · 12 days ago
https://techcrunch.com/2025/02/04/hugging-face-researchers-a...

> On GAIA, a benchmark for general AI assistants, Open Deep Research achieves a score of 54%. That’s compared with OpenAI deep research’s score of 67.36%..Worth noting is that there are a number of OpenAI deep research “reproductions” on the web, some of which rely on open models and tooling. The crucial component they — and Open Deep Research — lack is o3, the model underpinning deep research.

Blog post, https://huggingface.co/blog/open-deep-research

Show replies

flaviuspopan · 12 days ago

Show replies

tkellogg · 12 days ago
it's just an example, but it's great to see smolagents in practice. I wonder how well the import whitelist approach works for code interpreter security.

Show replies

· 12 days ago
[deleted]