LLMs Fail More than Half Real-Life MCP Tests: Salesforce Research

A few weeks ago, Salesforce Research released a tool to test AI agent performance when using MCP servers. They’ve now published results of their MCP-Universe benchmark, which show that leading LLMs including ChatGPT, Google Gemini, Grok and Claude all fail more than half the time at real-life tasks such as location navigation, browser automation, and web search. Reality can be so annoying.

One of the trends we’re following is technology offerings by service vendors. Latest example is Shopify design agency Domaine, which just released an “AI Commerce Suite” that uses large language models to automate copy, imagery and video creation; real-time adaptive SEO and geo-targeting; and on-site search, recommendations, and dynamic content. That would have been impressive not so long ago; today, it’s almost commonplace.

CDPI Newsletter

The US Director of National Intelligence, Tulsi Gabbard, said the UK will not require Apple to provide backdoor access to encrypted data as was mandated earlier this year. Apple had subsequently withdrawn its encryption feature for new users in the U.K and disabled it for users. Apple and the UK have not commented.

CDPI Privacy Newsletter

CDP Zeta Global has unveiled Athena, a “superintelligent” voice-activated interface that delivers instant answers, smarter decisions, and agentic actions throughout the Zeta platform. It activates a suite of agentic apps that can adapt to each user’s specific goals and styles. Early access will begin later this year, with a suite of agentic apps to follow in the first half of next year.

CDPI Newsletter

CDP Institute

News

LLMs Fail More than Half Real-Life MCP Tests: Salesforce Research

More News

Shopify Agency Domaine Launches AI Commerce Suite

US says UK backing off backdoor Apple access

CDP Zeta Global Unveils Superintelligent Agent