
A benchmark for conversational proactivity in LLMs — noticing and acting on what the user implied but never said. 198 curated dialogues, 624 trigger points, 16 models, and a leaderboard where Recovery proves dramatically hard.

A benchmark for conversational proactivity in LLMs — noticing and acting on what the user implied but never said. 198 curated dialogues, 624 trigger points, 16 models, and a leaderboard where Recovery proves dramatically hard.

Since the release of Higgs-Llama-v2, we have received much positive feedback from the community. While we are amazed by the community's creativity in utilizing our model, we realize the importance of providing an automated benchmark to effectively evaluate large language model (LLM)'s roleplaying capability.

At Boson AI, we are working on intelligent agents that can serve as human companions and helpers. Today we are excited to share Higgs-Llama-3-70B-v2, a new model that significantly improves upon its predecessor. It narrows the gap to the very best proprietary models on benchmarks relevant for dialog interaction and understanding.

Since founding Boson AI in 2023, we have dedicated ourselves to empower enterprises with AI technologies, with a mission to transform how stories are told, knowledge is learned, and insights are gathered. We helped customers build intelligent agents to interact with their users by playing various roles, including game characters, language tutors, insurance agents and financial advisors.