AI Startup Arthur Launches Open-Source Tool to Evaluate Performance of Language Models

Date:

Updated: [falahcoin_post_modified_date]

San Francisco-based AI startup, Arthur, has unveiled its latest offering, Arthur Bench, an open-source tool designed to evaluate the performance of large language models (LLMs) such as OpenAI’s GPT-3.5 Turbo and Meta’s LLaMA 2. The tool enables companies to assess the performance of different language models according to specific use cases and provides metrics to compare models based on accuracy, readability, hedging, and other criteria.

The issue of hedging is particularly important when it comes to using LLMs regularly. Hedging refers to instances where an LLM provides extraneous language summarizing its terms of service or programming constraints, which is often irrelevant to the user’s intended response. Arthur Bench aims to highlight these subtle behavioral differences that may be relevant to each application.

By offering starter criteria for comparing LLM performance, Arthur Bench allows enterprises to add their own specific criteria to suit their needs. For example, companies can input their users’ last 100 questions and run them against all models, with Arthur Bench highlighting areas where answers differed significantly, enabling manual review.

The primary objective of Arthur Bench is to help businesses make informed decisions when adopting AI. The tool streamlines benchmarking and translates academic measures into tangible real-world business benefits. By combining statistical measures and scores, along with input from other LLMs, Arthur grades the responses of desired LLMs side-by-side.

Arthur Bench has already attracted various industry sectors. Financial services firms are leveraging the tool to generate investment theses and accelerate analysis. Vehicle manufacturers are tapping into Arthur Bench to create LLMs capable of answering customer queries accurately and promptly, while reducing information hallucinations. In addition, enterprise media and publishing platform Axios HQ is utilizing the tool to establish an internal framework for LLM evaluation and description of performance to its Product team.

Arthur is making Bench available as an open-source tool, enabling anyone to use and contribute to it for free. While the startup believes in the strength of an open-source approach in building the best products, opportunities for monetization through team dashboards are still available. Additionally, Arthur has announced a hackathon with Amazon Web Services (AWS) and Cohere to encourage developers to develop new metrics for Arthur Bench. The alignment between AWS’s Bedrock environment, which helps users select and deploy various LLMs, and Arthur Bench is expected to further enhance the tool’s capabilities and reach.

This marks a continuation of Arthur’s efforts in the AI space, following the launch of Arthur Shield earlier this year. Arthur Shield focuses on monitoring large language models for hallucinations and other potential issues. With the introduction of Arthur Bench, the startup aims to provide enterprises with the necessary tools to leverage language models effectively and make informed decisions regarding AI adoption.

[single_post_faqs]
Neha Sharma
Neha Sharma
Neha Sharma is a tech-savvy author at The Reportify who delves into the ever-evolving world of technology. With her expertise in the latest gadgets, innovations, and tech trends, Neha keeps you informed about all things tech in the Technology category. She can be reached at neha@thereportify.com for any inquiries or further information.

Share post:

Subscribe

Popular

More like this
Related

Revolutionary Small Business Exchange Network Connects Sellers and Buyers

Revolutionary SBEN connects small business sellers and buyers, transforming the way businesses are bought and sold in the U.S.

District 1 Commissioner Race Results Delayed by Recounts & Ballot Reviews, US

District 1 Commissioner Race in Orange County faces delays with recounts and ballot reviews. Find out who will come out on top in this close election.

Fed Minutes Hint at Potential Rate Cut in September amid Economic Uncertainty, US

Federal Reserve minutes suggest potential rate cut in September amid economic uncertainty. Find out more about the upcoming policy decisions.

Baltimore Orioles Host First-Ever ‘Faith Night’ with Players Sharing Testimonies, US

Experience the powerful testimonies of Baltimore Orioles players on their first-ever 'Faith Night.' Hear how their faith impacts their lives on and off the field.