YandexGPT Experimental entered the top of the LLM Arena rating
The model called YandexGPT Experimental was in the top of the LLM Arena rating on the same level as the GPT-4o, GPT-4 Turbo and Claude 3.5 Sonnet. The LLM Arena rating evaluates how well the models answer questions in Russian
The LLM Arena platform was launched by independent developers from the Russian ML community. The service gives users free access to various large language models (LLM), in return, users determine which model, in their opinion, gives the best answer. Based on the collected user ratings, the authors of the service build a model leaderboard, according to which models can be compared with each other.
The logic of the service and the principle of operation were taken from the foreign service LMSYS Chatbot Arena — one of the most reputable benchmarks in the foreign market.
Unlike its foreign counterpart, the LLM Arena focuses on the Russian language, and Russian LLMs such as YandexGPT, GigaChat, Saiga, and Whirlwind have been added. The authors of the service noted that they want to create an objective, open and up-to-date benchmark of LLM models in Russian.
In the future, the service intends to add a multimodal arena, and make the benchmark a target in the Russian market.
There are already several benchmarks of LLM models in Russia, such as rulm-sbs2, MERA, Arena-Hard-Auto. Unlike current benchmarks, the evaluation of models does not occur automatically by another stronger model, or on the basis of private closed tests, but with the help of human live evaluations of real users, which makes the benchmark more objective.
Комментарии