Live GitHub stats, community sentiment, and trend data for Chinese Llm Benchmark. TrendingBots tracks star velocity, fork activity, and what developers are saying — updated from real data sources.
GitHub data synced: Mar 22, 2026 • Sentiment updated: Mar 16, 2026
Community Buzz: The community is actively engaged in evaluating and comparing large language models, with a focus on agentic-ai, artificial-intelligence, llm-agent, and llm-evaluation. This project is likely to be of interest to researchers and developers working in these areas.
ReLE is different from alternatives because it provides a comprehensive and scalable system for evaluating Chinese language models. Its multi-dimensional evaluation approach and large defect library make it a valuable resource for researchers and developers. The project's focus on capability anisotropy and its use of a structured benchmark set it apart from other evaluation frameworks. Additionally, ReLE's support for various domains, including medical and financial, makes it a versatile tool for a wide range of applications.
Build a comprehensive Chinese language model benchmarking system — ReLE provides a scalable system and structured benchmark for diagnosing capability anisotropy in Chinese LLMs, Build a large-scale language model evaluation platform — ReLE supports multi-dimensional evaluation, including training, medical, and financial domains, Build a customized language model for specific industries — ReLE offers a wide range of models and a large defect library for improvement, Build an automated testing framework for language models — ReLE provides a comprehensive testing framework with various evaluation metrics, Build a research project on language model capabilities — ReLE provides a large-scale dataset and evaluation metrics for research purposes
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongCat、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。
Official site: https://nonelinear.com
Category: development
Tags: agentic-ai, artificial-intelligence, llm-agent, llm-evaluation
This project is part of the growing trend of benchmarking and evaluating large language models for various applications.