- LiveCodeBench: Holistic and Contamination Free Evaluation of Large . . .
LiveCodeBench collects problems from periodic contests on LeetCode, AtCoder, and Codeforces platforms and uses them for constructing a holistic benchmark for evaluating Code LLMs across variety of code-related scenarios continuously over time
- LiveCodeBench Pro
LiveCodeBench Pro A benchmark composed of problems from Codeforces, ICPC, and IOI that are continuously updated to reduce the likelihood of data contamination
- Official repository for the paper LiveCodeBench: Holistic and . . .
LiveCodeBench provides holistic and contamination-free evaluation of coding capabilities of LLMs Particularly, LiveCodeBench continuously collects new problems over time from contests across three competition platforms -- LeetCode, AtCoder, and CodeForces
- LiveCodeBench Benchmark Leaderboard | Artificial Analysis
Compare AI model performance on LiveCodeBench Benchmark Leaderboard A contamination-free coding benchmark that continuously harvests fresh competitive programming problems from LeetCode, AtCoder, and CodeForces, evaluating code generation, self-repair, and execution
- LiveCodeBench:对大语言模型进行全面且无污染的代码能力评估 - 知乎
LiveCodeBench 包含标有发布日期的问题,允许在不同时间窗口内进行评估。 可以通过仅在模型截止日期之后的时间窗口进行评估来检测并避免污染。
- livecodebench (Live Code Bench) - Hugging Face
livecodebench code_generation livecodebench test_generation livecodebench submissions livecodebench execution
- LiveCodeBench – UC Berkeley Sky Computing Lab
In this work, we propose LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which continuously collects new problems over time from contests across three competition platforms, namely LeetCode, AtCoder, and CodeForces
- LiveCodeBench:全面的 LLM 代码评测基准基准 | 数据学习者官方网站 (DataLearner)
LiveCodeBench 由加州大学伯克利分校、麻省理工学院和康奈尔大学的研究人员开发,是一个先进的评测基准套件,专门用于严格评估大语言模型 (LLMs) 在代码处理方面的能力,并解决现有基准测试的局限性。
|