- [2404. 10952] Can Language Models Solve Olympiad Programming?
These resources enable us to construct and test a range of LM inference methods for competitive programming for the first time We find GPT-4 only achieves a 8 7% pass@1 accuracy with zero-shot chain-of-thought prompting, and our best inference method improves it to 20 2% using a combination of self-reflection and retrieval over episodic knowledge
- LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in . . .
Recent reports claim that large language models (LLMs) now outperform elite humans in competitive programming Drawing on knowledge from a group of medalists in international algorithmic contests, we revisit this claim, examining how LLMs differ from human experts and where limitations still remain
- It is clear that the state-of-the-art large-scale language . . .
The coding capabilities of large-scale language models (LLMs) are so high that technology company leaders have said things like, ' In LiveCodeBench Pro, a team of International Olympiad medalists
- How Do Olympiad Medalists Judge LLMs in Competitive . . .
A new benchmark assembled by a team of International Olympiad medalists suggests the hype about large language models beating elite human coders is premature LiveCodeBench Pro, unveiled in a 584-problem study [PDF] drawn from Codeforces, ICPC and IOI contests, shows the best frontier model clears j
- USACO: Can Language Models Solve Olympiad Programming?
We introduce the USACO benchmark with 307 problems from the USA Computing Olympiad Among other results, we find that inference-time methods double zero-shot model performance, and interactive human-in-the-loop tutoring can allow models to succeed on previously unsolved problems
- Can Language Models Solve Olympiad Programming? - OpenReview
These resources enable us to construct and test a range of LM inference methods beyond zero-shot prompting for competitive programming
- 2025년 6월 16일 - by Kim Seonghyeon - arXiv Daily
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? (Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie)
|