CL-bench Leaderboard

📋 About CL-bench

CL-bench is a benchmark for evaluating language models' context learning. Solving tasks in CL-bench requires models to learn new knowledge from the provided context, rather than relying on pre-trained knowledge.

⚖️ Evaluation Method

Model solutions are evaluated via instance-level rubrics and LM-as-a-judge. Each task contains an average of 16.6 rubrics, which evaluate solutions across multiple dimensions.

🔄 Submit Your Model

Want to add your model? Download the dataset from Hugging Face, run inference and evaluation using our scripts, then submit a PR with your results.

🏆 CL-bench

A benchmark for Context Learning

📊 Leaderboard

📋 About CL-bench

⚖️ Evaluation Method

🔄 Submit Your Model