[Paper Note]

RAG Evaluation

Stop grading only the final answer.

RAG Evaluation Grounding
Back to writing

Thesis

If retrieval fails, answer scoring mostly measures how gracefully the model guessed.

Notes

A useful RAG eval should separate retrieval coverage, source relevance, citation accuracy, answer faithfulness, and latency. The pipeline needs blame assignment.

Working Claim

Evaluation is a debugging interface, not just a leaderboard.