CalibRank scores argument quality, not correctness. A well-structured argument for the wrong side can outscore a sloppy argument for the right one.
Our scoring model analyzes the structure of your argument — how well you reason, what evidence you provide, whether your perspective is original, and how clearly you communicate. It does not judge whether your position is “right” or “wrong.” Two people on opposite sides of a debate can both score 90+.
Is your reasoning internally consistent? Do your premises lead to your conclusion without logical fallacies or contradictions?
Build step-by-step reasoning. Avoid leaps of logic or circular arguments.
Did you support your claims with concrete examples, data, citations, or real-world references? Unsupported assertions score low.
Cite sources, reference studies, or provide specific examples. "Because I said so" scores poorly.
Does your argument bring a fresh perspective? Restating a common talking point scores lower than a novel angle on the same position.
Find an angle others haven't covered. Combine ideas from different domains. Challenge assumptions.
Is your argument easy to follow? Good structure, clear language, and concise delivery all contribute. Rambling or confusing writing scores low.
Lead with your strongest point. Use short sentences. One idea per paragraph.
Each dimension is scored 0–100 independently, then combined using these weights.
"Should cities ban cars from downtown areas?"
Side A: Yes, ban cars
“Cities that have pedestrianized their centers — like Oslo, which banned cars from its core in 2019 — consistently report 10-15% increases in retail revenue within two years (Oslo Chamber of Commerce, 2021). The counterargument that businesses suffer is empirically false: foot traffic replaces car traffic at higher density. What most people miss is that car bans also function as an equity measure — lower-income residents who can't afford parking subsidize infrastructure they don't use. Redirecting road maintenance budgets toward transit creates a positive-sum outcome for 70%+ of urban residents.”
Tap any dimension to see why it scored that way. This is the same Argument DNA breakdown every scored argument receives.
Based on 266 scored arguments across all public debates.
Scoring 70+ puts you above the majority of arguments. 85+ is exceptional.
Arguments are scored by Google Gemma 27B, an open-weights language model. We chose an open model deliberately — its architecture and training data are publicly documented, not a black box.
The model receives your argument text with a structured prompt specifying exactly what to evaluate for each dimension. It returns four independent scores (0–100) plus highlighted excerpts showing which parts of your argument earned or lost points.
We do not fine-tune the model on CalibRank data. Every argument is scored against the same rubric with the same prompt. No user’s history, tier, or identity is included in the scoring context.
Ready to test your argument?
Browse Debates