Sachin Shah

February 6, 2021

Over the course of the 2019-2020 season, a pattern of negative side bias was statistically observed. More recently, online debate tournaments were found to have a negative side bias [1]. The relevant question now becomes does the 2021 January-February topic substantively compensate for the structural side bias?

**2021 January-February Data Set**

Affirmative and negative ballots were gathered from the 18 Tournament of Champions bid-distributing tournaments on the 2021 January-February topic with results posted on tabroom.com as of writing this article [2]. These qualifier tournaments range from octo-final to final bid level. This data set has a sample size of 4,579 rounds representing fairly diverse debating and judging styles.

**Standard Analysis**

When all posted ballots on the January-February topic are analyzed, the negative won 52.63% of rounds. Now the question is whether the difference between the actual win rate (52.63%) and what would be expected (50%) is statistically significant, or due to chance. In order to calculate a p-value to determine the answer, a one-proportion z-test was used. The null hypothesis was set to p = 0.5 (where p is the proportion of negative wins) since it is expected, barring any bias, that the affirmative and the negative would win an equal number of times. The alternative hypothesis was p ≠ .5. The z-test rejects the null hypothesis (p-value < 0.0005, 99% confidence interval [50.73%, 54.53%]). This implies there is less than a 0.05% chance that the proportion of negative wins observed could occur if the rounds are unbiased and thus suggests a negative side bias exists.

Participants in elimination rounds tend to be stronger debaters at tournaments as they have won the majority of prelims. Isolating just elimination round data can provide insight into the strength of the bias. If there is a structural skew in favor of the negative, we would expect the side bias to be larger in elimination rounds. In fact, of the 460 elimination rounds, the negative won a significant 55.43% of rounds (p-value < 0.02, 95% confidence interval [50.89%, 59.98%]). This demonstrates that this topic has not compensated for the side bias.

**Adjusting for Skill Differentials**

As outlined in the 2019 January-February side bias article [3], we can further characterize the side bias by taking into account the difference in the skill of each debater. The analysis above assumes that each debater has an equal chance of winning; the following analysis develops a more robust model that estimates the probability that each debater wins based on their respective skill level; rounds in which the affirmative debater is stronger are more likely to result in affirmative than negative wins. For a more robust account of debater skill differences, this study implemented an Elo rating system. For the purposes of calculating Elo ratings for every debater, rounds were gathered from 47 TOC bid-distributing tournaments from 2020 - present with round results posted on tabroom.com. The rest of the analysis uses the 2021 January-February data set.

Using this Elo ratings, a variety of metrics can now be used to quantify the side bias. The most straightforward method is to use a technique called logistic regression. In this analysis, a function of the form:

*f(x) = 1/(1+e^(-a(x-b)))*

is found such that *f(x)* is approximately the probability that the affirmative will win given that the difference between the affirmative and negative debaters’ Elo is *x*.

The parameters of this function were found so that the function best fit the data set described above. It was determined that the best parameters were *a* = 0.0094 and *b* = 13.67 (p-value < 10^-10). The fact that the “offset” parameter b is 13.67 means that when the negative is 13.67 points worse than the affirmative, the round is an even matchup – i.e., the probability either debater wins is 50%. This offset means there is a negative side bias because they are more likely to win even when the affirmative is the better debater. In fact, the debater favored to win in 8% of 2-1 decisions for the negative was changed as a result of the side bias. That is to say, before adjusting for the side bias (*b* = 0), *f(x) *was greater than 0.5, meaning the affirmative was predicted to win. After adjusting for the bias (*b *= 13.67), *f(x)* was less than 0.5 meaning the affirmative was now predicted to lose. This clearly demonstrates a bias because the better debater in terms of Elo ratings might not be favored as expected depending on the side they are on. Another way to quantify the side bias is to examine only rounds where debater with the lower Elo rating won, indicating an upset occurred. Theoretically the upsets should be equally distributed between upset affirmative wins and upset negative wins. In the 1,510 upset rounds across tournaments in the data set, the negative won a statistically significant 53.31% of those rounds (p-value < 0.01). This percentage demonstrates that the negative is able to overcome the disparity produced when the affirmative is slated to win more often than the affirmative is able to overcome the disparity produced when the negative is slated to win. Thus, negating is easier because they can overcome debater skill level disparity more often, meaning side bias indeed exists regardless of this important variable.

**Relevancy**

This analysis is statistically rigorous and relevant in several aspects: (A) The data is on the current 2021 January-February topic, meaning it’s relevant to rounds these months [4]. (B) The data represents a diverse set of debating and judging styles across the country. (C) This analysis accounts for disparities in debating skill level. (D) Multiple tests validate the results.

**Side Bias Trends**

It is also interesting to look at the trend over multiple topics. Of the 243 bid distributing tournaments from August 2015 to present, the negative won 52.30% of rounds (p-value < 10^-34, 99% confidence interval [51.82%, 52.78%]). Of elimination rounds, the negative won 55.85% of rounds (p-value < 10^-18, 99% confidence interval [54.16%, 57.54%]). Additionally, after fitting logistical regression to the entire dataset, the offset was found to be 12.57. That translates to 9% of rounds for the negative where the debater predicted to win changed as a result of the bias. This continues to suggest the negative side bias might be structural and not topic specific as this analysis now includes 18 topics. Although debaters commonly use theoretical arguments that negating is harder in rounds i.e., judge psychology, affirmatives speak first and last, etc., these arguments are superseded by the empirical evidence. Even if these arguments correctly point out an advantage for the affirmative, the data shows that after accounting for all advantages and disadvantages (for both sides), negating is still easier.

Given a structural advantage for the negative, the affirmative may be justified in being granted a substantive advantage to compensate for the structural skew. This could take various forms such as granting the affirmative presumption ground, tiny plans, or framework choice. Whatever form chosen should be tested to ensure the skew is not unintentionally reversed.

Therefore, this analysis confirms that affirming is in fact harder again on the 2021 January-February topic [5]. So, once again, don’t lose the flip!

-----

[2] Arizona State University, Blake, College Prep, Columbia, Durham Academy, Emory, Golden Desert, Harvard-Westlake, Lewis Clark, Lexington, Myers Park, Newark, North Allegheny, Peninsula, Strake Jesuit, Sunvitational, University of Houston, Winston Churchill

[4] It is important to note that numbers presented in this article that use the 2021 January-February data set should only be used within the context of the 2021 January-February topic; debaters who attempt to extrapolate this data to future topics would be misrepresenting the intent of this article.

[5] The data set and analyses that utilizes 2015-2020 tournaments could be extrapolated to future topics as it suggests a trend. If the activity structurally changed, then the data should not be extrapolated. For example, if the 1AR got an extra minute, then this data would not be indicative of those rounds because the underlying nature of a round, and therefore the structure of debate, changed.