« Vishy's Useless Factoids of the Day: T9 | Main | Doubting Thomases for election promises »

April 29, 2006

Vishy's Useless Factoid of the Day: Ranking schemes

Could an overachieving third-grader be more devastated? I looked on in horror at my report card at the end of third grade. I had been ranked first in my class at the end of the first and second grades and frankly, I was rather getting used to it. Now, I stared at the number 3 filling up the Rank line item. As if to add insult to injury, there were two students who scored one point more than me and they were both ranked first. I went up to my teacher, puffy-eyed, and asked her why I wasn't at least ranked second, for my score that came right after my two classmates. In an explanation that felt utterly inadequate, she said, "Vishwanath, when there are two first-ranks, the next rank is third." A number of years later, I realized that my expectation to be ranked second rather than third was not entirely fanciful. I was merely expecting to be ranked in a different ranking scheme.

Consider the following: if Alice got 97 points in an exam, Bob 97, Charlie 94 and Dave 89, should Charlie be ranked 2nd or 3rd? The answer is 'it depends'. It depends on what you're trying to get out of the ranking: the top N candidates in a cohort or the top N scores. Typically, one ranks a group to confer some advantage on the top few ranks. From the test takers above, it is clear that the top 3 are Alice, Bob and Charlie. The last person in the top 3 should definitely be ranked 3rd. Thus, Charlie is ranked 3rd and rank 2 is skipped entirely.

However, what if the ranker's objective is to get the top 3 scores in the test? From the above example, the top 3 scores would correspond to test takers Alice, Bob, Charlie and Dave. In this case, it's fair to have Alice and Bob be ranked first because they got the top score, but Charlie ranked second because he got the second best score. The top N scores may not correspond to N test takers, but there are no skipped ranks.

I was pleasantly surprised when I found that this distinction is codified in the Oracle relational database system. When returning a ranked set of rows, a query can use either the RANK or the DENSE_RANK functions. In case of the former, Charlie above would get ranked 3rd, but he would be ranked 2nd if the latter function were used. I expected to be ranked in third grade via DENSE_RANK (solely on my mastery of the material) when the function being used was RANK (relative to my peers). I was never able to make up for this abysmal performance. Starting with fourth grade, I moved to a different school, which refused to rank students strictly by their total point score on a set of exams and used a GPA system instead.

Posted by Vishy at April 29, 2006 06:21 PM

Comments