Learning to Rank

Short Summary: Various ranking system models.

Problem: given a query \(q\) and a set of documents \(D = \{d_1, \dots, d_n\}\), rank the documents based on relevance. Put it simply, ranking is a sorting problem, where the key is to find the "key" according to which to sort. The key is typically a Machine Learning (ML) model.

Unfortunately, rank information is available only after sorting, and sorting is non differentiable.

Pointwise Methods

Considers a single candidate \(x_i = (q, d_i)\)
Predicts the relevance scores for a given query \(S_i = M(x_i)\)
The predicted relevance score is then compared to the ground truth
Hence, we reformulate the problem as a regression problem

Pairwise Methods

Considers a pair of candidates \(x_i = (q, d_i)\) and \(x_j = (q, d_j)\)
Predicts whether \(S_i = M(x_i) > S_j = M(x_j)\)
The predicted value will be in the \((0, 1)\) interval and we compare that to ground truth
Unfortunately, rank information is available only after sorting, and sorting is not differentiable
Fortunately, Gradient Descent does not really need a loss function, it only needs its gradient!
Hence, we reformulate the problem as a binary classification problem
Examples:
- LambdaRank defines the gradients of an implicit loss function, so that documents with high rank have much bigger gradients
- Having gradients is also enough to build a Gradient Boosting model. This is the idea that LambdaMART uses, which is a boosted tree version of LambdaRank
- Fun fact: both use "Lambda" as you don’t need the costs, only the gradients (\(\lambda\)) of the cost with respect to the model score

Listwise Methods

Considers a list of candidates \(x_i = (q, d_i), \dots, x_n = (q, d_n)\)
Predicts the entire order of documents
Directly maximizes the evaluation metric
Hence, we reformulate the problem as a metric maximization problem
Examples:
- ListNet
- SetRank

Learning to Rank

Table of Contents

Pointwise Methods

Pairwise Methods

Listwise Methods