The score of query q
for document d
is defined as follows:
score(q,d) = sum( tf(t in d) * idf(t) * getBoost(t.field in d) * lengthNorm(t.field in d) ) *
coord(q,d) * queryNorm(q)
tf(t in d) - ZSearchSimilarity::tf($freq)
- a score factor based on a term or phrase's frequency in a document.
idf(t) - ZSearchSimilarity::tf($term, $reader)
- a score factor for a simple term for the specified index.
getBoost(t.field in d) - boost factor for the term field.
lengthNorm($term) - the normalization value for a field given the total number of terms contained in a field. This value is stored within the index. These values, together with field boosts, are stored in an index and multiplied into scores for hits on each field by the search code.
Matches in longer fields are less precise, so implementations of this method usually return smaller values when numTokens is large, and larger values when numTokens is small.
coord(q,d) - ZSearchSimilarity::coord($overlap, $maxOverlap)
- a score
factor based on the fraction of all query terms that a document contains.
The presence of a large portion of the query terms indicates a better match with the query, so implementations of this method usually return larger values when the ratio between these parameters is large and smaller values when the ratio between them is small.
queryNorm(q) - the normalization value for a query given the sum of the squared weights of each of the query terms. This value is then multiplied into the weight of each query term.
This does not affect ranking, but rather just attempts to make scores from different queries comparable.
Scoring algorithm can be customized by defining your own Similatity class. To do this
extend ZSearchSimilarity class as defined below, then use
ZSearchSimilarity::setDefault($similarity);
method to set it as default.
class MySimilarity extends ZSearchSimilarity { public function lengthNorm($fieldName, $numTerms) { return 1.0/sqrt($numTerms); } public function queryNorm($sumOfSquaredWeights) { return 1.0/sqrt($sumOfSquaredWeights); } public function tf($freq) { return sqrt($freq); } /** * It's not used now. Computes the amount of a sloppy phrase match, * based on an edit distance. */ public function sloppyFreq($distance) { return 1.0; } public function idfFreq($docFreq, $numDocs) { return log($numDocs/(float)($docFreq+1)) + 1.0; } public function coord($overlap, $maxOverlap) { return $overlap/(float)$maxOverlap; } } $mySimilarity = new MySimilarity(); ZSearchSimilarity::setDefault($mySimilarity);