agarie/measurable

View on GitHub
lib/measurable/kullback_leibler.rb

Summary

Maintainability
A
0 mins
Test Coverage
module Measurable
  module KullbackLeibler

    # call-seq:
    #     kullback_leibler(p, q) -> Float
    #
    # The Kullback-Leibler Divergence between the distributions +p+ and +q+ is
    # a measure of their dissimilarity. However, it doesn't obey the triangular
    # inequality and isn't symmetric, thus it isn't a metric.
    #
    # It is calculated as follows:
    #
    #   KL(p, q) = \sum_{i = q}^{N} p[i] * log(p[i] / q[i])
    #
    # With distributions +p+ and +q+ represented as vectors of N elements
    # summing to 1.0.
    #
    # References:
    # - http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
    # - Christopher D. Manning and Hinrich Schütze. Foundations of Statistical
    #   Natural Language Processing.
    #
    # Arguments:
    # - +p+ -> A probability distribution represented by a n-element Array.
    # - +q+ -> A probability distribution represented by a n-element Array.
    # Returns:
    # - A measure of the difference between the probability distributions p and q.
    def kullback_leibler(p, q)
      # TODO: Change this to a more specific, custom-made exception.
      raise ArgumentError if p.size != q.size

      p.zip(q).reduce(0.0) do |acc, probs|
        acc += probs[0] * Math.log(probs[0] / probs[1])
      end
    end
  end

  extend Measurable::KullbackLeibler
end