jpmckinney/tf-idf-similarity

View on GitHub
CHANGELOG.md

Summary

Maintainability
Test Coverage
# Changelog

## v0.3.0 (2024-02-26)

### Added

- Add support for [numo](https://rubygems.org/gems/numo) matrix library. @yagince @srapilly

### Changed

- Drop support for Ruby versions less than 2.4.

### Fixed

- Fix the `term_frequency` method in the `BM25Model` class, caused by a typographical error (`documents.size` instead of `document.size`).

## v0.2.0 (2019-12-19)

### Added

- Add `tokenizer` option to `Document` class. @satoryu

  The value is an object with a `tokenize` method that accepts a string and returns an array of `Token` instances. 

  For example, to use [natto](https://rubygems.org/gems/natto) instead of [unicode_utils](https://rubygems.org/gems/unicode_utils) for Japanese, install MeCab (`brew install mecab`), and then:

  ```ruby
  require 'natto'

  class Tokenizer
    def initialize
      @nm = Natto::MeCab.new
    end

    def tokenize(text)
      @nm.enum_parse(text).map do |node|
        Token.new(node)
      end
    end
  end

  document = TfIdfSimilarity::Document.new("こんにちは世界", tokenizer: tokenizer)
  ```

- Add `to_s` method to `Token` class, to use less memory than chaining `lowercase_filter` with `classic_filter`. @satoryu

## v0.1.6 (2017-03-07)

### Changed

- Add support for recent RubyGems and Ruby versions (`require 'delegate'`). @diasks2
- Drop support for Ruby 1.9.3.

## v0.1.5 (2016-01-17)

### Changed

- Update the `classic_filter` method of the `Token` class to remove possessives when the apostrophe is a backtick (\`) or a single quotation mark (’). @diasks2
- Drop support for Ruby 1.9.2.

## v0.1.4 (2014-10-10)

### Added

- Add the `document_index` and `text_index` methods to the `Model` class and its subclasses.

### Changed

- Extract logic from the `BM25Model` and `TfIdfModel` classes to a new `Model` class.
- Drop support for Ruby 1.8.7.

## v0.1.3 (2014-04-12)

### Changed

- Load only the required methods from the `unicode_utils` gem, to use less memory.

## v0.1.2 (2014-03-30)

### Fixed

- Install the `unicode_utils` gem only on Ruby versions greater than 1.8.

## v0.1.1 (2014-03-28)

### Changed

- Remove `:function` option from `TfIdfModel` class. Use `BM25Model` class, instead.

## v0.1.0 (2013-06-02)

Major refactor of v0.0.x.