README.md
# NLP Pure
[![Gem Version](https://badge.fury.io/rb/nlp-pure.svg)](https://badge.fury.io/rb/nlp-pure) [![Code Climate](https://codeclimate.com/github/parhamr/nlp-pure/badges/gpa.svg)](https://codeclimate.com/github/parhamr/nlp-pure)
[![Build Status](https://travis-ci.org/parhamr/nlp-pure.svg?branch=master)](https://travis-ci.org/parhamr/nlp-pure)
[![Coverage Status](https://coveralls.io/repos/github/parhamr/nlp-pure/badge.svg?branch=master)](https://coveralls.io/github/parhamr/nlp-pure?branch=master)
Natural language processing algorithms implemented in pure Ruby with minimal dependencies.
NOTE: this is not affiliated with, endorsed by, or in any way connected with [Pure NLP](http://purenlp.com/), a trademark of John La Valle.
This project aims to provide functionality similar to [Treat](https://github.com/louismullie/treat), [open-nlp](https://github.com/louismullie/open-nlp), and [stanford-core-nlp](https://rubygems.org/gems/stanford-core-nlp) but with fewer dependencies. The code is tested against English language but the algorithm implementations aim to be flexible for other languages.
## Table of Contents
* [Installation](#installation)
* [Usage](#usage)
* [Word Segmentation](#word-segmentation)
* [Sentence Segmentation](#sentence-segmentation)
* [Supported Ruby Versions](#supported-ruby-versions)
* [Versioning](#versioning)
* [Contributing](CONTRIBUTING.md)
* [License](LICENSE)
* [See Also](#see-also)
## Installation
Add this line to your application’s Gemfile:
```
gem 'nlp-pure'
```
And then execute:
```
$ bundle
```
Or install it yourself as:
```
$ gem install nlp-pure
```
## Usage
Simply require a library file and start using its interfaces! To preserve modularity and a small installation footprint, classes and modules are not recursively loaded up front.
### Word Segmentation
```
$ bundle exec irb
irb(main):001:0> require 'nlp_pure/segmenting/default_word'
=> true
irb(main):002:0> NlpPure::Segmenting::DefaultWord.parse 'The quick brown fox jumps over the lazy dog.'
=> ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog."]
irb(main):003:0> NlpPure::Segmenting::DefaultWord.parse 'The New York-based company hired new staff.'
=> ["The", "New", "York", "based", "company", "hired", "new", "staff."]
irb(main):004:0> NlpPure::Segmenting::DefaultWord.parse 'The U.S.A. is a member of NATO.'
=> ["The", "U.S.A.", "is", "a", "member", "of", "NATO."]
irb(main):005:0> NlpPure::Segmenting::DefaultWord.parse "Mary had a little lamb,\nHis fleece was white as snow,\nAnd everywhere that Mary went,\nThe lamb was sure to go."
=> ["Mary", "had", "a", "little", "lamb,", "His", "fleece", "was", "white", "as", "snow,", "And", "everywhere", "that", "Mary", "went,", "The", "lamb", "was", "sure", "to", "go."]
```
### Sentence Segmentation
```
M017-PDX:nlp-pure rp0616$ bundle exec irb
irb(main):001:0> require 'nlp_pure/segmenting/default_sentence'
=> true
irb(main):002:0> NlpPure::Segmenting::DefaultSentence.parse 'The U.S.A. is a member of NATO.'
=> ["The U.S.A. is a member of NATO."]
irb(main):003:0> NlpPure::Segmenting::DefaultSentence.parse 'Mary had a little lamb. The lamb\U+FFE2s fleece was white as snow. Everywhere that Mary went, the lamb was sure to go.'
=> ["Mary had a little lamb.", "The lambs fleece was white as snow.", "Everywhere that Mary went, the lamb was sure to go."]
irb(main):004:0> NlpPure::Segmenting::DefaultSentence.parse 'I am excited! Today is Friday.'
=> ["I am excited!", "Today is Friday."]
```
## Supported Ruby Versions
This library aims to support and is [tested against](https://travis-ci.org/parhamr/nlp-pure) the following Ruby
implementations:
* Ruby 2.2
* Ruby 2.3
* Ruby 2.4
* [JRuby](http://www.jruby.org/)
* [Rubinius](http://rubini.us/)
If something doesn't work on one of these interpreters, it's a bug.
This library may inadvertently work (or seem to work) on other Ruby
implementations, however support will only be provided for the versions listed
above.
## Versioning
This library aims to adhere to [Semantic Versioning 2.0.0](http://semver.org/). Violations
of this scheme should be reported as bugs. Specifically, if a minor or patch
version is released that breaks backward compatibility, that version should be
immediately yanked and/or a new version should be immediately released that
restores compatibility. Breaking changes to the public API will only be
introduced with new major versions. As a result of this policy, you can (and
should) specify a dependency on this gem using the [Pessimistic Version
Constraint](http://docs.rubygems.org/read/chapter/16#page74) with two digits of precision. For example:
```ruby
spec.add_dependency 'nlp-pure', '~> 0.1'
```
## See Also
[Search “nlp” at ruby-toolbox.com](https://www.ruby-toolbox.com/search?q=nlp)
* APIs
* [alchemy_api](https://github.com/dbalatero/alchemy_api)
* [napi-ruby](https://github.com/Maluuba/napi-ruby)
* [poliqarpr](https://github.com/apohllo/poliqarpr)
* [wlapi](https://github.com/arbox/wlapi)
* Bindings and Toolkits
* [open-nlp](https://github.com/louismullie/open-nlp)
* [stanford-core-nlp](https://github.com/louismullie/stanford-core-nlp)
* [treat](https://github.com/louismullie/treat)
* Classification
* [linnaeus](https://github.com/djcp/linnaeus)
* [maxent_string_classifier](https://github.com/mccraigmccraig/maxent_string_classifier)
* N-Grams
* [ruby-ngram](https://github.com/tkellen/ruby-ngram)
* Specific Languages
* Polish
* [nlp](https://github.com/knife/nlp)
* Stopwords
* [clarifier](https://github.com/meducation/clarifier)
* [stopwords](https://github.com/brez/stopwords)
* [stopwords-filter](https://github.com/brenes/stopwords-filter)
* Tokenization
* [rseg](https://rubygems.org/gems/rseg)
* [Tokenizer](https://github.com/arbox/tokenizer)
* Word Counters
* [words_counted](https://github.com/abitdodgy/words_counted)