chrislit/abydos

View on GitHub
README.rst

Summary

Maintainability
Test Coverage
Abydos
======

+------------------+------------------------------------------------------+
| CI & Test Status | |travis| |circle| |azure| |semaphore| |coveralls|    |
+------------------+------------------------------------------------------+
| Code Quality     | |codeclimate| |scrutinizer| |codacy| |codefactor|    |
+------------------+------------------------------------------------------+
| Dependencies     | |requires| |snyk| |pyup| |cii| |black|               |
+------------------+------------------------------------------------------+
| Local Analysis   | |pylint| |flake8| |pydocstyle| |sloccount| |mypy|    |
+------------------+------------------------------------------------------+
| Usage            | |docs| |mybinder| |license| |sourcerank| |zenodo|    |
+------------------+------------------------------------------------------+
| Contribution     | |openhub| |gh-commits| |gh-issues| |gh-stars|        |
+------------------+------------------------------------------------------+
| PyPI             | |pypi| |pypi-dl| |pypi-ver|                          |
+------------------+------------------------------------------------------+
| conda-forge      | |conda| |conda-dl| |conda-platforms|                 |
+------------------+------------------------------------------------------+

.. |travis| image:: https://travis-ci.org/chrislit/abydos.svg?branch=master
    :target: https://travis-ci.org/chrislit/abydos
    :alt: Travis-CI Build Status

.. |circle| image:: https://circleci.com/gh/chrislit/abydos/tree/master.svg?style=shield
    :target: https://circleci.com/gh/chrislit/abydos/tree/master
    :alt: Circle-CI Build Status

.. |azure| image:: https://dev.azure.com/chrislit/abydos/_apis/build/status/chrislit.abydos?branchName=master
    :target: https://dev.azure.com/chrislit/abydos/_build/latest?definitionId=1
    :alt: Azure Pipelines Build Status

.. |semaphore| image:: https://semaphoreci.com/api/v1/chrislit/abydos/branches/master/shields_badge.svg
    :target: https://semaphoreci.com/chrislit/abydos
    :alt: Semaphore Build Status

.. |coveralls| image:: https://coveralls.io/repos/github/chrislit/abydos/badge.svg?branch=master
    :target: https://coveralls.io/github/chrislit/abydos?branch=master
    :alt: Coverage Status

.. |codeclimate| image:: https://codeclimate.com/github/chrislit/abydos/badges/gpa.svg
    :target: https://codeclimate.com/github/chrislit/abydos
    :alt: Code Climate

.. |scrutinizer| image:: https://scrutinizer-ci.com/g/chrislit/abydos/badges/quality-score.png?b=master
    :target: https://scrutinizer-ci.com/g/chrislit/abydos/?branch=master
    :alt: Scrutinizer

.. |codacy| image:: https://api.codacy.com/project/badge/Grade/db79f2c31ea142fb9b5938abe87b0854
    :target: https://www.codacy.com/app/chrislit/abydos?utm_source=github.com&utm_medium=referral&utm_content=chrislit/abydos&utm_campaign=Badge_Grade
    :alt: Codacy

.. |codefactor| image:: https://www.codefactor.io/repository/github/chrislit/abydos/badge
    :target: https://www.codefactor.io/repository/github/chrislit/abydos
    :alt: CodeFactor

.. |requires| image:: https://requires.io/github/chrislit/abydos/requirements.svg?branch=master
    :target: https://requires.io/github/chrislit/abydos/requirements/?branch=master
    :alt: Requirements Status

.. |snyk| image:: https://snyk.io/test/github/chrislit/abydos/badge.svg?targetFile=requirements.txt
    :target: https://snyk.io/test/github/chrislit/abydos?targetFile=requirements.txt
    :alt: Known Vulnerabilities

.. |pyup| image:: https://pyup.io/repos/github/chrislit/abydos/shield.svg
    :target: https://pyup.io/repos/github/chrislit/abydos/
    :alt: Updates

.. |cii| image:: https://bestpractices.coreinfrastructure.org/projects/1598/badge
    :target: https://bestpractices.coreinfrastructure.org/projects/1598
    :alt: CII Best Practices

.. |black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/ambv/black
    :alt: black

.. |pylint| image:: https://img.shields.io/badge/Pylint-9.13/10-yellowgreen.svg
    :target: #
    :alt: Pylint Score

.. |flake8| image:: https://img.shields.io/badge/flake8-0-brightgreen.svg
    :target: #
    :alt: flake8 Errors

.. |pydocstyle| image:: https://img.shields.io/badge/pydocstyle-0-brightgreen.svg
    :target: #
    :alt: pydocstyle Errors

.. |sloccount| image:: https://img.shields.io/badge/SLOCCount-40,079-blue.svg
    :target: #
    :alt: SLOCCount

.. |mypy| image:: https://img.shields.io/badge/mypy-1.87%25%20imprecise-1F5082.svg
    :target: #
    :alt: mypy Imprecision

.. |docs| image:: https://readthedocs.org/projects/abydos/badge/?version=latest
    :target: https://abydos.readthedocs.org/en/latest/
    :alt: Documentation Status

.. |mybinder| image:: https://img.shields.io/badge/launch-binder-579aca.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFkAAABZCAMAAABi1XidAAAB8lBMVEX///9XmsrmZYH1olJXmsr1olJXmsrmZYH1olJXmsr1olJXmsrmZYH1olL1olJXmsr1olJXmsrmZYH1olL1olJXmsrmZYH1olJXmsr1olL1olJXmsrmZYH1olL1olJXmsrmZYH1olL1olL0nFf1olJXmsrmZYH1olJXmsq8dZb1olJXmsrmZYH1olJXmspXmspXmsr1olL1olJXmsrmZYH1olJXmsr1olL1olJXmsrmZYH1olL1olLeaIVXmsrmZYH1olL1olL1olJXmsrmZYH1olLna31Xmsr1olJXmsr1olJXmsrmZYH1olLqoVr1olJXmsr1olJXmsrmZYH1olL1olKkfaPobXvviGabgadXmsqThKuofKHmZ4Dobnr1olJXmsr1olJXmspXmsr1olJXmsrfZ4TuhWn1olL1olJXmsqBi7X1olJXmspZmslbmMhbmsdemsVfl8ZgmsNim8Jpk8F0m7R4m7F5nLB6jbh7jbiDirOEibOGnKaMhq+PnaCVg6qWg6qegKaff6WhnpKofKGtnomxeZy3noG6dZi+n3vCcpPDcpPGn3bLb4/Mb47UbIrVa4rYoGjdaIbeaIXhoWHmZYHobXvpcHjqdHXreHLroVrsfG/uhGnuh2bwj2Hxk17yl1vzmljzm1j0nlX1olL3AJXWAAAAbXRSTlMAEBAQHx8gICAuLjAwMDw9PUBAQEpQUFBXV1hgYGBkcHBwcXl8gICAgoiIkJCQlJicnJ2goKCmqK+wsLC4usDAwMjP0NDQ1NbW3Nzg4ODi5+3v8PDw8/T09PX29vb39/f5+fr7+/z8/Pz9/v7+zczCxgAABC5JREFUeAHN1ul3k0UUBvCb1CTVpmpaitAGSLSpSuKCLWpbTKNJFGlcSMAFF63iUmRccNG6gLbuxkXU66JAUef/9LSpmXnyLr3T5AO/rzl5zj137p136BISy44fKJXuGN/d19PUfYeO67Znqtf2KH33Id1psXoFdW30sPZ1sMvs2D060AHqws4FHeJojLZqnw53cmfvg+XR8mC0OEjuxrXEkX5ydeVJLVIlV0e10PXk5k7dYeHu7Cj1j+49uKg7uLU61tGLw1lq27ugQYlclHC4bgv7VQ+TAyj5Zc/UjsPvs1sd5cWryWObtvWT2EPa4rtnWW3JkpjggEpbOsPr7F7EyNewtpBIslA7p43HCsnwooXTEc3UmPmCNn5lrqTJxy6nRmcavGZVt/3Da2pD5NHvsOHJCrdc1G2r3DITpU7yic7w/7Rxnjc0kt5GC4djiv2Sz3Fb2iEZg41/ddsFDoyuYrIkmFehz0HR2thPgQqMyQYb2OtB0WxsZ3BeG3+wpRb1vzl2UYBog8FfGhttFKjtAclnZYrRo9ryG9uG/FZQU4AEg8ZE9LjGMzTmqKXPLnlWVnIlQQTvxJf8ip7VgjZjyVPrjw1te5otM7RmP7xm+sK2Gv9I8Gi++BRbEkR9EBw8zRUcKxwp73xkaLiqQb+kGduJTNHG72zcW9LoJgqQxpP3/Tj//c3yB0tqzaml05/+orHLksVO+95kX7/7qgJvnjlrfr2Ggsyx0eoy9uPzN5SPd86aXggOsEKW2Prz7du3VID3/tzs/sSRs2w7ovVHKtjrX2pd7ZMlTxAYfBAL9jiDwfLkq55Tm7ifhMlTGPyCAs7RFRhn47JnlcB9RM5T97ASuZXIcVNuUDIndpDbdsfrqsOppeXl5Y+XVKdjFCTh+zGaVuj0d9zy05PPK3QzBamxdwtTCrzyg/2Rvf2EstUjordGwa/kx9mSJLr8mLLtCW8HHGJc2R5hS219IiF6PnTusOqcMl57gm0Z8kanKMAQg0qSyuZfn7zItsbGyO9QlnxY0eCuD1XL2ys/MsrQhltE7Ug0uFOzufJFE2PxBo/YAx8XPPdDwWN0MrDRYIZF0mSMKCNHgaIVFoBbNoLJ7tEQDKxGF0kcLQimojCZopv0OkNOyWCCg9XMVAi7ARJzQdM2QUh0gmBozjc3Skg6dSBRqDGYSUOu66Zg+I2fNZs/M3/f/Grl/XnyF1Gw3VKCez0PN5IUfFLqvgUN4C0qNqYs5YhPL+aVZYDE4IpUk57oSFnJm4FyCqqOE0jhY2SMyLFoo56zyo6becOS5UVDdj7Vih0zp+tcMhwRpBeLyqtIjlJKAIZSbI8SGSF3k0pA3mR5tHuwPFoa7N7reoq2bqCsAk1HqCu5uvI1n6JuRXI+S1Mco54YmYTwcn6Aeic+kssXi8XpXC4V3t7/ADuTNKaQJdScAAAAAElFTkSuQmCC
    :target: https://mybinder.org/v2/gh/chrislit/abydos/master?filepath=binder
    :alt: Binder

.. |license| image:: https://img.shields.io/badge/License-GPL%20v3+-blue.svg?logo=gnu
    :target: https://www.gnu.org/licenses/gpl-3.0
    :alt: License: GPL v3.0+

.. |sourcerank| image:: https://img.shields.io/librariesio/sourcerank/pypi/abydos.svg
    :target: https://libraries.io/pypi/abydos
    :alt: Libraries.io SourceRank

.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3603514.svg
    :target: https://doi.org/10.5281/zenodo.3603514
    :alt: Zenodo

.. |openhub| image:: https://www.openhub.net/p/abydosnlp/widgets/project_thin_badge.gif
    :target: https://www.openhub.net/p/abydosnlp
    :alt: OpenHUB

.. |gh-commits| image:: https://img.shields.io/github/commit-activity/y/chrislit/abydos.svg?logo=github
    :target: https://github.com/chrislit/abydos/graphs/commit-activity
    :alt: GitHub Commits

.. |gh-issues| image:: https://img.shields.io/github/issues-closed/chrislit/abydos.svg?logo=github
    :target: https://github.com/chrislit/abydos/issues?q=
    :alt: GitHub Issues Closed

.. |gh-stars| image:: https://img.shields.io/github/stars/chrislit/abydos.svg?logo=github
    :target: https://github.com/chrislit/abydos/stargazers
    :alt: GitHub Stars

.. |pypi| image:: https://img.shields.io/pypi/v/abydos.svg?logo=python&logoColor=white
    :target: https://pypi.python.org/pypi/abydos
    :alt: PyPI

.. |pypi-dl| image:: https://img.shields.io/pypi/dm/abydos.svg?logo=python&logoColor=white
    :target: https://pypi.python.org/pypi/abydos
    :alt: PyPI downloads/month

.. |pypi-ver| image:: https://img.shields.io/pypi/pyversions/abydos.svg?logo=python&logoColor=white
    :target: https://pypi.python.org/pypi/abydos
    :alt: PyPI versions

.. |conda| image:: https://img.shields.io/conda/vn/conda-forge/abydos.svg?logo=conda-forge
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge

.. |conda-dl| image::     https://img.shields.io/conda/dn/conda-forge/abydos.svg?logo=conda-forge
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge downloads

.. |conda-platforms| image:: https://img.shields.io/conda/pn/conda-forge/abydos.svg?logo=conda-forge
    :target: https://anaconda.org/conda-forge/abydos
    :alt: conda-forge platforms

|

.. image:: https://raw.githubusercontent.com/chrislit/abydos/master/abydos-small.png
    :target: https://github.com/chrislit/abydos
    :alt: abydos
    :align: right

|
| `Abydos NLP/IR library <https://github.com/chrislit/abydos>`_
| Copyright 2014-2020 by Christopher C. Little

Abydos is a library of phonetic algorithms, string distance measures & metrics,
stemmers, and string fingerprinters including:

- Phonetic algorithms
    - Robert C. Russell's Index
    - American Soundex
    - Refined Soundex
    - Daitch-Mokotoff Soundex
    - Kölner Phonetik
    - NYSIIS
    - Match Rating Algorithm
    - Metaphone
    - Double Metaphone
    - Caverphone
    - Alpha Search Inquiry System
    - Fuzzy Soundex
    - Phonex
    - Phonem
    - Phonix
    - SfinxBis
    - phonet
    - Standardized Phonetic Frequency Code
    - Statistics Canada
    - Lein
    - Roger Root
    - Oxford Name Compression Algorithm (ONCA)
    - Eudex phonetic hash
    - Haase Phonetik
    - Reth-Schek Phonetik
    - FONEM
    - Parmar-Kumbharana
    - Davidson's Consonant Code
    - SoundD
    - PSHP Soundex/Viewex Coding
    - an early version of Henry Code
    - Norphone
    - Dolby Code
    - Phonetic Spanish
    - Spanish Metaphone
    - MetaSoundex
    - SoundexBR
    - NRL English-to-phoneme
    - Beider-Morse Phonetic Matching

- String distance metrics
    - Levenshtein distance
    - Optimal String Alignment distance
    - Levenshtein-Damerau distance
    - Hamming distance
    - Tversky index
    - Sørensen–Dice coefficient & distance
    - Jaccard similarity coefficient & distance
    - overlap similarity & distance
    - Tanimoto coefficient & distance
    - Minkowski distance & similarity
    - Manhattan distance & similarity
    - Euclidean distance & similarity
    - Chebyshev distance
    - cosine similarity & distance
    - Jaro distance
    - Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
    - Longest common substring
    - Ratcliff-Obershelp similarity & distance
    - Match Rating Algorithm similarity
    - Normalized Compression Distance (NCD) & similarity
    - Monge-Elkan similarity & distance
    - Matrix similarity
    - Needleman-Wunsch score
    - Smith-Waterman score
    - Gotoh score
    - Length similarity
    - Prefix, Suffix, and Identity similarity & distance
    - Modified Language-Independent Product Name Search (MLIPNS) similarity &
      distance
    - Bag distance
    - Editex distance
    - Eudex distances
    - Sift4 distance
    - Baystat distance & similarity
    - Typo distance
    - Indel distance
    - Synoname

- Stemmers
    - the Lovins stemmer
    - the Porter and Porter2 (Snowball English) stemmers
    - Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
    - CLEF German, German plus, and Swedish stemmers
    - Caumann's German stemmer
    - UEA-Lite Stemmer
    - Paice-Husk Stemmer
    - Schinke Latin stemmer
    - S stemmer

- String Fingerprints
    - string fingerprint
    - q-gram fingerprint
    - phonetic fingerprint
    - Pollock & Zomora's skeleton key
    - Pollock & Zomora's omission key
    - Cisłak & Grabowski's occurrence fingerprint
    - Cisłak & Grabowski's occurrence halved fingerprint
    - Cisłak & Grabowski's count fingerprint
    - Cisłak & Grabowski's position fingerprint
    - Synoname Toolcode


-----

Installation
============

Required libraries:

- NumPy
- deprecation

Optional libraries (all available on PyPI, some available on conda or
conda-forge):

- `SyllabiPy <http://syllabipy.com/>`_
- `NLTK <https://www.nltk.org/>`_
- `PyLZSS <https://github.com/rumbah/pylzss>`_
- `paq <https://github.com/observerss/paq>`_


To install Abydos (master) from Github source::

   git clone https://github.com/chrislit/abydos.git --recursive
   cd abydos
   python setup install

If your default python command calls Python 2.7 but you want to install for
Python 3, you may instead need to call::

   python3 setup install


To install Abydos (latest release) from PyPI using pip::

   pip install abydos

To install from `conda-forge <https://anaconda.org/conda-forge/abydos>`_::

   conda install abydos

It should run on Python 3.5-3.8.

Testing & Contributing
======================

To run the whole test-suite just call tox::

    tox

The tox setup has the following environments: black, py37, doctest,
regression, fuzz, pylint, pydocstyle, flake8, doc8, docs, sloccount, badges, &
build. So if you only want to generate documentation (in HTML, EPUB, & PDF
formats), just call::

    tox -e docs

In order to only run & generate Flake8 reports, call::

    tox -e flake8

Contributions such as bug reports, PRs, suggestions, desired new features, etc.
are welcome through Github
`Issues <https://github.com/chrislit/abydos/issues>`_ &
`Pull requests <https://github.com/chrislit/abydos/pulls>`_.