john-kurkowski/tldextract

View on GitHub
CHANGELOG.md

Summary

Maintainability
Test Coverage
# tldextract Changelog

After upgrading, update your cache file by deleting it or via `tldextract
--update`.

## 5.1.2 (2024-03-18)

* Bugfixes
  * Remove `socket.inet_pton`, to fix platform-dependent IP parsing ([#318](https://github.com/john-kurkowski/tldextract/issues/318))
  * Use non-capturing groups for IPv4 address detection, for a slight speed boost ([#323](https://github.com/john-kurkowski/tldextract/issues/323))
* Misc.
  * Add CI for PyPy3.9 and PyPy3.10 ([#316](https://github.com/john-kurkowski/tldextract/issues/316))
  * Add script to automate package release process ([#325](https://github.com/john-kurkowski/tldextract/issues/325))
  * Update LICENSE copyright years

## 5.1.1 (2023-11-16)

* Bugfixes
  * Fix path join on Windows ([#314](https://github.com/john-kurkowski/tldextract/issues/314))
  * Support Python 3.12

## 5.1.0 (2023-11-05)

* Features
    * Allow passing in `requests.Session` ([#311](https://github.com/john-kurkowski/tldextract/issues/311))
    * Add "-j, --json" option to support output in json format ([#313](https://github.com/john-kurkowski/tldextract/issues/313))
* Docs
    * Improve clarity of absolute path ([#312](https://github.com/john-kurkowski/tldextract/issues/312))
* Misc.
    * Extract all testing deps from tox.ini to pyproject.toml extras ([#310](https://github.com/john-kurkowski/tldextract/issues/310))
    * Work around responses type union error, in tests

## 5.0.1 (2023-10-17)

* Bugfixes
    * Indicate MD5 not used in a security context (FIPS compliance) ([#309](https://github.com/john-kurkowski/tldextract/issues/309))
* Misc.
    * Increase typecheck aggression

## 5.0.0 (2023-10-11)

* Breaking Changes
    * Migrate `ExtractResult` from `namedtuple` to `dataclass` ([#306](https://github.com/john-kurkowski/tldextract/issues/306))
        * This means no more iterating/indexing/slicing/unpacking the result
          object returned by this library. It is no longer a tuple. You must
          directly reference the fields you're interested in.

          For example, the
          following will no longer work.
          ```python
          tldextract.extract("example.com")[1:3]
          # TypeError: 'ExtractResult' object is not subscriptable
          ```
          Instead, use the following.
          ```python
          ext = tldextract.extract("example.com")
          (ext.domain, ext.suffix)
          ```
* Bugfixes
    * Drop support for EOL Python 3.7
* Misc.
    * Switch from pycodestyle and Pylint to Ruff ([#304](https://github.com/john-kurkowski/tldextract/issues/304))
    * Consolidate config files
    * Type tests
    * Require docstrings in tests
    * Remove obsolete tests

## 4.0.0 (2023-10-11)

* **Breaking** bugfixes
    * Always include suffix if private suffix enabled and private suffix exists ([#300](https://github.com/john-kurkowski/tldextract/issues/300))
        * Add a 4th field `is_private: bool`, to the `ExtractResult`
          `namedtuple`, indicating whether the extraction came from the PSL's
          private domains or not.
        * **This could cause issues when iterating over the tuple and assuming
          only 3 fields.**
        * Previously, the docs promoted iteration to rejoin parts of the tuple.
          This is better achieved by individual access of fields of interest
          (e.g. `ExtractResult.subdomain`) or convenience properties (e.g.
          `ExtractResult.{fqdn,registered_domain}`).

This is the same content as version 3.6.0, originally released 2023-09-19,
which was yanked.

## 3.5.0 (2023-09-06)

* Features
    * Support IPv6 addresses ([#298](https://github.com/john-kurkowski/tldextract/issues/298))
* Bugfixes
    * Accept only 4 decimal octet IPv4 addresses ([#292](https://github.com/john-kurkowski/tldextract/issues/292))
    * Support IPv4 addresses with unicode dots ([#292](https://github.com/john-kurkowski/tldextract/issues/292))
    * Reject IPv4 addresses with trailing whitespaces + non-whitespaces ([#293](https://github.com/john-kurkowski/tldextract/issues/293))
* Misc.
    * Migrate setup.py to pyproject.toml ([#299](https://github.com/john-kurkowski/tldextract/issues/299))

## 3.4.4 (2023-05-19)

* Bugfixes
  * Honor private domains flag on `self`, not only when passed to `__call__` ([#289](https://github.com/john-kurkowski/tldextract/issues/289))

## 3.4.3 (2023-05-18)

* Bugfixes
  * Speed up 10-15% over all inputs
    * Refactor `suffix_index()` to use a trie ([#285](https://github.com/john-kurkowski/tldextract/issues/285))
* Docs
  * Adopt PEP257 doc style

## 3.4.2 (2023-05-16)

* Bugfixes
  * Speed up 10-40% on "average" inputs, and even more on pathological inputs, like long subdomains
    * Optimize `suffix_index()`: search from right to left ([#283](https://github.com/john-kurkowski/tldextract/issues/283))
    * Optimize netloc extraction: switch from regex to if/else ([#284](https://github.com/john-kurkowski/tldextract/issues/284))

## 3.4.1 (2023-04-26)

* Bugfixes
  * Fix Pyright not finding tldextract public interface ([#279](https://github.com/john-kurkowski/tldextract/issues/279))
  * Fix various Pyright checks
  * Use SPDX license identifier ([#280](https://github.com/john-kurkowski/tldextract/issues/280))
  * Support Python 3.11
* Docs
  * Add FAQ about private domains
* Misc.
  * Update bundled snapshot
  * Fix lint in newer pylint

## 3.4.0 (2022-10-04)

* Features
  * Add method `extract_urllib` to extract from a `urllib.parse.{ParseResult,SplitResult}` ([#274](https://github.com/john-kurkowski/tldextract/issues/274))
* Bugfixes
  * Fix internal type-var error, in newer versions of mypy ([#275](https://github.com/john-kurkowski/tldextract/issues/275))

## 3.3.1 (2022-07-08)

* Bugfixes
  * Fix documented types, in README and in exception message ([#265](https://github.com/john-kurkowski/tldextract/issues/265))
* Misc.
  * Format source code

## 3.3.0 (2022-05-04)

* Features
  * Add CLI flag `--suffix_list_url` to set the suffix list URL(s) or source file(s) ([#197](https://github.com/john-kurkowski/tldextract/issues/197))
  * Add CLI flag `--no_fallback_to_snapshot` to not fall back to the snapshot ([#260](https://github.com/john-kurkowski/tldextract/issues/260))
  * Add alias `--include_psl_private_domains` for CLI flag `--private_domains`
* Bugfixes
  * Handle more internationalized domain name dots ([#253](https://github.com/john-kurkowski/tldextract/issues/253))
* Misc.
  * Update bundled snapshot
  * Add basic CLI test coverage

## 3.2.1 (2022-04-11)

* Bugfixes
  * Fix incorrect namespace used for caching function returns ([#258](https://github.com/john-kurkowski/tldextract/issues/258))
  * Remove redundant encode ([`6e2c0e0`](https://github.com/john-kurkowski/tldextract/commit/6e2c0e0))
  * Remove redundant lowercase ([`226bfc2`](https://github.com/john-kurkowski/tldextract/commit/226bfc2))
  * Remove unused `try`/`except` path ([#255](https://github.com/john-kurkowski/tldextract/issues/255))
  * Add types to the private API (disallow untyped calls and defs) ([#256](https://github.com/john-kurkowski/tldextract/issues/256))
  * Rely on `python_requires` instead of runtime check ([#247](https://github.com/john-kurkowski/tldextract/issues/247))
* Docs
  * Fix docs with updated types
  * Fix link in Travis CI badge ([#248](https://github.com/john-kurkowski/tldextract/issues/248))
  * Rewrite documentation intro
  * Remove unnecessary subheading
  * Unify case

## 3.2.0 (2022-02-20)

* Features
    * Add types to the public API ([#244](https://github.com/john-kurkowski/tldextract/issues/244))
* Bugfixes
    * Add support for Python 3.10 ([#246](https://github.com/john-kurkowski/tldextract/issues/246))
    * Drop support for EOL Python 3.6 ([#246](https://github.com/john-kurkowski/tldextract/issues/246))
    * Remove py2 tag from wheel ([#245](https://github.com/john-kurkowski/tldextract/issues/245))
    * Remove extra backtick in README ([#240](https://github.com/john-kurkowski/tldextract/issues/240))

## 3.1.2 (2021-09-01)

* Misc.
    * Only run pylint in Tox environments, i.e. CI, not by default in tests ([#230](https://github.com/john-kurkowski/tldextract/issues/230))

## 3.1.1 (2021-08-27)

* Bugfixes
    * Support Python 3.9
    * Drop support for EOL Python 3.5

## 3.1.0 (2020-11-22)

* Features
    * Prefer to cache in XDG cache directory in user folder, vs. in Python install folder ([#213](https://github.com/john-kurkowski/tldextract/issues/213))
* Bugfixes
    * Fix `AttributeError` on `--update` ([#215](https://github.com/john-kurkowski/tldextract/issues/215))

## 3.0.2 (2020-10-24)

* Bugfixes
    * Catch permission error when making cache dir, as well as cache file ([#211](https://github.com/john-kurkowski/tldextract/issues/211))

## 3.0.1 (2020-10-21)

* Bugfixes
    * Fix `tlds` property `AttributeError` ([#210](https://github.com/john-kurkowski/tldextract/issues/210))
    * Allow `include_psl_private_domains` in global `extract` too ([#210](https://github.com/john-kurkowski/tldextract/issues/210))

## 3.0.0 (2020-10-20)

No changes since 3.0.0.rc1.

## 3.0.0.rc1 (2020-10-12)

This release fixes the long standing bug that public and private suffixes were
generated separately and could not be switched at runtime,
[#66](https://github.com/john-kurkowski/tldextract/issues/66).

* Breaking Changes
    * Rename `cache_file` to `cache_dir` as it is no longer a single file but a directory ([#207](https://github.com/john-kurkowski/tldextract/issues/207))
    * Rename CLI arg also, from `--cache_file` to `--cache_dir`
    * Remove Python 2.7 support
* Features
    * Can pass `include_psl_private_domains` on call, not only on construction
    * Use filelocking to support multi-processing and multithreading environments
* Bugfixes
    * Select public or private suffixes at runtime ([#66](https://github.com/john-kurkowski/tldextract/issues/66))
* Removals
    * Do not `debug` log the diff during update

## 2.2.3 (2020-08-05)

* Bugfixes
    * Fix concurrent access to cache file when using tldextract in multiple threads ([#146](https://github.com/john-kurkowski/tldextract/pull/146))
    * Relocate version number, to avoid costly imports ([#187](https://github.com/john-kurkowski/tldextract/pull/187))
    * Catch `IndexError` caused by upstream punycode bug ([#200](https://github.com/john-kurkowski/tldextract/pull/200))
    * Drop support for EOL Python 3.4 ([#186](https://github.com/john-kurkowski/tldextract/pull/186))
    * Explain warning better

## 2.2.2 (2019-10-15)

* Bugfixes
    * Catch file not found
    * Use pkgutil instead of pkg_resources ([#163](https://github.com/john-kurkowski/tldextract/pull/163))
    * Performance: avoid recomputes, a regex, and a partition
* Misc.
    * Update LICENSE from GitHub template
    * Fix warning about literal comparison
    * Modernize testing ([#177](https://github.com/john-kurkowski/tldextract/issues/177))
        * Use the latest pylint that works in Python 2
        * Appease pylint with the new rules
        * Support Python 3.8-dev

## 2.2.1 (2019-03-05)

* Bugfixes
    * Ignore case on punycode prefix check ([#133](https://github.com/john-kurkowski/tldextract/issues/133))
    * Drop support for EOL Python 2.6 ([#152](https://github.com/john-kurkowski/tldextract/issues/152))
    * Improve sundry doc and README bits

## 2.2.0 (2017-10-26)

* Features
    * Add `cache_fetch_timeout` kwarg and `TLDEXTRACT_CACHE_TIMEOUT` env var ([#139](https://github.com/john-kurkowski/tldextract/issues/139))
* Bugfixes
    * Work around `pkg_resources` missing, again ([#137](https://github.com/john-kurkowski/tldextract/issues/137))
    * Always close sessions ([#140](https://github.com/john-kurkowski/tldextract/issues/140))

## 2.1.0 (2017-05-24)

* Features
    * Add `fqdn` convenience property ([#129](https://github.com/john-kurkowski/tldextract/issues/129))
    * Add `ipv4` convenience property ([#126](https://github.com/john-kurkowski/tldextract/issues/126))

## 2.0.3 (2017-05-20)

* Bugfixes
    * Switch to explicit Python version check ([#124](https://github.com/john-kurkowski/tldextract/issues/124))
* Misc.
    * Document public vs. private domains
    * Document support for Python 3.6

## 2.0.2 (2016-10-16)

* Misc.
    * Release as a universal wheel ([#110](https://github.com/john-kurkowski/tldextract/issues/110))
    * Consolidate test suite running with tox ([#104](https://github.com/john-kurkowski/tldextract/issues/104))

## 2.0.1 (2016-04-25)

* Bugfixes
    * Relax required `requests` version: >= 2.1 ([#98](https://github.com/john-kurkowski/tldextract/issues/98))
* Misc.
    * Include tests in release source tarball ([#97](https://github.com/john-kurkowski/tldextract/issues/97))

## 2.0.0 (2016-04-21)

No changes since 2.0rc1.

## 2.0rc1 (2016-04-04)

This release focuses on shedding confusing code branches & deprecated cruft.

* Breaking Changes
    * Renamed/changed the type of `TLDExtract` constructor param
      `suffix_list_url`
        * It used to take a `str` or iterable. Its replacement,
          `suffix_list_urls` only takes an iterable. This better communicates
          that it tries a _sequence_ of URLs, in order. To only try 1 URL, pass
          an iterable with exactly 1 URL `str`.
    * Serialize the local cache of the remote PSL as JSON (no more `pickle`) - [#81](https://github.com/john-kurkowski/tldextract/issues/81)
        * This should be a transparent upgrade for most users.
        * However, if you're configured to _only_ read from your local cache
          file, no other sources or fallbacks, the new version will be unable
          to read the old cache format, and an error will be raised.
    * Remove deprecated code
        * `TLDExtract`'s `fetch` param. To disable live HTTP requests for the
          latest PSL, instead pass `suffix_list_urls=None`.
        * `ExtractResult.tld` property. Use `ExtractResult.suffix` instead.
    * Moved code
        * Split `tldextract.tldextract` into a few files.
            * The official public interface of this package comes via `import
              tldextract`. But if you were relying on direct import from
              `tldextract.tldextract` anyway, those imports may have moved.
            * You can run the package `python -m tldextract` for the same
              effect as the included `tldextract` console script. This used to
              be `python -m tldextract.tldextract`.
* Misc.
    * Use `requests` instead of `urllib` - [#89](https://github.com/john-kurkowski/tldextract/issues/89)
        * As a side-effect, this fixes [#93](https://github.com/john-kurkowski/tldextract/pull/93).

## 1.7.5 (2016-02-07)

* Bugfixes
    * Support possible gzipped PSL response - [#88](https://github.com/john-kurkowski/tldextract/pull/88)

## 1.7.4 (2015-12-26)

* Bugfixes
    * Fix potential for `UnicodeEncodeError` with info log - [#85](https://github.com/john-kurkowski/tldextract/pull/85)

## 1.7.3 (2015-12-12)

* Bugfixes
    * Support IDNA2008 - [#82](https://github.com/john-kurkowski/tldextract/pull/82)
* Misc.
    * Ease running scripts during local development

## 1.7.2 (2015-11-28)

* Bugfixes
    * Domain parsing fails with trailing spaces - [#75](https://github.com/john-kurkowski/tldextract/pull/75)
    * Update to latest, direct PSL links - [#77](https://github.com/john-kurkowski/tldextract/pull/77)
* Misc.
    * Update bundled PSL snapshot
    * Require requirements.txt for local development
    * Enforce linting via the test suite - [#79](https://github.com/john-kurkowski/tldextract/pull/79)
    * Switch to py.test runner - [#80](https://github.com/john-kurkowski/tldextract/pull/80)
    * No longer distribute tests. No mention of `test_suite` in setup.py. CI is
      handled centrally now, on this project's GitHub.

## 1.7.1 (2015-08-22)

Fix publishing mistake with 1.7.0.

## 1.7.0 (2015-08-22)

* Features
    * Can include PSL's private domains on CLI with `--private_domains` boolean flag
* Bugfixes
    * Improved support for multiple Punycode (or Punycode-looking) parts of a URL
        * Mixed in/valid
        * Mixed encodings
    * Fix `ExtractResult._asdict` on Python 3.4. This should also save space,
      as `__dict__` is not created for each `ExtractResult` instance.

## 1.6 (2015-03-22)

* Features
    * Pass `extra_suffixes` directly to constructor
* Bugfixes
    * Punycode URLs were returned decoded, rather than left alone
    * Things that look like Punycode to tldextract, but aren't, shouldn't raise
    * Print unified diff to debug log, rather than inconsistent stderr

## 1.5.1 (2014-10-13)

* Bugfixes
    * Missing setuptools dependency
    * Avoid u'' literal for Python 3.0 - 3.2 compatibility. Tests will still fail though.

## 1.5 (2014-09-08)

* Bugfixes
    * Exclude PSL's private domains by default - [#19](https://github.com/john-kurkowski/tldextract/pull/19)
        * This is a **BREAKING** bugfix if you relied on the PSL's private
          domains
        * Revert to old behavior by setting `include_psl_private_domains=True`
    * `UnicodeError` for inputs that looked like an IP

## 1.4 (2014-06-01)

* Features
    * Support punycode inputs
* Bugfixes
    * Fix minor Python 3 unicode errors

## 1.3.1 (2013-12-16)

* Bugfixes
    * Match PSL's GitHub mirror rename, from mozilla-central to gecko-dev
    * Try Mozilla's PSL SPOT first, then the mirror

## 1.3 (2013-12-08)

* Features
    * Specify your own PSL url/file with `suffix_list_url` kwarg
    * `fallback_to_snapshot` kwarg - defaults to True
* Deprecations
    * `fetch` kwarg

## 1.2 (2013-07-07)

* Features
    * Better CLI
    * Cache env var support
    * Python 3.3 support
    * New aliases `suffix` and `registered_domain`
* Bugfixes
    * Fix dns root label

## 1.1 (2012-03-22)

* Bugfixes
    * Reliable logger name
    * Forgotten `import sys`