jaimeiniesta/metainspector

View on GitHub
CHANGELOG.md

Summary

Maintainability
Test Coverage
# MetaInspector Changelog

## [Changes in 5.15.0](https://github.com/jaimeiniesta/metainspector/compare/v5.14.0...v5.15.0)

* Added mechanism to use all available options in the `FollowRedirects` Faraday middleware,
https://github.com/jaimeiniesta/metainspector/pull/355 thanks to @bruno-b-martins and @miguelrod

## [Changes in 5.14.0](https://github.com/jaimeiniesta/metainspector/compare/v5.13.0...v5.14.0)

* Several dependency updates, including Addressable 2.8.1 which fixes invalid_byte_sequence exception.

## [Changes in 5.13.0](https://github.com/jaimeiniesta/metainspector/compare/v5.12.1...v5.13.0)

* Remove support for #feed that was deprecated in 5.9
* Add support for Ruby 3.1

## [Changes in 5.12.1](https://github.com/jaimeiniesta/metainspector/compare/v5.12.0...v5.12.1)

* Update dependencies: rubocop, nokogiri

## [Changes in 5.12.0](https://github.com/jaimeiniesta/metainspector/compare/v5.11.2...v5.12.0)

* Support Ruby 3.0

## [Changes in 5.11.2](https://github.com/jaimeiniesta/metainspector/compare/v5.11.1...v5.11.2)

* Relax dependencies to allow minor releases.

## [Changes in 5.11.0](https://github.com/jaimeiniesta/metainspector/compare/v5.11.0...v5.11.1)

* Upgrade to Nokogiri 1.11.0.

## [Changes in 5.11.0](https://github.com/jaimeiniesta/metainspector/compare/v5.10.1...v5.11.0)

* Upgrade to Faraday 1.1.

## [Changes in 5.10.1](https://github.com/jaimeiniesta/metainspector/compare/v5.10.0...v5.10.1)

* Fix for empty base_href. Makes relative links work when base_href is nil but empty ("").
* Drop support for Ruby 2.4, add support for Ruby 2.7.

## [Changes in 5.10](https://github.com/jaimeiniesta/metainspector/compare/v5.9.0...v5.10.0)

* Upgrade to Faraday 1.0.

## [Changes in 5.9](https://github.com/jaimeiniesta/metainspector/compare/v5.8.0...v5.9.0)

* Added #feeds method to retrieve all feeds of a page.
* Adds deprecation warning on #feed method.

## [Changes in 5.8](https://github.com/jaimeiniesta/metainspector/compare/v5.7.0...v5.8.0)

* Added h1..h6 support.

## [Changes in 5.7](https://github.com/jaimeiniesta/metainspector/compare/v5.6.0...v5.7.0)

* Avoids normalizing image URLs. https://github.com/jaimeiniesta/metainspector/pull/241
* Adds `NonHtmlErrorException` instead of `ParserError` https://github.com/jaimeiniesta/metainspector/pull/248

## [Changes in 5.6](https://github.com/jaimeiniesta/metainspector/compare/v5.5.0...v5.6.0)

* New feature: `:encoding` option for force encoding of a parsed document.
* Improvement: make `best_title` and `best_author` work by order of preference, rather than length.

## [Changes in 5.5](https://github.com/jaimeiniesta/metainspector/compare/v5.4.0...v5.5.0)

* New feature: adds `author`, `best_author`.
* Bugfix: adds presence validation for empty string on meta tag image values.
* Improves spider and links checker examples.
* Uses WebMock instead of FakeWeb in tests.

## [Changes in 5.4](https://github.com/jaimeiniesta/metainspector/compare/v5.3.0...v5.4.0)

* Supports Gzipped responses.
* Adds method `best_description` and makes `description` return just the meta description.
* Removes support for Ruby 2.0.0 and adds support for 2.4.0.

## [Changes in 5.3](https://github.com/jaimeiniesta/metainspector/compare/v5.2.0...v5.3.0)

* Returns secondary description if meta description is empty.
* Adds a custom timeout on top of the ones for Faraday, and sets defaults for timeouts.
* Eliminates possible NULL char in HTML which breaks nokogiri.

## [Changes in 5.2](https://github.com/jaimeiniesta/metainspector/compare/v5.1.0...v5.2.0)

* Removes the deprecated `html_content_only` option, and replaces it by `allow_non_html_content`, by default `false`.

## [Changes in 5.1](https://github.com/jaimeiniesta/metainspector/compare/v5.0.0...v5.1.0)

* Deprecates the `html_content_only` option, and turns it on by default.

## [Changes in 5.0](https://github.com/jaimeiniesta/metainspector/compare/v4.7.1...v5.0.0)

* Removes the ExceptionLog, all exceptions are now encapsulated in our own exception classes and
always raised.

## [Changes in 4.7](https://github.com/jaimeiniesta/metainspector/compare/v4.6.0...v4.7.1)

* MetaInspector can be configured to use [Faraday::HttpCache](https://github.com/plataformatec/faraday-http-cache) to cache page responses. For that you should pass the `faraday_http_cache` option with at least the `:store` key, for example:

```ruby
cache = ActiveSupport::Cache.lookup_store(:file_store, '/tmp/cache')
page = MetaInspector.new('http://example.com', faraday_http_cache: { store: cache })
```

* Bugfixes:
  * Parsing of the document is done as soon as it is initialized (just like we do with the request), so
that parsing errors will be catched earlier.
  * Rescues from Faraday::SSLError.

## [Changes in 4.6](https://github.com/jaimeiniesta/metainspector/compare/v4.5.0...v4.6.0)

* Faraday can be passed options via `:faraday_options`. This is useful in cases where we need to
customize the way we request the page, like for example disabling SSL verification, like this:

```ruby
MetaInspector.new('https://example.com')
# Faraday::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed

MetaInpector.new('https://example.com', faraday_options: { ssl: { verify: false } })
# Now we can access the page
```

## [Changes in 4.5](https://github.com/jaimeiniesta/metainspector/compare/v4.4.0...v4.5.0)

* The Document API now includes access to head/link elements
    * `page.head_links` returns an array of hashes of all head/links.
    * `page.stylesheets` returns head/links where rel='stylesheet'
    * `page.canonicals` returns head/links where rel='canonical'

* The URL API can remove common tracking parameters from the querystring
    * `url.tracked?` will tell you if the url contains known tracking parameters
    * `url.untracked_url` will return the url with known tracking parameters removed
    * `url.untrack!` will remove the tracking parameters from the url

* The images API has been extended:
    * `page.images.with_size` returns a sorted array (by descending area) of [image_url, width, height]

## [Changes in 4.4](https://github.com/jaimeiniesta/metainspector/compare/v4.3.0...v4.4.0)

* The default headers now include `'Accept-Encoding' => 'identity'` to minimize trouble with servers that respond with malformed compressed responses, [as explained here](https://github.com/lostisland/faraday/issues/337).

## [Changes in 4.3](https://github.com/jaimeiniesta/metainspector/compare/v4.3.0...v4.4.0)

* The Document API has been extended with one new method `page.best_title` that returns the longest text available from a selection of candidates.
* `to_hash` now includes `scheme`, `host`, `root_url`, `best_title` and `description`.

## [Changes in 4.2](https://github.com/jaimeiniesta/metainspector/compare/v4.1.0...v4.2.0)

* The images API has been extended, with two new methods:

  * `page.images.owner_suggested` returns the OG or Twitter image, or `nil` if neither are present.
  * `page.images.largest` returns the largest image found in the page. This uses the HTML height and width attributes as well as the [fastimage](https://github.com/sdsykes/fastimage) gem to return the largest image on the page that has a ratio squarer than 1:10 or 10:1. This usually provides a good alternative to the OG or Twitter images if they are not supplied.

* The criteria for `page.images.best` has changed slightly, we'll now return the largest image instead of the first image if no owner-suggested image is found.

## [Changes in 4.1](https://github.com/jaimeiniesta/metainspector/compare/v4.0.0...v4.1.0)

* Introduces the `:normalize_url` option, which allows to disable URL normalization.

## [Changes in 4.0](https://github.com/jaimeiniesta/metainspector/compare/v3.0.0...v4.0.0)

* The links API has been changed, now instead of `page.links`, `page.internal_links` and `page.external_links` we have:

```ruby
page.links.raw      # Returns all links found, unprocessed
page.links.all      # Returns all links found, unrelavitized and absolutified
page.links.http     # Returns all HTTP links found
page.links.non_http # Returns all non-HTTP links found
page.links.internal # Returns all internal HTTP links found
page.links.external # Returns all external HTTP links found
```

* The images API has been changed, now instead of `page.image` we have `page.images.best`, and instead of `page.favicon` we have `page.images.favicon`.

* Now `page.image` will return the first image in `page.images` if no OG or Twitter image found, instead of returning `nil`.

* You can now specify 2 different timeouts, `connection_timeout` and `read_timeout`, instead of the previous single `timeout`.

## [Changes in 3.0](https://github.com/jaimeiniesta/metainspector/compare/v2.0.0...v3.0.0)

* The redirect API has been changed, now the `:allow_redirections` option will expect only a boolean, which by default is `true`. That is, no more specifying `:safe`, `:unsafe` or `:all`.
* We've dropped support for Ruby < 2.

Also, we've introduced a new feature:

* Persist cookies across redirects. Now MetaInspector will include the received cookies when following redirects. This fixes some cases where a redirect would fail, sometimes caught in a redirection loop.