jkphl/rdfa-lite-microdata

View on GitHub
doc/index.md

Summary

Maintainability
Test Coverage
# About jkphl/rdfa-lite-microdata

[![Build Status][travis-image]][travis-url] [![Coverage Status][coveralls-image]][coveralls-url] [![Scrutinizer Code Quality][scrutinizer-image]][scrutinizer-url]  [![Code climate][codeclimate-image]][codeclimate-url]  [![Documentation Status][readthedocs-image]][readthedocs-url]  [![Clear architecture][clear-architecture-image]][clear-architecture-url]

> RDFa Lite 1.1 and HTML Microdata parser for web documents (HTML, SVG, XML)

*rdfa-lite-microdata* is used for extracting [RDFa Lite 1.1](https://www.w3.org/TR/rdfa-lite/ "RDFa Lite 1.1 - Second Edition") and [HTML Microdata](https://www.w3.org/TR/microdata/) information out of web documents (HTML / SVG / XML). The embedded structures may use arbitrary vocabularies (e.g. [schema.org](https://schema.org/)) and are returned as a Plain Old PHP Object (POPO) which is compliant with the JSON serialization [described for HTML Microdata](https://www.w3.org/TR/microdata/#json).

### RDFa Lite 1.1

To extract [RDFa Lite 1.1](https://www.w3.org/TR/rdfa-lite/ "RDFa Lite 1.1 - Second Edition") data out of a web document, instantiate an `RdfaLite` parser and call the appropriate parse method:

```php
$rdfaParser = new \Jkphl\RdfaLiteMicrodata\Ports\Parser\RdfaLite();

// Parse an HTML file
$rdfaItems = $rdfaParser->parseHtmlFile('/path/to/file.html');

// Parse an HTML string
$rdfaItems = $rdfaParser->parseHtml('<html><head>...</head><body vocab="http://schema.org/">...</body>');

// Parse a DOM document (here: created from an HTML string)
$rdfaDom = new \DOMDocument();
$rdfaDom->loadHTML('<html><head>...</head><body vocab="http://schema.org/">...</body>');
$rdfaItems = $rdfaParser->parseDom($rdfaDom);

// Parse an XML file (e.g. SVG)
$rdfaItems = $rdfaParser->parseXmlFile('/path/to/file.svg');

// Parse an XML string (e.g. SVG)
$rdfaItems = $rdfaParser->parseXml('<svg viewBox="0 0 100 100" vocab="http://schema.org/">...</svg>');

echo json_encode($rdfaItems, JSON_PRETTY_PRINT);
```

The resulting JSON serialization will look something like this (JSON serialization):

```json
{
    "items": [
        {
            "type": [
                "http://schema.org/Movie"
            ],
            "id": "http://www.imdb.com/title/tt0499549/",
            "properties": {
                "http://schema.org/name": [
                    "Avatar"
                ],
                "http://schema.org/director": [
                    {
                        "type": [
                            "http://schema.org/Person"
                        ],
                        "id": null,
                        "properties": {
                            "http://schema.org/name": [
                                "James Cameron"
                            ],
                            "http://schema.org/birthDate": [
                                "August 16, 1954"
                            ]
                        }
                    }
                ],
                "http://schema.org/genre": [
                    "Science fiction"
                ],
                "http://schema.org/trailer": [
                    "../movies/avatar-theatrical-trailer.html"
                ]
            }
        }
    ]
}
```

Item types and property names can be treated as references consisting of a profile IRI and a separate name. To enable IRI mode, instantiate the parser with `true` as argument:

```php
$rdfaParser = new \Jkphl\RdfaLiteMicrodata\Ports\Parser\RdfaLite(true);
$rdfaItems = $rdfaParser->parseHtmlFile('/path/to/file.html');
```

With IRI mode enabled, the result will look like more verbose (JSON serialization):

```json
{
    "items": [
        {
            "type": [
                {
                    "profile": "http://schema.org/",
                    "name": "Movie"
                }
            ],
            "id": "http://www.imdb.com/title/tt0499549/",
            "properties": {
                "http://schema.org/name": {
                    "profile": "http://schema.org/",
                    "name": "name",
                    "values": [
                        "Avatar"
                    ]
                },
                "http://schema.org/director": {
                    "profile": "http://schema.org/",
                    "name": "director",
                    "values": [
                        {
                            "type": [
                                {
                                    "profile": "http://schema.org/",
                                    "name": "Person"
                                }
                            ],
                            "id": null,
                            "properties": {
                                "http://schema.org/name": {
                                    "profile": "http://schema.org/",
                                    "name": "name",
                                    "values": [
                                        "James Cameron"
                                    ]
                                },
                                "http://schema.org/birthDate": {
                                    "profile": "http://schema.org/",
                                    "name": "birthDate",
                                    "values": [
                                        "August 16, 1954"
                                    ]
                                }
                            }
                        }
                    ]
                },
                "http://schema.org/genre": {
                    "profile": "http://schema.org/",
                    "name": "genre",
                    "values": [
                        "Science fiction"
                    ]
                },
                "http://schema.org/trailer": {
                    "profile": "http://schema.org/",
                    "name": "trailer",
                    "values": [
                        "../movies/avatar-theatrical-trailer.html"
                    ]
                }
            }
        }
    ]
}
```

### HTML Microdata

The [Microdata](https://www.w3.org/TR/microdata/) format isn't specified for non-HTML host formats, so the `Microdata` parser only supports HTML processing:
   

```php
$microdataParser = new \Jkphl\RdfaLiteMicrodata\Ports\Parser\Microdata();

// Parse an HTML file
$microdataItems = $microdataParser->parseHtmlFile('/path/to/file.html');

// Parse an HTML string
$microdataItems = $microdataParser->parseHtml('<html><head>...</head><body itemscope itemtype="http://schema.org/Movie">...</body>');

// Parse a DOM document created from an HTML string
$microdataDom = new \DOMDocument();
$microdataDom->loadHTML('<html><head>...</head><body itemscope itemtype="http://schema.org/Movie">...</body>');
$microdataItems = $microdataParser->parseDom($microdataDom);

// Parse an HTML string with types / property names treated as IRIs
$microdataParserIri = new \Jkphl\RdfaLiteMicrodata\Ports\Parser\Microdata(true);
$microdataItems = $microdataParser->parseHtmlFile('/path/to/file.html');
```

## Installation

This library requires PHP >=5.5 or later. I recommend using the latest available version of PHP as a matter of principle. It has no userland dependencies. It's installable and autoloadable via [Composer](https://getcomposer.org/) as [jkphl/rdfa-lite-microdata](https://packagist.org/packages/jkphl/rdfa-lite-microdata).

```bash
composer require jkphl/rdfa-lite-microdata
```

Alternatively, [download a release](https://github.com/jkphl/rdfa-lite-microdata/releases) or clone [the repository](https://github.com/jkphl/rdfa-lite-microdata), then require or include its [`autoload.php`](https://github.com/jkphl/rdfa-lite-microdata/blob/master/autoload.php) file.


## Dependencies

![Composer dependency graph](https://rawgit.com/jkphl/rdfa-lite-microdata/master/doc/dependencies.svg)


## License

Copyright © 2017 [Joschi Kuphal][author-url] / joschi@tollwerk.de. Licensed under the terms of the [MIT license](../LICENSE).


[codeclimate-image]: https://lima.codeclimate.com/github/jkphl/rdfa-lite-microdata/badges/gpa.svg
[codeclimate-url]: https://lima.codeclimate.com/github/jkphl/rdfa-lite-microdata
[readthedocs-url]: http://jkphlrdfa-lite-microdata.readthedocs.io/en/latest/
[coveralls-url]: https://coveralls.io/github/jkphl/rdfa-lite-microdata?branch=master
[clear-architecture-url]: https://github.com/jkphl/clear-architecture
[travis-url]: https://travis-ci.org/jkphl/rdfa-lite-microdata
[scrutinizer-url]: https://scrutinizer-ci.com/g/jkphl/rdfa-lite-microdata/?branch=master
[clear-architecture-image]: https://img.shields.io/badge/Clear%20Architecture-%E2%9C%94-brightgreen.svg
[travis-image]: https://secure.travis-ci.org/jkphl/rdfa-lite-microdata.svg
[scrutinizer-image]: https://scrutinizer-ci.com/g/jkphl/rdfa-lite-microdata/badges/quality-score.png?b=master
[readthedocs-image]: https://readthedocs.org/projects/jkphlrdfa-lite-microdata/badge/?version=latest
[coveralls-image]: https://coveralls.io/repos/github/jkphl/rdfa-lite-microdata/badge.svg?branch=master


[author-url]: https://jkphl.is