relaton/relaton-bib

View on GitHub
docs/hash.adoc

Summary

Maintainability
Test Coverage
= Relaton YAML

== YAML

The following structure is in place for encoding bibitem as YAML objects, and is also used
to represent bibliographic entries in Metanorma AsciiDoc. The structure has not yet been
generalised to `bibdata/ext`, the flavour-specific extensions of relaton.

If an element in Relaton XML has attributes, the content of the element is represented in YAML
with a `content` key:

[source,xml]
----
<title type="main">Geographic information</title>
----

[source,yaml]
----
title:
  type: main
  content: Geographic information
----

Any elements with a cardinality of many can be represented as arrays, but
they can also be populated by a hash or single element. For example,
a Relaton title can have multiple titles, and multiple scripts; so
the following are equivalent:

[source,yaml]
----
title:
  - type: main
    content: Geographic information
title:
  type: main
  content: Geographic information
script:
  - Latn
script: Latn
----

In YAML, "on" is a reserved word, and thus cannot be used as a key. "value" is used as a synonym for
"on" in dates.

The structure below is given in YAML format:

[source,yaml]
----
# bibliographic item anchor, used to crossreference within document
id: ISO/TC211
# date record was created
fetched: '2019-06-30'
# titles are an array, with a mandatory type and content, and optional format, language and script
title:
  - type: main
    content: Geographic information
  - type: subtitle
    content: Geographic information subtitle
    language: en
    script: Latn
    format: text/plain
# type of document
type: standard
# document identifiers are an array, with a mandatory type and id component
docid:
  type: ISO
  id: TC211
# document number
docnumber: '211'
# edition
edition: '1'
# language is an array
language:
  - en
  - fr
# script is an array
script:
  Latn
# version contains revision date and draft (as array)
version:
  revision_date: '2019-04-01'
  draft: draft
# note is an array of type and content
biblionote:
  type: bibnote
  content: >
      Mark set a major league
      home run record in 1998.
# document status has stage, and optional substage and iteration
docstatus:
  stage:
    value: '30'
    abbreviation: CD
  substage:
    value: '00'
  iteration: final
# date is an array, with mandatory type, and either an "on" value or a "from" and optional "to" value
date:
  - type: issued
    value: '2014'
  - type: published
    from: '2014-04'
    to: '2014-05'
  - type: accessed
    value: '2015-05-20'
# abstract is an array, with content, and optional language, script, format
abstract:
  - content: >
      ISO 19115-1:2014 defines the schema required for ...
  - content: >
      L'ISO 19115-1:2014 définit le schéma requis pour ...
    language: fr
    script: Latn
    format: text/plain
# contributors are an array of entity/role pairs, where entity is either person or organization.
# The role is an array of type and description; it can be a an array of just string, which are treated
# as the type.
# Organisations have attributes name, url, abbreviation, subdivision, contacts, identifiers, logo.
# Persons have attributes name, affiliation, contacts
# Person names have attributes abbreviation, surname, completename, initials, forename, additions, prefixes.
# Initials, forename, additions, prefixes are arrays.
# Name field values are either strings, or hashes, with content and language and script attributes.
# The language and script attribute can also be given on the name.
# Contacts are an array, containing either addresses, or other fields.
# Addresses are identified as hashes containing a city attribute; they can also contain a street
# (which is an array), a postcode, a state, and a country. The other contact fields
# are phones, emails, uris; they can contain a type.
# Affiliations are an array, and they contains an organization, and an optional description.
# The affiliation description can be a single string, or a hash of content, language, script, and format.
contributor:
  - organization:
      name: International Organization for Standardization
      url: www.iso.org
      abbreviation: ISO
      subdivision: division
      logo:
        image:
          id: logo1
          src: logo1.png
          mimetype: image/png
          filename: logo1.png
          height: "100%"
          width: "200"
          alt: Logo 1
          title: "Logo #1"
          longdesc: Logo number 1
    role:
      type: publisher
      description: Publisher role
  - person:
      name:
        completename:
          content: A. Bierman
          language: en
      affiliation:
        - organization:
            name: IETF
            abbreviation: IETF
            identifier:
              - type: uri
                id: www.ietf.org
          description: Affiliation description
      contact:
        - address:
            street:
              - 8 Street St
            city: City
            postcode: '123456'
            country: Country
            state: State
        - phone: '223322'
          type: mobile
    role: author
  - organization:
      name: IETF
      abbreviation: IETF
      identifier:
        - type: uri
          id: www.ietf.org
    role:
      publisher
  - person:
      name:
        abbreviation: AB
        language: en
        initial:
          - A.
        surname: Bierman
      affiliation:
        -  organization:
             name: IETF
             abbreviation: IETF
           description:
             content: Affiliation description
             language: en
             script: Latn
      identifier:
        - type: uri
          id: www.person.com
    role:
      author
# copyright consists of an owner (a hash containing the fields of an organisation),
# a "from" date, and an optional "to" date
copyright:
   owner:
     name: International Organization for Standardization
     abbreviation: ISO
     url: www.iso.org
   from: '2014'
   to: '2020'
# link is an array of URIs, with a type and content
link:
  - type: src
    content: https://www.iso.org/standard/53798.html
  - type: obp
    content: https://www.iso.org/obp/ui/#!iso:std:53798:en
  - type: rss
    content: https://www.iso.org/contents/data/standard/05/37/53798.detail.rss
# relations are an array of type, bibitem, locality, source_locality, and description.
# bibitem contains any of the attributes of a bibliographic item.
# locality is an array of locality_stack which is an array of hash of type,
#   reference_from, and optionally reference_to
# source_locality is an array of source_locality_stack which is similar to locality_stack
# description is optional and contains content and optional format, language, ans script.
relation:
  - type: updates
    bibitem:
      formattedref: ISO 19115:2003
    locality:
      locality_stack:
        type: page
        reference_from: '7'
        reference_to: '10'
    source_locality:
      source_locality_stack:
        - type: volume
          reference_from: '1'
        - type: chapter
          reference_from: '2'
  - type: updates
    bibitem:
      type: standard
      formattedref:
        content: ISO 19115:2003/Cor 1:2006
        format: text/plain
    description:
      content: supersedes
      format: text/plain
  - type: partOf
    bibitem:
      title:
        type: main
        content: Book title
        format: text/plain
# series are an array, containing a title, a type, a formattedref, a place,
# an organization (string), an abbreviation, a from, a to, a number, and a partnumber.
# The title is mandatory, and all other fields are optional.
# The series title, like the titles of bibliographic items, contains a type,
# content, and optional language, script, and format attributes.
# The abbreviation and formattedref are either a string,
# or a hash containing content, language, and script.
series:
  - type: main
    title:
      type: original
      content: ISO/IEC FDIS 10118-3
      language: en
      script: Latn
      format: text/plain
    place: Serie's place
    organization: Serie's organization
    abbreviation:
      content: ABVR
      language: en
      script: Latn
    from: '2009-02-01'
    to: '2010-12-20'
    number: serie1234
    partnumber: part5678
  - title:
      - content: Series
        language: en
        script: Latn
      - content: Séries
        language: fr
        script: Latn
        format: text/plain
# medium contains a form, a size, and a scale
medium:
  form: medium form
  size: medium size
  scale: medium scale
# place is an array of strings or hashes. Can have name or city, region and country.
# Name or city is mandatory, region and country are optional.
# String and hash with name are equivalent.
place:
  - bib place
  - city: Geneva
    region:
      - name: Region
    country:
      - iso: CH
        name: Switzelznd
        recommended: true
# extent is an array, localities are an array of locality_stack
extent:
  locality:
    type: section
    reference_from: '7'
    reference_to: '10'
# accesslocation is an array of strings
accesslocation:
  - accesslocation1
  - accesslocation2
# classification is an array of type and value
classification:
  type: type
  value: value
# validity contains a begins date, an ends date, and a revision date
validity:
  begins: '2010-10-10 12:21'
  ends: '2011-02-03 18:30'
  revision: '2011-03-04 09:00'
# keyword is an array of strings or hashes of content, language, script, and format
keyword:
  - Keyword
  - Key Word
# license is a string
license: License
----

== Metanorma structure (AsciiBib): nested definition list

The Metanorma AsciiDoc representation of the Relaton hash structure
in a bibliography
is as a definition list of element name and element contents,
with nested definition lists for nested structures. If a nested
definition is given for an element, the element itself has a blank definition.

As with the YAML representation, if an element in Relaton XML has attributes,
the content of the element is represented in YAML with a `content` key:

[source,xml]
----
<title type="main">Geographic information</title>
----

[source,asciidoc]
----
title::
  type::: main
  content::: Geographic information
----

Likewise, as with the YAML representation,
Repeating elements in a hash can be realised as ordered or unordered lists.

[source,asciidoc]
----
language::
  . en
  . fr
----

Metanorma AsciiDoc also supports representing repeating elements
by repeating the key for that entry. This will almost always be more
straightforward to use in AsciiDoc:

[source,asciidoc]
----
language:: en
language:: fr
----

Each Relaton entry in a bibliography is represented in Metanorma AsciiDoc
through a subclause with option attribute `[%bibitem]`. Any title given to the
subclause is treated as the title for the bibliographic entry, with language `en`,
script `Latn`, format `text/plain`, and type `main`.

So the following is a very simple reference in Metanorma AsciiDoc:

[source,asciidoc]
----
[%bibitem]
=== Rubber latex -- Sampling
id:: iso123
docid::
  type::: ISO
  id::: ISO 123
docid::
  type::: ABC
  id::: 32784
type:: standard
----

If there is no such title
for the entry, the subclause title should be left as `{blank}`, and the desired
title should be given in the hash body:

[source,asciidoc]
----
[%bibitem]
=== {blank}
id:: iso123
title::
language::: fr
script::: Latn
format::: text/plain
type::: alt
content::: Latex de caoutchouc -- Échantillonnage
----

Note the use of `content` as a key to represent the contents of the `title` tag.

The anchor crossreference for the bibliographic entry may be encoded as either the
`id` entry in the definition list, or as the normal AsciiDoc anchor on the
subclause, which takes priority:

[source,asciidoc]
----
[[iso123]]
[%bibitem]
=== Rubber latex -- Sampling
docid::
  type::: ISO
  id::: ISO 123
type:: standard
----

Repeating elements in a hash can be realised as ordered or unordered lists.

[source,asciidoc]
----
[[iso123]]
[%bibitem]
=== Rubber latex -- Sampling
docid::
  type::: ISO
  id::: ISO 123
language::
  . en
  . fr
----

Metanorma AsciiDoc also supports representing repeating elements
by repeating the key for that entry. This will almost always be more
straightforward to use in AsciiDoc:

[source,asciidoc]
----
[[iso123]]
[%bibitem]
=== Rubber latex -- Sampling
docid::
  type::: ISO
  id::: ISO 123
language:: en
language:: fr
----

AsciiDoc does not recognise definition lists more than four levels
deep. If deeper nesting is needed, you will need to attach a new definition
list with a list continuation, with the definition list depth reset back to one:

[source,asciidoc]
----
[[iso123]]
[%bibitem]
=== Rubber latex -- Sampling
docid::
  type::: ISO
  id::: ISO 123
type:: standard
contributor::
  role::: author
  person:::
    name::::
+
--
completename::
  language::: en
  content::: Fred
--
----

(This is very awkward, and <<JSONPath>> provides a workaround.)

The most heavily nested parts of a Relaton entry are the contributors,
series, and relations.
Each of these can be marked up as subclauses within the entry, with the clause
titles "contributor", "series", and "relation". Each subclause contains
a new definition list, with its definition list reset to zero depth;
the subclauses can be repeated for multiple instances of the same subentity.

AsciiBib citations can be combined with other AsciiDoc citations in the
same Metanorma document; but any AsciiDoc citations need be precede
AsciiBib citations. Each AsciiBib citations constitutes a subclause of its own,
and Metanorma will (unsuccessfully) attempt to incorporate any trailing material
in the subclause, including  AsciiDoc citations, into the current AsciiBib
citation.

The following is Metanorma AsciiDoc markup corresponding to the YAML
given above:


[source,asciidoc]
----
[[ISO/TC211]]
[%bibitem]
=== {blank}
fetched:: 2019-06-30
title::
  type::: main
  content::: Geographic information
title::
  type::: subtitle
  content::: Geographic information subtitle
  language::: en
  script::: Latn
  format::: text/plain
type:: standard
docid::
  type::: ISO
  id::: TC211
docnumber:: 211
edition:: 1
language::
  . en
  . fr
script:: Latn
version::
  revision_date::: 2019-04-01
  draft::: draft
biblionote::
  type::: bibnote
  content:::
+
--
Mark set a major league
home run record in 1998.
--
docstatus::
  stage::: stage
  substage::: substage
  iteration::: iteration
date::
  type::: issued
  value::: 2014
date::
  type::: published
  from::: 2014-04
  to::: 2014-05
date::
  type::: accessed
  value::: 2015-05-20
abstract::
  content:::
+
--
ISO 19115-1:2014 defines the schema required for ...
--
abstract::
  content::: L'ISO 19115-1:2014 définit le schéma requis pour ...
  language::: fr
  script::: Latn
  format::: text/plain
copyright::
   owner:::
     name:::: International Organization for Standardization
     abbreviation:::: ISO
     url:::: www.iso.org
   from::: 2014
   to::: 2020
link::
  type::: src
  content::: https://www.iso.org/standard/53798.html
link::
  type::: obp
  content::: https://www.iso.org/obp/ui/#!iso:std:53798:en
link::
  type::: rss
  content::: https://www.iso.org/contents/data/standard/05/37/53798.detail.rss
medium::
  form::: medium form
  size::: medium size
  scale::: medium scale
place:: bib place
extent::
  locality:::
    type::: section
    reference_from::: 7
accesslocation::
  . accesslocation1
  . accesslocation2
classification::
  type::: type
  value::: value
validity::
  begins::: 2010-10-10 12:21
  ends::: 2011-02-03 18:30


==== Contributor
organization::
  name::: International Organization for Standardization
  url::: www.iso.org
  abbreviation::: ISO
  subdivision::: division
role::
  type::: publisher
  description::: Publisher role

==== Contributor
person::
  name:::
    completename::::
+
--
content:: A. Bierman
language:: en
--
  affiliation:::
    organization::::
+
--
name:: IETF
abbreviation:: IETF
identifier::
type::: uri
id::: www.ietf.org
--
    description:::: Affiliation description
  address:::
    street::::
      . 8 Street St
    city:::: City
    postcode:::: 123456
    country:::: Country
    state:::: State
  contact:::
    mobile:::: 223322
    phone:::: mobile
role:: author

==== Contributor
organization::
  name::: IETF
  abbreviation::: IETF
  identifier:::
    type:::: uri
    id:::: www.ietf.org
role:: publisher

==== Contributor
person::
  name:::
    language:::: en
    initial:::: A.
    surname:::: Bierman
  affiliation:::
+
--
organization::
  name::: IETF
  abbreviation::: IETF
description::
  content::: Affiliation description
  language::: en
  script::: Latn
--
  identifier:::
    type:::: uri
    id:::: www.person.com
role:: author

==== Relation
type:: updates
bibitem::
  formattedref::: ISO 19115:2003
  bib_locality:::
    type:::: page
    reference_from:::: 7
    reference_to:::: 10

==== Relation
type:: updates
bibitem::
  type::: standard
  formattedref::: ISO 19115:2003/Cor 1:2006

==== Series
type:: main
title::
  type::: original
  content::: ISO/IEC FDIS 10118-3
  language::: en
  script::: Latn
  format::: text/plain
place:: Serie's place
organization:: Serie's organization
abbreviation::
  content::: ABVR
  language::: en
  script::: Latn
from:: 2009-02-01
to:: 2010-12-20
number:: serie1234
partnumber:: part5678

==== Series
type:: alt
formattedref::
  content::: serieref
  language::: en
  script::: Latn
----

[[JSONPath]]
== JSON Path style definition lists

The foregoing structure requires frequent breakouts into open blocks, to deal
with the limitation on AsciiDoc nested definition lists. An alternative is to
represent the nested structure of Relaton records in a simple, one-level definition list,
and to use the key for each key-value pair to represent the hierarchical nesting of entries,
as a dot-delimited path of keys. For example,

[source,asciidoc]
----
[%bibitem]
=== Rubber latex -- Sampling
id:: iso123
docid::
  type::: ISO
  id::: ISO 123
----

can instead be represented as:

[source,asciidoc]
----
[%bibitem]
=== Rubber latex -- Sampling
id:: iso123
docid.type:: ISO
docid.id:: ISO 123
----

Whenever part of the key is repeated between entries, the entries are assumed to attach to the same parent. If an array of hashes is needed, a blank entry is required for the key of each repeating element: For example,

[source,asciidoc]
----
[%bibitem]
=== Rubber latex -- Sampling
id:: iso123
docid::
  type::: ISO
  id::: ISO 123
docid::
  type::: ABC
  id::: 32784
----

can instead be represented as:

[source,asciidoc]
----
[%bibitem]
=== Rubber latex -- Sampling
id:: iso123
docid::
docid.type:: ISO
docid.id:: ISO 123
docid::
docid.type:: ABC
docid.id:: 32784
----

Embedded elements can also repeat:

[source,asciidoc]
----
[%bibitem]
...
==== Contributor
person::
  contact:::
    phone:::: 223322
    type:::: mobile
  contact:::
    phone:::: 332233
    type:::: work
----

can instead be represented as:

[source,asciidoc]
----
[%bibitem]
...
contributor.person.contact::
contributor.person.contact.phone:: 223322
contributor.person.contact.type:: mobile
contributor.person.contact::
contributor.person.contact.phone:: 332233
contributor.person.contact.type:: work
----

The following is Metanorma AsciiDoc markup corresponding to the YAML
given above, using JSON Path style definition lists instead of nested definition lists:

[source,asciidoc]
----
[[ISO/TC211]]
[%bibitem]
=== {blank}
fetched:: 2019-06-30
title::
title.type:: main
title.content:: Geographic information
title::
title.type:: subtitle
title.content:: Geographic information subtitle
title.language:: en
title.script:: Latn
title.format:: text/plain
type:: standard
docid::
docid.type:: ISO
docid.id:: TC211
docnumber:: 211
edition:: 1
language:: en
language:: fr
script:: Latn
version.revision_date:: 2019-04-01
version.draft:: draft
biblionote.type:: bibnote
biblionote.content::
+
--
Mark set a major league
home run record in 1998.
--
docstatus.stage:: stage
docstatus.substage:: substage
docstatus.iteration:: iteration
date::
date.type:: issued
date.value:: 2014
date::
date.type:: published
date.from:: 2014-04
date.to:: 2014-05
date::
date.type:: accessed
date.value:: 2015-05-20
abstract::
abstract.content::
+
--
ISO 19115-1:2014 defines the schema required for ...
--
abstract::
abstract.content:: L'ISO 19115-1:2014 définit le schéma requis pour ...
abstract.language:: fr
abstract.script:: Latn
abstract.format:: text/plain
copyright.owner.name:: International Organization for Standardization
copyright.owner.abbreviation:: ISO
copyright.owner.url:: www.iso.org
copyright.from:: 2014
copyright.to:: 2020
link::
link.type:: src
link.content:: https://www.iso.org/standard/53798.html
link::
link.type:: obp
link.content:: https://www.iso.org/obp/ui/#!iso:std:53798:en
link::
link.type:: rss
link.content:: https://www.iso.org/contents/data/standard/05/37/53798.detail.rss
medium::
medium.form:: medium form
medium.size:: medium size
medium.scale:: medium scale
place:: bib place
extent.type:: section
extent.reference_from:: 7
accesslocation:: accesslocation1
accesslocation:: accesslocation2
classification.type:: type
classification.value:: value
validity.begins:: 2010-10-10 12:21
validity.ends:: 2011-02-03 18:30
contributor::
contributor.organization.name:: International Organization for Standardization
contributor.organization.url:: www.iso.org
contributor.organization.abbreviation:: ISO
contributor.organization.subdivision:: division
contributor.role.type:: publisher
contributor.role.description:: Publisher role
contributor::
contributor.person.name.completename.content:: A. Bierman
contributor.person.name.completename.language:: en
contributor.person.affiliation.organization.name:: IETF
contributor.person.affiliation.organization.abbreviation:: IETF
contributor.person.affiliation.organization.identifier.type:: uri
contributor.person.affiliation.organization.identifier.id:: www.ietf.org
contributor.person.affiliation.description:: Affiliation description
contributor.person.address.street:: 8 Street St
contributor.person.address.city:: City
contributor.person.address.postcode:: 123456
contributor.person.address.country:: Country
contributor.person.address.state:: State
contributor.person.contact.phone:: 223322
contributor.person.contact.type:: mobile
contributor.role:: author
contributor::
contributor.organization.name:: IETF
contributor.organization.abbreviation:: IETF
contributor.organization.identifier.type:: uri
contributor.organization.identifier.id:: www.ietf.org
contributor.role:: publisher
contributor::
contributor.person.name.language:: en
contributor.person.name.initial:: A.
contributor.person.name.surname:: Bierman
contributor.person.affiliation.organization.name:: IETF
contributor.person.affiliation.organization.abbreviation:: IETF
contributor.person.affiliation.description.content:: Affiliation description
contributor.person.affiliation.description.language:: en
contributor.person.affiliation.description.script:: Latn
contributor.person.identifier.type:: uri
contributor.person.identifier.id:: www.person.com
contributor.role:: author
relation::
relation.type:: updates
relation.bibitem.formattedref:: ISO 19115:2003
relation.bibitem.bib_locality.type:: page
relation.bibitem.bib_locality.reference_from:: 7
relation.bibitem.bib_locality.reference_to:: 10
relation::
relation.type:: updates
relation.bibitem.type:: standard
relation.bibitem.formattedref:: ISO 19115:2003/Cor 1:2006
series::
series.type:: main
series.title.type:: original
series.title.content:: ISO/IEC FDIS 10118-3
series.title.language:: en
series.title.script:: Latn
series.title.format:: text/plain
series.place:: Serie's place
series.organization:: Serie's organization
series.abbreviation.content:: ABVR
series.abbreviation.language:: en
series.abbreviation.script:: Latn
series.from:: 2009-02-01
series.to:: 2010-12-20
series.number:: serie1234
series.partnumber:: part5678
series::
series.type:: alt
series.formattedref.content:: serieref
series.formattedref.language:: en
series.formattedref.script:: Latn
----