NaturalIntelligence/fast-xml-parser

View on GitHub
docs/v4/2.XMLparseOptions.md

Summary

Maintainability
Test Coverage
We can customize parsing by providing different options to the parser;

```js
const {XMLParser} = require('fast-xml-parser');

const options = {
    ignoreAttributes : false
};

const parser = new XMLParser(options);
let jsonObj = parser.parse(xmlDataStr);
```

Let's understand each option in detail with necessary examples

## allowBooleanAttributes

To allow attributes without value. 

By default boolean attributes are ignored
```js
    const xmlDataStr = `<root a="nice" checked><a>wow</a></root>`;

    const options = {
        ignoreAttributes: false,
        attributeNamePrefix : "@_"
    };
    const parser = new XMLParser(options);
    const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "@_a": "nice",
        "a": "wow"
    }
}
```
Once you allow boolean attributes, they are parsed and set to `true`.
```js
const xmlDataStr = `<root a="nice" checked><a>wow</a></root>`;

const options = {
    ignoreAttributes: false,
    attributeNamePrefix : "@_",
    allowBooleanAttributes: true
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "@_a": "nice",
        "@_checked": true,
        "a": "wow"
    }
}
```

## alwaysCreateTextNode

FXP creates text node always when `preserveOrder: true`. Otherwise, it creates a property with the tag name and assign the value directly.

You can force FXP to render a tag with textnode using this option.

Eg
```js
const xmlDataStr = `
    <root a="nice" checked>
        <a>wow</a>
        <a>
          wow again
          <c> unlimited </c>
        </a>
        <b>wow phir se</b>
    </root>`;  

const options = {
    ignoreAttributes: false,
    // alwaysCreateTextNode: true
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output when `alwaysCreateTextNode: false`
```json
{
    "root": {
        "a": [
            "wow",
            {
                "c": "unlimited",
                "#text": "wow again"
            }
        ],
        "b": "wow phir se",
        "@_a": "nice"
    }
}
```
Output when `alwaysCreateTextNode: true`
```json
{
    "root": {
        "a": [
            {
                "#text": "wow"
            },
            {
                "c": {
                    "#text": "unlimited"
                },
                "#text": "wow again"
            }
        ],
        "b": {
            "#text": "wow phir se"
        },
        "@_a": "nice"
    }
}
```

## attributesGroupName

To group all the attributes of a tag under given property name.

```js
const xmlDataStr = `<root a="nice" b="very nice" ><a>wow</a></root>`;

const options = {
    ignoreAttributes: false,
    attributeNamePrefix : "@_",
    attributesGroupName : "@_"
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "@_": {
            "@_a": "nice",
            "@_b": "very nice"
        },
        "a": "wow"
    }
}
```

## attributeNamePrefix

To recognize attributes in the JS object separately. You can prepend some string with each attribute name.

```js
const xmlDataStr = `<root a="nice" ><a>wow</a></root>`;

const options = {
    ignoreAttributes: false,
    attributeNamePrefix : "@_"
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "@_a": "nice",
        "a": "wow"
    }
}
```



## attributeValueProcessor
Similar to `tagValueProcessor` but applied to attributes.

Eg
```js
:
const options = {
    ignoreAttributes: false,
    attributeValueProcessor: (name, val, jPath) => {
        :
    }
};
:
```

## cdataPropName
If `cdataPropName` is not set to some property name, then CDATA values are merged to tag's text value.

Eg
```js
const xmlDataStr = `
    <a>
        <name>name:<![CDATA[<some>Jack</some>]]><![CDATA[Jack]]></name>
    </a>`;  

const parser = new XMLParser();
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "a": {
        "name": "name:<some>Jack</some>Jack"
    }
}
```
When `cdataPropName` is set to some property then text value and CDATA are parsed to different properties.

Example 2
```js
const options = {
    cdataPropName:     "__cdata"
}
```
Output
```json
{
    "a": {
        "name": {
            "__cdata": [
                "<some>Jack</some>",
                "Jack"
            ],
            "#text": "name:"
        }
    }
}
```
## commentPropName
If `commentPropName` is set to some property name, then comments are parsed from XML.

Eg
```js
const xmlDataStr = `
    <!--Students grades are uploaded by months-->
    <class_list>
       <student>
         <!--Student details-->
         <!--A second comment-->
         <name>Tanmay</name>
         <grade>A</grade>
       </student>
    </class_list>`;  

const options = {
    commentPropName: "#comment"
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "#comment": "Students grades are uploaded by months",
    "class_list": {
        "student": {
            "#comment": [
                "Student details",
                "A second comment"
            ],
            "name": "Tanmay",
            "grade": "A"
        }
    }
}
```

## htmlEntities
FXP by default parse XMl entities if `processEntities: true`. You can set `htmlEntities` to parse HTML entities. Check [entities](./5.Entities.md) section for more information.

## ignoreAttributes

By default `ignoreAttributes` is set to `true`. It means, attributes are ignored by the parser. If you set any configuration related to attributes without setting `ignoreAttributes: false`, it is useless.

Eg
```js
const xmlDataStr = `<root a="nice" ><a>wow</a></root>`;

const options = {
    // ignoreAttributes: false,
    attributeNamePrefix : "@_"
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "a": "wow"
    }
}
```

## ignoreDeclaration

As many users want to ignore XML declaration tag to be ignored from parsing output, they can use `ignoreDeclaration: true`.

Eg
```js
const xmlDataStr = `<?xml version="1.0"?>
      <?elementnames <fred>, <bert>, <harry> ?>
      <h1></h1>`;

const options = {
    ignoreDeclaration: true,
    attributeNamePrefix : "@_"
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "?elementnames": "",
    "h1": ""
}
```

## ignorePiTags


As many users want to ignore PI tags to be ignored from parsing output, they can use `ignorePiTags: true`.

Eg
```js
const xmlDataStr = `<?xml version="1.0"?>
      <?elementnames <fred>, <bert>, <harry> ?>
      <h1></h1>`;

const options = {
    ignoreDeclaration: true,
    ignorePiTags: true,
    attributeNamePrefix : "@_"
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "h1": ""
}
```
## isArray

Whether a single tag should be parsed as an array or an object, it can't be decided by FXP. Hence `isArray` method can help users to take the decision if a tag should be parsed as an array.

Eg
```js
const xmlDataStr = `
    <root a="nice" checked>
        <a>wow</a>
        <a>
          wow again
          <c> unlimited </c>
        </a>
        <b>wow phir se</b>
    </root>`;

const alwaysArray = [
    "root.a.c",
    "root.b"
];
      

const options = {
    ignoreAttributes: false,
    //name: is either tagname, or attribute name
    //jPath: upto the tag name
    isArray: (name, jpath, isLeafNode, isAttribute) => { 
        if( alwaysArray.indexOf(jpath) !== -1) return true;
    }
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "@_a": "nice",
        "a": [
            "wow",
            {
                "#text": "wow again",
                "c": [
                    "unlimited"
                ]
            }
        ],
        "b": [
            "wow phir se"
        ]
    }
}
```


## numberParseOptions
FXP uses [strnum](https://github.com/NaturalIntelligence/strnum) library to parse string into numbers. This property allows you to set configuration for strnum package.

Eg
```js
const xmlDataStr = `
    <root>
        <a>-0x2f</a>
        <a>006</a>
        <a>6.00</a>
        <a>-01.0E2</a>
        <a>+1212121212</a>
    </root>`;  

const options = {
    numberParseOptions: {
        leadingZeros: true,
        hex: true,
        skipLike: /\+[0-9]{10}/,
        // eNotation: false
    }
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "a": [ -47, 6, 6, -100, "+1212121212" ],
    }
}
```


## parseAttributeValue
`parseAttributeValue: true` option can be set to parse attributes value. This option uses [strnum](https://github.com/NaturalIntelligence/strnum) package to parse numeric values. For more controlled parsing check `numberParseOptions` option.

Eg
```js
const xmlDataStr = `
    <root a="nice" checked enabled="true" int="32" int="34">
        <a>wow</a>
    </root>`;  

const options = {
    ignoreAttributes: false,
    // parseAttributeValue: true,
    allowBooleanAttributes: true
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "a": "wow",
        "@_a": "nice",
        "@_checked": true,
        "@_enabled": "true",
        "@_int": "34"
    }
}
```
Boolean attributes are always parsed to `true`. XML Parser doesn't give error if validation is kept off so it'll overwrite the value of repeated attributes.

When `parseAttributeValue: true`
```json
{
    "root": {
        "a": "wow",
        "@_a": "nice",
        "@_checked": true,
        "@_enabled": true,
        "@_int": 34
    }
}
```

When validation options are set or `true`.
```js
//..
const output = parser.parse(xmlDataStr, {
    allowBooleanAttributes: true
});
//..
```
Output
```bash
Error: Attribute 'int' is repeated.:1:48
```

## parseTagValue
`parseTagValue: true` option can be set to parse tags value. This option uses [strnum](https://github.com/NaturalIntelligence/strnum) package to parse numeric values. For more controlled parsing check `numberParseOptions` option.

Eg
```js
const xmlDataStr = `
    <root>
        35<nested>34</nested>
    </root>`;  

const options = {
    parseTagValue: true
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": 35
}
```

**Example 2**
Input
```xml
<root>
    35<nested>34</nested>
</root>
```
Output
```json
{
    "root": {
        "nested": 34,
        "#text": 35
    }
}
```

**Example 3**
Input
```xml
<root>
    35<nested>34</nested>46
</root>
```
Output
```json
{
    "root": {
        "nested": 34,
        "#text": "3546"
    }
}
```
## preserveOrder
When `preserveOrder: true` , you get some long ugly result. That's because this option is used to keep the order of tags in result js Object. It also helps XML builder to build the similar kind of XML from the js object without losing information.

```js
const XMLdata = `
    <!--Students grades are uploaded by months-->
    <class_list standard="3">
       <student>
         <!--Student details-->
         <!--A second comment-->
         <name>Tanmay</name>
         <grade>A</grade>
       </student>
    </class_list>`;

    const options = {
        commentPropName: "#comment",
        preserveOrder: true
    };
    const parser = new XMLParser(options);
    let result = parser.parse(XMLdata);
```
```json
[
    {
        "#comment": [
            { "#text": "Students grades are uploaded by months" }
        ]
    },
    {
        "class_list": [
            {
                "student": [
                    {
                        "#comment": [
                            { "#text": "Student details" }
                        ]
                    },
                    {
                        "#comment": [
                            { "#text": "A second comment" }
                        ]
                    },
                    {
                        "name": [
                            { "#text": "Tanmay" }
                        ]
                    },
                    {
                        "grade": [
                            { "#text": "A" }
                        ]
                    }
                ]
            }
        ],
        ":@": {
            "standard" : "3"
        }
    }
]
```

## processEntities
Set it to `true` (default) to process default and DOCTYPE entities. Check [Entities](./5.Entities.md) section for more detail. If you don't have entities in your XML document then it is recommended to disable it `processEntities: false` for better performance.
## removeNSPrefix

Remove namespace string from tag and attribute names.

Default is `removeNSPrefix: false`
```js
const xmlDataStr = `<root some:a="nice" ><any:a>wow</any:a></root>`;

const options = {
    ignoreAttributes: false,
    attributeNamePrefix : "@_"
    //removeNSPrefix: true
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "@_some:a": "nice",
        "any:a": "wow"
    }
}
```

Setting `removeNSPrefix: true`
```js
const xmlDataStr = `<root some:a="nice" ><any:a>wow</any:a></root>`;

const options = {
    ignoreAttributes: false,
    attributeNamePrefix : "@_",
    removeNSPrefix: true
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "@_a": "nice",
        "a": "wow"
    }
}
```

## stopNodes

At particular point, if you don't want to parse a tag and it's nested tags then you can set their path in `stopNodes`.

Eg
```js
const xmlDataStr = `
    <root a="nice" checked>
        <a>wow</a>
        <a>
          wow again
          <c> unlimited </c>
        </a>
        <b>wow phir se</b>
    </root>`;  

const options = {
    stopNodes: ["root.a"]
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "a": [
            "wow",
            "\n            wow again\n            <c> unlimited </c>\n          "
        ],
        "b": "wow phir se",
    }
}
```

You can also mention a tag which should not be processed irrespective of their path. Eg. `<pre>` or `<script>` in an HTML document.

```js
const options = {
    stopNodes: ["*.pre", "*.script"]
};
```

Note that a stop node should not have same closing node in contents. Eg

```xml
<stop>
invalid </stop>
</stop>
```
nested stop notes are also not allowed
```xml
<stop>
    <stop> invalid </stop>
</stop>
```

## tagValueProcessor
With `tagValueProcessor` you can control how and which tag value should be parsed.

1. If `tagValueProcessor` returns `undefined` or `null` then original value would be set without parsing.
2. If it returns different value or value with different data type then new value would be set without parsing.
3. Otherwise original value would be set after parsing (if `parseTagValue: true`)
4. if tag value is empty then `tagValueProcessor` will not be called.

Eg
```js
const xmlDataStr = `
    <root a="nice" checked>
        <a>wow</a>
        <a>
          wow again
          <c> unlimited </c>
        </a>
        <b>wow phir se</b>
    </root>`;  

const options = {
    ignoreAttributes: false,
    tagValueProcessor: (tagName, tagValue, jPath, hasAttributes, isLeafNode) => {
        if(isLeafNode) return tagValue;
        return "";
    }
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "a": [
            {
                "#text": "wow",
                "@_a": "2"
            },
            {
                "c": {
                    "#text": "unlimited",
                    "@_a": "2"
                },
                "@_a": "2"
            }
        ],
        "b": {
            "#text": "wow phir se",
            "@_a": "2"
        },
        "@_a": "nice"
    }
}
```


## textNodeName
Text value of a tag is parsed to `#text` property by default. You can always change this.

Eg
```js
const xmlDataStr = `
    <a>
            text<b>alpha</b>
    </a>`;  

const options = {
    ignoreAttributes: false,
    textNodeName: "$text"
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "a": {
        "b": "alpha",
        "$text": "text"
    }
}
```

## transformTagName

`transformTagName` property function let you change the name of a tag as per your need. Eg 'red-code' can be transformed to 'redCode' or 'redCode' can be transformed to 'redcode'.

```js
transformTagName: (tagName) => tagName.toLowerCase()
```

## transformAttributeName

`transformAttributeName` property function let you modify the name of attribute

```js
transformAttributeName: (attributeName) => attributeName.toLowerCase()
```

## trimValues
Remove surrounding whitespace from tag or attribute value.

Eg
```js
const xmlDataStr = `
    <root attri="  ibu  te   ">
        35       <nested>   34</nested>
    </root>`;  

const options = {
    ignoreAttributes: false,
    parseTagValue: true, //default
    trimValues: false
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "root": {
        "nested": 34,
        "#text": "\n      35       \n    ",
        "@_attri": "  ibu  te   "
    }
}
```
If the tag value consists of whitespace only and `trimValues: false` then value will not be parsed even if `parseTagValue:true`. Similarly, if `trimValues: true` and `parseTagValue:false` then surrounding whitespace will be removed.

```js
const options = {
    ignoreAttributes: false,
    parseTagValue: false,
    trimValues: true //default
};
```
Output
```json
{
    "root": {
        "nested": "34",
        "#text": "35",
        "@_attri": "ibu  te"
    }
}
```

## unpairedTags
Unpaired Tags are the tags which don't have matching closing tag. Eg `<br>` in HTML. You can parse unpaired tags by providing their list to the parser, validator and builder.

Eg
```js
const xmlDataStr = `
    <rootNode>
        <tag>value</tag>
        <empty />
        <unpaired>
        <unpaired />
        <unpaired>
    </rootNode>`;  

const options = {
    unpairedTags: ["unpaired"]
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```
Output
```json
{
    "rootNode": {
        "tag": "value",
        "empty": "",
        "unpaired": [ "", "", ""]
    }
}
```
**Note**: Unpaired tag can't be used as closing tag e.g. `</unpaired>` is not allowed.
## updateTag

This property allow you to change tag name when you send different name, skip a tag from the parsing result when you return false, or to change attributes.
```js
const xmlDataStr = `
    <rootNode>
        <a at="val">value</a>
        <b></b>
    </rootNode>`;  

const options = {
    attributeNamePrefix: "",
    ignoreAttributes:    false,
    updateTag(tagName, jPath, attrs){
        attrs["At"] = "Home";
        delete attrs["at"];
        
        if(tagName === "a") return "A";
        else if(tagName === "b") return false;
    }
};
const parser = new XMLParser(options);
const output = parser.parse(xmlDataStr);
```

[> Next: XmlBuilder](./3.XMLBuilder.md)