welcome/
java-mcmc/
software/
papers/
links/
email me

XML-FIXTAGS

NAME
SYNOPSIS
DESCRIPTION
TRANSFORMATIONS
OPTIONS
EXIT STATUS
BUGS
AUTHORS
SEE ALSO

NAME

xml-fixtags − convert HTML into XML on the standard output.

SYNOPSIS

xml-fixtags

[OPTION]... [FILE]

DESCRIPTION

xml-fixtags aggressively converts a single HTML or XML file, obtained from the standard input or FILE, into a well formed XML file, written to the standard output, that can be processed by xml-coreutils(7) without errors.

The output that is produced by xml-fixtags is almost certainly not what you want, and you should nearly always use a more sophisticated tool such as tidy(1), or xmllint(1) for ordinary conversions.

xml-fixtags is useful for processing documents which are not well formed to begin with, and where it does not matter if the corrections resemble closely what the original author intended, or when there are no alternatives installed on the system. This makes the xml-coreutils(7) more robust in a transparent way, without duplicating the repair heuristics in each command.

xml-fixtags uses a very simple algorithm which tries to localise the effect of well formedness errors in the input with minimal disruption to the other parts of the input. If the input is already well formed XML, then no modifications are performed.

The output of xml-fixtags is not guaranteed to be valid, and does not follow any rules specific to certain XML or HTML documents. It is merely guaranteed to be well formed.

TRANSFORMATIONS

This section describes the main transformations that are performed by xml-fixtags.

If the file does not start with ’<’, then an extra root tag will be added automatically (same effect as --root-wrap). As soon as a zero depth closing tag is encountered, the output ends.

If a closing tag is found which is not properly nested, all the children of the tag are closed immediately as well. If a closing tag is found which was not previously opened, it is opened and closed immediately. For the purposes of the preceding rules, tag names are searched case insensitively.

If an unknown entity reference "&name;" is found which has not been declared before, it is replaced with the text "&amp;name;".

If the --html switch is used, then the input is assumed to be HTML and the rules for opening and closing tags will also depend on the type of tag. The html, head, and body tags are inserted if they are missing, but full DTD compliance is not attempted.

OPTIONS

--root-wrap

Adds a standard root wrapper around the document, thereby incrementing the depth of every tag. This can be used to prevent early truncation of the document when a zero depth closing tag would otherwise be found.

--html

Assume that the input document is HTML. This switches on some extra heuristics. It does not imply valid XHTML on output.

--xml

Assume that the input document is XML. This is the default.

EXIT STATUS

xml-fixtags returns 0 on success, or 1 otherwise.

BUGS

xml-fixtags is still primitive, and can fail to fix an input document.

AUTHORS

Laird A. Breyer is the original author of this software. The source code (GPLv3 or later) for the latest version is available at the following locations:
http://www.lbreyer.com/gpl.html
http://xml-coreutils.sourceforge.net

SEE ALSO

xml-coreutils(7)