Opening and viewing existing XML files
We will work below with a simple XML file called food.xml. What can
we do with it? The simplest thing we can do is send it to the terminal
for display:
% xml-cat food.xml
<?xml version="1.0"?>
<root>
<product price="3">Chicken</product>
<product price="11.50">Lobster</product>
<product price=".20">Apple</product>
<product price="1.09">Milk (2 litres)</product>
</root>
Of course this is trivial, and we don't need a special command just to
do this. We could simply use the ordinary cat(1) command. However,
there are some differences here.
The first difference is that xml-cat(1) also checks the
file food.xml for integrity, and whereas cat(1)
prints whatever it finds in the file as-is, here xml-cat(1) will print
an error (and refuse to continue) as soon as it finds
that food.xml is not well formed XML. It
actually is well formed, so we don't see an error message in this
case.
Thus we get an implicit guarantee from xml-cat(1), that whatever it
allows to be printed will be suitable for another XML processor to
consume. The guarantee is weak, however, and is not a full validity
guarantee, only a well formedness guarantee. All the xml-coreutils(7)
commands process well formed XML documents and always ignore
validity. This is because they are likely to be used on XML
fragments, which don't usually carry their own validation specs.
The second difference between cat(1) and xml-cat(1) is at first
surprising: the existing top level element (called <products>)
in the food.xml file is discarded, and replaced
with a generic <root> tag. Why does this occur?
Just like with cat(1), the main task of xml-cat(1) is concatenation,
ie taking two or more XML files as input and creating a single XML
file which contains them all as output. But a well formed XML file
must only contain a single top level tag, and therefore xml-cat(1)
does the simplest thing it can to satisfy this constraint (as well as
a few others we won't mention here): it removes the top level tag from
each input file, and wraps the output in a single <root>
tag. You'll see this in action below. The generic root tag is also a
handy reminder that the output is no longer associated with a DTD.
Although xml-cat(1) is nice for inspecting small XML files, for larger
files a specialized viewer is essential. The xml-coreutils(7) include
such a viewer, called xml-less(1). This is a terminal based
interactive viewer, which is inspired by less(1), but with some extra
advantages: because it understands the structure of XML files, it can
do things that less(1) cannot, such as folding (press the TAB key),
word wrapping (press the W key), showing or hiding attributes (press
the A key), etc. You can try it out as follows:
% xml-less food.xml
One more command should be discussed straight away, and that
is xml-fixtags(1). This command takes an XML file which is not necessarily
well formed, and repairs it so that it becomes well formed XML.
It can be used to fix small problems, and can even convert an HTML file
into XML. However, be warned that the repairs are "dumb", and will probably
not be as expected.
Aside from xml-fixtags(1), all the other xml-coreutils(7) commands
expect their input XML files to be well formed, or will signal an
error. This follows the XML standard modus operandi, and also
prevents duplication of functionality.
% xml-fixtags food.xml | xml-less
% xml-fixtags --html xml_coreutils_tutorial.html | xml-fmt
|