Tuesday, 18 December 2012

Basics - XML

Here's the next entry in my Java basics series. Earlier we looked at how to read a text file. Now we'll look at some simple XML processing. XML is pretty ubiquitous; at some point you're going to run in to it. This post will show how to read XML and it parse into a Document. We'll extract data from the Document and then create a new one.

There are many XML libraries around that help with XML processing, such as jdom or dom4j, but the standard JDK comes with everything you need. Lets get started!

Getting a Document

The first step in processing XML is to create a Document from the raw XML. The Document is the central class for XML processing. We use a DocumentBuilder to parse XML into a Document. The builder has several overloaded parse methods allowing you to create a Document from a File, InputStream, InputSource or a URI. Here we are using a file.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
File file = new File("employees.xml");
Document doc = builder.parse(file);

In this example we're going to work with a basic XML file, employees;

  <employee id="1">
  <employee id="2">
  <employee id="3">

Working with the Document

Now we have our Document, we can extract data from it using XPath. XPath is a powerful query language for selecting nodes from an XML Document. Lets find the name of the employee with an ID of 1.

XPathFactory xpFactory = XPathFactory.newInstance();
XPath xpath = xpFactory.newXPath();
String qry = "/employees/employee[@id = '1']/name";
String name = (String)xpath.evaluate(qry, doc, XPathConstants.STRING);

When this code is run, it will output Fred, which is the name of the employee with an id attribute of '1'. Now lets get all the employee nodes in the sales department.

qry = "/employees/employee[department = 'Sales']";
NodeList employees = (NodeList)xpath.evaluate(qry, doc, XPathConstants.NODESET);
System.out.println("Employees in Sales department;");
for (int i=0; i<employees.getLength(); i++) {
  Element employee = (Element)employees.item(i);
  name = employee.getElementsByTagName("name").item(0).getTextContent();
  String age = employee.getElementsByTagName("age").item(0).getTextContent();
  String id = employee.getAttribute("id");
  System.out.format("%2$s - %1$s age %3$s\n", name, id, age);

In the above example we get a NodeList and loop over it to create a report of the employees in the Sales department;

Employees in Sales department;
1 - Fred age 20
2 - Bob age 30

Creating a new Document

Now lets look at creating a new Document from scratch. We use a DocumentBuilder to create a new empty Document, then we create some elements and build up the DOM tree.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.newDocument();

Element order = doc.createElement("order");
order.setAttribute("number", "1");

Element total = doc.createElement("total");

Element status = doc.createElement("status");

Now we have a small order Document! In order to see what a Document looks like we can output the Document to the console using a Transformer.

TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(order);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

which gives us the output;

<?xml version="1.0" encoding="UTF-8"?>
<order number="1">

