Test Data with XML

What is XML ?

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable


What is HTML ?

Hypertext Markup Language, a standardized system for tagging text files to achieve font, color, graphic, and hyperlink effects on World Wide Web pages


What is the difference between XML and HTML ?

XML and HTML were designed with different goals:

  • XML was designed to describe data, with focus on what data is
  • HTML was designed to display data, with focus on how data looks

How are XML and HTML relevant here in Automation?

Almost every UI Automation based scenario (and sometimes back end too), we are looking at HTML DOM to identify elements (which in turn looks very much like XML). We already went through a lot of concepts identifying elements in Basic Tutorial and interacting with them using Selenium/Watir api’s [text_field, radio..in fact the locator of every web element on DOM]. We used strategies like Xpath and Css. So the same concepts, but taking couple of steps deeper as we can play around a little bit more with XML/HTML using python xmltree, a parser which is extremely lightweight. The difference between xml parsers and Selenium is that xml parser is a parser on XML/HTML documents ONLY, however Selenium in addition to having an engine to parse the HTML DOM, can perform actions for eg. click, set as it has event handlers too. Anyways, this section is dedicated to XML parsing using python xmltree


  • python xmltree
  • Accessing the capabilities
  • Introspection on important lines of code
  • Entire behave feature
  • Expected output when run
  • Closing Thoughts

1) Python xmltree

The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.

Each element has a number of properties associated with it:

  • a tag which is a string identifying what kind of data this element represents (the element type, in other words).
  • a number of attributes, stored in a Python dictionary.
  • a text string.
  • an optional tail string.
  • a number of child elements, stored in a Python sequence

To create an element instance, use the Element constructor or the SubElement() factory function.

The ElementTree class can be used to wrap an element structure, and convert it from and to XML.

2)Accessing the capabilities

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. ET has two classes for this purpose – ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

ElementTree 1.3 comes by default with Python 2.7 and greater



Important Functionalities.

Below are extract’s of lines of code that are worth focusing on before we jump into behave scenario and step definitions. We can parse XML in a very similar way as we parsed HTML below, because HTML follows XML and has additional markup and style.

1) Parse xmlfile and get tree and root

2) Get Tag names

3) Retrieve children and print tag and attributes

4) Print text element for tag

5) Find elements using xpath finders

6) Build xml document


Entire behave Scenario


Step Definitions:

Expected Output:


Closing Thoughts:

This completes the different test data formats we planned to cover as part of Test Data. As a refresher, we covered Excel, JSON, YAML, XML data formats. Any of the data formats can be used in your Automation solution. If you are working on a maintenance project, then you might already be locked down with a certain format and hence we covered all the formats. If you are working on a new project, it is advisable to go with JSON or XML as those formats are easily exchanged between various systems. If you have non-programmers dealing with test data, then I would suggest maintaining the data in spread sheets (open office, office et al.) and have a parser sitting in between excel and your program that converts excel data to JSON/XML. That way, your programmers are relatively happy since their programs only have to deal with light weight data interchanges and at the same time your business users can still continue using what they used for years.

As part of an Automation Solution, I would however be inclined towards JSON/XML.

Please pass your feedback and valuable comments on this website by writing to us. We would appreciate critical feedback very much.