Test Data with XML

What is XML ?

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable


What is HTML ?

Hypertext Markup Language, a standardized system for tagging text files to achieve font, color, graphic, and hyperlink effects on World Wide Web pages


What is the difference between XML and HTML ?

XML and HTML were designed with different goals:

  • XML was designed to describe data, with focus on what data is
  • HTML was designed to display data, with focus on how data looks

How are XML and HTML relevant here in Automation?

Almost every UI Automation based scenario (and sometimes back end too), we are looking at HTML DOM to identify elements (which in turn looks very much like XML). We already went through a lot of concepts identifying elements in Basic Tutorial and interacting with them using Selenium/Watir api’s [text_field, radio..in fact the locator of every web element on DOM]. We used strategies like Xpath and Css. So the same concepts, but taking couple of steps deeper as we can play around a little bit more with XML/HTML using Nokogiri, a parser which is extremely lightweight. The difference between Nokogiri and Selenium is that Nokogiri is a parser on XML/HTML documents ONLY, however Selenium in addition to having an engine to parse the HTML DOM, can perform actions for eg. click, set as it has event handlers too. Anyways, this section is dedicated to XML parsing using Nokogiri


  • Nokogiri and open-uri
  • Accessing the capabilities
  • Introspection on important lines of code
  • Entire cucumber feature
  • Expected output when run
  • Closing Thoughts

1) Nokogiri and open-uri

Nokogiri is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors. open-uri is a module that is built-in to Ruby and the reason we are using it here is to be able to get the HTML document in a headless way (i.e. without the browser being opened). 

I found Nokogiri to be extremely useful especially for retrieving various properties on the HTML document and of course it has full api’s to deal with XML documents. If you are on a CI environment (Hudson, Jenkins, Bamboo etc.), nokogiri is also an excellent way to parse XML and spin up scripts that can run extremely fast. It can also be used to parse configuration files in xml and so on.

2)Accessing the capabilities

Nokogiri is a gem, so we would have to install it and then require the module in the context. . 

a) Add the gem to Gemfile and run bundle install/update from RubyMine OR bundle update from command line from the root of the project. The same steps what we have done so far for example for rubyXL.



b) Require the module in env.rb so that Cucumber can find it when loading the context.


Nokogiri and open-url api’s

Below are extract’s of lines of code that are worth focusing on before we jump into Cucumber scenario and step definitions. We can parse XML in a very similar way as we parsed HTML below, because HTML follows XML and has additional markup and style.

1) Nokogiri parsing a HTML document (using Selenium)

2) Nokogiri parsing a HTML document (using open-uri)

3) Nokogiri extracting a HTML link using xpath

4) Nokogiri extracting a HTML link using css

5) Nokogiri search

6) Nokogiri building an xml

7) Nokogiri building an xml document

8) Open uri ignoring the https (ssl) verification 

9) Code to fix the UTF-8 error when parsing html documents 

Entire Cucumber Scenario

The below cucumber feature contains 8 scenarios. The scenarios that fail are marked in Red. They are expected to fail because open uri doesn’t ignore open ssl verify peer. So in the subsequent scenarios, we fix this failure with the code specified in 8) above

  • Print all links on Google search using css and selenium
  • Print all links on google search using css and openuri
  • Print all links on google search using xpath and selenium
  • Print all links on google search using xpath and openuri
  • Print all links on google search using mix,match and openuri
  • Print all links on google search behind firewall
  • Nokogiri xml builder example
  • Nokogiri xml builder example that uses underscore for special tags
  • Nokogiri xml builder example to construct tag attributes

Step Definitions:

Expected Output:

  1. First scenario should print all links searched by xpath and selenium
  2. Second scenario should print all links searched (this fails though) by css and open uri
  3. Third scenario prints all links searched by xpath
  4. Fourth scenario prints all links searched by xpath and openuri
  5. Fifth scenario explains the search() in Nokogiri
  6. Sixth scenario fixes the open uri issue that occurred in second scenario 
  7. Seventh scenario constructs an xml document using Nokogiri
  8. Eight scenario is an in depth usage of certain methods between Ruby and Nokogiri to resolve name conflicts
  9. Ninth scenario constructs xml document and also adds attributes to xml nodes

Closing Thoughts:

This completes the different test data formats we planned to cover as part of Test Data. As a refresher, we covered Excel, JSON, YAML, XML data formats. Any of the data formats can be used in your Automation solution. If you are working on a maintenance project, then you might already be locked down with a certain format and hence we covered all the formats. If you are working on a new project, it is advisable to go with JSON or XML as those formats are easily exchanged between various systems. If you have non-programmers dealing with test data, then I would suggest maintaining the data in spread sheets (open office, office et al.) and have a parser sitting in between excel and your program that converts excel data to JSON/XML. That way, your programmers are relatively happy since their programs only have to deal with light weight data interchanges and at the same time your business users can still continue using what they used for years.

As part of an Automation Solution, I would however be inclined towards JSON/XML.

Please pass your feedback and valuable comments on this website by writing to us. We would appreciate critical feedback very much.