Data Driven with excel and/or yaml


Following on our discussion with frameworks, in our previous section, we read about data-driven framework. In fact we also mentioned that data-driven is a capability for a framework, rather than it being an entire solution.

In this section, we are going to take a concrete example on an e-commerce website and see how we can implement this capability.The following are the pre-requisite reads for this section:

Use Case:

As part of my framework, I need the following features viz.

  • I want to be able to specify my data format – excel, yml, xml, json
  • I want to be able to specify the file that contains my data
  • I want to be able to “change” the data file at execution time without too much code change
  • if I don’t specify my data file, there should be a default data file.

As we see above, my Test Automation suite/test cases should be able to use the data in a much efficient and less painful way.

We have already seen how to parse different data formats, here we will see how to externalize data files and how to specify test data input with minimal code changes.

Object Repository:

We hear this term a lot in the Automation space and QTP has made it very popular. Object repository as the name suggests is a repository that contains the objects we want to interact with. With the advent of Selenium and more mature tools, that are open source and configurable/flexible, we can create object repository with one single line in some cases. Anyways, for this framework, I would like to externalize the locator (html locator) information instead of hard-coding it in the step definitions. Since I already know an extremely easy format in Ruby that can maintain key,value pairs i.e. YAML, I will go ahead and create a yml file that contains my locator information.

We have already seen how to parse a YAML file into a hash data structure in Test Data with YML with a single line and I am doing to do the same here.

First I would create a yml file in ‘lib/config’ folder and name it ‘object_repository.yml’. We will see how to populate the contents in the next section based on the workflow, for now, the way we would load it up and set in context would be through env.rb as follows:

Data Helper:

A data helper module lets me do the following:

  • Return a data structure (hash) that contains my data in a file
  • Has a default data directory
  • Has a default data file
  • I can override my default directory and default file if needed

Below is a data helper that works for excel, yml, json and xml. In this section we will talk about excel though. In subsequent sections, we will talk about other formats.


Class methods

We define :data_excel (the colon means this is a ruby symbol, for now it represents a variable that’s it) , which is the data structure that gets returned to the calling code. The data structure would be a hash. The attr_accessor means it can be ‘get’ and ‘set’. We don’t have to write the getter and setter methods as in Java. The rest of the variables represent other data formats variables, which we will talk on subsequent sections.

Since we wrapped the attr_accessor methods under a class, it means that we can access those methods without an instance of the class in which this code was defined [Think of this as a Java “static” keyword for now]. Also when we “include” this module, the class methods would not get mixed in [that’s how Ruby behaves]. include only mixes instance methods, not class methods [If we needed something to happen when a module gets included into a class, we need to define self.included, anyways, there are so many patterns in Ruby, there is so much to learn]. For now, what we need out of this Data Helper is to return a hash representing the data, so let’s just focus on that.

Set Default Data Directory

We set a default data directory with this method. This is just in case we want to go with defaults if not already set. The “two pipes equal to” symbol is just a ruby way to tell that if the variable @data_directory is already set, then use it, otherwise set it to ‘lib/config/data’

Set Data Directory

If we explicitly need to set the data directory, we call this method. As you can see if we have called this method and @data_directory is set, then data_default_directory would use the value set by user.

Specify Data file:

This method returns the excel file that this module is supposed to use. We read the file name from an environment variable ‘DATA_EXCEL_FILE’. If the environment variable is NOT set, then we use the ‘default.xlsx’ file that is present in default data directory as set above. Remember the use case we talked above how we are providing a user to specify data file, but in case not specified, we have default values.

Parse data file and return data:

This method contains the core logic to parse the excel file that is set by the previous methods. If you read the post Test Data with excel, the section where we get a table hash by specifying the column headers, you would already understand most of the code here. The logic is below. We pass data_sheet name and row number of the data set that we require to get that dataset. For example if we need the excel row data that is present on row number “2”, then we pass 2 [yes this number is the same number that shows up on excel, so not required to add or subtract values for index differences]

As usual we get a workbook object, then get the worksheet object. We are assuming that the header row (top row) in the excel workbook contains the column names and it is NOT dataset. So we read the headerRow and convert it into a String array. This string array is passed to the method “get_table” and that ultimately returns an array of hashes, each hash having keys as column name and values as actual cell values.

Similarly we have loaded the YAML file and retrieved the hash (isn’t that so easy compared to excel;))

Where are my data files?

As we defined above, our default data directory is ‘lib/config/data‘, however if you wish to override, just call the method set_data_directory and pass in the directory you wish to set. I have set my data files as per the below structure.


We will populate the data files in next pages of this series when we talk about the concrete workflow we would like to automate using data driven approach.

Read next:

Closing Thoughts:

So in this post we are setting the stage and hence we talked about


  1. object repository (nothing but returning a hash from a yaml file that contains html locator information) , a way for us to externalize html locator information
  2. Data requirements through a Use case model
  3. Data helper that helps externalize data, parse and configurable at run time
  4. Location of data files (configurable)