What is Selenium GRID?

While doing cross browser test Automation, often we are overwhelmed with the # of combinations of browsers that are out there, on which the application needs to be validated. Sure, we can have the browsers installed on our box, but we are limited because we can have only *one* version of a browser flavor at a given time i.e. generally it is difficult to install and maintain different versions of chrome (for example) on the same machine AND have the script talk to multiple chrome versions.

Also we know that one of the selling points of selenium is that the same script can execute on any version of the supported browser right?

So in short Selenium GRID is a network of machines that has various versions of browsers installed and the network is intelligent enough to delegate script execution to the Node that has the requested browser configuration [by the script]

There are two major parts of being able to use a Selenium GRID

  • Set up a GRID (Plan and have a strategy of what browsers and set up is required)
  • Use RemoteWebDriver api to communicate with the GRID

What are the benefits ?

  • Instead of having the browser pop up on your box every time while executing delegate it to a set of machines and they will run and return back the results to you
  • Be able to test on different browsers and versions and platforms (Windows and *nix)
  • Like the use case for Big Data, cheap hardware machines [old machines like those 1gig, centrino boxes for eg.] can be spun together into a selenium GRID. This works because all we need is a browser to work on the machine, we are not interested in the whole performance of the OS
  • Manage the browser infrastructure in a much flexible and efficient way when updates to browsers [or new browsers] arrives in the market
  • Since you follow JSON over write protocol aka. webdriver protocol, expanding to use mobile automation tools like Appium feels like a breeze
  • Many more….(as you start using it)

Quick Explanation

Selenium GRID creates a network of HTTP servers where in there are two roles

  1. Hub
  2. Node

Hub is the orchestrator that takes incoming payload that contains a request for a certain configuration of browser.If the grid has a node that matches the browser configuration requested in the payload, the hub establishes a session and the script commands execute on that browser from there on. If there is no match, then hub returns an error

Node registers to a hub with a certain list of browsers it has. There are number of configuration options and we will talk in detail. A node can have multiple instances of browsers.

Architecture and WorkFlow

Local/WebDriver Execution

Whenever we execute a selenium script on a local box, the below is the general architecture flow. This flow represents local execution, otherwise WebDriver execution.

  • Our code (Ruby, Java, c# whatever) communicates with Selenium WebDriver or otherwise the Browser specific client driver that acts as HTTP proxy. Ex: chromedriver.exe, IEDriverServer.exe, SafariDriver etc.
  • A session is created and then the HTTP request and response communication happens. Request is asking to click, set text etc. Response is getting the result of each command, get text, results of execution and so on
  • The difference between Selenium RC [Selenium 1.0 and earlier] and WebDriver is that WebDriver makes native browser calls whereas RC injects javascript into the browser that in turn gets executed.

Selenium_webdriver

A little more details

A little more detailed diagram that explains the HTTP calls that happen between the code – Webdriver – Browser. As you can see the AUT (application under test), otherwise the URL of the application will be loaded into the DOM of the browser, once a session is established. After the session is created, then it is back-forth communication between in the same path as shown below.

selenium-rc_architecturepng

RemoteWebDriver / GRID Execution

As we mentioned above in the beginning, one task would be to set up a GRID and another would be to use RemoteWebDriver api to consume the capabilities offered by the grid.

The below architecture diagram is assuming that you have a GRID set up already with the browser combinations as mentioned in each Node. Without going into too many details, here are the basic steps to be able to work with a GRID

  1. Decide what  versions of browsers are required for your cross-browser testing – Have a Matrix with the parameters [Browser,version,platform]. We will talk more as we move in in this tutorial
  2. Install the browsers on different machines [GRID nodes]
  3. Have the WebDrivers (chromedriver.exe, IEdriverServer.exe…) available in environent PATH
  4. Start the Selenium GRID HUB on a machine
  5. Start and register all GRID nodes[Step 2] to the HUB
  6. At this point, we should have a Selenium GRID up and running
  7. In the automation code, use RemoteWebDriver api and set the DesiredCapabilities object values [You can use any supported language Ruby, Java, C# ….]. The RemoteWebDriver api will have a url that points to the GRID HUB.
  8. That is it ! Once you execute, the execution starts on the node that matches the requested browser

We will do this step by step as we move in this tutorial, but first let’s look at the overall architecture diagram as to how the workflow happens. Some jargon below.

  • JSON over wire: JSON messages hold the entire information that is required by the server [can be Selenium GRID or WebDriver components] and hence instead of RPC, plain JSON messages are exchanged between server and client. Also referred to as WebDriver protocol
  • Desired Capabilities: Object in automation code that has requesting browser configuration information
  • RemoteWebDriver: Object in automation code that knows how to communicate remotely with WebDriver / Selenium GRID

Overall_GRID_architecture

Summary

We tried to convey as much information as possible yet keeping the concepts simple. However please bear in mind that Selenium GRID value is fully realized on a network meaning the value add is when we set up a swarm of machines that have different kinds of browsers. Of course we can execute our scripts by starting a GRID on a local machine, however we could as well just execute using WebDriver. Using a RemoteWebDriver (aka. GRID) would be an overkill.

Also bear in mind, that since Selenium GRID goes on a network, we might have to work with network teams in corporate environments to see if specific ports are blocked and NOT assume that if it works locally, then it works on a network.

One last thought. Setting up a selenium GRID and managing it [creating logs, monitors for health etc.] is a significant time consuming task. Hence if you are a corporate environment and you would NOT want to manage it on-premise, you have other options like Sauce Labs that of course comes at a cost. So please do a cost-benefit analysis before jumping into implementation right away.

My advice would be to start with Selenium GRID have a couple of machines spin up into a GRID, run with it for a while and based on your requirements in terms of increasing cross browser versions, make a decision if you would like to use cloud services like Sauce Labs. Many start ups and small/medium businesses set up their own GRID in house, because it is extremely easy to set up and you can tear it down and change configuration [there are many configuration options] as you mature with it.