Localization Testing-Selenium, NLP & Machine Learning

For Automated Localization Testing, generally, we get the strings from the resource file then compare that with the extracted texts from the website using selenium or other tools.

The problem with this approach is

  • One needs to write XPath or any other locator strategies to read individual strings from the web page is a bit cumbersome.
  • Every time the content of the page changes, we need to update the locators.
  • One needs to load the resource files for each language is again time taking.

As a part of this solution, we can use machine learning and natural language processing. We can harness the power of a very powerful tool Lingua.

Quick Info on Lingua

  • This library tries to solve language detection of very short words and phrases using machine learning and natural language
  • Makes use of both statistical and rule-based approaches
  • Language Detector for more than 70 languages
  • Works within every Java 6+ application and on Android
  • No additional training of language models necessary
  • Offline usage without having to connect to an external service or API

The solution is:

Keep all the target URL in the properties file and read one by one.

FileReader reader;
reader = new FileReader(System.getProperty("user.dir") + "\\" + "links.properties");
Properties p = new Properties();
Set set = p.entrySet();
Iterator itr = set.iterator();
int page = 1;
while (itr.hasNext()) {
Map.Entry entry = (Map.Entry) itr.next();
System.out.println("Page No: " + page);
L10nSteps LocalizationSteps = new L10nSteps();
LocalizationSteps.webPageCheck(driver, "https://www.att.com/" + lang + entry.getKey(), lang);

Extract all the text from the page using generic XPath

List<WebElement> allTextElement = driver
.findElements(By.xpath("//*[string-length(normalize-space(text())) > 0]"));

Use Lingua, to check the translated text

final static LanguageDetector detector = LanguageDetectorBuilder
Language detectedLanguage = detector.detectLanguageOf(tempStr);

public boolean localCheck(String tempStr) {
Language detectedLanguage = detector.detectLanguageOf(tempStr);
if (detectedLanguage.toString().equals("ENGLISH")) {
return true;
} else
return false;

Upon run, it will create a JSON file for the untranslated text for the particular page. Using the XPath, one can quickly navigate to the element and analysis if the untranslated text is expected. If it is expected, then marked flag valid as false and create a folder input_pt-br(input_lang in root) and keep it. From the second run itself, it will ignore those expected texts.

"https://www.lumen.com/pt-br/about/4th-industrial-revolution.html": [
"valid": "Y",
"xpath": "(//*[string-length(normalize-space(text())) > 0])[437]",
"index": 437,
"value": "Video Player is loading."

Using Maven to use Lingua


You can refer to the solutions here. As always, if you get stuck, ask for help.

About author Sumit: QA for 15 years and passionate about Test Automation, Service Virtualization, Web service, DevOps & ETL-BI Testing. Currently working for a leading telecom company CenturyLink as an offshore Automation COE lead. Please connect with him on LinkedIn.




Test Automation Enthusiastic.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Using iOS Notifications, Cryptography and iCloud to build your own Chat App X

Why a Host is not a String and a Port is not an Integer

Understanding the basics of Google Tag Manager — Review

WEHack: Springboard

Google Cloud Platform Vs. Amazon Web Services

Start List From “i” Position & Get Items Upto “i” Position In Flutter

The Very Beginning of Writing Code. —  A Weekly Series.

How TO Hack Sql Server

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Test Automation Enthusiastic.

More from Medium

Natural Language Processing — Kick Start with NLTK

A Beginners Guide To Natural Language Processing In Machine Learning

Implementing TFIDF from scratch.

List of Machine Learning dataset from different domain