Case Studies


Dow Jones Enriches Factiva Global News Service With Knowledge Graph

Written by Joaquin Melara (Lead Semantics, partner of LangOptima)

Commentary from
Ezra Duong-Van (Dow Jones, Data and Analytics Solution Specialist)

KNOWLEDGE IS THE RESULT OF UNDERSTANDING
Dow Jones Factiva news is a powerful business intelligence platform for actionable insights that support strategic decision-making and to proactively identify and respond to opportunities and risk. This intelligence (facts) from news, however, is gleaned by human analysts actually reading to make sense by understanding the content, which remains a time consuming and arduous task.

While Factiva includes powerful search techniques such as ‘codes’ for subject, event, industry, region, language, and company to narrow the qualifying articles for a given purpose, the act of making sense and understanding of the final, albeit reduced set of articles by human readers still remains a bottleneck!

The collaborative effort between Dow Jones Factiva and Lead Semantics squarely targeted this ‘Automated understanding of text’ that results in the knowledge graph extracted from source news. The knowledge graph itself holds the extracted facts for querying and bypasses the actual need for humans’ time to read and comprehend the news for targeted purposes.

Ezra Duong-Van (Dow Jones) on challenges with unstructured data:
“When you deal with unstructured data, you might look at aspects of the document and not the full text to see if you get a better understanding that way. There’s all these little tricks to look at unstructured data to figure out how to get an accurate answer out.

"We have a lot of customers who are trying to do this on their own. And they are really struggling with working with unstructured text, how news is reported, and how it affects the business world. That’s still a challenge today. And we’re not familiar with anyone that has the perfect, out of the box solution. There may be one in years to come, but there is none today.

"One of the biggest appeals of Lead Semantics is that they enable customers to take a large volume of articles and make sense of them. They offer a very good solution today that does this.”


LEAN APPROACH TO DEVELOPING SUCCESSFUL PILOTS
The ESG domain was chosen as the first focus area for the joint effort recognizing its growing importance as well as recognizing that traditional news monitoring platforms face a challenge in providing insights that integrate ESG factors.

As part of the effort, Lead Semantics’ TextDistil – a modern NLP and Semantic Technology platform was trained to leverage Factiva codes and to ingest the results of Factiva Search to produce a knowledge graph focused on ESG and investment related facts from the news articles.

The resulting graph composed of facts supports two high value use cases; 1) instantly verifiable (augmented) semantic search 2) populating an enterprise Knowledge Graph with facts extracted from news/articles for on-going curation and high value knowledge analytics.

Integration of TextDistil extends Dow Jones Factiva with knowledge graph based analytics and search enabling deeper and comprehensive coverage and analysis of ESG impact of companies and industries.

Ezra Duong-Van (Dow Jones) on challenges with pilot programs using news data:
“Our biggest challenge when encountering customers who are looking to put together projects and use our news to do it is that their success rate is not very high. They do not have the funding, time, and in-house expertise to develop a proprietary solution which has already been solved by TextDistil.

"Having Lead Semantics as a partner when we encounter our customers who are going down this journey for a particular domain such as ESG would allow us to have a conversation around their needs. For example, they want to understand the ESG state of thousands of companies such as which are doing better over time, which are doing worse over time, and why.

"I know if they were trying to attempt to do that in-house, they would probably do a pilot. They probably would not get meaningful stuff out of it, then they would probably shelf the product. Introducing Lead Semantics into this process would greatly increase the probability of the pilot program being successful.”

"The solution empowers Factiva users to explore voluminous news articles and data through an easy to use TextDistil Search engine which programmatically builds relationships between thousands of concepts mentioned in huge volumes of articles in a matter of minutes and hours rather than days and weeks.”



SETTING OUR SIGHTS ON MODERNIZING NEWS SEARCH
Dow Jones and Lead Semantics embarked on a collaborative journey to develop a potent ESG knowledge graph integration with Factiva. This involved utilizing Lead Semantics’ flagship product, TextDistil.

TextDistil is a modern natural language processing (NLP) and semantic graph technology platform, which is capable of extracting unstructured ESG-related data from news articles and corporate reports. The extracted data is structured into a knowledge graph, where relationships between concepts such as companies, events, and ESG attributes, etc. are computed algorithmically.


ENHANCING FACTIVA WITH AN ESG KNOWLEDGE GRAPH
The integration introduced enhanced search functionalities, allowing users to query for ESG signals in Factiva news using ESG-related keywords, company names, and other specific criteria. The knowledge graph enabled the platform to display interconnected concepts for rich insights, showcasing how news events and ESG considerations intersect.

Ezra Duong-Van (Dow Jones) on Lead Semantics’s flagship product, TextDistil:
“Companies’ ESG scores and trends are now visualized in an interconnected browsable way, empowering users to make informed decisions based on timely comprehensive insights.”

“It’s not necessarily just the performance gains, but it’s something that’s repeatable. To have a human do this, they’re a very expensive human. They’re very highly trained. They’re a subject matter expert. They’re very good at doing research. But once I set up a system like this, it can constantly run, and so we can constantly monitor all of these things that are risks or opportunities to the business."

It’s a much more scalable model. And the fact that a human can only read so much, you know, rather than reading 50 articles, we can cover 500 articles or 5000 articles or so on. So it’s being able to expand your ability to not only find a needle in a much bigger haystack, but it’s also to programmatically keep that running on a regular basis at a much lower cost.”


NEWS WORTHY OUTCOMES
The collaboration resulted in a transformative enhancement of Factiva, aligning it with the evolving landscape of ESG-driven decision-making. The solution empowered users to:

* Monitor and assess companies’ ESG performance in (near) real-time.
* Identify potential risks and opportunities tied to ESG events.
* Understand the correlation between news events and ESG impacts.
* Analyze industry trends and benchmarks in the context of ESG.Make data-driven decisions that integrate ESG considerations.

The integration of Lead Semantics’ ESG knowledge graph enriched Factiva’s capabilities, making it a crucial tool for professionals seeking both comprehensive news coverage and insights into ESG factors. Dow Jones’s leadership in business intelligence, responding to market demands and contributing to more informed, sustainable decision-making across industries is bolstered.

The collaboration between Dow Jones and Lead Semantics stands as a testament to the power of innovative partnerships in addressing emerging market challenges. As a result, both companies are able to provide a valuable solution to users and contribute to the advancing the understanding and integration of ESG considerations in the business landscape.

Ezra Duong-Van (Dow Jones) on TextDistil’s versatility:
"Lead Semantics’s flagship product, TextDistil can be used for a lot of different things… It is not designed for one industry or one topic. By taking the news, we can distil it down to an industry or a topic and then Lead Semantics can then very quickly go through all of those articles and understand what the trends are that we’re looking for. Anything that’s captured in the news is a good candidate for this, and the use cases are all over the map."

"Anything that’s captured in the news is a good candidate… the use cases are all over the map”


Watch the full webinar here: https://www.youtube.com/watch?v=iFppR7AMmaY

Workflow orchestrator

How a workflow orchestrator transformed manual translation and audio-to-text workflows, by providing a single portal, with a set of microservices, that can automatically classify documents and audio file into specific automated workflows, whilst semi-automatically updating a Kanban-style process management.

I. ProblemThousands of translation and audio-to-text submissions ran through the localization department with manual processes annually. This was maintained by a significant administrative investment. Also, other departments, such as Editorial and Marketing departments, needed increasing quantities of transcription and translation services.

II. SolutionAfter a thorough review of available technology, the non-profit decided to design its own component-based language factory, with a set of microservices, tied together with the Blackbird.io workflow orchestrator as its backbone with a recursive microservice architecture.

A folder portal (or “hot folder”) was created on Dropbox in which translation or audio-to-text submissions can be placed. A submission is automatically classified based on file type and file name and automatically assigned to the appropriate semi-automated workflows. The file names are created with a simple-to-use file name builder on a Google sheet and those file names are automatically interpreted through Regex (regular expressions) classifications within Blackbird.io.

As each file submission travels through the workflow, the steps are semi-automatically updated on Slack channels and Trello (Kanban style) through Blackbird.io. Trello has its own automations set up to remove and assign individuals to cards at various steps. These product-based cards are templated in Trello and the workflow automatically copies the correct template based off the filename of the original submission.

Aside from translation submissions, Blackbird also enabled a microservice to be built for audio-to-text by using Transkriptor and OpenAI’s API and its Whisper feature. Through prompting and classifications, this microservice can transcribe audio, add paragraphs, timestamps and speaker diarization.Aside from a MT-only microservice, another TMS microservice classifies files for TMS into four different domains in order to populate four different Translation Memories and Glossaries.

Other microservices can convert files automatically to use the correct one for each tool and can also be used to convert output files to the desired final file formats.Once a file submission has triggered a workflow, the workflow is setup to add useful information to the files, such as word counts and it archives file versions to an archive folder automatically.

III. Experiences, Benefits, and MetricsManual administrative tasks have reduced by around 40 hours a week.Various digital tools have been, or are in the process of being, sunsetted and replaced by API enabled equivalents and people outside of the localization department can submit translations and audio-to-text requests with ease.LangOps staff can easily analyse and adjust each step of the process. 3rd party components of the workflow can easily be switched out or adjusted. Language assets can be used, independent from TMS, for further LangOps workflows and model training.

Localization Support

Scaling High-Value Localization Support for a Global Automotive Client

Case Study | Client Solutions & Support
Lead:
Mihai Petrescu (Solutions Architect)
Industry: Automotive
Focus: Turnaround time (TaT), workflow scalability, quality assurance, and client relationship management

Executive summary

A global automotive client experienced rapid localization volume growth and increasing operational complexity. LangOptima’s Client Solutions & Support function implemented a 3–6 month stabilization and scaling program that combined workflow redesign, automation, and quality systems. The result was a more predictable delivery engine, improved throughput on large-volume jobs, reduced operational bottlenecks (particularly around tooling and segmentation), and a stronger, more transparent operating cadence with the client.

The Challenge

The client’s localization program was growing quickly, with high-value potential but increasing execution risk. Delivery constraints were driven by:
- Turnaround time pressure against existing service commitments
- Workflow bottlenecks across assignment, file handoff, and pivot-language handling
- Tooling limitations (notably problematic segmentation in Lokalise for certain file types, and limited automation/propagation behavior)
- Quality risk at scale, especially during volume spikes
Account risk without proactive communication, structured reporting, and escalation paths

The engagement needed to scale output without proportional increases in manual project management effort.

Objectives

LangOptima aligned the program around four measurable objectives:
- Stabilize turnaround-time performance and reduce SLA risk for high-volume delivery
- Increase throughput for large files and peak periods using parallelization and better resourcing
- Maintain or improve quality with scalable QA controls and consistent linguistic resources
- Strengthen the client operating rhythm via transparent reporting, business reviews, and escalation mechanismsSolution overview


LangOptima delivered a combined operational + technical approach:
- Turnaround time optimization through baseline analysis, triage, escalation, and automated task orchestration
- Workflow re-architecture to bypass segmentation constraints and improve linguist productivity via an offline XLIFF pipeline
- Automation and resource strategy using assignment rules, reminders, and performance-based vendor pools
- Scalable QA
using automated pre-QA screening and hybrid human validation
- Account development and stakeholder coordination with consistent KPIs, client advisories, and partner escalation

What LangOptima implemented

1) Turnaround time optimization (TaT)
LangOptima began with a performance baseline: actual delivery times were benchmarked against SLA commitments and client expectations, segmented by order type and language pair. This analysis identified the highest-risk lanes and clarified where delays accumulated (e.g., sourcing, handoffs, pivot processing, or tooling friction).

Key interventions included:

- Parallelization for large files
Large-volume orders were split into parallel workstreams executed by multiple linguists. Consistency was maintained through robust guidelines and a designated lead linguist/reviewer responsible for harmonization before delivery.

- Direct source-to-target resourcing
To reduce delays caused by pivoting through English, the team prioritized sourcing and onboarding linguists who could translate directly from Italian into the required target languages—especially for the most frequently requested pairs.

- Dynamic linguist assignment via XTRF
Assignment rules were configured to match jobs to linguists based on availability, reliability, and performance signals, increasing speed-to-start and reducing manual coordination.

- Triage, escalation, and backup coverage
Urgent or high-value requests were routed through a triage model with defined escalation paths and pre-identified backup linguists to absorb demand spikes.

- Automated notifications and proactive late-risk alerts
Deadline reminders and late-risk signals were automated to enable intervention before delays impacted the client.

- Performance management of delayed resources
Recurrent lateness was addressed through targeted support/training or by adjusting the preferred pool to prioritize consistently reliable linguists.
Client communication was treated as part of delivery. For exceptionally large or complex projects, LangOptima aligned on realistic turnaround times and offered structured alternatives such as phased delivery and interim status updates.

2) Workflow improvements: overcoming segmentation constraints
A persistent productivity drag came from Lokalise limitations for specific content types (e.g., poor segmentation patterns). To resolve this without compromising traceability or client requirements, LangOptima introduced an enhanced offline translation workflow:

- Offline XLIFF export from Lokalise
When segmentation issues were detected, content was exported as XLIFF from Lokalise (either full language exports or filtered keys), ensuring the exact scope was captured for more controlled processing.

- Segmentation optimization in Phrase
XLIFF files were imported into Phrase, where segmentation rules were tuned (including SRX-based approaches when needed) to produce linguist-friendly sentence and phrase boundaries. This improved translation memory leverage and reduced friction during post-editing.

- XTRF integration for project automation
The Phrase–XTRF integration was used to automate project creation, synchronize key details (client, deadlines, workflow steps, language pairs), and maintain financial/resource tracking inside XTRF—reducing manual PM overhead while improving visibility and reporting.

- Post-translation sync and reimportAfter translation and review, finalized XLIFF files were exported and reimported into Lokalise (or delivered directly to the client where appropriate), preserving an end-to-end workflow that remained compatible with the client environment.

3) Quality at scale: automated screening + human validation
To maintain consistency while volumes increased, LangOptima made automated quality evaluation a standard part of the pipeline:

- Automated pre-QA screening (Auto LQA)
Jobs were screened prior to human post-editing to flag issues in accuracy, fluency, style, and terminology. Linguists then focused attention on high-severity segments rather than performing uniform manual checks across the entire file.

- Hybrid validation workflow
Automated findings were reviewed directly in the CAT workflow: linguists validated, edited, or dismissed flagged items. For large-volume work, Auto LQA was applied broadly, with human validation required only for segments below a defined quality threshold (e.g., under a target score).

- Post-delivery sampling and reporting
A percentage of delivered content was sampled for automated scoring to produce shareable scorecards. These were used to identify trends early and support transparent conversations with the client about quality and continuous improvement.

4) Account development and stakeholder coordination
Operational improvements were paired with stronger stakeholder management:

- Structured client operating cadence
Regular business reviews were introduced to share performance data, TaT trends, and workflow insights—creating opportunities to propose efficiency improvements and value-added initiatives.

- Consultative guidance on content preparation
The team advised the client on source-file optimization and launch best practices to reduce downstream localization friction.

- Partner escalation and feedback loops
With technology partners (e.g., Lokalise), LangOptima ran consistent escalation routines for technical issues and feature requests, backed by clear business impact narratives.

- Internal alignment mechanisms
Daily check-ins and cross-functional coordination (Client Solutions + Language Talent teams) ensured priorities, risks, and capacity decisions stayed synchronized.

Risk management
The program included explicit mitigations for common scale risks:

- Volume spikes: expanded linguist pool, parallel workflows, rapid escalation coverage
- Quality drift: automated controls, sampling, performance tracking, and current linguistic assets (style guides, glossaries, preferences)
- Tooling bottlenecks: standardized offline processes and partner engagementBurnout/attrition: workload monitoring, rotation, and recognition practices
- Client trust: proactive transparency and clear escalation paths

Outcomes
Within the 3–6 month window, the client’s localization delivery model became more predictable and scalable through automation, workflow redesign, and quality instrumentation.

- Improved TaT consistency for high-risk order types and top language lanes
- Faster job start times and fewer manual handoffs due to automated assignment and alerts
- Higher linguist productivity on previously problematic file types via segmentation optimization
- Stronger quality governance through automated screening, threshold-based validation, and scorecard reporting
- Increased client confidence due to regular reviews, clearer reporting, and responsive escalation paths

Tools and systems
- Lokalise
(source environment, exports/imports)
- Phrase (CAT + segmentation rules)
- XTRF (project automation, assignment, tracking)
- Auto LQA (pre-QA screening, sampling, scorecards)

Reach the largest audience possible

We will meet you where you are and rapidly help you make improvements with real results.