Dow Jones

Document Extraction To Assess Qualitative Impact Of Key Events

BFSI
case study

Business Impacts

30+

Years of unstructured news data synthesized

Reduced manual effort

1.3

Billion news articles processed

Customer Key Facts

  • Location : North America
  • Industry : News & Publishing

Problem Context

Dow Jones has a 30+ year archive of premium news articles that continues to grow by an estimated 1 million incoming news articles each day. The organization wanted to provide scalable, flexible access to their 1.3 billion document premium news repository, which is among the world’s largest, via its new cloud-based content processing and storage platform, called Dow Jones DNA.

Challenges

 

  • Terabyte-scale, unstructured data corpus
  • Complex network effects are difficult to identify and visualize

Technologies Used

Google Cloud Platform

Google Cloud Platform

Google's BigQuery

Google's BigQuery

Python

Python

neo4j

neo4j

spaCy

spaCy

Enabling Research and Generating Insights From a Large Corpus of Premium News Content at a Greater Scale with Knowledge Graphs

Recognizing the need to showcase the depth and breadth of the DNA dataset, Dow Jones wanted a solution that could process large volumes of historical and streaming business news documents and find hidden insights by transforming text into named entities ( i.e. people, locations, money and events) and the relationships among them. In fact, they found that these articles could serve as data points that can inform evolving industry demand in portfolio management, sales, business development, risk target identification and aggregation of deal opportunities, among others.

Solution

Quantiphi processed their terabyte-scale, unstructured data corpus and developed a Knowledge Graph framework to help data scientists and developers discover insights related to network effects and business impacts of rare global events, such as a major natural disaster. Customers can also visualize other key events, hidden relationships, or unseen opportunities that could impact their business. The tool leverages Google Cloud Platform, the Dow Jones DNA - Data, News & Analytics service, TensorFlow, and a graph database platform to perform text mining, machine learning, data integration, and enterprise advanced analytics.

Results

  • Time saved in manual text processing
  • Estimated impact of rare events
  • Intuitive visuals from text corpus
  • Defined complex network efforts to uncover hidden relationships and insights

Thank you for reaching out to us!

Our experts will be in touch with you shortly.

In the meantime, explore our insightful blogs and case studies.

Something went wrong!

Please try it again.

Share