Business Impacts
30+
Years of unstructured news data synthesized
Reduced manual effort
1.3
Billion news articles processed
Customer Key Facts
- Location : North America
- Industry : News & Publishing
Problem Context
Dow Jones has a 30+ year archive of premium news articles that continues to grow by an estimated 1 million incoming news articles each day. The organization wanted to provide scalable, flexible access to their 1.3 billion document premium news repository, which is among the world’s largest, via its new cloud-based content processing and storage platform, called Dow Jones DNA.
Challenges
- Terabyte-scale, unstructured data corpus
- Complex network effects are difficult to identify and visualize
Technologies Used
Google Cloud Platform
Google's BigQuery
Python
neo4j
spaCy
Enabling Research and Generating Insights From a Large Corpus of Premium News Content at a Greater Scale with Knowledge Graphs
Recognizing the need to showcase the depth and breadth of the DNA dataset, Dow Jones wanted a solution that could process large volumes of historical and streaming business news documents and find hidden insights by transforming text into named entities ( i.e. people, locations, money and events) and the relationships among them. In fact, they found that these articles could serve as data points that can inform evolving industry demand in portfolio management, sales, business development, risk target identification and aggregation of deal opportunities, among others.
Solution
Quantiphi processed their terabyte-scale, unstructured data corpus and developed a Knowledge Graph framework to help data scientists and developers discover insights related to network effects and business impacts of rare global events, such as a major natural disaster. Customers can also visualize other key events, hidden relationships, or unseen opportunities that could impact their business. The tool leverages Google Cloud Platform, the Dow Jones DNA - Data, News & Analytics service, TensorFlow, and a graph database platform to perform text mining, machine learning, data integration, and enterprise advanced analytics.
Results
- Time saved in manual text processing
- Estimated impact of rare events
- Intuitive visuals from text corpus
- Defined complex network efforts to uncover hidden relationships and insights