GCP Series: Cloud Big Table and Use Case Study

4 years ago   •   7 min read

By CloudNerve.com

What is Cloud BigTable?

A fully managed, scalable NoSQL database service for large analytical and operational workloads with up to 99.999% availability.

  • Consistent sub-10ms latency—handle millions of requests per second

  • Ideal for use cases such as personalization, ad tech, fintech, digital media, and IoT

  • Seamlessly scale to match your storage needs; no downtime during reconfiguration

  • Designed with a storage engine for machine learning applications leading to better predictions

  • Easily connect to Google Cloud services such as BigQuery or the Apache ecosystem

BENEFITS

Fast and performant

Use Cloud Bigtable as the storage engine that grows with you from your first gigabyte to petabyte-scale for low-latency applications as well as high-throughput data processing and analytics.

Seamless scaling and replication

Start with a single node per cluster, and seamlessly scale to hundreds of nodes dynamically supporting peak demand. Replication also adds high availability and workload isolation for live serving apps.

Simple and integrated

Fully managed service that integrates easily with big data tools like HadoopDataflow, and Dataproc. Plus, support for the open source HBase API standard makes it easy for development teams to get started.

Key features

High throughput at low latency

Bigtable is ideal for storing very large amounts of data in a key-value store and supports high read and write throughput at low latency for fast access to large amounts of data. Throughput scales linearly—you can increase QPS (queries per second) by adding Bigtable nodes. Bigtable is built with proven infrastructure that powers Google products used by billions such as Search and Maps.

Cluster resizing without downtime

Scale seamlessly from thousands to millions of reads/writes per second. Bigtable throughput can be dynamically adjusted by adding or removing cluster nodes without restarting, meaning you can increase the size of a Bigtable cluster for a few hours to handle a large load, then reduce the cluster’s size again—all without any downtime.

Flexible, automated replication to optimize any workload

Write data once and automatically replicate where needed with eventual consistency—giving you control for high availability and isolation of read and write workloads. No manual steps needed to ensure consistency, repair data, or synchronize writes and deletes. Benefit from a high availability SLA of 99.999% for instances with multi-cluster routing across 3 or more regions (99.9% for single-cluster instances).

BigTable Use Case:

Dow Jones DNA partnered with Quantiphi and Google to develop a Knowledge Graph for fast, robust analysis of the network effect of key events documented in over 30 years of news content.

Google Cloud Results

  • Synthesized 30+ years of unstructured news data to assess qualitative business impact of key events
  • Defined complex network efforts to uncover hidden relationships and insights
  • Exemplifies ease of using Google Cloud Platform with prototype Knowledge Graph, delivered in 10 weeks

Unlocking value in 1.3 billion news articles

Dow Jones has produced business and news content for more than 130 years and is one of the world’s largest news gathering organizations. Its publications and products include the flagship The Wall Street Journal, the largest newspaper by paid circulation in the United States; Factiva, Barron’s, and MarketWatch. As Dow Jones looked to support the digital transformation of its enterprise customers, the organization wanted to continue to provide scalable, flexible access to its 1.3 billion document premium news archive, which is among the world’s largest, via its new DNA platform.

Through DNA, Dow Jones is both a Google customer and a Google Cloud Technology Partner. To help DNA customers explore new possibilities being unlocked by cloud computing and machine learning, Dow Jones expanded the partnership to include Quantiphi, a Google Cloud Services partner focused on helping organizations translate the promise of big data and machine learning technologies into quantifiable business impact. The three-way partnership represents a powerful force with Dow Jones’ rich content archive, Quantiphi’s strong foundation in Google Cloud Platform (GCP), and Google as a preeminent cloud provider with a rich heritage in data science.

“Google Cloud Platform is hugely complementary to our DNA service because it really brings our datasets to life. Quantiphi delivers the services and expertise to tie everything together and bring the Knowledge Graph project to fruition and demonstrate the art of the possible.”

Niranjan Thomas, General Manager, Platform & Technology Partnerships, Dow Jones

Visualizing complex relationships

Recognizing the need to showcase the depth and breadth of the DNA dataset the team developed a Knowledge Graph prototype to help data scientists and developers discover insights related to network effects and business impacts of global events, such as a major natural disaster. Customers can also visualize other key events, hidden relationships, or unseen opportunities that could impact their business. The tool leverages GCP, the Dow Jones DNA – Data, News & Analytics service, TensorFlow, and a graph database platform to perform text mining, machine learning, data integration, and visualization of findings.

The Knowledge Graph example on the Dow Jones website demonstrates the impact of several 2017 hurricanes on insurance and other industries, exhibiting how Dow Jones DNA content about global events can be structured and depicted in a network diagram visualization for advanced analytics. The prototype reveals factual and inferred relationships between entities, and follows these associations to uncover critical insights. With this mapping, a full picture of the hurricane event ecosystem can be queried at scale.

Dow Jones DNA Diagram 1

 

The service can be customized for customers who want to key in on other types of global events or want a more comprehensive understanding of the network effects that can result, potentially uncovering impacts that were not apparent before. Visualizing the impacts as a network diagram can interconnect effects that might have multiple degrees of separation.

The Dow Jones and Quantiphi teams developed their Knowledge Graph concept in four weeks, then produced a fully working prototype in another six weeks. The team credits the short turnaround to the power of Google Cloud Platform and the synergies of the partnership between Dow Jones, Quantiphi, and Google.

“New technologies can often be difficult and expensive at the enterprise level, but Google Cloud Platform helps our DNA clients remove a lot of that friction. It democratizes advanced analytics by offering specialized capabilities that previously weren’t widely available, without requiring a lot of effort.”

Niranjan Thomas, General Manager, Platform & Technology Partnerships, Dow Jones

“Google Cloud Platform is hugely complementary to our DNA service because it really brings our datasets to life. Quantiphi delivers the services and expertise to tie everything together and bring the Knowledge Graph project to fruition and demonstrate the art of the possible,” says Niranjan Thomas, General Manager, Platform & Technology Partnerships for Dow Jones.

“New technologies can often be difficult and expensive at the enterprise level, but Google Cloud Platform helps our DNA clients remove a lot of that friction. It democratizes advanced analytics by offering specialized capabilities that previously weren’t widely available, without requiring a lot of effort.”

Dow Jones DNA Diagram 2

Fast and scalable analytics

Naturally, DNA Snapshots are very large and any solution had to be quick and highly scalable. “A combination of Cloud Bigtable and BiqQuery delivers the fast, powerful capabilities needed to support the Knowledge Graphs,” says Asif Hasan, Co-founder & President at Quantiphi. “With the help of Cloud Bigtable, we can easily store a huge corpus of data that needs to be processed, and BigQuery allows data manipulations in split seconds, helping to curate the data very easily. In the future, we anticipate usage of real-time querying in the Knowledge Graph and catering to manual queries to answer different questions from the premium news database, which can be a game changer.”

Dow Jones appreciates the broad set of services GCP offers around machine learning, allowing it to run TensorFlow on Compute Engine, run containers with Google Kubernetes Engine, gain high performance object storage with Cloud Storage, and create an analytics pipeline with tools such as Cloud Dataproc and Cloud Dataflow. For an upcoming Knowledge Graph the team is developing, Dialogflow enables a natural language conversational interface for delivering efficient and accurate responses to users interacting with the system.

“With GCP, it’s easier to synthesize large amounts of unstructured data and define complex network efforts. And with Dow Jones DNA fueling our Knowledge Graph, you can quickly leverage decades of knowledge in a way that makes connections more accurate. These business insights can unlock new revenue opportunities and reduce risks and costs for our customers.”

Asif Hasan, Co-founder & President, Quantiphi

Customizing Knowledge Graphs

The Dow Jones and Quantiphi teams are enthusiastic about helping clients build their own Knowledge Graphs, analyzing events tailored to their specific industry and use case. For financial services firms, for example, the tool can help with signal identification for investment management and event risk modeling, by showing the financial impacts across different companies and industries when certain events occur. In healthcare, a Knowledge Graph can provide intelligence for prioritizing research and development for new pharmaceutical and life sciences products, by analyzing published medical study results along with business results from publicly traded health sector companies. For consulting firms and other companies, the tool can provide competitor and market intelligence, by mining published information related to products or industries of interest.

The future promises integration of live data feeds from social media, weather information, census data, and more.

“In the past, preparing certain types of analyses could require months of sifting through news articles, and you still might not glean important relationships between events. With GCP, it’s easier to synthesize large amounts of unstructured data and define complex network efforts. And with Dow Jones DNA fueling our Knowledge Graph, you can quickly leverage decades of knowledge in a way that makes connections more accurate. These business insights can unlock new revenue opportunities and reduce risks and costs for our customers,” says Asif.

Spread the word