Enabling your business with the power of connected data

Architecting your Graph Analytics Platform

6 min readJan 19, 2022

Graph analytics is increasingly an indispensable part of enterprises’ journey to become insights-driven and advance their AI enablement initiatives. Currently, the challenge in these initiatives is operationalization and post-production maintenances. Enterprises often do not have the required skillsets to solve these problems and do not have the necessary tools in their current enterprise architectures. This blog post will focus on covering some high-level architecture designs on how you can start with a minimal set-up and grow into a scaled set-up as your enterprise’s needs and adoption of Graph Analytics develops.

I got you covered here. It doesn’t have to be “go big or go home…”. You got options.

I have chosen Neo4j and GCP (Google Cloud Platform) to cover the Graph Analytics Platform architecture as my most recent projects have been on it. The Graph Analytics Platform architecture can be designed and deployed with AWS (Amazon Web Services), Microsoft Azure, and other major cloud vendors or on-prem infrastructures. You might have guessed that there is no mention of an Orchestrator or Scheduler in this architecture. I deliberately (I didn’t want to venture into the rabbit hole) decided to skip that part as there are many available applications and frameworks, and everyone has their preference.

Minimal

The Minimal architecture was designed for enterprises who’re new to Graph Analytics and want to take a leap into the next phase of their analytics journey. In this architecture, the Graph Database is focused on providing support role of enriching data and generating insights. This set-up will cover the whole nine yards of Graph Analytics from getting the enterprise’s analytics team trained, creating familiarity among business users, and possibly enjoying the ROI.

Scaled

The Scaled architecture was designed to be either built incrementally starting from the Minimal architecture or for enterprises familiar with Graph Analytics and are looking to push the boundaries of their AI/ML capabilities. The Graph Database goes beyond supporting roles in this architecture and enables operational applications for Ex. Realtime Card E-Commerce fraud detection, Border Control (Immigration Clearance), Etc

Data life-cycle

Data life-cycle is a key aspect that drives the actions that are performed by the different parts of any modern Data Analytics Platform. I have mapped down the data life-cycle below in the way I see it (a very opinionated view).

The following architecture diagrams cover the different parts of the Graph Analytics Platform. I have grouped the architecture into different sections based on the functions they serve.

Minimal — Grouping

Graph Analytics Platform Architecture (Grouped) — Minimal

Scaled — Grouping

Graph Analytics Platform Architecture (Grouped) — Scaled

Collect

This section is where the data from across all the applications in the enterprise will be made available. Here, we have placed three different data sources, each hosting a specific type of data.

Big Query (BQ): Serves as the Data Warehouse or Data Mart hosting structure and semi-structured data

Google Cloud Storage (GCS): Blob storage acts as a data lake and hosts semi-structured and unstructured data Ex. JSON, TXT, Etc…

Apps (Applications): This can be any of the operational applications which you would like to connect with your Graph Analytics Platform to get active feedback to improve the application experience for its users.

Process

This section is where the data is acquired from its source, either processed or raw. And then, all the pre-processing work is performed before the data is sent to the Graph Data Platform (Neo4j). Over here, I have chosen two services to support batch-based and stream-based data acquisition and processing.

Kafka (Confluent/Apache): Can be configured to acquire data from various sources through the readily available source plugin connectors available from Confluent. Using Neo4j Kafka Connect Plugin, you can configure a Sink Connector to start writing data from any topic you can subscribe to in Kafka into the Neo4j.

Cloud Dataproc: Spark has become a standard tool in most data professionals’ toolboxes. We can use it to process semi-structured and unstructured data with a rich set of functions it provides out of the box. Neo4j Connector for Apache Spark provides a native connector built for Spark, making it at home for data professionals to work with Neo4j.

Enrich

Here we will enrich the data that has been ingested into the Neo4j Graph Data Platform using Neo4j Cypher (known patterns) and Neo4j Graph Data Science (unknown patterns).

Neo4j Cypher: We can use Cypher to identify known patterns within the Graph and enrich the data conditionally depending on the existence of the pattern. This works very well for Local Graph topology and is an excellent way to bring human business expertise to enrich the data.

Neo4j Graph Data Science (GDS): We can use GDS to identify unknown patterns in the Global Graph topology in an exploratory fashion and be able to bring the identified patterns to enrich the data with ease. This can be achieved using production-ready Unsupervised Graph Learning Algorithms designed to be low-code Ex. Centrality algorithms to identify influencers in a social app, bottlenecks, and single point of failures in your Supply Chain Network. GDS operates on in-memory Graphs known as Graph projections, which come in handy for experimental activities. With the Graph Catalog feature, these in-memory Graphs can be stored and reused for continuous exploration & experimentation by everyone in the team.

Generate

This section generates insights for the various use cases we set out to do on the Graph Analytics Platform on top of the enriched data.

Neo4j Graph Data Science (GDS): Using the low-code approach offered by GDS, you can train models to perform predictions, classifications and generate Graph Embeddings which can be set up as part of your analytics pipelines. The Model Catalog feature allows you to persist and manage the models that have been trained, which can be used later for scoring new data points.

Vertex AI: Provides a suite of tools to help you with performing all your Machine Learning activities. In this architecture, the integration focus is to get the enriched data from Neo4j GDS made available to the Vertex AI Feature store, which can then be consumed by the rest of the Vertex AI services.

Cloud Dataproc, Google Cloud Storage, Big Query will support the enriched data transportation from Neo4j GDS to Vertex AI.

Kafka will support insights data sync between Neo4j GDS (OLAP) and Neo4j AuraDB/Neo4j Causal Cluster (OLTP) riding on top of the Neo4j Kafka Connect Plugin source and sink connectors.

Serve

This section is all about serving the insights generated and the models built on the enriched data to downstream applications.

Microservices Layer: This Will Be the primary access point for all the consumers of the Graph Analytics Platform. The idea is to have an abstraction layer for all applications in the enterprise’s ecosystem to access the data through a standardized approach. My choice was to go with Google Kubernetes Engine (GKE), which can be replaced by Google Cloud Endpoints or Apigee, depending on the needs.

Vertex AI can be used to serve models directly to applications and perform batch scoring.

Neo4j AuraDB or Neo4j Causal Cluster: This will serve as an OLTP Graph Database to serve operational and mission-critical applications that need real-time Graph data support with very high uptime and concurrency requirements.

Conclusion

As you have gone through the architecture and saw how it fits into each section of the Data life-cycle, you will realize that the architecture does not deviate much from existing analytics platform architectures. The whole idea of this architecture is to help businesses introduce Graph platforms into their existing enterprise architecture without disrupting existing processes and workflows. The need to adopt Graph Analytics is growing as industries move past the Big Data Wave and enter the age of AI/ML. As we continue to venture into making AI an extension to enhance everything around us, we need to look beyond the point of data collection and set our sight at where information is generated.