james_kobielus
Contributor

Graph analysis: Not the dots, but the connections

analysis
Mar 04, 20166 mins
AnalyticsDatabasesNoSQL Databases

When relationships between entities are more important than the entities themselves, you have a business problem made for graph analysis

Results graph on chalkboard
Credit: Thinkstock

Graph analysis is not a new technology, but many analytics professionals remain unfamiliar with it.

One reason for this is that most people learn to associate data modeling with only one structure — relational — which in and of itself usually skews their thinking toward tables, joins, primary keys, and every other aspect of the approach that IBM’s E.F. Codd invented more than decades ago.

When all you know is one database modeling approach, you tend to view all data-centric business problems through that lens, sometimes ignoring other ways of structuring the data that may conform more harmoniously to the shape of the business problem. As IBM Cloud Data Services CTO Adam Kocoloski said recently, “Sometimes people don’t know they’re dealing with a graph-shaped problem.”

How do you know a graph-shaped business problem when you see one? Kocoloski pointed to targeted recommendations, social network analysis, and counterfraud as examples of inherently graph-shaped business problems. But he didn’t state exactly why.

How graph modeling works

I think I have a clue why, but first let’s briefly review the fundamental concepts of graph modeling in order to zero in on the core criterion.

Essentially, a graph-shaped business problem is one in which you’re more concerned with relationships among entities than with the entities in isolation.

At heart, you use graphs to model problems as “nodes” and “edges,” where the former roughly corresponds to entities and the latter to relationships. Graphs assume that the relationships among entities conform to stars, snowflakes, and even more complex nested chains of connections among such graphs. By contrast, relational data structures are inherently hierarchical (though they may be applied to graphlike structures through tortuous primary and secondary key relationships).

Influence analysis, for example, is a classic graph-shaped problem; it’s also the core of the use cases that Kocoloski cited. Most organizations need to know how they can most effectively influence customers to buy their products, recommend them to others, and so on using recommendation engines and social network analysis apps.

Likewise, counterfraud demands an approach that can easily identify adverse influence patterns and, hopefully, prevent them from rearing their ugly heads. Usually, these sorts of influence patterns are anything but hierarchical, so they’re well-suited to graphs.

Graph analysis has long been used modeling customer-engagement scenarios. For example, graph analysis can be used to identify which customers are more influential in some channels than others; which have the greatest impact on awareness, sentiment, and propensities of their peers; which promise the biggest bang for your marketing bucks through their influence on their peers; and so on.

When you start to think in terms of influence analysis as the sweet spot for graphs, lots of other graph-shaped business problems suggest themselves. For example, you might graph nonhierarchical influence patterns among employees — leveraging data from internal collaboration, messaging, knowledge management, and other sources — to address business questions such as which types of individuals, subject-matter experts, or relationships have the greatest influence on team productivity.

Likewise, you might use graph modeling to drill into the nonhierarchical influence patterns in your partner ecosystem. Which actual or potential partners have the right degree of influence for realizing desired business outcomes? Who within one’s own organization or among partners has the influence needed to establish, strengthen, and sustain the teaming arrangement at the heart of the alliance? And so on.

Graphing the Internet of things

The Internet of things (IoT) is yet another application domain that is chock-full of graph-shaped problems. That’s because the “things” themselves — such as sensor-equipped endpoints for consumer, industrial, and other uses — tend to be deployed in nonhierarchical grids of great complexity. The influence relationships evident in data from a collection of endpoints — such as might be examined in root-cause analyses — tend to be of much greater interest than status of individual endpoints in isolation.

Now that the IoT is coming to most industries and to practically every aspect of our lives, extreme-scale graph analysis will almost certainly become the core approach for tracking shifting patterns of influence and controlling its disparate endpoints. And considering that IoT is the future of the cloud, permeating every endpoint device in the form of “fog computing,” these graphs will be modeled and executed in massive parallel runtime environments in this new, distributed fabric.

Connected cars, for example, will generate ample IoT sensor data, as do geospatial applications of smartphones and other dynamic IoT edge devices. In order to stay contextually and predictively oriented in complex environments, these applications will depend increasingly on graph analyses, as well as on machine learning algorithms, that can find dynamic influence patterns within complex sensor-data sets. For example, how does traffic congestion among vehicles on one highway influence congestion on adjacent roads?

Recommendation engines enter this connected-car scenario as the graph-powered “next best action” engines for ensuring continual traffic optimization throughout an entire system, while also ensuring that individual vehicles reach their destinations safely and within acceptable timeframes.

Reading between the lines

Even an item as goofy as an artificial intelligence script-reading program might benefit from graph modeling. An app called ScriptHop uses algorithms to analyze a high volume of screenplays and generate assessments that might be relevant to entertainment-industry stakeholders. ScriptHop can determine which scripts include characters with particular attributes (such as minorities), which are likely to give these characters sufficient screen time, and which might be excessively costly and time-consuming to produce (based on likely casts, sets, locations).

How could this be conceived as a graph-shaped business problem? Easy: A motion-picture project can be modeled as a graph among interconnected stakeholders, including producers, directors, casting agents, cinematographers, actors, and so on. Every script can conceivably be modeled as a graph with a particular set of connections and dependencies among the likely stakeholders who would be included in the project, if it were actually produced.

Tweaking the graph of any particular script-based project would impact some stakeholders positively and others negatively. Why not develop a graph-based app that gives each stakeholder a targeted assessment of the potential influence of each script revision on their budget, time, and other resource requirements?

Graph modeling is clearly a creative exercise that seeks to reflect the nonhierarchical nature of problem at hand. In order to use graph tools effectively in business applications, analysts should start to retool their thinking around this powerful approach.