Home Data Management Google's Cloud Spanner melds transactional consistency, NoSQL scale

by Serdar Yegulalp

Senior Writer

Google’s Cloud Spanner melds transactional consistency, NoSQL scale

news analysis

May 04, 20174 mins

Cloud ComputingDatabasesTechnology Industry

The research behind the horizontally scalable, SQL-compatible database has spawned imitators, but Google's private network is the real secret sauce

Credit: Unsplash

Earlier this year, Google offered a peek at Cloud Spanner, an automanaged database service that melds features from both conventional relational systems and NoSQL technologies.

Today, Google announced Cloud Spanner will be available to the general public later this month. It will compete not only with rival cloud databases, but also up-and-coming open source projects that address scale and reliability issues by using Google’s own ideas.

The best of both worlds

Google presents Cloud Spanner as a happy medium between two common database needs that often prove incompatible. A database can be highly scalable and distributed (the NoSQL approach), or it can be transactionally consistent (the conventional database approach). Cloud Spanner aims to be both.

As laid out in a 2012 research paper, one key to accomplish this is a time synchronization mechanism for actions that need to be kept consistent between nodes—such as globally consistent read operations, which people expect from a transactional database.

This sync mechanism takes into account the potential differences between timestamps provided by different machines in the cluster and can “wait out” the differences if they are too large. But the system also tries to keep uncertainty to a minimum by drawing on multiple time sources to increase clock accuracy. As a result, it’s easier to get operations spread across multiple nodes (for example, MapReduce) to agree on when something was achieved and to deliver consistent results.

In a white paper published earlier this year, Google talked about another key element: How Cloud Spanner leverages Google’s own network. Of the three characteristics that are most desired from a distributed system—consistency, availability, and tolerance for splits between nodes—Cloud Spanner tries to deliver all three by making slight but often undetectable sacrifices to availability, aided by the fact that the service runs on Google’s own highly redundant network.

A little more scale, a little less SQL

The actual database Google has created from this technology strongly resembles other cloud-hosted transactional databases, but with some potentially irksome differences.

First, Cloud Spanner is advertised as having support for ANSI 2011 SQL queries. The documentation shows this is true for SELECT queries; they support all the familiar SQL syntax, including JOIN and GROUP BY. But INSERT and UPDATE commands are not available; according to a blog post at Quizlet, which used Cloud Spanner in beta, you need to use “RPCs for mutating rows given their primary key” instead. Some of this is made easier through Cloud Spanner’s language and interface support, as it provides libraries for Go, Java/JDBC, Node.js., and Python, as well as support for REST calls.

Cloud Spanner’s other touted advantage is scale and availability. The database autoscales based on demand, with pricing based on the number of nodes in use, storage needed on those nodes, and outbound bandwidth consumed. Right now the size of a database influences the number of nodes required to deploy it; every 2TB of database storage requires at least one node to support it.

Imitation and flattery

Cloud Spanner’s promises are echoes of features in other database products, although Google is clearly hoping to compete broadly by offering a better amalgamation of features in one place.

Take autoscaling, for instance. Ex-Microsoftie Bob Muglia served up Snowflake as a cloud data-warehouse system that didn’t need to be tweaked or tuned. There, Google can almost certainly compete on pricing, as it has its own infrastructure, where Snowflake is implemented on Amazon.

Speaking of Amazon, it has a few products that could be competition. Aurora, for instance, is Amazon’s hosted version of MySQL, and it beats Google’s MySQL offering for high-end work. It also has the advantage of being familiar and widely supported; there’s barely a database developer who hasn’t touched MySQL at some point. But again, Google’s hope is that Cloud Spanner will compete by offering better scale across the board, including for write operations and not only reads.

Then there’s CockroachDB, which is approaching its first full 1.0 version. This open source database project is an implementation of the ideas in Google’s Spanner paper, in much the same way Google’s paper on MapReduce inspired Hadoop.

Where Google wants to stand out, though, is in the execution. That explains the white paper professing how it isn’t only the time-synchronization functions that makes Cloud Spanner special, but also Google’s tight control over the networking between nodes. It might be possible for another cloud to implement that through a CockroachDB-based service, but Google’s counting on first-mover advantage—and all the major back-end resources it can work with—to make an impression.

by Serdar Yegulalp

Senior Writer

Serdar Yegulalp is a senior writer at InfoWorld, covering software development and operations tools, machine learning, containerization, and reviews of products in those categories. Before joining InfoWorld, Serdar wrote for the original Windows Magazine, InformationWeek, the briefly resurrected Byte, and a slew of other publications. When he's not covering IT, he's writing SF and fantasy published under his own personal imprint, Infinimata Press.

Topics

About

Policies

Our Network

More

Google’s Cloud Spanner melds transactional consistency, NoSQL scale

The research behind the horizontally scalable, SQL-compatible database has spawned imitators, but Google's private network is the real secret sauce

The best of both worlds

A little more scale, a little less SQL

Imitation and flattery

More from this author

And the #1 Python IDE is . . .

Dataframes explained: The modern in-memory data science format

Python is the most popular language on GitHub

Python threading and subprocesses explained

The best Python libraries for parallel processing

True multithreading in Python at last!

Get started with the free-threaded build of Python 3.13

Electron vs. Tauri: Which cross-platform framework is for you?

Show me more

The dirty little secret of open source contributions

14 great preprocessors for developers who love to code

Designing the APIs that accidentally power businesses

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx

Google’s Cloud Spanner melds transactional consistency, NoSQL scale

The research behind the horizontally scalable, SQL-compatible database has spawned imitators, but Google's private network is the real secret sauce

The best of both worlds

A little more scale, a little less SQL

Imitation and flattery

Related content

Strategies to navigate the pitfalls of cloud costs

Understanding Hyperlight, Microsoft’s minimal VM manager

Docker tutorial: Get started with Docker volumes

Red Hat OpenShift AI unveils model registry, data drift detection

More from this author

And the #1 Python IDE is . . .

Dataframes explained: The modern in-memory data science format

Python is the most popular language on GitHub

Python threading and subprocesses explained

The best Python libraries for parallel processing

True multithreading in Python at last!

Get started with the free-threaded build of Python 3.13

Electron vs. Tauri: Which cross-platform framework is for you?

Show me more

The dirty little secret of open source contributions

14 great preprocessors for developers who love to code

Designing the APIs that accidentally power businesses

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx