by Marc Prioleau

The world needs more (and better) open map data

feature
Mar 25, 20248 mins
AnalyticsData ManagementOpen Source

The best map data and the most advanced mapping features have been proprietary. The Overture Maps Foundation aims to change that.

shutterstock 2291065933 space satellite in orbit above the Earth white clouds and blue sea below
Credit: Artsiom P / Shutterstock

The trend of open data is taking hold around the world and it will unleash untold innovation.

McKinsey notes that open data, which is data that can be freely used and freely redistributed, could contribute $3 trillion annually to the global economy and has the potential to “unleash innovation and transform every sector of the economy.” The World Bank also recognizes “the huge potential of open data,” noting that private company use of open government data has “only begun to be exploited.”

The European Union, which often leads in terms of embracing digital initiatives, has its Open Data Directive, which recently added new “high-value data sets” that public sector bodies have to make available for free use. These data sets provide geospatial, environmental, meteorological, statistical, mobility, and company data, with the goal of advancing society’s digital transformation and improving mobility, medical care, energy conservation, and sustainability, for example.

As noted by commissioner Thierry Breton, “Start-ups and SMEs will be able to use this data to develop new products and innovative solutions that improve the lives of citizens in the EU and around the world.”

Overture Maps Foundation

At the Overture Maps Foundation, we’re amassing what we believe will be the world’s largest collection of enterprise-grade open map data that anyone, anywhere, can use to build interoperable mapping services. With the right data, services like navigation, local search, logistics management, and location-based augmented reality can quickly be realized.

However, as the requirements for accuracy, timeliness, and advanced features continue to expand, the need for an open base layer of data also grows. By building on common, open base layers of data, value-added data can be more easily combined and provide better interoperability across platforms. The task of building and maintaining these data-rich, advanced mapping features—required to map a continuously changing world—is too big, too complex, and too costly for any one entity to tackle.

The best map data has, up until now, been proprietary. The costs to build and sustain that data has only grown as the demand for better data has grown. Given a large number of sources and ways to gather and present the data, interoperability has been hard, slow, and limiting. By building a wide network of users who consume map services and provide feedback on the accuracy of data, the industry can build the best map data available. Open use through open licensing is central to expanding that reach.

The basic idea of open map data is the same as for open source software: collaboratively building an asset that anyone can use, change, and then redistribute. The open model, pioneered about three decades ago to develop the Linux operating system, gave rise to an open source software industry that is now worth trillions, a recent Harvard Business School report notes. If not for open source software, companies “would need to spend 3.5 times more on software than they currently do,” the report states.

Some of the biggest technology and location companies in the world, such as AWS, Meta, TomTom, and Microsoft, are huge open source advocates because they recognize the value of open source for their companies and the economy. So it should come as no surprise these same companies came together late in 2022 to launch Overture, which is among the world’s largest open data projects hosted at The Linux Foundation. 

Overture’s mission is to build open map data that will support a wide range of mapping applications and geospatial analysis. Map data is the digital representation of the physical world… a massive, complex and interrelated data set. To build that, Overture aggregates, removes duplicates, improves, standardizes, and maintains data resulting from many different signals to build comprehensive foundational data sets that are valuable to map makers.

For example, Overture’s Buildings theme combines crowdsourced data from OpenStreetMap, government data from Esri’s Community Maps Program, and AI-generated building footprints from Microsoft and Google. The combined data set includes 2.3 billion buildings, making it the largest open buildings data set in the world. Addresses, road networks, and places of interest are other themes that will rely heavily on the aggregation of open data sources.

Open data vs. open source

While there are many similarities between open data projects and open source projects, there are also notable differences. Those differences may impact who decides to work with each.

In my nine months at Overture, I’ve come to assess how and why open source code and open data differ. I would point to six key differences that have important implications for open data projects:

  1. Data generation. Software code is generated from a human’s brain or from a growing number of AI-based coding assistants. By contrast, data is created through measurement or observation, requiring systems or projects to do this detection. In the case of map data, detection can include the building of a new road, a change in a place of business, or the destruction of an existing building. The community in an open data project needs to gain access to or develop these systems for measuring and generating open data. This is often an on-going need, measuring data as it evolves or changes over time.
  2. Accuracy. Open data reflects facts. Therefore, it has to be as accurate as possible. Map data is a digital representation of the physical world. That representation needs to be as true to reality as possible. In open source software development, it’s common to write code that gets refined later to suit specific use cases. This leads to faster sharing: the code gets out in the world faster and gets improved faster. With open data, rigor around accuracy will need to be much higher from the start.
  3. Timeliness. Some types of data measure things that constantly change, like air quality, road conditions, or business reviews. They need to be refreshed every month, week, or even daily. Because of this, open data projects run like a production line. We currently produce a new data release every month and we will push to increase that frequency. With open source software, someone might put code out that doesn’t get used, looked at, or revised for days, months, even years. Developers working on open data will face different time requirements and expectations around how they work.
  4. Cost and size. Open source code typically involves manageable data size. In the simplest use cases, the code can run locally on a laptop. The cost to store and serve code is typically not a factor. Data is different. Big data, such as map data for the world, runs into the terabytes and petabytes. Even in the most basic use cases, that data has to be stored and served to users and maintainers, which can require substantial up-front investments.
  5. Licensing. Data has to come from somewhere. It might be “owned” by someone, like a private company, or it might combine any number of existing open data repositories. Because the data is derived from different sources, data schemas need to be merged into one coherent system. At times, some data simply cannot be merged with other data without explicit permission from the data owner. Open source code comes with licenses, too, but if we’re writing code, the license of the existing code base is known and the author is free to write into it.
  6. Privacy protection. Data can include personal information like addresses and phone numbers that belong to real people, and it can include images of the world that contain identifiable elements. Care needs to be taken to ensure that all personally identifiable information is removed. This kind of clean-up does not typically occur when writing or re-using open source code.

The road ahead

It took decades for open source code to spread to the mainstream. Open data can leverage the learnings from the open source movement, but will have to develop its own best practices as it goes forward. As that happens, it will lead to new products and services across industries, governments, and entire economies.

Open data will benefit companies that deliver proprietary services on top of the open data, just as open source has benefited companies that build on open source code. Consumers, governments, and companies will all derive benefits in terms of more and better goods and services.

Marc Prioleau is executive director of Overture Maps Foundation.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.