Databricks’ Unity Catalog, now open sourced under the Apache 2.0 license, competes with Snowflake’s Polaris Catalog, which will soon be open source. Credit: kryvoshapka/Shutterstock Just days after rival data lakehouse provider Snowflake said that it would open up the source code to its Polaris Catalog, Databricks is open sourcing its Unity Catalog offering. Databricks’ Unity Catalog, which was made generally available in June 2022 and later updated with Okera’s capabilities, used to be a closed-sourced unified governance offering that provided centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces. When Snowflake released Polaris Catalog at its annual conference earlier this month, it said it would open source it within three months. It offers similar capabilities to Unity Catalog, but is built atop the popular open source Apache Iceberg data table format. “It is difficult to look at the Unity Catalog announcement without thinking about the consistent contest that exists between Databricks and Snowflake for enterprise attention,” said Hyoun Park, chief analyst at Amalgam Insights. “By open sourcing Unity before Polaris, Databricks wants to position as being the first to open source its data catalog,” Park added. Now Databricks says it has open-sourced Unity Catalog under the Apache 2.0 license and opened up all its APIs as well. The Apache 2.0 license, introduced by the Apache Software Foundation in 2004, is a software license that allows users to modify and distribute code without any charge. After being open sourced, the catalog will provide users with a universal interface that supports data in any format and compute environment, such as the ability to read tables with Delta Lake, Apache Iceberg, and Apache Hudi clients via Delta Lake UniForm, the company said. The now open-sourced version also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards, it added. Additionally, Unity Catalog will continue to provide unified governance across AI assets, such as machine learning (ML) models and generative AI tools. The move to open up Unity Catalog’s APIs, according to IDC’s research vice president Stewart Bond, provides open access to intelligence about data held within the Databricks environment. “This is significant as it provides opportunities for an enterprise to include intelligence about data on Databricks to be integrated into and shared with catalogs that maintain intelligence about data stored elsewhere,” Bond said, adding that it is a way to support unification of data intelligence so that data consumers, engineers, and executives do not need to use multiple tools to discover, manage, and govern all data in a given enterprise. This approach of supporting data unification, according to Steven Dickens, The Futurum Group’s practice lead for hybrid cloud, eliminates vendor lock-in, allowing businesses to choose the best tools and platforms for their needs while ensuring consistent governance and security across their data estate. A race to be seen as more open source The open sourcing of Unity Catalog, that too at the heels of Snowflake’s decision to open source Polaris Catalog in three months, is being seen by analysts as a race to be seen as more open source and grab data catalog users. Futurum’s Dickens said Databricks’ move to open source Unity Catalog represents a significant challenge for rivals such as Snowflake, Teradata, and Dremio. “The emphasis on interoperability and open-source commitment ensures that Databricks can cater to a wider range of customer needs, reducing the friction associated with data format compatibility,” he said. “Teradata and Dremio, while strong in their respective niches, have not demonstrated the same level of integration and comprehensive tooling for data and AI governance,” Dickens added. However, IDC’s Bond pointed out that the success of the now open sourced Unity Catalog will depend on how much metadata about data stored in competitive platforms is being made available to external processes. “Unity is still a very technical catalog. Making it open source may accelerate innovations in business-level user experiences and make Unity more competitive,” Bond said. Related content feature Dataframes explained: The modern in-memory data science format Dataframes are a staple element of data science libraries and frameworks. Here's why many developers prefer them for working with in-memory data. By Serdar Yegulalp Nov 06, 2024 6 mins Data Science Data Management analysis Cloud providers make bank with genAI while projects fail Generative AI is causing excitement but not success for most enterprises. This needs to change quickly, but it will take some work that enterprises may not be willing to do. By David Linthicum Nov 05, 2024 5 mins Generative AI Cloud Computing Data Management feature Overcoming data inconsistency with a universal semantic layer Disparate BI, analytics, and data science tools result in discrepancies in data interpretation, business logic, and definitions among user groups. A universal semantic layer resolves those discrepancies. By Artyom Keydunov Nov 01, 2024 7 mins Business Intelligence Data Management feature Bridging the performance gap in data infrastructure for AI A significant chasm exists between most organizations’ current data infrastructure capabilities and those necessary to effectively support AI workloads. By Colleen Tartow Oct 28, 2024 12 mins Generative AI Data Architecture Artificial Intelligence Resources Videos