Key to the success of any large organization is effective governance of a vast, distributed landscape of data stores. AI can help. Credit: Gorodenkoff / Shutterstock More than any other factor, the hyperabundance of accessible data has powered today’s surge in AI adoption and generative AI capability. Collecting, cleaning, organizing, and securing that data for AI and machine learning have become a project in itself—a governance endeavor in which AI tools themselves play an important role. The result can be an enormous improvement in data governance that benefits the entire enterprise. The database remains the foundational repository for data, but the ecosystem of AI-powered data governance tools is all over the place, including products from startups that may lack staying power or deep database expertise. Over time, a growing number of governance capabilities are likely to be integrated with database software offerings and cloud database services. Using AI to automate data governance has immediate payoffs. The better an enterprise governs its data, the better its MLOps (machine learning operations) personnel can use that data to build AI-powered applications. More broadly, adding AI to data governance has a positive impact on any organization’s data analytics, regulatory compliance, and data quality efforts. Here’s how AI is modernizing the processes around governance—and how AI-enhanced tools can help ensure success for both AI/ML applications and data wrangling in general. Data cataloging Do you know where your data is? For governance to work, organizations need a complete inventory of all salient data stores and an understanding of what they contain. The task of identifying, accessing, and categorizing enterprise data keeps getting more arduous—thanks to the unruly proliferation of cloud data stores, not to mention semi-structured logs used to identify operational trends and anomalies. Data cataloging software puts all those repositories on the map. AI can assist with every phase of cataloging an organization’s data, starting with automated discovery of every data store relevant to the enterprise. The scope of cataloging tools varies, but some use AI to organize access control policies and/or enable natural language search across an organization’s data fabric. AI-powered cataloging vastly reduces the manual labor associated with classifying data assets and reveals data lineages showing where data originated and how it has changed. Metadata management Effective management of metadata—that is, managing the information that describes your company data—is fundamental to successful governance. AI cataloging tools can identify metadata to properly categorize data assets, but metadata stewardship is also vital to a healthy data estate. Thus a broad swath of offerings from data integration software to data observability platforms now offer metadata management capabilities. AI-infused metadata management tools alleviate the tedium of manual data classification and help reconcile differences in metadata descriptions. In the past, enterprises have behaved as though metadata was relatively static, but today, AI tools can continually monitor and collect dynamic metadata on data storage, usage, and flow. Among other benefits, deep metadata around data assets can be used for AI recommendations of optimal storage platforms, or even to suggest potential data integration pipelines. Data quality The greatest impact AI has had on data governance has been in data quality, which has six dimensions: accuracy, completeness, consistency, uniqueness, timeliness, and validity. Obviously, data that lacks those qualities can be calamitous for operations. Not to mention that data scientists and analysts routinely find themselves up to their necks in cleaning data before they’re able to use it. AI/ML tools can automatically infer missing values, normalize data formats, flag data anomalies, and more. Humans still need to make judgment calls (are two customers with identical names the same or different?) but the overall time savings can be enormous. As AI tools learn from patterns in large quantities of data, their recommendations, correlations, and corrections steadily improve. That baseline can be used to monitor the quality of data in real time. Data modeling Structuring a database—or an entire data architecture—starts with collecting and analyzing data requirements and developing the logical and physical models to accommodate them. Several product offerings use AI to enable data architects and engineers to generate visual representations of data models easily. Today, in many enterprises, data modeling is being turned on its head to serve AI/ML applications. A number of AI data tools offer automated feature engineering, where key data characteristics are derived from data sets in preparation for AI training. In conjunction with AutoML (automated machine learning), this activity in turn supports a different type of model selection: Choosing the right ML model to power an application or fuel predictive analytics. Should there be too little data to properly train a model, AI-powered data simulation tools can plumb existing data stores and generate synthetic data that closely resembles the real thing. Data policy and life cycle management Every organization needs to establish policies around the handling of its data—informed by federal, state, industry, and international regulations as well as internal business rules. In larger enterprises, a data governance committee sets those policies and specifies how they should be followed in a living document that evolves as regulations and procedures change. The natural language capabilities of generative AI can pop out first drafts of that documentation and make subsequent changes much less onerous. By analyzing data usage patterns, regulatory requirements, and internal workflows, AI can help organizations define and enforce data retention policies and automatically identify data that has reached the end of its useful life. AI can even initiate the archiving or deletion process. Along with reducing risk and ensuring compliance, automated data archiving helps free up storage space and reduce storage costs. Data availability AI-powered disaster recovery systems can help organizations develop sound recovery strategies by predicting potential failure scenarios and establishing preventive measures to minimize downtime and data loss. Backup systems infused with AI can ensure the integrity of backups and, when disaster strikes, automatically initiate recovery procedures to restore lost or corrupted data. Storage management systems infused with AI can replicate and distribute data across multiple storage locations to ensure high availability and low latency. At the same time, AI-driven predictive analytics can ingest data from sensors, equipment logs, and historical maintenance records to forecast potential failures or downtime. Nothing beats predictive maintenance to forestall the loss of data availability in the first place. Humans still needed Quite a bit of data governance is low-hanging fruit for AI. Many of the tasks associated with governance, from data discovery to data cleanup to policy management, are chock full of repetitive manual tasks that AI can handle easily—and complete with greater accuracy than humans can. That’s a big win, particularly as MLOps seeks clean, organized data stores upon which AI applications can be built and trained. Remember, though, that AI is not intelligent in any meaningful sense of the word. Even resolving minor data discrepancies may require context born of broad experience that only humans can acquire and digest. No one would, say, delegate the creation of an enterprise data architecture to a machine. Yes, AI is already eliminating a big chunk of manual labor from data governance. But it’s not going to do the thinking for you. Jozef de Vries is chief product engineering officer at EDB. — Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com. Related content news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages news Visual Studio 17.12 brings C++, Copilot enhancements Debugging and productivity improvements also feature in the latest release of Microsoft’s signature IDE, built for .NET 9. By Paul Krill Nov 13, 2024 3 mins Visual Studio Integrated Development Environments Microsoft .NET news Microsoft’s .NET 9 arrives, with performance, cloud, and AI boosts Cloud-native apps, AI-enabled apps, ASP.NET Core, Aspire, Blazor, MAUI, C#, and F# all get boosts with the latest major rev of the .NET platform. By Paul Krill Nov 12, 2024 4 mins C# Generative AI Microsoft .NET news Red Hat OpenShift AI unveils model registry, data drift detection Cloud-based AI and machine learning platform also adds support for Nvidia NIM, AMD GPUs, the vLLM runtime for KServe, KServe Modelcars, and LoRA fine-tuning. By Paul Krill Nov 12, 2024 3 mins Generative AI PaaS Artificial Intelligence Resources Videos