A cloudops technology stack is easier to define than to design. Here are 6 capabilities to look for. Credit: Client supplied We’re still defining what cloudops is exactly, as well as clarifying what technology is needed to solve the core problems. Like all cloud computing situations, it’s helpful to break down the core components of a working cloudops solution, such as AIops. Also, to define what the technology needs to do and the value it brings to the table. To this end I picked out six capabilities a cloudops tool should offer: Observe and gather data from any number of systems that are needed to find patterns to further analyze and act on. This has a few components to it, including the ability to leverage connectors and/or agents to communicate with the system under management, as well as to get the data back to some type of centralized cloudops system in a reliable way. Correlate massive amounts of system data (noise) in meaningful ways. This includes determining patterns, such as where the data is coming from, and grouping the data before it can be analyzed in some deeper way. Analyze the patterns to determine problems and root causes. This is really where the AIops or general cloudops tool makes its money. It should be able to find patterns in the data being gathered and correlated and determine patterns that indicate current issues, such as a failed networking device. Or more importantly, predict issues that are likely to occur. Proactive cloudops can help avoid a major problem, such as identifying a cloud storage system that is kicking off I/O errors, which could indicate that a failure is imminent. Share the observability findings with ops team users, as well as automate processes that can respond automatically and fix the issues. It’s one thing to indicate that something is wrong; it’s another to make sure those processes and people who can fix the thing are notified. Here is where things are improving fast, including automated ticketing systems and self-healing processes. Respond to the problem and launch an automated fix or collaboration to get to a fix. This means the mechanisms are in place to fix the problem. Automation is taking over here, either as part of the cloudops tool or another orchestration layer that can define how common issues are fixed without humans having to get involved. Inform reports and dashboards so cloudops users can see both strategic and tactical data as to the effectiveness of the systems over time. Dashboards show the health of the systems now and how things are trending, thus predicting future health. Although cloudops teams are hesitant to leverage these teamwide, my advice is to make sure that anyone associated with cloudops or development can see these metrics in real time and thus make good decisions to improve things. Again, there is no magic to solve the cloudops problems. Much of what I’m recommending may not be doable for some enterprises without more than a single AIops or other cloudops technology in place. It’s dependent on the types of systems and cloud you’re running and the number and types of applications and data stores. However, addressing these six concepts is a good start that will likely get you where you need to go. Related content analysis Strategies to navigate the pitfalls of cloud costs Cloud providers waste a lot of their customers’ cloud dollars, but enterprises can take action. By David Linthicum Nov 15, 2024 6 mins Cloud Architecture Cloud Management Cloud Computing analysis Understanding Hyperlight, Microsoft’s minimal VM manager Microsoft is making its Rust-based, functions-focused VM tool available on Azure at last, ready to help event-driven applications at scale. By Simon Bisson Nov 14, 2024 8 mins Microsoft Azure Rust Serverless Computing how-to Docker tutorial: Get started with Docker volumes Learn the ins, outs, and limits of Docker's native technology for integrating containers with local file systems. By Serdar Yegulalp Nov 13, 2024 8 mins Devops Cloud Computing Software Development news Red Hat OpenShift AI unveils model registry, data drift detection Cloud-based AI and machine learning platform also adds support for Nvidia NIM, AMD GPUs, the vLLM runtime for KServe, KServe Modelcars, and LoRA fine-tuning. By Paul Krill Nov 12, 2024 3 mins Generative AI PaaS Artificial Intelligence Resources Videos