The Most Important AI-Powered Tools Categories and Use-Cases for DevOps and Cloud Engineers

The Most Important AI-Powered Tools Categories and Use-Cases for DevOps and Cloud Engineers

AI is the hottest topic right now, and AI tools are being developed for all areas, including different IT fields. Today I want to discuss AI tools for DevOps and Cloud use cases and to make it more interesting and valuable for you, i will focus on categories of AI tools, like monitoring, security improvement, etc., instead of just comparing random or the most popular AI tools. I will also give you my honest experience of using these tools in actual DevOps projects/labs and explain why these AI tools can’t replace human expertise now but instead augment humans by increasing productivity and saving valuable time.  

DevOps is all about automation and efficiency, right? Whether we want to release a new feature fast, secure our infrastructure, or monitor our platform and prevent any issues in time, as part of DevOps principles, all these should be and can be automated. For example, imagine a scenario where AI is helping to take over the tedious, repetitive tasks, help automate things, and help engineers make data-based decisions. This means super-fast code validations, proactive support in case some issues happen on your Cloud infrastructure or Kubernetes cluster, automatically detecting security threats, and super-quick responses to incidents, all driven by the power of AI. Now, if this sounds too good to be true, it’s because it is. Most AI tools right now aren’t mature enough to use them in 100% full automation. At the moment, you need to use them like any other tool. They can’t do everything automatically for you. In fact, many of them still need a lot of human review and validation. But that doesn’t mean that they’re useless. There are still some really good use cases for AI tools right now.

The obvious one we all know is ChatGPT, which am not going to talk about because it’s much more general purpose. It’s not specific for DevOps and Cloud Engineers. Even though you may be getting by with ChatGPT for now, these kinds of tools have many limitations when it comes to DevOps and Cloud specifically.

When we think about DevOps and all these automations and Cloud platforms with so many services, it’s always about integrating various tools together to build these automated processes, like to automate monitoring in your cluster and automatically notify engineering teams of any abnormal behavior, to detecting security issues or automating the provisioning of infrastructure. And generally, the integration of various tools together.

The main use case for such tools is in writing infrastructure as code or configuration as code, and so on. Popular examples include GitHub Copilot and Amazon’s Code Whisperer but there are lots of very similar tools that basically do the same thing. They are your code assistant within the code editor or IDE. They help you with code suggestions, code completions. Basically, they predict what you want to code based on the current context and suggest that code for you. Or you can ask specifically to give you a code piece or entire code block by simply typing the logic in English, and it will give you respective code for that functionality. You can even ask it to refactor a code block or clean up the code, find any code duplications, any dirty code. And another interesting use case is you can ask it any explanation on the application code or improvement suggestions. For example, if you’re a junior engineer trying to figure out what the Terraform code in your new project does, you can use an AI assistant to understand the code. You can use it to learn and become better at the infrastructure as code tool that your new team is working with. The limitation of such tools, based on my experience using various such AI code assistants in DevOps projects, is that the code they generate is usually not entirely usable and you still need to validate or fix the output before you can use it and sometimes you have to double-check with the official documentation, etc. Code assistants are very useful when you can use them directly in your code editor while writing the code, so you don’t have to switch between your code editor and internet browser to use those tools. Examples of code editors or IDE with AI integration include Visual Studio Code or IntelliJ, Cursor, etc.

Monitoring in DevOps and Cloud is a complex task that absolutely needs to be automated. When you have very complex systems with thousands of servers and tens of thousands of components on those servers, you can’t do observability manually. You must automate and alert proactively when the services and applications in your system behave abnormally otherwise you would be sitting in front of the dashboard or manually checking that all your services are running properly. But configuring automated monitoring and alerting is a challenging task and this is where AI-powered monitoring tools become useful. One popular such tool is DataDog Watchdog which has a built-in intelligence layer that continuously analyzes billions of data points from your infrastructure to applications, to the underlying platform, to find the issues proactively and help you resolve them quickly.

DataDog also has a feature that digs into the issue to identify what caused that issue by analyzing the data about how the services are connected to each other, how they correlate, how they behave, it can pinpoint exactly where the issue is coming from. This saves you the time it would take you to manually troubleshooting and identify the root cause of the issue.

Another very important feature of AI-powered monitoring tools is called predictive analytics. This basically means that by analyzing previous data on all the abnormal behavior that has happened and identifying any trends, the tool can tell you if there are any potential issues that may happen in future. That is the real power of AI in monitoring tools, it can analyze large sets of historical data, as well as analyze the data for correlations between the systems, between the services and applications, and find issues, as well as predict any potential issues before they happen.

This is where a lot of DevOps automation and optimization work is done. One of the CI/CD tools that really focuses on saving time for developers and optimizing the release workflow is TeamCity pipelines from JetBrains. TeamCity pipelines focus on developer productivity by providing self-tuning pipelines, which have built-in pipeline optimization options. That is, while you’re building and configuring the pipeline in the TeamCity pipelines tool, the platform itself guides you with built-in, intelligent suggestions of how to optimize your pipeline. The biggest advantage is that you don’t have to switch back and forth to some documentation and back to the tool to configure various details of the pipeline, like adding caching or running jobs in parallel to speed up pipeline execution, etc. And if you prefer scripted as opposed to using the UI, you can save the pipeline configuration, and it will automatically save the YAML configuration of the pipeline in your GIT repository.

Of interest are the tools that can prevent any security issues before they happen by doing security checks and discovering any security vulnerabilities based on statistical data or abnormal behavior detected in the systems. There are some security AI tools that even allow operations teams to configure auto-fixes when security issues or misconfigurations are detected. That is, the tool itself will find and fix the security issue before it can get exploited. One popular tool in this category is Snyk. Like DataDog, it uses machine learning and advanced analytics to give you proactive monitoring of large, complex systems, especially in containerized environments. Imagine if you have thousands of containers on different systems, running different services and applications, how hard it is to make sure that each container and each service within that container is running with security best practices, is not misconfigured, does not have any security vulnerabilities that can be exploited, etc. It would be hard to keep track and have an overview of security status on all of this and that’s where AI-powered tools that have the capability of analyzing large sets of data fast are useful. And that’s what Snyk does. Basically, it will automatically go through the entire environment, it will detect, analyze, and it will highlight any potential security threats that it detected in your systems. Snyk also has a visualization tool and can also identify the root cause of the issues and suggest fixes. This is very helpful and saves engineers a ton of time in troubleshooting the issues.

This is a challenging task to do especially when your applications and your deployments scale. When you have lots of services and applications running in a cloud environment or when you’re using multi-cloud, etc. There are AI-powered tools like CloudHealth or Usage AI or Cast AI, and they give you an overview of how efficiently you are using the infrastructure resources, as well as recommendations of how you can optimize the usage and save costs. For example, what type or size of instances you can be running, which instances that you’re running are underutilized or not used at all. These tools use predictive analytics to provide you with information about how much infrastructure or how many instances, for example, you would need. But this, of course, works only for workloads that are more predictable. Based on that, it can also give you recommendations on scaling your environment based on usage patterns and to basically help you avoid unnecessary costs of using resources that you don’t need or won’t need as much in the future. It’s important for these tools to support multi-cloud because a lot of complex applications today are running on multi-cloud environments and having an overview of your workloads across the cloud platforms is an important use case for AI in DevOps and Cloud.

These are the top 5 applications of AI technologies in cloud and DevOps. And as I mentioned at the beginning, AI tools right now are not mature enough to fulfill the ambitious promise of 100% full automation and replacing human expertise. Even though they have tremendous capabilities, and they can process and analyze massive sets of data and do predictive analytics etc. Right now, they are still just another set of tools that you as a DevOps or Cloud Engineer need to have under your tool set. These kinds of AI tools augment humans and can massively increase our productivity and save us a lot of time so we can focus on more creative and strategic tasks. By automating routine and repetitive work, we can dedicate our efforts to areas that require human insight, innovation, and decision-making. This shift allows us to pursue more fulfilling and impactful work, ultimately leading to personal and professional growth.

I hope this overview of AI technologies for DevOps and Cloud use cases proved helpful. I would love to hear from you what other AI tools you are using in your DevOps or Cloud projects, and what are your experiences with them. So please share them below in the comments section of this article or connect with me on LinkedIn. Cheers!

Joshua Makuru PP Linked

About the Author

Joshua Makuru Nomwesigwa is a seasoned Telecommunications Engineer with vast experience in IP Technologies; he eats, drinks, and dreams IP packets. He is a passionate evangelist of the forth industrial revolution (4IR) a.k.a Industry 4.0 and all the technologies that it brings; 5G, Cloud Computing, BigData, Artificial Intelligence (AI), Machine Learning (ML), Internet of Things (IoT), Quantum Computing, etc. Basically, anything techie because a normal life is boring.

Spread the word: