Pairing Reinforcement Learning and Online Training in API Security

Author

Impart Security

Published on

April 16, 2024

Read time

Impart Security

April 16, 2024

min

Many cyber security companies are jumping on the machine learning bandwagon, but not all companies truly use the technology in the way you might think in their solutions. In fact, most machine learning implementations in our industry are nothing more than simple business rules, or in some cases, a team of security researchers masquerading as an algorithm!

We believe it is important to be transparent about our approach to machine learning in API security. This blog post provides a brief overview of the machine learning systems we've developed to keep you and your APIs safe.

The Foundation - Reinforcement Learning

Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with its environment, based on rewards and penalties. The agent learns to optimize its behavior in order to maximize the cumulative reward over time. This approach is often used in decision-making and control problems, such as playing games or controlling robots.

What Comprises a Reinforcement Learning System

A reinforcement learning system typically has a few key components:

An agent: The agent is the entity that interacts with the environment and makes decisions based on its observations and previous experiences. At Impart, we have a multi-mode inspector that interacts with different types of customer environments - ranging from the customer's service mesh to the customer's CI/CD pipeline. These agents are smart - they operate autonomously and can make decisions on their own; yet also have methods to communicate with the rest of our systems.
An environment: The environment is the world in which the agent operates. It provides the agent with observations and rewards/penalties based on its actions. At Impart, this is where the rubber meets the road. Using our inspectors, we are able to interact with the customer's API requests and responses, as well as code level interactions such as .YML files that contain customer API Specifications. We do this in such a way that respects a customer's data privacy, ensuring that all sensitive data stays within the customer domain and premises while at the same time being able to send metrics and metadata to our other systems for further analysis.
A policy: The policy is the strategy that the agent uses to determine its actions based on its observations. At Impart, we have a global policy across all of a customer's inspectors that classifies events that are observed in various environments, as well as takes immediate real-time action based on those events. This policy is dynamic in nature, and is updated on a configurable cadence so that it is continuously relevant.
A reward function: The reward function is used to evaluate the agent's actions and provide feedback. It assigns a numerical value (reward) to each action the agent takes, indicating how good or bad the action was in terms of achieving the goal of the system. At Impart, we use user milestones as a key input into this reward function- events such as writing rules, blocking specific requests, or deployment of new inspectors in order to come up with the most compelling reward system that aligns with the real world behavior of our users. For example, if a security incident is declared or resolved based on an event that we detected, we will know that the reward function should be updated because our detection was highly accurate and/or actionable.

Why a Reinforcement Learning System is Ideal for API Security

Reinforcement learning is a good match for API security use cases because it offers several benefits that are well suited to the specific requirements of API Security use cases:

Adaptability: Reinforcement learning is an adaptive approach, which allows the model to learn and improve over time. This is important in API security use cases because the threat landscape is constantly evolving, and new threats are constantly emerging. With reinforcement learning, the model can adapt to these new threats and improve its performance over time.
Classification: Reinforcement learning is often used in classification and control problems. Similarly, API security requires making decisions about whether a request or a user is legitimate or not. Reinforcement learning can be used to make these decisions, by learning from past experiences and adapting to new situations.
Real-time: Reinforcement learning can operate in real-time, which is important for API security use cases. With API security, it is important to detect and prevent threats as soon as possible, to prevent them from causing harm. Reinforcement learning allows for real-time decision-making and can detect and prevent threats in real-time.

Pairing Reinforcement Learning with Online Training

Most WAFs, API Gateways, and API Security tools utilize offline training techniques, based on logs or traces that are gathered from network elements asynchronously and not in real time. In practice, most of the time this is a huge log file dump that a team of security researchers loads into a large data lake and runs queries on manually in order to come up with novel insights and business rules. With offline training, a model is only as good as the data it was trained on, and it can quickly become outdated. Using this type of approach is akin to searching for a needle in a haystack, during a hurricane.

With online learning, the model can be updated frequently, allowing it to adapt to new threats and improve its performance over time. That's why we decided to implement an Online Training system at Impart Security.

Online machine learning training, also known as online learning, is a type of machine learning where a model is trained using data that is collected and processed in real-time, as opposed to offline training which uses a fixed dataset. This approach has many benefits when it comes to application and API security use cases:

Adaptability to changing patterns: One of the biggest benefits of online learning is its ability to adapt to changing patterns. With offline training, a model is only as good as the data it was trained on and can quickly become outdated. With online learning, the model can be updated frequently, allowing it to adapt to new threats and improve its performance over time. For example, if a new type of malware is discovered, an online model can be updated to detect it in real-time, while an offline model would need to be retrained on the new data before it can detect the malware.
Ability to learn from a large amount of data: Another benefit of online learning is its ability to learn from a large amount of data. In application and API security use cases, accurate detection and prevention of cyber attacks is crucial. With offline training, the model is limited to the data it was trained on, which may not be enough to achieve high accuracy. With online learning, the model can learn from a much larger dataset, which can improve its accuracy and effectiveness in detecting and preventing cyber attacks. For example, an online model can learn from historical data, as well as real-time data, which can improve its ability to detect and prevent cyber attacks.
Real-time threat detection: Online learning also allows for real-time threat detection. With offline training, a model can only detect threats that it was trained on, and it can only do so when it is run against the data. With online learning, the model can detect threats in real-time, as the data is collected and processed. This is especially important in application and API security use cases, as it allows for quick response to potential threats and prevent them from causing harm. For example, an online model can detect a DDoS attack in real-time, and automatically block the IP addresses of the attacking systems, preventing the attack from causing harm.
Integration with other security tools: Online learning can also be used in conjunction with other security measures, such as intrusion detection systems (IDS) and security information and event management (SIEM) systems. With offline training, a model may not be able to detect certain types of threats, as it would require a specific dataset to train on. With online learning, the model can learn from a variety of sources, including IDS and SIEM systems, which can improve its accuracy and effectiveness in detecting and preventing cyber attacks. For example, an online model can integrate with an IDS to detect and block malicious network traffic.
Cost-effective: Online learning is a cost-effective approach, as it allows the model to be updated frequently, avoiding the need to retrain the model from scratch each time. This reduces the computational cost and time required to retrain the model. Additionally, online learning allows to use less data to achieve the same level of performance as offline training, which can also save on costs associated with data storage and processing.
Scalability: Online learning is also highly scalable, as it can handle a large amount of data in real-time. This is especially important in applications that have a high volume of traffic, such as web applications and APIs. With offline training, the model would need to be retrained on a regular basis to keep up with the volume of data, which can be resource-intensive and time-consuming. With online learning, the model can handle a large amount of data in real-time, making it suitable for high-traffic applications.

In conclusion, combining reinforcement learning along with online machine learning training offers many benefits for application and API security use cases. Want to learn more? Please join our BETA program to try for yourself!

‍

Meet a Co-Founder

Want to learn more about WAF and API security? Speak with an Impart Co-Founder!

Meet a Co-Founder

See why security teams love us

Get a demo