Machine learning is a subset of artificial intelligence focused on building systems that can learn from historical data, identify patterns, and make logical decisions with little to no human intervention. It is a data analysis method that automates the building of analytical models through using data that encompasses diverse forms of digital information including numbers, words, clicks and images.
Machine learning applications learn from the input data and continuously improve the accuracy of outputs using automated optimization methods. The quality of a machine learning model is dependent on two major aspects:
The better the machine learning model, the more accurately it can find features and patterns in data. That, in turn, implies the more precise its decisions and predictions will be.
Unprecedented protection combining machine learning and endpoint security along with world-class threat hunting as a service.
Learn moreWhy use machine learning? Machine learning is growing in importance due to increasingly enormous volumes and variety of data, the access and affordability of computational power, and the availability of high speed Internet. These digital transformation factors make it possible for one to rapidly and automatically develop models that can quickly and accurately analyze extraordinarily large and complex data sets.
There are a multitude of use cases that machine learning can be applied to in order to cut costs, mitigate risks, and improve overall quality of life including recommending products/services, detecting cybersecurity breaches, and enabling self-driving cars. With greater access to data and computation power, machine learning is becoming more ubiquitous every day and will soon be integrated into many facets of human life.
There are four key steps you would follow when creating a machine learning model.
Training data is information that is representative of the data the machine learning application will ingest to tune model parameters. Training data is sometimes labeled, meaning it has been tagged to call out classifications or expected values the machine learning mode is required to predict. Other training data may be unlabeled so the model will have to extract features and assign clusters autonomously.
For labeled, data should be divided into a training subset and a testing subset. The former is used to train the model and the latter to evaluate the effectiveness of the model and find ways to improve it.
The type of machine learning algorithm you choose will primarily depend on a few aspects:
For prediction or classification use cases, you would usually use regression algorithms such as ordinary least square regression or logistic regression. With unlabeled data, you are likely to rely on clustering algorithms such as k-means or nearest neighbor. Some algorithms like neural networks can be configured to work with both clustering and prediction use cases.
Training the algorithm is the process of tuning model variables and parameters to more accurately predict the appropriate results. Training the machine learning algorithm is usually iterative and uses a variety of optimization methods depending upon the chosen model. These optimization methods do not require human intervention which is part of the power of machine learning. The machine learns from the data you give it with little to no specific direction from the user.
The last step is to feed new data to the model as a means of improving its effectiveness and accuracy over time. Where the new information will come from depends on the nature of the problem to be solved. For instance, a machine learning model for self-driving cars will ingest real-world information on road conditions, objects and traffic laws.
What is supervised machine learning
Supervised machine learning algorithms use labeled data as training data where the appropriate outputs to input data are known. The machine learning algorithm ingests a set of inputs and corresponding correct outputs. The algorithm compares its own predicted outputs with the correct outputs to calculate model accuracy and then optimizes model parameters to improve accuracy.
Supervised machine learning relies on patterns to predict values on unlabeled data. It is most often used in automation, over large amounts of data records or in cases where there are too many data inputs for humans to process effectively. For example, the algorithm can pick up credit card transactions that are likely to be fraudulent or identify the insurance customer who will most probably file a claim.
What is unsupervised machine learning
Unsupervised machine learning is best applied to data that do not have structured or objective answer. There is no pre-determination of the correct output for a given input. Instead, the algorithm must understand the input and form the appropriate decision. The aim is to examine the information and identify structure within it.
Unsupervised machine learning works well on transactional information. For example, the algorithm can identify customer segments who possess similar attributes. Customers within these segments can then be targeted by similar marketing campaigns. Popular techniques used in unsupervised learning include nearest-neighbor mapping, self-organizing maps, singular value decomposition and k-means clustering. The algorithms are subsequently used to segment topics, identify outliers and recommend items.
Aspect |
Supervised learning |
Unsupervised learning |
Process |
Input and output variables are provided to train model. |
Only input data is provided to train model. No output data is used. |
Input data |
Uses labeled data. |
Uses unlabeled data. |
Algorithms supported |
Supports regression algorithms, instance-based algorithms, classification algorithms, neural networks and decision trees. |
Supports clustering algorithms, association algorithms and neural networks. |
Complexity |
Simpler. |
More complex. |
Subjectivity |
Objective. |
Subjective. |
Number of classes |
Number of classes is known. |
Number of classes is unknown. |
Primary drawback |
Classifying massive data with supervised learning is difficult. |
Choosing number of clusters can be subjective. |
Primary goal |
Train the model to predict output when presented with new inputs. |
Find useful insights and hidden patterns. |
Whereas machine learning functionality has been around for decades, it is the more recent ability to apply and automatically compute complex mathematical calculations involving big data that has given it unprecedented sophistication. The realm of machine learning application today is vast ranging from enterprise AIOps to online retail. Some real world examples of machine learning capabilities today include the following:
As data volumes grow, computing power increases, Internet bandwidth expands and data scientists enhance their expertise, machine learning will only continue to drive greater and deeper efficiency at work and at home.
With the ever increasing cyber threats that businesses face today, machine learning is needed to secure valuable data and keep hackers out of internal networks. Our premier UEBA SecOps software, ArcSight Intelligence, uses machine learning to detect anomalies that may indicate malicious actions. It has a proven track record of detecting insider threats, zero-day attacks, and even aggressive red team attacks.
Proactively detect insider risks, novel attacks, and advanced persistent threats
Accelerate threat detection and response with real-time detection and native SOAR
AI-powered test automation
Interset augments human intelligence with machine intelligence to strengthen your cyber resilience
Simplify log management and compliance while accelerating forensic investigation