Tech topics

What is Machine Learning?

Illustration of IT items with focus on a question mark

Overview

Machine learning is a subset of artificial intelligence focused on building systems that can learn from historical data, identify patterns, and make logical decisions with little to no human intervention. It is a data analysis method that automates the building of analytical models through using data that encompasses diverse forms of digital information including numbers, words, clicks and images.

Machine learning applications learn from the input data and continuously improve the accuracy of outputs using automated optimization methods. The quality of a machine learning model is dependent on two major aspects:

  1. The quality of the input data. A common phrase around developing machine learning algorithms is “garbage in, garbage out”. The saying means if you put in low quality or messy data then the output of your model will be largely inaccurate.
  2. The model choice itself. In machine learning there are a plethora of algorithms that a data scientist can choose, all with their own specific uses. It is vital to choose the correct algorithm for each use case. Neural networks are an algorithm type with significant hype around it because of the high accuracy and versatility it can deliver. However, for low amounts of data choosing a simpler model will often perform better.

The better the machine learning model, the more accurately it can find features and patterns in data. That, in turn, implies the more precise its decisions and predictions will be.

OpenText ArcSight Intelligence for CrowdStrike

Unprecedented protection combining machine learning and endpoint security along with world-class threat hunting as a service.

Learn more

Machine learning

Why is machine learning important?

Why use machine learning? Machine learning is growing in importance due to increasingly enormous volumes and variety of data, the access and affordability of computational power, and the availability of high speed Internet. These digital transformation factors make it possible for one to rapidly and automatically develop models that can quickly and accurately analyze extraordinarily large and complex data sets.

There are a multitude of use cases that machine learning can be applied to in order to cut costs, mitigate risks, and improve overall quality of life including recommending products/services, detecting cybersecurity breaches, and enabling self-driving cars. With greater access to data and computation power, machine learning is becoming more ubiquitous every day and will soon be integrated into many facets of human life.


How does machine learning work?

There are four key steps you would follow when creating a machine learning model.

  1. Choose and prepare a training data set

    Training data is information that is representative of the data the machine learning application will ingest to tune model parameters. Training data is sometimes labeled, meaning it has been tagged to call out classifications or expected values the machine learning mode is required to predict. Other training data may be unlabeled so the model will have to extract features and assign clusters autonomously.

    For labeled, data should be divided into a training subset and a testing subset. The former is used to train the model and the latter to evaluate the effectiveness of the model and find ways to improve it.

  2. Select an algorithm to apply to the training data set

    The type of machine learning algorithm you choose will primarily depend on a few aspects:

    • Whether the use case is prediction of a value or classification which uses labeled training data or the use case is clustering or dimensionality reduction which uses unlabeled training data
    • How much data is in the training set
    • The nature of the problem the model seeks to solve

    For prediction or classification use cases, you would usually use regression algorithms such as ordinary least square regression or logistic regression. With unlabeled data, you are likely to rely on clustering algorithms such as k-means or nearest neighbor. Some algorithms like neural networks can be configured to work with both clustering and prediction use cases.

  3. Train the algorithm to build the model

    Training the algorithm is the process of tuning model variables and parameters to more accurately predict the appropriate results. Training the machine learning algorithm is usually iterative and uses a variety of optimization methods depending upon the chosen model. These optimization methods do not require human intervention which is part of the power of machine learning. The machine learns from the data you give it with little to no specific direction from the user.

  4. Use and improve the model

    The last step is to feed new data to the model as a means of improving its effectiveness and accuracy over time. Where the new information will come from depends on the nature of the problem to be solved. For instance, a machine learning model for self-driving cars will ingest real-world information on road conditions, objects and traffic laws.


Machine learning methods

What is supervised machine learning

Supervised machine learning algorithms use labeled data as training data where the appropriate outputs to input data are known. The machine learning algorithm ingests a set of inputs and corresponding correct outputs. The algorithm compares its own predicted outputs with the correct outputs to calculate model accuracy and then optimizes model parameters to improve accuracy.

Supervised machine learning relies on patterns to predict values on unlabeled data. It is most often used in automation, over large amounts of data records or in cases where there are too many data inputs for humans to process effectively. For example, the algorithm can pick up credit card transactions that are likely to be fraudulent or identify the insurance customer who will most probably file a claim.

What is unsupervised machine learning

Unsupervised machine learning is best applied to data that do not have structured or objective answer. There is no pre-determination of the correct output for a given input. Instead, the algorithm must understand the input and form the appropriate decision. The aim is to examine the information and identify structure within it.

Unsupervised machine learning works well on transactional information. For example, the algorithm can identify customer segments who possess similar attributes. Customers within these segments can then be targeted by similar marketing campaigns. Popular techniques used in unsupervised learning include nearest-neighbor mapping, self-organizing maps, singular value decomposition and k-means clustering. The algorithms are subsequently used to segment topics, identify outliers and recommend items.


What is the difference between supervised and unsupervised machine learning?

Aspect

Supervised learning

Unsupervised learning

Process

Input and output variables are provided to train model.

Only input data is provided to train model. No output data is used.

Input data

Uses labeled data.

Uses unlabeled data.

Algorithms supported

Supports regression algorithms, instance-based algorithms, classification algorithms, neural networks and decision trees.

Supports clustering algorithms, association algorithms and neural networks.

Complexity

Simpler.

More complex.

Subjectivity

Objective.

Subjective.

Number of classes

Number of classes is known.

Number of classes is unknown.

Primary drawback

Classifying massive data with supervised learning is difficult.

Choosing number of clusters can be subjective.

Primary goal

Train the model to predict output when presented with new inputs.

Find useful insights and hidden patterns.


What can machine learning do: Machine learning in the real world

Whereas machine learning functionality has been around for decades, it is the more recent ability to apply and automatically compute complex mathematical calculations involving big data that has given it unprecedented sophistication. The realm of machine learning application today is vast ranging from enterprise AIOps to online retail. Some real world examples of machine learning capabilities today include the following:

  • Cyber Security using behavioral analytics to determine suspicious or anomalous events that may indicate insider threatsAPTs, or zero-day attacks.
  • Self-driving car projects, such as Waymo (a subsidiary of Alphabet Inc.) and Tesla’s Autopilot which is a step below actual self-driving cars.
  • Digital assistants like Siri, Alexa and Google Assistant that search the web for information in response to our voice commands.
  • User-tailored recommendations that are driven by machine learning algorithms on websites and apps like Netflix, Amazon and YouTube.
  • Fraud detection and cyber resilience solutions that aggregate data from multiple systems, unearth clients exhibiting high-risk behavior and identify patterns of suspicious activity. These solutions can use supervised and unsupervised machine learning to classify transactions for financial organizations as fraudulent or legitimate. This is why a consumer can get texts from their credit card company verifying if an unusual purchase using the consumer’s financial credentials is legitimate. Machine learning has gotten so advanced in the area of fraud that many credit card companies advertise no-fault to consumers if fraudulent transactions are not caught by the financial organization’s algorithms.
  • Image recognition has had significant advancements and can be reliably used for facial recognition, reading handwriting on deposited checks, traffic monitoring and counting the number of people in a room.
  • Spam filters that detect and block unwanted mail from inboxes.
  • Utilities that analyze sensor data to find ways of improving efficiency and cutting costs.
  • Wearable medical devices that capture in real time valuable data for use in assessing patient health continuously.
  • Taxi apps evaluating traffic conditions in real time and recommending the most efficient route.
  • Sentiment analysis determines the tone of a line of text. Good applications of sentiment analysis are Twitter, customer reviews, and survey respondents:
    • Twitter: one way to evaluate brands is to detect the tone of tweets directed toward a person or company. Companies such as Crimson Hexagon and Nuvi provide this real time.
    • Customer reviews: You can detect the tone of customer reviews to evaluate how your company is doing. This is especially useful if there is no rating system paired with free text customer reviews.
    • Surveys: Using sentiment analysis on free text survey responses can give you at a glance evaluation of how your survey respondents feel. Qualtrics has this implemented with their surveys.
  • Market segmentation analysis uses unsupervised machine learning to cluster customers according to buying habits to determine different types or personas of customers. This allows you to better know your most valuable or underserved customers.
  • It is easy to press ctrl+F to search a document for exact words and phrases, but if you do not know the exact wording you are looking for it can be difficult to search documents. Machine learning can use techniques such as fuzzy methods and topic modelling can make this process much easier by allowing you to search documents without knowing the exact phrasing you are looking for.

Machine learning’s role will only continue to grow

As data volumes grow, computing power increases, Internet bandwidth expands and data scientists enhance their expertise, machine learning will only continue to drive greater and deeper efficiency at work and at home.

With the ever increasing cyber threats that businesses face today, machine learning is needed to secure valuable data and keep hackers out of internal networks. Our premier UEBA SecOps software, ArcSight Intelligence, uses machine learning to detect anomalies that may indicate malicious actions. It has a proven track record of detecting insider threats, zero-day attacks, and even aggressive red team attacks.

Footnotes