Tech topics

What is Machine Learning?

Illustration of IT items with focus on a question mark

Overview

Machine learning is a subset of artificial intelligence focused on building systems that can learn from historical data, identify patterns, and make logical decisions with little to no human intervention. It is a data analysis method that automates the building of analytical models through using data that encompasses diverse forms of digital information including numbers, words, clicks and images.

Machine learning applications learn from the input data and continuously improve the accuracy of outputs using automated optimization methods. The quality of a machine learning model is dependent on two major aspects:

The quality of the input data. A common phrase around developing machine learning algorithms is “garbage in, garbage out”. The saying means if you put in low quality or messy data then the output of your model will be largely inaccurate.
The model choice itself. In machine learning there are a plethora of algorithms that a data scientist can choose, all with their own specific uses. It is vital to choose the correct algorithm for each use case. Neural networks are an algorithm type with significant hype around it because of the high accuracy and versatility it can deliver. However, for low amounts of data choosing a simpler model will often perform better.

The better the machine learning model, the more accurately it can find features and patterns in data. That, in turn, implies the more precise its decisions and predictions will be.

OpenText ArcSight Intelligence for CrowdStrike

Unprecedented protection combining machine learning and endpoint security along with world-class threat hunting as a service.

Learn more

Machine learning

Why is machine learning important?

Why use machine learning? Machine learning is growing in importance due to increasingly enormous volumes and variety of data, the access and affordability of computational power, and the availability of high speed Internet. These digital transformation factors make it possible for one to rapidly and automatically develop models that can quickly and accurately analyze extraordinarily large and complex data sets.

There are a multitude of use cases that machine learning can be applied to in order to cut costs, mitigate risks, and improve overall quality of life including recommending products/services, detecting cybersecurity breaches, and enabling self-driving cars. With greater access to data and computation power, machine learning is becoming more ubiquitous every day and will soon be integrated into many facets of human life.

How does machine learning work?

There are four key steps you would follow when creating a machine learning model.

Choose and prepare a training data set
Training data is information that is representative of the data the machine learning application will ingest to tune model parameters. Training data is sometimes labeled, meaning it has been tagged to call out classifications or expected values the machine learning mode is required to predict. Other training data may be unlabeled so the model will have to extract features and assign clusters autonomously.

For labeled, data should be divided into a training subset and a testing subset. The former is used to train the model and the latter to evaluate the effectiveness of the model and find ways to improve it.
Select an algorithm to apply to the training data set
The type of machine learning algorithm you choose will primarily depend on a few aspects:
- Whether the use case is prediction of a value or classification which uses labeled training data or the use case is clustering or dimensionality reduction which uses unlabeled training data
- How much data is in the training set
- The nature of the problem the model seeks to solve
For prediction or classification use cases, you would usually use regression algorithms such as ordinary least square regression or logistic regression. With unlabeled data, you are likely to rely on clustering algorithms such as k-means or nearest neighbor. Some algorithms like neural networks can be configured to work with both clustering and prediction use cases.
Train the algorithm to build the model
Training the algorithm is the process of tuning model variables and parameters to more accurately predict the appropriate results. Training the machine learning algorithm is usually iterative and uses a variety of optimization methods depending upon the chosen model. These optimization methods do not require human intervention which is part of the power of machine learning. The machine learns from the data you give it with little to no specific direction from the user.
Use and improve the model
The last step is to feed new data to the model as a means of improving its effectiveness and accuracy over time. Where the new information will come from depends on the nature of the problem to be solved. For instance, a machine learning model for self-driving cars will ingest real-world information on road conditions, objects and traffic laws.

Machine learning methods

What is supervised machine learning

Supervised machine learning algorithms use labeled data as training data where the appropriate outputs to input data are known. The machine learning algorithm ingests a set of inputs and corresponding correct outputs. The algorithm compares its own predicted outputs with the correct outputs to calculate model accuracy and then optimizes model parameters to improve accuracy.

Supervised machine learning relies on patterns to predict values on unlabeled data. It is most often used in automation, over large amounts of data records or in cases where there are too many data inputs for humans to process effectively. For example, the algorithm can pick up credit card transactions that are likely to be fraudulent or identify the insurance customer who will most probably file a claim.

What is unsupervised machine learning

Unsupervised machine learning is best applied to data that do not have structured or objective answer. There is no pre-determination of the correct output for a given input. Instead, the algorithm must understand the input and form the appropriate decision. The aim is to examine the information and identify structure within it.

Unsupervised machine learning works well on transactional information. For example, the algorithm can identify customer segments who possess similar attributes. Customers within these segments can then be targeted by similar marketing campaigns. Popular techniques used in unsupervised learning include nearest-neighbor mapping, self-organizing maps, singular value decomposition and k-means clustering. The algorithms are subsequently used to segment topics, identify outliers and recommend items.

What is the difference between supervised and unsupervised machine learning?

Aspect	Supervised learning	Unsupervised learning
Process	Input and output variables are provided to train model.	Only input data is provided to train model. No output data is used.
Input data	Uses labeled data.	Uses unlabeled data.
Algorithms supported	Supports regression algorithms, instance-based algorithms, classification algorithms, neural networks and decision trees.	Supports clustering algorithms, association algorithms and neural networks.
Complexity	Simpler.	More complex.
Subjectivity	Objective.	Subjective.
Number of classes	Number of classes is known.	Number of classes is unknown.
Primary drawback	Classifying massive data with supervised learning is difficult.	Choosing number of clusters can be subjective.
Primary goal	Train the model to predict output when presented with new inputs.	Find useful insights and hidden patterns.

What can machine learning do: Machine learning in the real world

Whereas machine learning functionality has been around for decades, it is the more recent ability to apply and automatically compute complex mathematical calculations involving big data that has given it unprecedented sophistication. The realm of machine learning application today is vast ranging from enterprise AIOps to online retail. Some real world examples of machine learning capabilities today include the following:

Cyber Security using behavioral analytics to determine suspicious or anomalous events that may indicate insider threats, APTs, or zero-day attacks.
Self-driving car projects, such as Waymo (a subsidiary of Alphabet Inc.) and Tesla’s Autopilot which is a step below actual self-driving cars.
Digital assistants like Siri, Alexa and Google Assistant that search the web for information in response to our voice commands.
User-tailored recommendations that are driven by machine learning algorithms on websites and apps like Netflix, Amazon and YouTube.
Fraud detection and cyber resilience solutions that aggregate data from multiple systems, unearth clients exhibiting high-risk behavior and identify patterns of suspicious activity. These solutions can use supervised and unsupervised machine learning to classify transactions for financial organizations as fraudulent or legitimate. This is why a consumer can get texts from their credit card company verifying if an unusual purchase using the consumer’s financial credentials is legitimate. Machine learning has gotten so advanced in the area of fraud that many credit card companies advertise no-fault to consumers if fraudulent transactions are not caught by the financial organization’s algorithms.
Image recognition has had significant advancements and can be reliably used for facial recognition, reading handwriting on deposited checks, traffic monitoring and counting the number of people in a room.
Spam filters that detect and block unwanted mail from inboxes.
Utilities that analyze sensor data to find ways of improving efficiency and cutting costs.
Wearable medical devices that capture in real time valuable data for use in assessing patient health continuously.
Taxi apps evaluating traffic conditions in real time and recommending the most efficient route.
Sentiment analysis determines the tone of a line of text. Good applications of sentiment analysis are Twitter, customer reviews, and survey respondents:
- Twitter: one way to evaluate brands is to detect the tone of tweets directed toward a person or company. Companies such as Crimson Hexagon and Nuvi provide this real time.
- Customer reviews: You can detect the tone of customer reviews to evaluate how your company is doing. This is especially useful if there is no rating system paired with free text customer reviews.
- Surveys: Using sentiment analysis on free text survey responses can give you at a glance evaluation of how your survey respondents feel. Qualtrics has this implemented with their surveys.
Market segmentation analysis uses unsupervised machine learning to cluster customers according to buying habits to determine different types or personas of customers. This allows you to better know your most valuable or underserved customers.
It is easy to press ctrl+F to search a document for exact words and phrases, but if you do not know the exact wording you are looking for it can be difficult to search documents. Machine learning can use techniques such as fuzzy methods and topic modelling can make this process much easier by allowing you to search documents without knowing the exact phrasing you are looking for.

Machine learning’s role will only continue to grow

As data volumes grow, computing power increases, Internet bandwidth expands and data scientists enhance their expertise, machine learning will only continue to drive greater and deeper efficiency at work and at home.

With the ever increasing cyber threats that businesses face today, machine learning is needed to secure valuable data and keep hackers out of internal networks. Our premier UEBA SecOps software, ArcSight Intelligence, uses machine learning to detect anomalies that may indicate malicious actions. It has a proven track record of detecting insider threats, zero-day attacks, and even aggressive red team attacks.

Resources

What is Artificial Intelligence?

What is AIOps?

Predictive analytics using machine learning

MITRE ATT&CK machine learning

FeaturedFeatured

Analytics CloudAnalytics Cloud

Data Lakehouse & AnalyticsData Lakehouse & Analytics

BI, Visualization & ReportingBI, Visualization & Reporting

eDiscovery with AIeDiscovery with AI

OpenText™ Aviator Search

Business Network CloudBusiness Network Cloud

Supply Chain AutomationSupply Chain Automation

B2B IntegrationB2B Integration

Secure CollaborationSecure Collaboration

Supply Chain TraceabilitySupply Chain Traceability

Supply Chain InsightsSupply Chain Insights

Industry Applications and ServicesIndustry Applications and Services

OpenText™ Business Network Aviator

Content CloudContent Cloud

Document ManagementDocument Management

AI Content ManagementAI Content Management

Capture and Intelligent Document ProcessingCapture and Intelligent Document Processing

Process AutomationProcess Automation

Business IntegrationsBusiness Integrations

Information ArchivingInformation Archiving

Industry SolutionsIndustry Solutions

Information GovernanceInformation Governance

OpenText™ Content Aviator

Cybersecurity CloudCybersecurity Cloud

Application SecurityApplication Security

Data Privacy and ProtectionData Privacy and Protection

Threat Detection and ResponseThreat Detection and Response

Identity and Access ManagementIdentity and Access Management

Digital Investigations and ForensicsDigital Investigations and Forensics

Threat IntelligenceThreat Intelligence

OpenText™ Cybersecurity Aviator

DevOps CloudDevOps Cloud

DevOps PlatformDevOps Platform

Functional TestingFunctional Testing

PPM and Strategic Portfolio ManagementPPM and Strategic Portfolio Management

Quality ManagementQuality Management

Performance EngineeringPerformance Engineering

OpenText™ DevOps Aviator

Experience CloudExperience Cloud

Web & Mobile ExperiencesWeb & Mobile Experiences

Contact Center AnalyticsContact Center Analytics

Messaging & FaxMessaging & Fax

Customer CommunicationsCustomer Communications

Digital Asset ManagementDigital Asset Management

Customer Journey and DataCustomer Journey and Data

OpenText™ Experience Aviator

Observability and Service Management CloudObservability and Service Management Cloud

Service ManagementService Management

ObservabilityObservability

AIOpsAIOps

Automation and Vulnerability RemediationAutomation and Vulnerability Remediation

CMDB and Asset ManagementCMDB and Asset Management

OpenText™ Service Management Aviator

OpenText™ ThrustOpenText™ Thrust

OpenText™ Thrust bundleOpenText™ Thrust bundle

OpenText™ Aviator Thrust

PortfolioPortfolio

Enterprise Data Backup and Disaster Recovery SolutionsEnterprise Data Backup and Disaster Recovery Solutions

Unified Endpoint Management ToolsUnified Endpoint Management Tools

Hybrid Work, Email, and Team Collaboration Hybrid Work, Email, and Team Collaboration

Email Archiving, E-Discovery, Data Archiving ComplianceEmail Archiving, E-Discovery, Data Archiving Compliance

Connectivity and Document ManagementConnectivity and Document Management

Information reimaginedInformation reimagined

Artificial IntelligenceArtificial Intelligence

IndustryIndustry

Enterprise ApplicationsEnterprise Applications

Your journey to successYour journey to success

Customer SupportCustomer Support

Customer Success ServicesCustomer Success Services

Consulting ServicesConsulting Services

NextGen ServicesNextGen Services

Cloud MigrationCloud Migration

Learning ServicesLearning Services

Managed ServicesManaged Services

Find an OpenText PartnerFind an OpenText Partner

Find a Partner SolutionFind a Partner Solution

Grow as a PartnerGrow as a Partner

Become a Partner

Asset LibraryAsset Library

Featured

Analytics Cloud

Data Lakehouse & Analytics

BI, Visualization & Reporting

eDiscovery with AI

Business Network Cloud

Supply Chain Automation

B2B Integration

Secure Collaboration

Supply Chain Traceability

Supply Chain Insights

Industry Applications and Services

Content Cloud

Document Management

AI Content Management

Capture and Intelligent Document Processing

Process Automation

Business Integrations

Information Archiving

Industry Solutions

Information Governance

Cybersecurity Cloud

Application Security

Data Privacy and Protection

Threat Detection and Response

Identity and Access Management

Digital Investigations and Forensics

Threat Intelligence

DevOps Cloud

DevOps Platform

Functional Testing

PPM and Strategic Portfolio Management

Quality Management

Performance Engineering

Experience Cloud

Web & Mobile Experiences

Contact Center Analytics

Messaging & Fax

Customer Communications

Digital Asset Management

Customer Journey and Data

Observability and Service Management Cloud

Service Management

Observability

AIOps

Automation and Vulnerability Remediation

CMDB and Asset Management

OpenText™ Thrust

OpenText™ Thrust bundle

Portfolio

Enterprise Data Backup and Disaster Recovery Solutions

Unified Endpoint Management Tools

Hybrid Work, Email, and Team Collaboration

Email Archiving, E-Discovery, Data Archiving Compliance

Connectivity and Document Management

Information reimagined

Artificial Intelligence

Industry

Enterprise Applications

Your journey to success

Customer Support

Customer Success Services

Consulting Services

NextGen Services

Cloud Migration

Learning Services

Managed Services

Find an OpenText Partner

Find a Partner Solution

Grow as a Partner

Asset Library

Blogs

Events

Communities

Customer Stories

OpenText Navigator

Marketplace