Advanced

Insider Threat detection using Isolation Forest

Scherman, Maja LU and Bülow, Joakim (2018) EITM01 20162
Department of Electrical and Information Technology
Abstract
In contrast to the need for companies to get real time information about
insider threats, there is a privacy and integrity based limitation of what the
individual accepts as acceptable surveillance. This creates a problem since
performing online surveillance would pose an infringement on the employees
privacy and integrity.
Therefore we present a model using Isolation Forest to solve this problem.
We focus on analyzing the non-intrusive features in a real time, event based
approach. We process our features using periodic features, which we have sta-
tistically proven to be more effective than periodic features used with Isolation
Forest.
Our results show that by analyzing employees login and logout times, we
can detect 76% of... (More)
In contrast to the need for companies to get real time information about
insider threats, there is a privacy and integrity based limitation of what the
individual accepts as acceptable surveillance. This creates a problem since
performing online surveillance would pose an infringement on the employees
privacy and integrity.
Therefore we present a model using Isolation Forest to solve this problem.
We focus on analyzing the non-intrusive features in a real time, event based
approach. We process our features using periodic features, which we have sta-
tistically proven to be more effective than periodic features used with Isolation
Forest.
Our results show that by analyzing employees login and logout times, we
can detect 76% of all insider threats while only falsely classify 7% of all nor-
mal instances. The recall rate, which shows how complete the results are, is
76%. (Less)
Popular Abstract
Digitalization has brought us great opportunities for economic growth and there has been
a global trend for companies to store more and more of their assets and products in digital
form for many years. But digitalization also brought new types of risks and vulnerabilities
and to stay secure companies needs to invest in countermeasures. Companies are prone
to put great recourses into securing their digital perimeters and exposure to the Internet
to prevent cyber-crime and digital theft. What is commonly ignored is the possibility of
threats from within the company perimeters, so called insider threats. What if we could
detect and prevent insider threats before they ever occurred? For example, if an employee
feels let down by the... (More)
Digitalization has brought us great opportunities for economic growth and there has been
a global trend for companies to store more and more of their assets and products in digital
form for many years. But digitalization also brought new types of risks and vulnerabilities
and to stay secure companies needs to invest in countermeasures. Companies are prone
to put great recourses into securing their digital perimeters and exposure to the Internet
to prevent cyber-crime and digital theft. What is commonly ignored is the possibility of
threats from within the company perimeters, so called insider threats. What if we could
detect and prevent insider threats before they ever occurred? For example, if an employee
feels let down by the company and decides to sell information to the highest bidder; this
might be preceded by certain actions that could be detected. In this thesis we have im-
plemented a model for detection of insider threats adapted for companies which is usable
when trying to reduce the risk of insider threats.
Detection of insider threats can be done with the help of machine learning. Ma-
chine learning is when a computer learns from input data without being specifically pro-
grammed. In our case we use a machine learning algorithm, called Isolation Forest, which
is specialized in detecting anomalies. Machine learning typically needs a large amount of
data to be able to perform well. This leads us to a common problem among researchers
of insider threats - real data is often not made public. Most companies treat data of inter-
nal attacks, insider threats, with high confidentially, and do not make it publicly available.
This led a team of researchers to develop a synthetic data set, adapted for researches who
research about insider threats. The data set consists of lots of normal employee behavior
as well as a small part of suspicious events that indicates insider threats. The small part
of only only 0.023% suspicious events introduces problems that several machine learning
algorithms have a hard time to handle, but anomaly detection algorithms such as Isolation
Forest can deal with it quite well.
Inspired by the current discourse with legislations related to privacy and integrity for
citizens of the EU (General Data Protection Regulation, EU ePrivacy Regulation) we have
decided to restrict our data usage to monitor data that is less of a privacy concern for a
company’s employees. Specifically login times and logout times at office computers.
Preprocessing of the input data is an important aspect of machine learning. To be able
to get as good results as possible from the machine learning algorithm, one need to have
7CONTENTS
adapted the raw data for the algorithm. Different methods of preprocessing gives large
differences in accuracy of the machine learning model. By evaluating different ways of
preprocessing our data set, we could conclude that periodical features where better than
ordinal features.
In this thesis we have concluded that by designing a model for detecting insider threats
using arrival and departure times, we were able to detect 76% of all insider threats while
only falsely classify 7% of all normal events as threats. Although the false positives can
be expensive to handle in terms of manpower and further analysis, the detection rate of
76% could potentially save a company from a otherwise very expensive data breach. (Less)
Please use this url to cite or link to this publication:
author
Scherman, Maja LU and Bülow, Joakim
supervisor
organization
alternative title
Detektering av Interna Hot med användning av Isolation Forest
course
EITM01 20162
year
type
H2 - Master's Degree (Two Years)
subject
keywords
MSc, Insider Threats, Isolation Forest, Machine Learning, Security
report number
LU/LTH-EIT 2018-657
language
English
id
8952203
date added to LUP
2018-06-21 17:31:43
date last changed
2018-06-21 17:31:43
@misc{8952203,
  abstract     = {In contrast to the need for companies to get real time information about
insider threats, there is a privacy and integrity based limitation of what the
individual accepts as acceptable surveillance. This creates a problem since
performing online surveillance would pose an infringement on the employees
privacy and integrity.
Therefore we present a model using Isolation Forest to solve this problem.
We focus on analyzing the non-intrusive features in a real time, event based
approach. We process our features using periodic features, which we have sta-
tistically proven to be more effective than periodic features used with Isolation
Forest.
Our results show that by analyzing employees login and logout times, we
can detect 76% of all insider threats while only falsely classify 7% of all nor-
mal instances. The recall rate, which shows how complete the results are, is
76%.},
  author       = {Scherman, Maja and Bülow, Joakim},
  keyword      = {MSc,Insider Threats,Isolation Forest,Machine Learning,Security},
  language     = {eng},
  note         = {Student Paper},
  title        = {Insider Threat detection using Isolation Forest},
  year         = {2018},
}