Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Dynamic Feature Grouping in Anomaly Detection

Åkerman, Jonatan LU and Walter, Karl LU (2024) EITM01 20241
Department of Electrical and Information Technology
Abstract
Anomaly detection is important in many different areas, among them web security. When performing anomaly detection in web security there can be hundreds of features in the collected web traffic data to consider, even though all of them might not be necessary. This thesis aims to develop a strategy to dynamically create significant parameter groups from web traffic data containing over 100 features, maximising information retention for effective detection of fraudulent activity. This was achieved by employing a combination of machine learning models and systematically evaluating the predictive ability of different feature groups.

Using simple digital signal processing tools, activity level spikes were detected and used as a proxy for... (More)
Anomaly detection is important in many different areas, among them web security. When performing anomaly detection in web security there can be hundreds of features in the collected web traffic data to consider, even though all of them might not be necessary. This thesis aims to develop a strategy to dynamically create significant parameter groups from web traffic data containing over 100 features, maximising information retention for effective detection of fraudulent activity. This was achieved by employing a combination of machine learning models and systematically evaluating the predictive ability of different feature groups.

Using simple digital signal processing tools, activity level spikes were detected and used as a proxy for indicating scripted attack behaviour in web traffic. With the use of multiple feature importance methods in an ensemble context, a ranking of the most important features for classifying data points as in-spike or not was constructed. Using this ranking, multiple feature selection algorithms were able to find groups of features that on their own could help simple machine learning models determine whether a data point was benign or not.

Using the pipeline constructed in this work, a simple logistic regression model trained on only the feature groups delivered by the pipeline could classify data points as part of spikes or not, as good or sometimes better than a logistic regression model trained on the complete data set. This shows a high information retention in the feature groups, and a possibility of these groups being helpful in aiding the existing systems used by the web security firm Castle. (Less)
Popular Abstract
In a digital landscape where AI tools improve on the daily, web security is at a threat as a means of authenticating users of web sites. At the same time, the amount of data in web traffic is larger than ever, with hundreds of features describing the users and their behaviour. This thesis aims to develop a strategy to find groups of features that alone can indicate unwanted behaviour, and thus increase the capability and interpretability of web security decision making.

In the digital age, web security is paramount. The simple act of logging into a web site often requires password, multi factor authentication and even a small puzzle to prove the validity of the user. There are many reasons for maintaining strong web security, everything... (More)
In a digital landscape where AI tools improve on the daily, web security is at a threat as a means of authenticating users of web sites. At the same time, the amount of data in web traffic is larger than ever, with hundreds of features describing the users and their behaviour. This thesis aims to develop a strategy to find groups of features that alone can indicate unwanted behaviour, and thus increase the capability and interpretability of web security decision making.

In the digital age, web security is paramount. The simple act of logging into a web site often requires password, multi factor authentication and even a small puzzle to prove the validity of the user. There are many reasons for maintaining strong web security, everything from credit card fraud to account takeovers needs to be prevented. One of the more common ways to commit such crimes is by automating malicious web site usage, like trying thousands of different codes for a stolen credit card. Existing solutions aim to address these problems, but with the rise of highly competent AI used with malicious intent, they may not hold up to such threats in the future.

It is crucial to distinguish real users from ones that come from automated attacks, aiming to exploit vulnerabilities in web sites and their users. To combat malicious activity, an approach relying on more than user authentication is needed, but analysing web traffic exhaustively is not tractable due to the large number of features that describe every single data point in the traffic. What if there was a way to analyse only a select few of the data features and find insights on what reveals an automated user?

This thesis introduces a novel strategy for dynamically finding groups of features that can reveal patterns indicative of automated attacks. It was found that this strategy could boil down the number of features to a select few that retained as much information as the complete web traffic data. This was achieved by using machine learning tools to rank the importance of the features and build groups of them to find the ones that best represented the contents of the web traffic. The results can be used in web security in existing rule based services to easier find patterns that reveal automated activity. By finding small groups of features, it allows for qualified parties to find patterns in the data and make decisions accordingly. (Less)
Please use this url to cite or link to this publication:
author
Åkerman, Jonatan LU and Walter, Karl LU
supervisor
organization
course
EITM01 20241
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Feature grouping, Anomaly detection, Machine learning, Web security
report number
LU/LTH-EIT 2024-974
language
English
id
9161033
date added to LUP
2024-06-17 10:09:06
date last changed
2024-06-17 10:09:06
@misc{9161033,
  abstract     = {{Anomaly detection is important in many different areas, among them web security. When performing anomaly detection in web security there can be hundreds of features in the collected web traffic data to consider, even though all of them might not be necessary. This thesis aims to develop a strategy to dynamically create significant parameter groups from web traffic data containing over 100 features, maximising information retention for effective detection of fraudulent activity. This was achieved by employing a combination of machine learning models and systematically evaluating the predictive ability of different feature groups.

Using simple digital signal processing tools, activity level spikes were detected and used as a proxy for indicating scripted attack behaviour in web traffic. With the use of multiple feature importance methods in an ensemble context, a ranking of the most important features for classifying data points as in-spike or not was constructed. Using this ranking, multiple feature selection algorithms were able to find groups of features that on their own could help simple machine learning models determine whether a data point was benign or not.

Using the pipeline constructed in this work, a simple logistic regression model trained on only the feature groups delivered by the pipeline could classify data points as part of spikes or not, as good or sometimes better than a logistic regression model trained on the complete data set. This shows a high information retention in the feature groups, and a possibility of these groups being helpful in aiding the existing systems used by the web security firm Castle.}},
  author       = {{Åkerman, Jonatan and Walter, Karl}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Dynamic Feature Grouping in Anomaly Detection}},
  year         = {{2024}},
}