Evaluation of LLMs for Hardware Test Generation

Lidbäck, Albin

Evaluation of LLMs for Hardware Test Generation

Mark

Lidbäck, Albin ^LU (2024) EITM01 20242
Department of Electrical and Information Technology

Abstract: Testing electrical units and systems is an important part of the production process.
The development and generation of hardware tests are a labor-intensive process
that requires expertise and deep understanding of the system’s behavior. This
thesis explores the application of Large Language Models (LLMs) in generating
hardware test steps.

The research employs a comparative analysis of several LLMs including GPT-
3.5-turbo, GPT-4-turbo, Meta-Llama-3.1-8B-Instruct, and Mixtral-8x7B-Instruct-
v0.1, assessing their performance in terms of accuracy and usability. An evaluation
of the LLMs will be performed in three different electrical cases with increasing
complexity of the PCB. In addition to this, an assessment of the importance... (More); Testing electrical units and systems is an important part of the production process.
The development and generation of hardware tests are a labor-intensive process
that requires expertise and deep understanding of the system’s behavior. This
thesis explores the application of Large Language Models (LLMs) in generating
hardware test steps.

The research employs a comparative analysis of several LLMs including GPT-
3.5-turbo, GPT-4-turbo, Meta-Llama-3.1-8B-Instruct, and Mixtral-8x7B-Instruct-
v0.1, assessing their performance in terms of accuracy and usability. An evaluation
of the LLMs will be performed in three different electrical cases with increasing
complexity of the PCB. In addition to this, an assessment of the importance of
prompt engineering and how the structure of data impacts the generated test steps.

The results indicate that the LLMs have a varying performance when evaluated
for the different cases. The accuracy decreases drastically when the complexity of
the test cases is increased. The results also indicate that the structure of prompts
and data are important in the generated test steps’ quality.

This thesis contributes to the field of hardware test generation by providing an
initial study of how Artificial Intelligence (AI) and LLM may be used to automate
and ease the development of hardware tests. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9178235

author

Lidbäck, Albin ^LU

supervisor

Erik Larsson ^LU

organization

Department of Electrical and Information Technology

course

EITM01 20242

year

2024

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

report number

LU/LTH-EIT 2024-1033

language

English

id

9178235

date added to LUP

2024-11-28 09:28:32

date last changed

2024-11-28 09:28:32

@misc{9178235,
  abstract     = {{Testing electrical units and systems is an important part of the production process.
The development and generation of hardware tests are a labor-intensive process
that requires expertise and deep understanding of the system’s behavior. This
thesis explores the application of Large Language Models (LLMs) in generating
hardware test steps.

The research employs a comparative analysis of several LLMs including GPT-
3.5-turbo, GPT-4-turbo, Meta-Llama-3.1-8B-Instruct, and Mixtral-8x7B-Instruct-
v0.1, assessing their performance in terms of accuracy and usability. An evaluation
of the LLMs will be performed in three different electrical cases with increasing
complexity of the PCB. In addition to this, an assessment of the importance of
prompt engineering and how the structure of data impacts the generated test steps.

The results indicate that the LLMs have a varying performance when evaluated
for the different cases. The accuracy decreases drastically when the complexity of
the test cases is increased. The results also indicate that the structure of prompts
and data are important in the generated test steps’ quality.

This thesis contributes to the field of hardware test generation by providing an
initial study of how Artificial Intelligence (AI) and LLM may be used to automate
and ease the development of hardware tests.}},
  author       = {{Lidbäck, Albin}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Evaluation of LLMs for Hardware Test Generation}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Evaluation of LLMs for Hardware Test Generation