Masking Out Transient Network Issues in Sound Playback

Midlöv, Alexander

Masking Out Transient Network Issues in Sound Playback

Mark

Midlöv, Alexander ^LU (2024) EITM01 20241
Department of Electrical and Information Technology

Abstract: Significant advancements in audio transport, encoding, and processing power in
embedded systems constitute the foundation of today’s speakers. Transitioning
the digitalized audio data transmission to Audio over IP (AoIP) has opened the
potential to transmit audio with the same network flexibility as other services
using packet-based networks, such as voice over IP or video over IP.
This thesis primarily focuses on evaluating different masking techniques for
transient network errors in embedded environments, such as an IP speaker using a simulated audio pipeline. The key indicators of success are the algorithms’
latency, delay, perceived audio quality, and complexity. Both sender-based and
receiver-based, as well as combinations of... (More); Significant advancements in audio transport, encoding, and processing power in
embedded systems constitute the foundation of today’s speakers. Transitioning
the digitalized audio data transmission to Audio over IP (AoIP) has opened the
potential to transmit audio with the same network flexibility as other services
using packet-based networks, such as voice over IP or video over IP.
This thesis primarily focuses on evaluating different masking techniques for
transient network errors in embedded environments, such as an IP speaker using a simulated audio pipeline. The key indicators of success are the algorithms’
latency, delay, perceived audio quality, and complexity. Both sender-based and
receiver-based, as well as combinations of these methods, have been evaluated. Two
techniques were sender-based (Redundant Transmission (RT) and Parity Packet
Forward Error Correction (PPFEC)). Five were receiver-based (Silence Insertion
(SI), frequency-dependent White Noise Insertion (WNI), Packet Repetition (PR),
Waveform Substitution based on Pattern-matching (WSP), and Waveform Similarity Overlap and Add (WSOLA)). The sender-receiver-based techniques were
the 10 resulting combinations of sender-based and receiver-based techniques.
Considering all the key indicators, the findings were that the best-performing
algorithms for receiver-based Packet Loss Concealment (PLC), sender-based Forward Error Correction (FEC), and the combination sender-receiver-based were
PR, PPFEC, and PPFEC with PR. PR had a mean opinion score (MOS) of 57.08
across all evaluated tracks at a drop percentage of 10% with a segmented crosscorrelation score roughly at 0.5, which, compared to SI, only had a segmented
similarity score in the range of 0.1 to 0.3 for the evaluated excerpts. Further, the
execution time ratios between SI and PR were almost equal. Introducing FEC into
the masking techniques further improved the results since the reconstruction made
by FEC techniques perfectly reconstructs the lost segment if enough redundant
data was transmitted correctly. However, implementing FEC using RT or PPFEC
results in increased end-to-end latency due to the increased amount of redundant
information transmitted, with RT causing a greater increase than PPFEC.
The conclusion was that when latency does not have to be ultra-low, the combination of PPFEC and PR will do a great job of masking occasional transient
errors in low-resource AoIP-embedded systems. In the case of ultra-low requirements, using only PR is suggested since PR has low complexity, introduces no new
latency, and does not further strain the network. (Less)
Popular Abstract: In recent years, a lot has happened in the audio system industry. The forefront
of technology is being pushed forward daily. With the vastly increased processing
power in embedded systems (such as speakers or dongles), the developments of
bandwidth power, and the evolution of the Internet of Things (IoT), a revolution
has sparked regarding audio data transmission. The transition from traditional
transmitting technology, analog or previously used digital transmission, to Audio
over IP (AoIP).
AoIP has become a fast and resilient alternative for audio transmission and
has found its way into several use cases. Whether for live music performances
with ultra-low latency or safety-enhancing in cities, AoIP has found its way into
... (More); In recent years, a lot has happened in the audio system industry. The forefront
of technology is being pushed forward daily. With the vastly increased processing
power in embedded systems (such as speakers or dongles), the developments of
bandwidth power, and the evolution of the Internet of Things (IoT), a revolution
has sparked regarding audio data transmission. The transition from traditional
transmitting technology, analog or previously used digital transmission, to Audio
over IP (AoIP).
AoIP has become a fast and resilient alternative for audio transmission and
has found its way into several use cases. Whether for live music performances
with ultra-low latency or safety-enhancing in cities, AoIP has found its way into
several sectors of society. The technology is foremost used in professional industries, providing solutions for various applications. Some use cases include security
regarding fast signaling of environmental disasters, warehouse announcements and
background music, hospital announcements, producer equipment, live performance
wiring, audio tied to surveillance in subways, and so on.
The sectors using AoIP do so because they want better transmission, relaying
information faster, and utilizing the benefits tied to operating on a packet-based
network. However, as with other packet-based networks, AoIP is sensitive to latency, disruptions, and transient network errors. Further, it is crucial to achieve
and maintain good perceived audio quality. Robust systems are in high demand,
which is why many in the industry have yet to transition from traditional transmitting technologies to more modern ones. In the neighboring field of Voice over
IP (VoIP), the solution to mask the transient network errors has been researched
for quite a while. Some of these techniques have been evaluated on music and
other signals than speech, but even fewer have been assessed in AoIP systems in
embedded environments.
This thesis investigates and evaluates some PLC and FEC techniques when
implemented on an IP speaker. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9165562

author

Midlöv, Alexander ^LU

supervisor

Azra Abtahi Fahliani ^LU

organization

Department of Electrical and Information Technology

alternative title

Maskering av Tillfälliga Nätverksstörningar vid Ljuduppspelning

course

EITM01 20241

year

2024

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

report number

LU/LTH-EIT 2024-996

language

English

id

9165562

date added to LUP

2024-06-24 10:14:51

date last changed

2024-06-24 10:14:51

@misc{9165562,
  abstract     = {{Significant advancements in audio transport, encoding, and processing power in
embedded systems constitute the foundation of today’s speakers. Transitioning
the digitalized audio data transmission to Audio over IP (AoIP) has opened the
potential to transmit audio with the same network flexibility as other services
using packet-based networks, such as voice over IP or video over IP.
This thesis primarily focuses on evaluating different masking techniques for
transient network errors in embedded environments, such as an IP speaker using a simulated audio pipeline. The key indicators of success are the algorithms’
latency, delay, perceived audio quality, and complexity. Both sender-based and
receiver-based, as well as combinations of these methods, have been evaluated. Two
techniques were sender-based (Redundant Transmission (RT) and Parity Packet
Forward Error Correction (PPFEC)). Five were receiver-based (Silence Insertion
(SI), frequency-dependent White Noise Insertion (WNI), Packet Repetition (PR),
Waveform Substitution based on Pattern-matching (WSP), and Waveform Similarity Overlap and Add (WSOLA)). The sender-receiver-based techniques were
the 10 resulting combinations of sender-based and receiver-based techniques.
Considering all the key indicators, the findings were that the best-performing
algorithms for receiver-based Packet Loss Concealment (PLC), sender-based Forward Error Correction (FEC), and the combination sender-receiver-based were
PR, PPFEC, and PPFEC with PR. PR had a mean opinion score (MOS) of 57.08
across all evaluated tracks at a drop percentage of 10% with a segmented crosscorrelation score roughly at 0.5, which, compared to SI, only had a segmented
similarity score in the range of 0.1 to 0.3 for the evaluated excerpts. Further, the
execution time ratios between SI and PR were almost equal. Introducing FEC into
the masking techniques further improved the results since the reconstruction made
by FEC techniques perfectly reconstructs the lost segment if enough redundant
data was transmitted correctly. However, implementing FEC using RT or PPFEC
results in increased end-to-end latency due to the increased amount of redundant
information transmitted, with RT causing a greater increase than PPFEC.
The conclusion was that when latency does not have to be ultra-low, the combination of PPFEC and PR will do a great job of masking occasional transient
errors in low-resource AoIP-embedded systems. In the case of ultra-low requirements, using only PR is suggested since PR has low complexity, introduces no new
latency, and does not further strain the network.}},
  author       = {{Midlöv, Alexander}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Masking Out Transient Network Issues in Sound Playback}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Masking Out Transient Network Issues in Sound Playback