Interpolation of Perceived Gender in Speech Signals

Hagelborn, Alexander; Hulme Geber, Jack

Interpolation of Perceived Gender in Speech Signals

Mark

Hagelborn, Alexander ^LU and Hulme Geber, Jack (2020) In Master's Theses in Mathematical Sciences FMSM01 20201
Mathematical Statistics

Abstract: For individuals with gender dysphoria, voice therapy can be an important tool to change characteristics about their voice to align better with their gender identity. This is often done by practising with a speech therapist and can be a long and difficult process. A useful tool in this setting would be software that can generate a voice, based on the patients voice, which lies slightly closer to their desired voice. The patient could then mimic the generated voice in order to train their voice.

The purpose of this thesis is to explore how voices can be digitally modified in order to change how their gender is perceived. The aim is to find a method of interpolation where a voice could gradually be modified to sound like a target voice,... (More); For individuals with gender dysphoria, voice therapy can be an important tool to change characteristics about their voice to align better with their gender identity. This is often done by practising with a speech therapist and can be a long and difficult process. A useful tool in this setting would be software that can generate a voice, based on the patients voice, which lies slightly closer to their desired voice. The patient could then mimic the generated voice in order to train their voice.

The purpose of this thesis is to explore how voices can be digitally modified in order to change how their gender is perceived. The aim is to find a method of interpolation where a voice could gradually be modified to sound like a target voice, and where all intermediate points on the path sound natural. Two methods were evaluated, but only one produced adequate results that were evaluated with a participant survey.

Survey participants listened to voices that are a mix of female and male voices, and rated on a scale how they perceived the gender and if the voices sounded natural. The results show that there is a decrease in how natural the modified voices sound. On average there is a consensus that the perceived gender is changed, however the individual participant results showed that there is a need for improvement. (Less)
Popular Abstract: What if you could talk into a microphone and another persons voice would come out of the speaker? What would a voice in between yours and mine sound like? And what does a voice which is 50% male and 50% female sound like? These questions arose in our thesis where we explored if it is possible to gradually change the gender identity of a voice.

Gender dysphoria is a condition of psychological distress caused by a mismatch of a persons gender and biological sex. The voice is an important gender communicator, so people with gender dysphoria often consult a speech therapist for voice therapy. The motivation for our project was to create a tool that could assist in this setting. The idea was that a person could record their voice which is... (More); What if you could talk into a microphone and another persons voice would come out of the speaker? What would a voice in between yours and mine sound like? And what does a voice which is 50% male and 50% female sound like? These questions arose in our thesis where we explored if it is possible to gradually change the gender identity of a voice.

Gender dysphoria is a condition of psychological distress caused by a mismatch of a persons gender and biological sex. The voice is an important gender communicator, so people with gender dysphoria often consult a speech therapist for voice therapy. The motivation for our project was to create a tool that could assist in this setting. The idea was that a person could record their voice which is then played back slightly moved towards their goal voice. The person can then train to sound like this voice.

Voice training takes a lot of practise and effort and the tool could be helpful since learning can be easier when imitating. In addition, training the voice in smaller steps can prevent straining it.

As you might know, voices are composed of a fundamental tone and the multiples of this tone (harmonics). The male voice is not simply a lower tone than the female voice. In addition to the tone being lower the relative amplitude between the harmonics in a male voice differs from a female voice. Knowing this, we focused on changing both the tone and the relation between harmonics of the voices. A tone played by a guitar or piano, however, can also be described as a tone with harmonics with a certain relation. So the question is: how do we change this relation while making all intermediate signals sound human? We tried two different approaches to answer this question.

In our first approach we tried to teach a neural network to extract what makes a voice unique, known as an embedding, and then recreate the voice with only this knowledge. The idea was to then tell the network to create new voices by mixing the embeddings of speakers.

In our second approach we modeled each persons voice production organs using just a recording of their voice. The new voices were then generated by creating voice production organs from a mix of two people.

Out of the two approaches, the second one was the most successful. Using a survey, we were able to determine that people perceived that we had created voices which were a mix of male and female characteristics. Future research can include improving the naturalness of the voices. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9032262

author

Hagelborn, Alexander ^LU and Hulme Geber, Jack

supervisor

Filip Elvander ^LU

organization

Mathematical Statistics

course

FMSM01 20201

year

2020

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

keywords

speech modelling, speech morph, interpolation, gender dysphoria

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMS-3402-2020

ISSN

1404-6342

other publication id

2020:E81

language

English

id

9032262

date added to LUP

2021-05-12 09:59:31

date last changed

2024-10-07 15:39:10

@misc{9032262,
  abstract     = {{For individuals with gender dysphoria, voice therapy can be an important tool to change characteristics about their voice to align better with their gender identity. This is often done by practising with a speech therapist and can be a long and difficult process. A useful tool in this setting would be software that can generate a voice, based on the patients voice, which lies slightly closer to their desired voice. The patient could then mimic the generated voice in order to train their voice. 

The purpose of this thesis is to explore how voices can be digitally modified in order to change how their gender is perceived. The aim is to find a method of interpolation where a voice could gradually be modified to sound like a target voice, and where all intermediate points on the path sound natural. Two methods were evaluated, but only one produced adequate results that were evaluated with a participant survey.

Survey participants listened to voices that are a mix of female and male voices, and rated on a scale how they perceived the gender and if the voices sounded natural. The results show that there is a decrease in how natural the modified voices sound. On average there is a consensus that the perceived gender is changed, however the individual participant results showed that there is a need for improvement.}},
  author       = {{Hagelborn, Alexander and Hulme Geber, Jack}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Interpolation of Perceived Gender in Speech Signals}},
  year         = {{2020}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Interpolation of Perceived Gender in Speech Signals