Vocal Pitch Modulator

A vocal pitch modulator that uses Machine Learning for realistic voice change. This is a project for NUS's CS4347 (Sound and Music Computing).

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisities

Make sure you have the right environment to run.
Run the requirements.txt file to install neccessary modules.

For anaconda environment:

conda install requirements.txt

Motivation

The goal of vocal pitch modulation in this project is to maintain a “realistic” sound when the pitch is changed. When we use conventional modulation techniques, increasing the pitch of an audio file by more than around 3 semitones tends to make people sound like chipmunks, while lowering the pitch by more than around 3 semitones makes them sound demonic/dopey. However, there are people who speak at lower and higher pitches without sounding this way, so it is not simply the case that lower/higher pitched voices sound this way, but the spectral characteristics must be adjusted in a suitable manner to keep realism. As such, in this project, we wish to employ machine learning to adjust pitch without losing realism.

Repository Structure

The relevant python files to refer to are as follows:

ANN.py: Contains the pytorch model code (TimbreEncoder, TimbreVAE, BaseFFN) and training functions
DemoPitchShifter.py: Contains the wrapper functions used in the demo notebook
PitchShifterModule.py: Contains the final pitch shift function that our project aimed to create. This function uses the naive pitch shift, followed by timbre conformation by using an ANN that we have trained. Model data itself can be found here.
Utils.py: Contains utility and graphing functions for convenience that are used throughout the notebooks.
VPM.py: Contains the main functions, e.g. naive pitch shifting, conversion of audio to FFTs, FFTs to mels, etc.

Relevant Jupyter Notebooks are as follows:

Data Processing for Training Walkthrough: This details our data processing workflow
Demonstration Notebook App: This is where you can try out the actual pitchs shifter.
Pitch Up Recreation Attempts: This notebook details our experiments when finding a useable pitch shift method.
SpecificModelCreation: This is the main ANN training notebook that was used to create all the models used in our final product.
Timbre Encoder: This was the training notebook for the Timbre Encoder portion, that was used for architecture 2.

As for directories:

Archive: Simply an archive of old work
Data: Inside the folder, you will find the list of files along with the relevant labels in dataset_files.csv. You will also be able to find the Jupyter Notebooks that were used to generate the dataset, and the raw file list, but we are not including the raw files in this repository, so these will not be for use, but for reference.
- Data/dataset: The dataset we are training our Artificial Neural Networks with can be found in the folder.
Documentation: Figures detailing the implementation of the Vocal Pitch Modulator
many_expts: The audio files produced by different experiments can be found in folder.
model_data: Contains the model data for our final model.
output_wav: The directory that the demo notebook will output pitch shifted sounds to

Vocal Pitch Modulation System Design

The following is the proposed modulation pipeline:
Image 1: Overall Vocal Pitch Modulation System Design

Please refer to Vocal Pitch Modulation Audio and Waveform presentation page to listen to our reconstructed results for each methods and experiments we took.

The above System Design includes 4 stages:

1. Pre-processing Stage

As seen in Image 1, pre-processing stage converts audio wav file, apply STFT function and create either STFT, Mel-spectrum or MFCC to be used for further processing. This stage also includes data and pitch pairing for ANN Training which will be further shown in stage 3.

2. Pitch Shifting and Training Stage

As seen in Image 1, pitch shift takes into the mel-spectrum, train and output the pitch shifted mel-spectrum to be fed for further timbre training in stage 3.

3. Further Processing and ANN Timbre Conformation Training Pipeline

Proposed Architecture 0

Proposed Architecture 1

Proposed Architecture 2

Architecture 2 has an additional Timbre Encoder ANN and the pipline is as follows:

Proposed Architecture 3

4. Post-processing stage

As seen in Image 1, post-processing stage reconstruct the STFT function to give our output result.

Data Organization

For a walkthrough of the typical data processing that we conducted, refer to the Data Processing for Training Walkthrough.

The following image is the vowel diagram we followed for dataset collection.

The following are additional aids which illustrates the organization of our data.

Acknowledgments

Thank you to CS4347 Sound and Music Computing Prof Wanye and WeiWei.

Big thanks to Vocal Pitch Modulation Team 13:

Louiz-Kim
Rachel Tan
Zachary Feng
Shaun Goh

Yin-Jyun Luo and team's work has greatly gave us huge inspiration to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Archive		Archive
Data		Data
Documentation/Figures		Documentation/Figures
many_expts		many_expts
model_data		model_data
output_wav		output_wav
.gitignore		.gitignore
ANN.py		ANN.py
Data Processing for Training Walkthrough.ipynb		Data Processing for Training Walkthrough.ipynb
DemoPitchShifter.py		DemoPitchShifter.py
Demonstration Notebook App.ipynb		Demonstration Notebook App.ipynb
Pitch Up Recreation Attempts.ipynb		Pitch Up Recreation Attempts.ipynb
PitchShifterModule.py		PitchShifterModule.py
README.md		README.md
SpecificModelCreation.ipynb		SpecificModelCreation.ipynb
Timbre Encoder.ipynb		Timbre Encoder.ipynb
Utils.py		Utils.py
VPM.py		VPM.py
requirements.txt		requirements.txt
visual-audio-materials.html		visual-audio-materials.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vocal Pitch Modulator

Getting Started

Prerequisities

Motivation

Repository Structure

Vocal Pitch Modulation System Design

The above System Design includes 4 stages:

1. Pre-processing Stage

2. Pitch Shifting and Training Stage

3. Further Processing and ANN Timbre Conformation Training Pipeline

Proposed Architecture 0

Proposed Architecture 1

Proposed Architecture 2

Proposed Architecture 3

4. Post-processing stage

Data Organization

Acknowledgments

About

Releases

Packages

Contributors 4

Languages

zioul123/VocalPitchModulator

Folders and files

Latest commit

History

Repository files navigation

Vocal Pitch Modulator

Getting Started

Prerequisities

Motivation

Repository Structure

Vocal Pitch Modulation System Design

The above System Design includes 4 stages:

1. Pre-processing Stage

2. Pitch Shifting and Training Stage

3. Further Processing and ANN Timbre Conformation Training Pipeline

Proposed Architecture 0

Proposed Architecture 1

Proposed Architecture 2

Proposed Architecture 3

4. Post-processing stage

Data Organization

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages