1. Overview of VoMPE
VoMPE (Voice over MIDI Polyphonic Expression) analyzes voice input and then outputs polyphonic MIDI notes based on formant frequencies representing voice characteristics.
The process of voice production can be modeled by the vibration of the vocal cords and the resonance of the vocal tract. In this case, the voice spectrum is the product of the vibration spectrum of the vocal cords and the frequency characteristics of the vocal tract. Thus, the frequency characteristics of the vocal tract are called the spectral envelope. And the resonant frequencies of the vocal tract, i.e., the peak frequencies of the spectral envelope, are called formant frequencies. These are widely used in speech analysis, synthesis, and coding.
VoMPE estimates the spectral envelope of voice input using Linear Prediction Coefficients and then finds the formant frequencies. The spectral envelope is divided into bands and one band corresponds to one MIDI channel. VoMPE outputs MIDI notes that reflect formant frequencies whose amplitude is maximum in the band.
VoMPE is provided as a VST 3 plug-in for digital audio workstations and supports 16KHz, 44.1KHz, and 48KHz sampling rates. OS environment is 64bit Windows 11.
The VoMPE binary distribution is licensed under the Creative Commons Attribution 4.0 (CC BY 4.0) at no charge.
The VoMPE source code distribution is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0) at no charge.
2. Operation of VoMPE
The user interface of VoMPE is shown in Fig. 1.

The line, blue dots, and blue line on the left side graph show the spectral envelope, formant frequencies, and MIDI note on/off threshold respectively. The graph on the right shows the spectral envelope over time with a color map. White dots indicate formant frequencies.
The spectral envelope, which has a maximum bandwidth of 8KHz, is divided into bands based on the formant bandwidth and the number of formant bands. Each band corresponds to one MIDI channel. The MIDI note on/off threshold indicates the amplitude of the formant frequency that triggers the MIDI note on/off.
The formant and MIDI frequency ratio indicates the pitch between the formant and MIDI note frequencies. The audio output delay shows the VoMPE audio output delay time. When the MIDI pitch bend is set to On, a pitch bend is sent before a MIDI note is played, resulting in a more accurate formant frequency. The pause button freezes the update of the left and right graphs.
Right-mouse clicking on the user interface displays UI zoom factors.
Examples of MPE synthesizer configurations
When the MIDI pitch bend is set to On, the synthesizer must be set to MPE mode. MPE configurations for Surge XT and Vital are described for reference.
Activate the MPE mode on the MPE synthesizer.
Surge XT: Set Status MPE and select Poly in the Play Mode. See Fig. 2.
Vital: Set MPE ENABLED. See Fig. 3.
Note: Vital's MPE status is not saved correctly in the DAW project file, so each time a project is started, unset and then set MPE ENABLED.
Set the number of voices to an appropriate value. Note that in some presets, the default number of voices is set to 1.


VST is a registered trademark of Steinberg Media Technologies GmbH.
