Home
>
Software Development
>
Creating Voice-Jaw Data with a Processing Script – InApps 2022

March 21, 2022 by Phu Nguyen

Creating Voice-Jaw Data with a Processing Script – InApps 2022

Main Contents:

Creating Voice-Jaw Data with a Processing Script – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Creating Voice-Jaw Data with a Processing Script – InApps in today’s post !

We’ll Need Some Phrases

For now, the tech talk “act” will be “scripted” between Hedley and myself. It will give the illusion of a conversation. Of course, that’s assuming I can remember MY lines. It should be a great effect and lay the foundation for getting a much more sophisticated artificial intelligence (AI) conversational process up and running. Everything is going to AI, you know.

For a scripted act, we’ll certainly need a few canned audio responses. Naturally, Hedley should sound like a robot. A logical choice for this job is to use eSpeak, which is easily integrated into Linux scripts. It will even send the audio to a .wav file by using the “-w” option. We covered this text-to-speech program a few weeks ago.

Here’s an example.

robnotebook% espeak “My name is Hedley, please help me welcome Doctor Torq to the stage” -v en-us -p 30 -s 200 -k 20 -w intro-drtorq2.wav

robnotebook% espeak “My name is Hedley, please help me welcome Doctor Torq to the stage” –v en–us –p 30 –s 200 –k 20 –w intro–drtorq2.wav

The “-v” option sets the voice to US-English, a pitch (“-p” option) to 30, speed (“-s” option) to 200 and capitalization emphasis (“-k”) to 20.

You should definitely play around with the voices and other settings to find a combination that gives your robot a distinctive and interesting sound.

Down the line, I’ll need to be able to take a text response or some kind of data from an application and convert it into sounds that will come out of the speaker inside Hedley’s mouth. At the same time, the audio will have to be analyzed and the resulting data sent to Hedley’s Arduino-jaw servo subsystem.

I generated several .wav files with various responses for testing purposes. With the .wav files ready, lets now turn our attention to the Processing-based audio analysis program.

Turning a .wav File into Data

The Processing programming language, with its rich set of libraries, makes it easy to play a .wav audio file and analyze the waveforms, in near real-time. This is important because any lag between the sound and the jaw movement ruins the “talking skull” effect. There are a bunch of different functions in Processing for all kinds of sophisticated audio and visual effects.

Also, Processing closely mirrors the code layout for the Arduino. Why not use essentially the same code structure for the Arduino and your visual-audio code on a Linux notebook or Raspberry Pi?

I chose an “amplitude modulation” function to capture the prominent points of the audio wave profile. It gives reasonably realistic jaw movement. You’ll definitely need to tweak settings both on the Processing data analysis side and the Arduino-jaw servo programs for the best effect, with your project.

If you want to get really tricky, you might explore the Fast Fourier Transform (FFT) function, to grab specific parts of the audio waveform. It’s there if you want to give it a try. FFT analyzes the audio according to frequency, so you can precisely tailor your output data for specific sounds. The Wee Little Talker board does a similar thing, although much of the work is done in hardware. Hedley said he was happy speaking with the amplitude function for now.

Here’s the Processing code.

import processing.sound.*; import processing.serial.*; Serial myPort; SoundFile sample1; SoundFile sample2; Amplitude rms; float scale=8; float smooth_factor=.2; float sum; int valpos; int sendpos; int i = 0; int execloop = 1; public void setup() { size(640,360); myPort = new Serial(this, “/dev/ttyUSB0”, 115200); } public void draw() { background(125,255,125); noStroke(); fill(255,0,150); if (execloop == 1) { sample1 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq1.wav”); sample1.rate(.45); sample1.play(); rms = new Amplitude(this); rms.input(sample1); execloop = 0; } else if (execloop == 2) { sample2 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq2.wav”); sample2.rate(.45); sample2.play(); rms = new Amplitude(this); rms.input(sample2); execloop = 0; } sum += (rms.analyze() – sum) * smooth_factor; myPort.write(str(int(map((sum * 700), 30, 85, 8, 3)))); myPort.write(‘n’); }

import processing.sound.*;

import processing.serial.*;

Serial myPort;

SoundFile sample1;

SoundFile sample2;

Amplitude rms;

float scale=8;

float smooth_factor=.2;

float sum;

int valpos;

int sendpos;

int i = 0;

int execloop = 1;

public void setup() {

size(640,360);

myPort = new Serial(this, “/dev/ttyUSB0”, 115200);

}

public void draw() {

background(125,255,125);

noStroke();

fill(255,0,150);

if (execloop == 1) {

sample1 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq1.wav”);

sample1.rate(.45);

sample1.play();

rms = new Amplitude(this);

rms.input(sample1);

execloop = 0;

}

else if (execloop == 2) {

sample2 = new SoundFile(this, “/home/rob/hedleytheskull/intro-torq2.wav”);

sample2.rate(.45);

sample2.play();

rms = new Amplitude(this);

rms.input(sample2);

execloop = 0;

}

sum += (rms.analyze() – sum) * smooth_factor;

myPort.write(str(int(map((sum * 700), 30, 85, 8, 3))));

myPort.write(‘n’);

}

At the top are the usual library and variable initializations. Then the serial line is started, so we can send the resultant analysis data out to move the jaw. The real magic happens down in the draw loop where the audio files are played and then examined with the Amplitude function. The output data is smoothed out a bit and then proportionally mapped into the zero through nine range expected by the Arduino-jaw servo subsystem. Each data point terminates with a newline.

Notice that I used two different .wav files. I simply chose one or the other using the “execloop” variable, at runtime. My plan is to put the audio response file names in a text file and then step through the list as Hedley and I talk. I’ll add a new push button to my wired slide clicker and use that as the trigger when I want to jump to the next phrase. A fake antique microphone will go on top of the clicker but won’t be connected to anything at this point. With a little practice, it should look like we are talking with each other.

I’ve tested Hedley on several friends and family members and they thought his “talking skull” effect looked pretty cool and realistic.

Going Further

There is much to do as we get closer to the upcoming Embedded System Conference (ESC) show at the end of October. Hedley is getting anxious to be back up on stage.

The .wav file list function is next and then I’ll program the extra button to advance through the phrases. Hedley and I will also have to put together several conversations we’ll use for our act. And, of course, there will be lots of rehearsal… so I remember my lines. Our shtick will need to be woven in with the slides.

By-the-way, we’ll be running the slides using LibreOffice from Hedley’s onboard Raspberry Pi 3. Maybe I should start calling him my “skulltop computer.” Talk about your case mod!

Feature image via Pixabay.

InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Torq.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.