Let's take a step back
Last time I said I had to try Autocorrelation in order to guess a note from an audio input. To explain this principle, I need to come back on a few things. I forgot to mention that we can represent frequencies as sine waves. For instance, here is how we could represent a frequency of 440Hz :
But most of the time, a sound is a sum of frequencies, for instance if we combine A4 (440Hz) and C#5 (554.37Hz), it looks like this :
But in real life, what we get is the sum of those frequencies, which looks like this :
As notes are louder or quieter, their frequency amplitudes differ. For instance, if A has 4 times more amplitude than C#, it results in a sine wave looking like this :
As you can see, a frequency is a pattern replicated within each period. Previously we relied on getByteFrequencyData
from the analyser to get frequencies in bins of 1.46484375Hz
. But this led us to nothing. What we will do now, is access the sound information at a given time and verify if it repeats within a period. With this period we will then be able to calculate our main frequency : frequency = 1/period
.
Remember when I said that frequencies are sine waves ? This is where is comes handy. By offsetting it gradually we can verify if it repeats.
Here is a proof of concept :
Use the range input below to change the period of the red sine wave.
P = 0.00002s
F = 1/P = 50000Hz
You will probably find that the frequency is around 450Hz, it's due to rendering but anyway, it's close to our 440Hz, so, you get the idea !
Let's build the tuner
First, I prefer to extract the logic to create our analyser inside a hook and use it on demand :
export const FFT_SIZE = 32768
export function useAnalyser(stream: MediaStream, fftSize: number = FFT_SIZE): [] | [AnalyserNode, MediaStreamAudioSourceNode] {
if (!stream ) {
return []
}
const ctx = new AudioContext()
const analyser = ctx.createAnalyser()
analyser.fftSize = fftSize
analyser.minDecibels = -100;
analyser.maxDecibels = -10;
analyser.smoothingTimeConstant = 0.85;
const sourceNode = ctx.createMediaStreamSource(stream)
sourceNode.connect(analyser)
return [analyser, sourceNode]
}
Then invoke the hook and create a function to start using our tuner extract timebased data from our analyser. It will use a setInterval that we can stop when we later want the tuner to stop.
const [analyser] = useAnalyser(audioStream, 2048)
function start() {
if (analyser) {
const interval = setInterval(() => {
const buffer = new Float32Array(analyser.fftSize)
analyser.getFloatTimeDomainData(buffer)
// This will be where we run our autoCorrelate function to guess the frequency
}, 500) as unknown as number
setGuessingInterval(interval)
}
}
We then use this wonderful function made for us by a PhD student, rewritten for reading purposes :
// Thanks to
// https://github.com/cwilso/PitchDetect/pull/23
// https://github.com/dalatant
// https://alexanderell.is/posts/tuner/
// https://en.wikipedia.org/wiki/Root_mean_square
function getRootMeanSquare(buffer: Float32Array) {
let sum = 0
for (let i = 0; i < buffer.length; i++) {
sum += buffer[i] * buffer[i]
}
return Math.sqrt(sum / buffer.length)
}
function buildBufferWithThreshold(buffer: Float32Array) {
// Since the main sound is a sum of frequencies it can be seen as a composed sine wave
// We can predict that it will repeat, so we can narrow the buffer down, to several occurences with a threshold
let r1 = 0;
let r2 = buffer.length - 1;
let thres = 0.2;
// Split buffer un half and check where we lose frequency amplitude so we can work on less data
for (let i = 0; i < buffer.length / 2; i++) {
if (Math.abs(buffer[i]) < thres) {
r1 = i;
break;
}
}
for (let i = 1; i < buffer.length / 2; i++) {
if (Math.abs(buffer[buffer.length - i]) < thres) {
r2 = buffer.length - i;
break;
}
}
return buffer.slice(r1, r2)
}
// Implements the ACF2+ algorithm
export function autoCorrelate(buffer: Float32Array, sampleRate: number) {
// Calculate the root mean square to see if we have enough signal to perform operations
let rootMeanSquare = getRootMeanSquare(buffer)
if (rootMeanSquare < 0.01) {
return -1;
}
const newBuffer = buildBufferWithThreshold(buffer)
// Build correlation array containing ordinates values multiplied by a moving offset.
// The index that will have the greater value will be the peak of our sine wave
let correlation = new Array(newBuffer.length).fill(0);
for (let i = 0; i < newBuffer.length; i++) {
for (let j = 0; j < newBuffer.length-i; j++) {
correlation[i] += newBuffer[j] * newBuffer[j+i];
}
}
let d = 0;
// Find the first bottom within our new range
while (correlation[d] > correlation[d + 1]) {
d++;
}
let maxVal = -1
let maxPos = -1;
// From that index find the first peak within the correlation array, our desired abscissa
for (var i = d; i < newBuffer.length; i++) {
if (correlation[i] > maxVal) {
maxVal = correlation[i];
maxPos = i;
}
}
var T0 = maxPos;
// From the original author:
// interpolation is parabolic interpolation. It helps with precision. We suppose that a parabola pass through the
// three points that comprise the peak. 'a' and 'b' are the unknowns from the linear equation system and b/(2a) is
// the "error" in the abscissa. Well x1,x2,x3 should be y1,y2,y3 because they are the ordinates.
// ----
// Note: x1, x2 and x3 were the previous names of y1, y2 and y3
// ----
// From me:
// What I can comprehend on this, is that we have a parabolic equation, but are unsure which of those three points
// is the closest to our desired frequency. This bits sorts it out by comparing which is closer to the parabolic
// peak.
let y1 = correlation[T0 - 1]
let y2 = correlation[T0]
let y3 = correlation[T0 + 1];
let a = (y1 + y3 - 2 * y2) / 2;
let b = (y3 - y1) / 2;
if (a) {
T0 = T0 - b / (2 * a);
}
return sampleRate / T0;
}
It may seem complicated but it only is what I was writing about before. Calculate the autocorrelation, finding it's first peak, and boom, we have our period. We then just add it to our start
function :
function start() {
if (analyser) {
const interval = setInterval(() => {
const buffer = new Float32Array(analyser.fftSize)
analyser.getFloatTimeDomainData(buffer)
// Guess the frequency
setGuessedFrequency(autoCorrelate(buffer, analyser.context.sampleRate))
}, 500) as unknown as number
setGuessingInterval(interval)
}
}
Let's just add a function to stop the tuner, and we'll be ready to wrap it up !
function stop() {
clearInterval(guessingInterval)
}
That's it, we have a tuner, that every 500ms will take the sound input and guess its main pitch.
The process explained with charts
At a given time, we extract our timebased values, notice how there is 2048 values, it's our FFT_SIZE :
We filter values that are above a specific threshold. Well, it doesn't do that much in our case, but anyway :
We calculate correlation for each abscissa :
And we find our period :
We have a period of 401, but our sample rate is 44100 samples per second. What remains is dividing our sample rate by our period, to obtain our desired frequency : 44100 / 401 = 109.975062344Hz
. Which is really close to the frequency of an A string on a guitar : 110.0Hz
! This is the string I plucked for the test. It means that it works, and I am in tune, just barely barely flat !
What if we want to use it to tune a guitar ?
Well, that was the purpose of this whole experiment. Let's use MTTS to generate an array containing tuning notes for our guitar. It is a library that I created to manipulate music theory conceps in typescript. We can use it to easily create notes, scales, chords, etc. :
const notes = [
Note.fromSPN('E2'),
Note.fromSPN('A2'),
Note.fromSPN('D3'),
Note.fromSPN('G3'),
Note.fromSPN('B3'),
Note.fromSPN('E4'),
]
We now have an array containing notes to test our guesssed frequency against. Let's create a function that recursively take a frequency and an array of notes and returns the note with the closest frequency.
function findClosestNote(searchedFrequency: number, notesInRange: Note[]): Note {
if (notesInRange.length <= 3) {
let noteFound = notesInRange[0]
let closest = Math.abs(searchedFrequency - noteFound.frequency)
for (let i = 1; i < notesInRange.length; i++) {
const frequencyDifference = Math.abs(searchedFrequency - notesInRange[i].frequency)
if (closest > frequencyDifference) {
noteFound = notesInRange[i]
closest = frequencyDifference
}
}
return noteFound
}
const secondHalf = notesInRange.slice(notesInRange.length / 2, notesInRange.length)
if (searchedFrequency < secondHalf[0].frequency) {
return findClosestNote(searchedFrequency, notesInRange.slice(0, (notesInRange.length / 2) + 1))
}
return findClosestNote(searchedFrequency, notesInRange.slice((notesInRange.length / 2) - 1, notesInRange.length))
}
And use it on our frequency inside our component :
useEffect(() => {
if (guessedFrequency > 0) {
setGuessedNote(findClosestNote(guessedFrequency, notes))
} else {
setGuessedNote(undefined)
}
return undefined
}, [setGuessedNote, guessedFrequency])
Just to wrap it up, we add wheter our frequency is flat or sharp to our closest note. To avoid testing against a fixed frequency, that is impossible to have on a guitar, we add a margin of 0.3Hz
.
const AVERAGE_FLUCTUATION = 0.3
const inTune = Math.abs(guessedFrequency - guessedNote?.frequency) < AVERAGE_FLUCTUATION
const sharp = Math.abs(guessedFrequency - guessedNote?.frequency) > AVERAGE_FLUCTUATION && (guessedFrequency - guessedNote?.frequency) > 0
const flat = Math.abs(guessedFrequency - guessedNote?.frequency) > AVERAGE_FLUCTUATION && (guessedFrequency - guessedNote?.frequency) < 0
And here we have our tuner :
The full code of this project can be found here : https://github.com/FaXaq/website/tree/master/app/blog/tuner-pt2