Let's take a step back

Last time I said I had to try Autocorrelation in order to guess a note from an audio input. To explain this principle, I need to come back on a few things. I forgot to mention that we can represent frequencies as sine waves. For instance, here is how we could represent a frequency of 440Hz :

But most of the time, a sound is a sum of frequencies, for instance if we combine A4 (440Hz) and C#5 (554.37Hz), it looks like this :

But in real life, what we get is the sum of those frequencies, which looks like this :

As notes are louder or quieter, their frequency amplitudes differ. For instance, if A has 4 times more amplitude than C#, it results in a sine wave looking like this :

As you can see, a frequency is a pattern replicated within each period. Previously we relied on getByteFrequencyData from the analyser to get frequencies in bins of 1.46484375Hz. But this led us to nothing. What we will do now, is access the sound information at a given time and verify if it repeats within a period. With this period we will then be able to calculate our main frequency : frequency = 1/period.

Remember when I said that frequencies are sine waves ? This is where is comes handy. By offsetting it gradually we can verify if it repeats.

Here is a proof of concept :

Use the range input below to change the period of the red sine wave.

P = 0.00002s

F = 1/P = 50000Hz

You will probably find that the frequency is around 450Hz, it's due to rendering but anyway, it's close to our 440Hz, so, you get the idea !

Let's build the tuner

First, I prefer to extract the logic to create our analyser inside a hook and use it on demand :

export const FFT_SIZE = 32768

export function useAnalyser(stream: MediaStream, fftSize: number = FFT_SIZE): [] | [AnalyserNode, MediaStreamAudioSourceNode] {
  if (!stream ) {
    return []
  }
  const ctx = new AudioContext()
  const analyser = ctx.createAnalyser()
  analyser.fftSize = fftSize
  analyser.minDecibels = -100;
  analyser.maxDecibels = -10;
  analyser.smoothingTimeConstant = 0.85;
  const sourceNode = ctx.createMediaStreamSource(stream)
  sourceNode.connect(analyser)
  return [analyser, sourceNode]
}

Then invoke the hook and create a function to start using our tuner extract timebased data from our analyser. It will use a setInterval that we can stop when we later want the tuner to stop.

const [analyser] = useAnalyser(audioStream, 2048)

function start() {
  if (analyser) {
    const interval = setInterval(() => {
      const buffer = new Float32Array(analyser.fftSize)
      analyser.getFloatTimeDomainData(buffer)
      // This will be where we run our autoCorrelate function to guess the frequency
    }, 500) as unknown as number
    setGuessingInterval(interval)
  }
}

We then use this wonderful function made for us by a PhD student, rewritten for reading purposes :

// Thanks to
// https://github.com/cwilso/PitchDetect/pull/23
// https://github.com/dalatant
// https://alexanderell.is/posts/tuner/

// https://en.wikipedia.org/wiki/Root_mean_square
function getRootMeanSquare(buffer: Float32Array) {
  let sum = 0
  for (let i = 0; i < buffer.length; i++) {
    sum += buffer[i] * buffer[i]
  }
  return Math.sqrt(sum / buffer.length)
}

function buildBufferWithThreshold(buffer: Float32Array) {
  // Since the main sound is a sum of frequencies it can be seen as a composed sine wave
  // We can predict that it will repeat, so we can narrow the buffer down, to several occurences with a threshold
  let r1 = 0;
  let r2 = buffer.length - 1;
  let thres = 0.2;

  // Split buffer un half and check where we lose frequency amplitude so we can work on less data 
  for (let i = 0; i < buffer.length / 2; i++) {
    if (Math.abs(buffer[i]) < thres) {
      r1 = i;
      break;
    }
  }

  for (let i = 1; i < buffer.length / 2; i++) {
    if (Math.abs(buffer[buffer.length - i]) < thres) {
      r2 = buffer.length - i;
      break;
    }
  }

  return buffer.slice(r1, r2)
}

// Implements the ACF2+ algorithm
export function autoCorrelate(buffer: Float32Array, sampleRate: number) {
  // Calculate the root mean square to see if we have enough signal to perform operations
  let rootMeanSquare = getRootMeanSquare(buffer)

  if (rootMeanSquare < 0.01) {
    return -1;
  }

  const newBuffer = buildBufferWithThreshold(buffer)

  // Build correlation array containing ordinates values multiplied by a moving offset.
  // The index that will have the greater value will be the peak of our sine wave
  let correlation = new Array(newBuffer.length).fill(0);
  for (let i = 0; i < newBuffer.length; i++) {
    for (let j = 0; j < newBuffer.length-i; j++) {
      correlation[i] += newBuffer[j] * newBuffer[j+i];
    }
  }

  let d = 0;

  // Find the first bottom within our new range
  while (correlation[d] > correlation[d + 1]) {
    d++;
  }

  let maxVal = -1
  let maxPos = -1;

  // From that index find the first peak within the correlation array, our desired abscissa
  for (var i = d; i < newBuffer.length; i++) {
    if (correlation[i] > maxVal) {
      maxVal = correlation[i];
      maxPos = i;
    }
  }
  var T0 = maxPos;

  // From the original author:
  // interpolation is parabolic interpolation. It helps with precision. We suppose that a parabola pass through the
  // three points that comprise the peak. 'a' and 'b' are the unknowns from the linear equation system and b/(2a) is
  // the "error" in the abscissa. Well x1,x2,x3 should be y1,y2,y3 because they are the ordinates.
  // ----
  // Note: x1, x2 and x3 were the previous names of y1, y2 and y3
  // ----
  // From me:
  // What I can comprehend on this, is that we have a parabolic equation, but are unsure which of those three points
  // is the closest to our desired frequency. This bits sorts it out by comparing which is closer to the parabolic 
  // peak.
  let y1 = correlation[T0 - 1]
  let y2 = correlation[T0]
  let y3 = correlation[T0 + 1];

  let a = (y1 + y3 - 2 * y2) / 2;
  let b = (y3 - y1) / 2;
  if (a) {
    T0 = T0 - b / (2 * a);
  }

  return sampleRate / T0;
}

It may seem complicated but it only is what I was writing about before. Calculate the autocorrelation, finding it's first peak, and boom, we have our period. We then just add it to our start function :

function start() {
  if (analyser) {
    const interval = setInterval(() => {
      const buffer = new Float32Array(analyser.fftSize)
      analyser.getFloatTimeDomainData(buffer)
      // Guess the frequency
      setGuessedFrequency(autoCorrelate(buffer, analyser.context.sampleRate))
    }, 500) as unknown as number
    setGuessingInterval(interval)
  }
}

Let's just add a function to stop the tuner, and we'll be ready to wrap it up !

function stop() {
  clearInterval(guessingInterval)
}

That's it, we have a tuner, that every 500ms will take the sound input and guess its main pitch.

The process explained with charts

At a given time, we extract our timebased values, notice how there is 2048 values, it's our FFT_SIZE :

We filter values that are above a specific threshold. Well, it doesn't do that much in our case, but anyway :

We calculate correlation for each abscissa :

And we find our period :

We have a period of 401, but our sample rate is 44100 samples per second. What remains is dividing our sample rate by our period, to obtain our desired frequency : 44100 / 401 = 109.975062344Hz. Which is really close to the frequency of an A string on a guitar : 110.0Hz ! This is the string I plucked for the test. It means that it works, and I am in tune, just barely barely flat !

What if we want to use it to tune a guitar ?

Well, that was the purpose of this whole experiment. Let's use MTTS to generate an array containing tuning notes for our guitar. It is a library that I created to manipulate music theory conceps in typescript. We can use it to easily create notes, scales, chords, etc. :

const notes = [
  Note.fromSPN('E2'),
  Note.fromSPN('A2'),
  Note.fromSPN('D3'),
  Note.fromSPN('G3'),
  Note.fromSPN('B3'),
  Note.fromSPN('E4'),
]

We now have an array containing notes to test our guesssed frequency against. Let's create a function that recursively take a frequency and an array of notes and returns the note with the closest frequency.

function findClosestNote(searchedFrequency: number, notesInRange: Note[]): Note {
  if (notesInRange.length <= 3) {
    let noteFound = notesInRange[0]
    let closest = Math.abs(searchedFrequency - noteFound.frequency)
    for (let i = 1; i < notesInRange.length; i++) {
      const frequencyDifference = Math.abs(searchedFrequency - notesInRange[i].frequency)
      if (closest > frequencyDifference) {
        noteFound = notesInRange[i]
        closest = frequencyDifference
      }
    }
    return noteFound
  }

  const secondHalf = notesInRange.slice(notesInRange.length / 2, notesInRange.length)

  if (searchedFrequency < secondHalf[0].frequency) {
    return findClosestNote(searchedFrequency, notesInRange.slice(0, (notesInRange.length / 2) + 1))
  }

  return  findClosestNote(searchedFrequency, notesInRange.slice((notesInRange.length / 2) - 1, notesInRange.length))
}

And use it on our frequency inside our component :

useEffect(() => {
  if (guessedFrequency > 0) {
    setGuessedNote(findClosestNote(guessedFrequency, notes))
  } else {
    setGuessedNote(undefined)
  }

  return undefined
}, [setGuessedNote, guessedFrequency])

Just to wrap it up, we add wheter our frequency is flat or sharp to our closest note. To avoid testing against a fixed frequency, that is impossible to have on a guitar, we add a margin of 0.3Hz.

const AVERAGE_FLUCTUATION = 0.3
const inTune = Math.abs(guessedFrequency - guessedNote?.frequency) < AVERAGE_FLUCTUATION
const sharp = Math.abs(guessedFrequency - guessedNote?.frequency) > AVERAGE_FLUCTUATION && (guessedFrequency - guessedNote?.frequency) > 0
const flat = Math.abs(guessedFrequency - guessedNote?.frequency) > AVERAGE_FLUCTUATION && (guessedFrequency - guessedNote?.frequency) < 0

And here we have our tuner :

The full code of this project can be found here : https://github.com/FaXaq/website/tree/master/app/blog/tuner-pt2