How to build a (guitar) tuner in Javascript ? - Part 1

What is a tuner ?

A tuner is a device that helps tuning string based instruments. It takes a sound in input and output a note name if it's close enough and indicates if it is sharp, flat, or in tune.

Get audio data from the microphone

First, we need some audio stream from the microphpone to analyze. Request access to it, and store the audio stream in a variable.

const audioStreamRequest = navigator.mediaDevices.getUserMedia({ audio: true })

audioStreamRequest
  .then((audioStream) => {
    /* ... Following source code will be there except for function & constants declarations ... */
  })
  .catch(err => {
    console.error("Error when requesting mic audio stream", err)
  })

Inserting our audio in a context

Second we'll need to inject this audio stream in a context we can analyse. We can use the Web Audio Api. It provides a context, audio nodes and many more, that we can use to manipulate and analyse sound with browser javascript.

const ctx = new AudioContext()

We'll also define the audio stream from the mic as a "source node" of our audio chain.

const sourceNode = ctx.createMediaStreamSource(audioStream)

Analyse the sound

How do I want to do it ?

I have a very crude idea in mind.

Take as reference notes we want to analyse. If we take a guitar, those are : E2, A2, D3, G3, E4. Select the one with the highest magnitude or the loudest (hopefully the one currently playing) and see if we can tell it's in tune.

Spoiler: This is probably not how to do it

Let's do it

Extracting the data

Now that we have an Audio Context, audio nodes we can play with, we can analyse the audio stream from the mic. The Web Audio Api provides an Analyser Node especially for this purpose.

The AnalyserNode interface represents a node able to provide real-time frequency and time-domain analysis information. It is an Audio Node that passes the audio stream unchanged from the input to the output, but allows you to take the generated data, process it, and create audio visualizations.

It seems that an Analyser Node can output frequencies with their magnitudes ? Let's dig into it.

In the documentation there are two methods that we may have a use for :

Copies the current frequency data into a Float32Array array passed into it.

Copies the current frequency data into a Uint8Array (unsigned byte array) passed into it.

We'll extract data from our mic input and see what it looks like.

Create an Analyser Node and connect it to our source node. Since we do not plan to do anything else with our audio stream, we can ommit to connect the Analyser Node output.

const analyserNode = ctx.createAnalyser()
sourceNode.connect(analyserNode)

Our audio nodes chain looks like this :

---------------         ----------------
| Source Node | ------> | AnalyserNode |
---------------         ----------------

We'll use the getByteFrequencyData and try requesting data from the Analyser Node each 300ms.

function collectAnalyserData(analyser) {
  // Uint8Array should be the same length as the frequencyBinCount
  const dataArray = new Uint8Array(analyser.frequencyBinCount)
  analyser.getByteFrequencyData(dataArray)

  setTimeout(() => collectAnalyserData(analyser), 300)
}

/* calling the function within the then callback */
collectAnalyserData(analyserNode)

Each 300ms it yields an Uint8Array of 1024 numbers. Those numbers are bytes, so are ranging between 0 and 255. What do we do with that ? Uint8Array from the console

Understanding the data

I found out that what getByteFrequencyData does under the hood is a Fast Fourier Transform (FFT).

A FFT is a formula used to switch between time domain to frequency domain.

Let's skip forward a few days trying to understand what FFT is. I found 2 videos on YouTube that sums it pretty well. Watch the first one, then the second one.

Since we are now in the frequency domain, we have 1024 frequencyBins. But what are those frequencyBins ? And why 1024 ?

We've 1024 frequency bins because the size of the FFT by default is 2048 and we cannot calculate magnitudes above the Nyquist limit (go check out video two if you didn't). It means that the FFT will take our audio stream and output 1024 numbers each representing the magnitude related to a frequency bin. A frequency bin is an interval between two frequencies and its range is calculated as follow : sampleRate / fftSize.

See here for a stackoverflow explaination.

If I check the sampleRate from my current audio context I get 48000. Let's use it to calculate our frequency bin range.

const frequencyBinRange = ctx.sampleRate / analyserNode.fftSize

With my values I end up with a frequencyBinRange of 23.4375Hz. That's a huge range. Let's try to reduce it a bit. Can we increase the fftSize ? Yes ! As long that it's a power of 2 and within the range of 2⁵ and 2¹⁵. Let's set it to the maximum, see where we can end up. So, let's set the fftSize to 32768 before calculating that range.

analyserNode.fftSize = 32768

I end up with a much smaller range of 1.46484375Hz. It sounds nice enough.

Guessing the current note

Here is an array containing guitar string frequencies that will be our reference. I declare it at the start of the file :

const GUITAR_STRING_FREQUENCIES = [
  { name: 'E2', frequency: 82.4068892282175 },
  { name: 'A2', frequency: 110 },
  { name: 'D3', frequency: 146.8323839587038 },
  { name: 'G3', frequency: 195.99771799087463 },
  { name: 'B3', frequency: 246.94165062806206 },
  { name: 'E4', frequency: 329.6275569128699 }
]

Now, if I want the magnitude of each guitar string frequency in my signal, I have to check on the magnitude of the corresponding frequency bin. First we'll try with the E2 string. Let's declare a function that takes a frequency in parameter, the frequency bin range, the FFT results and returns the magnitude.

To get the frequency bin that corresponds to our custom frequency, it's simply the closest lower integer of this frequency divided by the frequency bin range.

function getFrequencyMagnitudeFromFFT(frequency, frequencyBinRange, FFTValues) {
  const frequencyBinIndex = Math.floor(frequency / frequencyBinRange)
  return FFTValues[frequencyBinIndex]
}

Let's call it from within our collectAnalyserData function.

function collectAnalyserData(analyser) {
  // Uint8Array should be the same length as the frequencyBinCount
  const dataArray = new Uint8Array(analyser.frequencyBinCount)
  analyser.getByteFrequencyData(dataArray)

  const frequencyBinRange = analyser.context.sampleRate / analyser.fftSize

  const E2Magnitude = getFrequencyMagnitudeFromFFT(GUITAR_STRING_FREQUENCIES[0].frequency, frequencyBinRange, dataArray)

  setTimeout(() => collectAnalyserData(analyser), 300)
}

Well, it's working ! Let's now map our guitar strings to getFrequencyMagnitudeFromFFT and see the result, and reduce it to detect which string has the highest magnitude and hopefully the one currently playing.

function collectAnalyserData(analyser) {
  // Uint8Array should be the same length as the frequencyBinCount
  const dataArray = new Uint8Array(analyser.frequencyBinCount)
  analyser.getByteFrequencyData(dataArray)


  const frequencyBinRange = analyser.context.sampleRate / analyser.fftSize

  const guitarStringWithMagnitudes = GUITAR_STRING_FREQUENCIES.map(gsf => {
    return {
      ...gsf,
      magnitude: getFrequencyMagnitudeFromFFT(gsf.frequency, frequencyBinRange, dataArray)
    }
  })

  const guessedString = guitarStringWithMagnitudes.reduce((p, c) => p.magnitude < c.magnitude ? c : p)

  setTimeout(() => collectAnalyserData(analyser), 300)
}

I have my guitar right next to me, and when I play an A string for instance it prints in the console : guessed A string

SUCCESS !

Is the string in tune ?

Here we may have a situation. Since the frequency bin range is ~1.46Hz, we can never assure that our string is in tune within this range. Unless... We try to analyse its surroundings frequency bins ?

If a string is in tune, it means that its frequency bin's magnitude is higher than its surroundings one, right ?

function isStringInTune(guitarString, frequencyBinRange, FFTValues) {
  const prevMagnitude = getFrequencyMagnitudeFromFFT(guitarString.frequency - frequencyBinRange, frequencyBinRange, FFTValues)
  const nextMagnitude = getFrequencyMagnitudeFromFFT(guitarString.frequency + frequencyBinRange, frequencyBinRange, FFTValues)

  return prevMagnitude - guitarString.magnitude < 0 && nextMagnitude - guitarString.magnitude < 0
}

But that leaves a range of ~1.46Hz for it to be out of tune.

I have an idea : Try to see if the distance between our guitar string frequency and the starting frequency of its frequency bin has any relation with its surroundings magnitude.

Let me try to explain my reasoning. In each data extraction we have 7 data :

Note: Each time I talk about a frequency bin's frequency, I'm refering to its starting frequency.

For the E2 string, its frequency is 82.4068892282175. It means that its distance from its frequency bin is 82.4068892282175 - Math.floor(82.4068892282175 / frequencyBinRange) * frequencyBinRange which is equal to 0.3115523141549943. This is lesser than half of our frequency range. So it is closer to the previous frequency bin than the next frequency bin.

Let's see, when the E string is in tune, if the previous frequency bin magnitude is higher than the next frequency bin magnitude.

SUCCESS !

Now, after tuning it slightly sharp.

And tuning it slightly down.

SUCCESS !

Ok so what I had as an intuition worked as expected. See data for the G3 string. Its distance from its frequency bin is 0.8530646705621336. Which means it's closer to the next frequency bin than the last. So, when in tune, the next frequency bin's magnitude should be higher than the previous one.

After seeing those graphs I don't know how to procede and find the correlation. I tried looking at the lines equations' coefficients but ... nothing. If anyone has an idea, don't hesitate to fork the github repo and make a PR.

You can try it here :

Conclusion

Although it's a bit of a failure and probably not the right way to do it, I learned a lot starting from the mic input and doing it all by hand with the Web Audio Api and the FFT. After reading a few articles, I found out that I need to look into something called Autocorrelation. It will be for part two.

Thanks for reading.

Links

  1. Github repo

  2. Web Audio API