Playing a piano with your eyes via gaze estimation

This article guides you through how to use Pipeless to play a virtual Piano of 8 notes with your eyes just by looking to the notes.

A step by step execution guide is available here

The code is available under the examples folder at the Pipeless GitHub repository.

https://www.youtube.com/watch?v=_FktutJ_Bik

Splitting the steps into Pipeless hooks

Before starting to deep dive in the logic, we have to think about how to compose this use case with Pipeless.

Remember that with Pipeless we just need to implement some hook functions and it takes care of everything else by invoking our functions at the proper time with each frame.

In this case, we will implement a custom processing hook. Our processing hook will take the frame and produce an output that consists of 2 points (P1, P2) representing the direction of the person gaze. P1 is the center of the left eye and P2 is the direction of the gaze.

We will also implement a post-processing hook, which takes the output of processing and calculates the sound we have to play based on the gaze direction and also draws our piano over the frame.

Finally, we will initialize a context with the face mesh MediaPipe module. This allows us to initialize the library just once and read that instance from all the invocations of our hooks.

Now, we are ready to go deeper on the actual code logic.

Detecting the eyes and gaze direction

To perform the detection of the eyes we used face mesh module of the Google MediaPipe library. This module creates a mesh of points over the faces it recognizes. Our work here is to process the points of that mesh to estimate the point that the person is looking to.

To get the gaze direction, we are interested in a point P2 that, along with the left eye center (P1) gives us the gaze direction (the line that joins those points).

To do that, we have to project the 2D point from the pupil into the 3D model, using an estimation of the camera matrix and the center of the eye ball in the 3D model. For this task, we use the OpenCV estimateAffine3D function. This function gives us the transformation that we need to apply to the center of the left eye to later obtain the gaze point in the 3D world. After projecting that point back to 2D we have P2.

To maintain the article clean I will not paste the whole code here. The function defining that code can be found here.

Setting up the piano

For the piano, we will create 8 sections, one per each musical note. These sections will be created in a radial manner, being the center of the circle the left eye. In other words, we will create a circle around the left eye with 8 slices, each slice will correspond to a different sound.

Then, to know which note we must play, we calculate the gaze direction angle. Like commented above, we use the center of the left eye as x=0,y=0 for our coordinates.

The calculation is fairly simple. From our previous step we got the eye center (P1) and the estimated gaze direction point (P2). Using basic trigonometry we calculate the angle as atan((P2y - P1y) / (P2x - P1x)). This returns a value between pi/2 and -pi/2. When the angle is negative, we add 2*pi to make it positive. Then, we just need to compare the angle with the sections we defined and play the sound for that section:

# Frecuency of each note to generate the tone
notes = [261.63, 293.66, 329.63, 349.23, 392.00, 440.00, 493.88, 523.25]
if angle > 15*math.pi/8 and angle < math.pi/8:
   play_sound(notes[6], note_duration)
elif angle > math.pi/8 and angle < 3*math.pi/8:
   play_sound(notes[5], note_duration)
elif angle > 3*math.pi/8 and angle < 5*math.pi/8:
   play_sound(notes[4], note_duration)
elif angle > 5*math.pi/8 and angle < 7*math.pi/8:
   play_sound(notes[3], note_duration)
elif angle > 7*math.pi/8 and angle < 9*math.pi/8:
   play_sound(notes[2], note_duration)
elif angle > 9*math.pi/8 and angle < 11*math.pi/8:
   play_sound(notes[1], note_duration)
elif angle > 11*math.pi/8 and angle < 13*math.pi/8:
   play_sound(notes[0], note_duration)
elif angle > 13*math.pi/8 and angle < 15*math.pi/8:
   play_sound(notes[7], note_duration)

Generating the sounds

To generate our sounds we use the Python simpleaudio package. The following is the whole function that we use to play the sound. Note that some beeps will appear because we are not performing any kind of zero crossing technique to smooth the tone change.

def play_sound(note, duration):
    sample_rate = 44100  # Hz
    samples = (32767 * 0.5 * np.sin(2.0 * np.pi * np.arange(sample_rate * duration) * note / sample_rate)).astype(np.int16)
    wave_obj = sa.WaveObject(samples, 1, 2, sample_rate)
    play_obj = wave_obj.play()
    play_obj.wait_done()

Run the example

To fetch the whole code and run the example simple execute the following commands:

Install Pipeless:

curl https://raw.githubusercontent.com/pipeless-ai/pipeless/main/install.sh | bash

Create a project:

pipeless init my-project
cd my-project

Download the code:

wget -O - https://github.com/pipeless-ai/pipeless/archive/main.tar.gz | tar -xz --strip=2 "pipeless-main/examples/gaze-piano"

Start Pipeless:

pipeless start --stages-dir .

Provide a stream from your webcam:

pipeless add stream --input-uri "v4l2" --output-uri "screen" --frame-path "gaze-piano"

You can provide streams from any source including file://, https://, rtsp://, rtmp://, etc.

Conclusions

As you can see, just by creating a couple code functions for Pipeless we created a computer vision application that has utility. You can apply what we did here for many other use cases involving gaze detection. Pipeless on its side allows you to easily deploy the applications easily to any device or cloud and makes it really simple to manage streams effortlessly.

If you like what we are doing at Pipeless consider supporting us by starring our GitHub repository!

You can also join our community,don’t be shy!

Pipeless AI - Computer vision framework