Exploring MediaPipe Pose Landmarker for Stable Motion Tracking – blog

In this blog post, I’ll talk about my experience with MediaPipe in my React application. Looking for more landmark stability, I turned to MediaPipe Pose Landmarker, discovering its advantages and overcoming its challenges along the way.

Choosing MediaPipe Pose Landmarker

PoseNet’s kind of jittery landmakers led me to explore MediaPipe Pose Landmarker in React. I actually ended up removing the smoothing filter I implemented in PoseNet because I saw additional options for automatically smoothing detections, including parameters like minPoseDetectionConfidence, minPosePresenceConfidence, and minTrackingConfidence. Finding the right combination of these values kind of offered the stability I was looking for.

To correctly set up the MediaPipe project I relied on various learning resources. Of course, at the core it was the official documentation from Google: the Pose landmark guide detection for web and their MediaPipe Pose Landmarker task for web Codepen example. Additionally, a YouTube video called Google Mediapipe Face Tracking in React with Ready Player Me Avatars! played a key role in resolving some random installation hiccups.

Sharing my experiences, below I’ve outlined the configuration that worked for me in installing MediaPipe in my React project successfully.

npm install @mediapipe/tasks-vision

...

const vision = await FilesetResolver.forVisionTasks("https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm");

poseLandmarker = await PoseLandmarker.createFromOptions(
       vision,
       {
          baseOptions: {
              modelAssetPath: "https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_full/float16/latest/pose_landmarker_full.task"
          },
          runningMode: "VIDEO"
});

...

const trackPose = async () => {
        let startTimeMs = performance.now();
        
        if (lastVideoTime !== video.currentTime) {
            lastVideoTime = video.currentTime;
            poseLandmarker.detectForVideo(video, startTimeMs, (result) => {
                ctx.save();
                ctx.clearRect(0, 0, canvasRef.current.width, canvasRef.current.height);

                for (const landmark of result.landmarks) {
                    drawingUtils.drawLandmarks(landmark, {
                        radius: (data) => DrawingUtils.lerp(data.from.z, -0.15, 0.1, 5, 1)
                    });
                    drawingUtils.drawConnectors(landmark, PoseLandmarker.POSE_CONNECTIONS);
                }
                ctx.restore();
            });
        }

        requestAnimationFrame(trackPose);
}

If you’d like to switch the MediaPipe model with another variant, below are the model paths that you can use in modelAssetPath:

Lite: https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task

Full: https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_full/float16/latest/pose_landmarker_full.task

Heavy: https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_heavy/float16/1/pose_landmarker_heavy.task

For a more visual representation, I’ve included a demo showcasing the performance of each model.

Dynamically Creating Exercises

The heart of my project lies in the creation of an exercise tracking component for the web app. I spent a lot of time thinking about the best approach – in the beginning, inspired by this post, I considered allowing the user to choose body landmarks and their position and angles based on a reference such as the Y axis of the body and generate an exercise like that. But that wouldn’t fit my research goal of being an easier way to create exercise routines, quite the opposite.

The next idea was to use a pose classification similar to Teachable Machine from Google. It would’ve provided more ease in generating an exercise or a sport routine but it would’ve been difficult to calculate then if the exercise is done correctly, if the user is becoming more flexible, and so on.

The final idea was to use a combination of both, such as letting the user record their exercise, generate a pose and landmarks based on the recording, and then save it as an exercise which will then calculate 13 angles (to begin with), and subsequently use these angles to check the live motion tracking from people who try the exercise. This method involves calculating angles for live motion tracking, inspired by a YouTube video on AI Pose Estimation with Python and MediaPipe.

Since I am not naturally mathematically gifted, I started looking for Javascript methods to calculate the angle between three points – and I found it in the form of this StackOverflow thread.

So, in order to analyze the pose, the code calculates the angle of the following 13 groups of landmarks:

[0, 11, 12] 
[13, 11, 23]
[15, 13, 11]
[14, 12, 24] 
[12, 14, 16]
[11, 23, 25]
[12, 24, 26]
[23, 25, 27]
[24, 26, 28]
[25, 23, 24]
[23, 24, 26]
[11, 23, 25]
[12, 24, 26]

To check if the user is positioned facing the camera or sideways we can calculate the angle between the two shoulders and the nose, and then we can offer feedback accordingly and normalize the values of the other angles.

Project Structure and Components

Below there’s an explanation of the current project structure and the components used.

- src:
    - assets:
        - css:
            - reset.css
            - style.css

    - components:
        - Camera.js - This is a component that holds the camera with the model setup.
        - Counter.js - A component that uses a counter to give the user some time before starting to registering the exercise.
        - PoseEditor.js - The pose editor is the component that deals with generating the pose after recording and calculating the angles.
        - SnapshotGallery.js - This gallery shows the "recorded" images (at every 250ms) that the user generates.

    - pages:
        - CreateExercise.js
        - Dashboard.js

My routing is done in the following way, using react-router-dom:

root.render(
  <BrowserRouter>
    <Routes>
      <Route path="/" element={<Layout />}>
        <Route index element={<Dashboard />} />
        <Route path="createexercise" element={<CreateExercise />} />
      </Route>
    </Routes>
  </BrowserRouter>
);

Sample Code

This is a code in progress and it’s just to give an idea about how it’s supposed to work, some features aren’t fully-fledged yet.

CreateExercise.js

import React, { useEffect, useRef, useState } from 'react';
import Webcam from 'react-webcam';
import { FilesetResolver, PoseLandmarker } from '@mediapipe/tasks-vision';
import Counter from '../components/Counter';
import SnapshotGallery from '../components/SnapshotGallery';
import PoseEditor from '../components/PoseEditor';

function CreateExercise(){
    const [poseSnapshots, setPoseSnapshots] = useState([]);
    let recordingStarted = false;
    const snapshotInterval = useRef();
    let maxSnapshots = 10;
    let startCounter = false;
    let videoWidth, videoHeight;

    const handleRecordingStart = () => {
        recordingStarted = true;
        console.log('Recording started');
        snapshotInterval.current = setInterval(() => {
                snapshotPose();
        }, 250);
    };

    const handleRecordingStop = () => {
        recordingStarted = false;
        window.clearInterval(snapshotInterval.current);
        console.log('Recording stopped');
    };

    const webcamRef = useRef(null);

    let poseLandmarker;
    let lastVideoTime = -1;
    let video = null;

    const setupPrediction = async () => {
        const vision = await FilesetResolver.forVisionTasks("https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm");

        poseLandmarker = await PoseLandmarker.createFromOptions(
            vision,
            {
                baseOptions: {
                    modelAssetPath: "https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task"
                },
                runningMode: "VIDEO"
        });

        if (
          typeof webcamRef.current !== "undefined" &&
          webcamRef.current !== null &&
          webcamRef.current.video.readyState === 4
        ) {
        
          video = webcamRef.current.video;
          videoWidth = video.width;
          videoHeight = video.height;
          
          if(window.innerHeight >= window.innerWidth || window.matchMedia("(orientation: portrait)").matches){
            videoWidth = document.querySelector('.livefeed-wrapper').getBoundingClientRect().width;
            videoHeight = videoWidth;
          }
          else{
            videoHeight = document.querySelector('.livefeed-wrapper').getBoundingClientRect().height;
            videoWidth = videoHeight;
          }
          video.width = videoWidth;
          video.height = videoHeight;
          webcamRef.current.video.width = videoWidth;
          webcamRef.current.video.height = videoHeight;

        }
        else{
          console.log("Some camera value is undefined");
        }
    };

    useEffect(() => {
        setupPrediction();
    }, [webcamRef]);

    useEffect(() => {
        if(poseSnapshots.length >= maxSnapshots){
            handleRecordingStop();
        }
    }, [poseSnapshots]);

    const snapshotPose = () => {
        let startTimeMs = performance.now();

        if(video){
            if (lastVideoTime !== video.currentTime) {
                lastVideoTime = video.currentTime;

                poseLandmarker.detectForVideo(video, startTimeMs, (result) => {
                    const snapshot = {
                        screenshot: webcamRef.current.getScreenshot(),
                        landmarks: result.landmarks,
                        timestamp: Date.now(),
                    };

                    setPoseSnapshots(prevSnapshots => [...prevSnapshots, snapshot]);
                });
            }
        }
    }

    return (
        <div>
            <header>Create Exercise</header>

            <div className='livefeed-wrapper'>
                <Webcam ref={webcamRef} className="webcam" />
            </div>

            <Counter onTimerFinish={handleRecordingStart} />

            {(poseSnapshots.length >= maxSnapshots) && (
                <PoseEditor poseCoordinates={poseSnapshots} />
            )}

            <SnapshotGallery poses={poseSnapshots} />
        </div>
    );
}

export default CreateExercise;

SnapshotGallery.js

import React, { useEffect, useState } from 'react';

function SnapshotGallery({poses}) {
    const [snapshots, setSnapshots] = useState(poses);

    useEffect(() => {
        setSnapshots(poses);
    }, [poses]);

    return(
        poses.length > 0 && (
            <>
                <div className='snapshot-gallery__title'>Movements</div>
                <div className='snapshot-gallery'>
                    {snapshots.length > 0 && (
                        snapshots.map((image, index) => (
                            <div key={index} className={`snapshot-gallery__image ${image.selected ? 'selected' : ''}`} >
                                <img src={image.screenshot} alt={`Image ${image.id}`} />
                            </div>
                        ))
                    )}
                </div>
            </>
        )
    );
}

export default SnapshotGallery;

Counter.js

import React, { useEffect, useState } from 'react';

function Counter({ onTimerFinish}) {
    const [timer, setTimer] = useState(5);

    useEffect(() => {
        const interval = setInterval(() => {
            setTimer((prevTimer) => {
                if (prevTimer > 0) {
                    return prevTimer - 1;
                } else {
                    clearInterval(interval);
                    if (onTimerFinish) {
                        onTimerFinish();
                    }
                    return 0;
                }
            });
        }, 1000);

        return () => clearInterval(interval);
    }, []);

    return (
        <div className='timer-wrapper'>
            {timer > 0 && (
                <div>
                    <div>Recording will start in... </div>
                    <div>{timer}</div>
                </div>
            )}
            {timer <= 0 && (
                <div>Recording in progress!</div>
            )}
        </div>
    );
}

export default Counter;

If you like this project, stay tuned for the next post or follow the progress on my Github repo.

Choosing MediaPipe Pose Landmarker

Dynamically Creating Exercises

Project Structure and Components

Sample Code

Building a MediaPipe AAR

Exploring Motion Tracking for a Cross-Platform Fitness App: A Progress Update

New Features and Improvements

Leave a Reply Cancel reply