Looking for efficient and smooth motion tracking, last week I made the decision to shift from cross-platform app development to a browser-based application. The reason behind this transition was the laggy performance of the Expo app with motion tracking (on my Android phone). Despite several attempts to improve its’ functionality, the results weren’t satisfactory.

The Struggle with Laggy Motion Tracking

After spending several days experimenting with different approaches to improve motion tracking in my cross-platform app developed with Expo, I encountered continuous lag and technical issues. The breaking point was when I observed my Android phone running the official TensorFlow.js React Native example (of pose detection) at a maximum of 8 frames per second (fps). Determined to find a smoother solution, I decided to explore alternatives.

Choosing React Posenet for Browser-Based Motion Tracking

After looking for a better option for the motion tracking experience, I switched to a React Posenet web-based application. This approach involves using the react-webcam package for the webcam input and the tensorflow and tensorflow-models/posenet packages for the motion tracking model.

Smoothing Out Motion Tracking Output

After implementing a basic example in PoseNet, my next task was to address the jitters in the motion tracking output. To achieve this, I experimented with different smoothing techniques, including the Kalman filter and an Average Moving smoothing filter. After careful evaluation, I chose the average smoothing filter due to ut being lightweight but effective in achieving smoother results.

let previousEstTreshold = 5; // starting value

let previousEstX = [];
let previousEstY = [];

for(let i = 0; i < 17; i++){
   previousEstX.push([]);
   previousEstY.push([]);
}

const calculateAverageOfArray = (arr) => {
   return arr.reduce((sum, element, index) => sum + (element - sum) / (index + 1), 0);
}

const calculateKeypointAverage = (keypoint, index) => {
    if(previousEstX[index].length === previousEstTreshold){
        previousEstX[index].shift();
    }
    if(previousEstY[index].length === previousEstTreshold){
        previousEstY[index].shift();
    }

    let tempHolderX = [];
    tempHolderX = previousEstX[index];
    tempHolderX.push(keypoint.position.x);
    previousEstX[index] = tempHolderX;

    let tempHolderY = [];
    tempHolderY = previousEstY[index];
    tempHolderY.push(keypoint.position.y);
    previousEstY[index] = tempHolderY;

    const tempX = calculateAverageOfArray(previousEstX[index]);
    const tempY = calculateAverageOfArray(previousEstY[index]);

    return {
        position:{
            x: tempX,
            y: tempY
        },
        score: keypoint.score
    }
}

const updateTracker = async () => {
    if (net) {
        const poses = await net.estimateSinglePose(video, imageScaleFactor, flipHorizontal, outputStride);
        ctx.clearRect(0, 0, canvas.width, canvas.height);

        poses.keypoints.forEach((keypoint, index) => {
             const keypointObj = calculateKeypointAverage(keypoint, index);
             drawKeypoint(keypointObj);
        });
     } else {
        console.log("Lost connection to motion tracker");
        setupTracker();
    }

    requestAnimationFrame(updateTracker);
};

Above you can see two arrays that hold the X and Y positions values over a number of iterations which is represented by the previousEstTreshold variable. That means that the function calculateAverageOfArray() calculates the average value of the last N observations and then draws that value on the canvas at the keypoint position. The function calculateKeypointAverage() deals with storing these last values into an array and then averaging them into a single value – which ultimately makes the outputs of the AI smoother.

While the average smoothing filter improved the jitteriness, there will be a need for further optimization. To address the slight delay in the motion tracking I plan to introduce a dynamic threshold based on the velocity of keypoints to adapt the filtering process dynamically. Therefore, instead of previousEstTreshold = 5, this value will be dynamically edited.

Code Implementation

The entire React code implementation is as it follows:

import React, { useEffect, useState, useRef } from 'react';
import './App.css';
import Webcam from 'react-webcam';
import '@tensorflow/tfjs';
import * as posenet from '@tensorflow-models/posenet';


function App() {
  const webcamRef = useRef(null);
  const canvasRef = useRef(null);

  let previousEstTreshold = 5; // starting value

  let previousEstX = [];
  let previousEstY = [];

  for(let i = 0; i < 17; i++){
      previousEstX.push([]);
      previousEstY.push([]);
  }

  let openTracking = true;
  let poses = [];

  let model = null;
  let video = null;

  const loadPosenet = async () => {
    await posenet.load({
      architecture: 'ResNet50',
      outputStride: 32,
      inputResolution: { width: 257, height: 200 },
      quantBytes: 2
    }).then((result) => {
        model = result;
        console.log(model);
        setupCamera(model);
        console.log("Model loaded!");
    });
  }

  loadPosenet();

  const setupCamera = async (modelVal) => {
    console.log(model);
    if (
      typeof webcamRef.current !== "undefined" &&
      webcamRef.current !== null &&
      webcamRef.current.video.readyState === 4
    ) {

      video = webcamRef.current.video;
      const videoWidth = webcamRef.current.video.videoWidth;
      const videoHeight = webcamRef.current.video.videoHeight;

      webcamRef.current.video.width = videoWidth;
      webcamRef.current.video.height = videoHeight;

      updateTracker();
    }
    else{
      console.log("Some value is undefined");
    }
  };

  const drawKeypoint = (keypoint) => {
    const ctx = canvas.current.getContext("2d");
    
    if(keypoint.score >= keypointMinConfidence){
        ctx.beginPath();
        ctx.arc(keypoint.position.x, keypoint.position.y, 5, 0, 2 * Math.PI);
        ctx.fillStyle = 'red';
        ctx.fill();
        ctx.closePath();
    }
  }

  const calculateKeypointAverage = (keypoint, index) => {
    if(previousEstX[index].length === previousEstTreshold){
        previousEstX[index].shift();
    }
    if(previousEstY[index].length === previousEstTreshold){
        previousEstY[index].shift();
    }

    let tempHolderX = [];
    tempHolderX = previousEstX[index];
    tempHolderX.push(keypoint.position.x);
    previousEstX[index] = tempHolderX;

    let tempHolderY = [];
    tempHolderY = previousEstY[index];
    tempHolderY.push(keypoint.position.y);
    previousEstY[index] = tempHolderY;

    const tempX = calculateAverageOfArray(previousEstX[index]);
    const tempY = calculateAverageOfArray(previousEstY[index]);

    return {
        position:{
            x: tempX,
            y: tempY
        },
        score: keypoint.score
    }
  }

  const calculateAverageOfArray = (arr) => {
    return arr.reduce((sum, element, index) => sum + (element - sum) / (index + 1), 0);
  }

  const updateTracker = async () => {
    const poses = await model.estimateSinglePose(webcamRef.current.video);

    poses.keypoints.forEach((keypoint, index) => {
      const keypointObj = calculateKeypointAverage(keypoint, index);
      drawKeypoint(keypointObj);
    });

    requestAnimationFrame(updateTracker);
  }

  return (
    <div className="App">
      <Webcam ref={webcamRef} />
      <canvas ref={canvasRef} />
    </div>
  );
}

export default App;

For a more in-depth look at the project and to explore the complete codebase, visit the dedicated GitHub repository at fitness-tracker-web. I separated the two different project approaches into different GitHub repositories for clarity.

Conclusion

The transition from cross-platform app development to a browser-based React Posenet application has proven to be the right choice for this project. I am looking forward to experimenting with more filters and actually building the interface of the app. Visit the GitHub repository to see the implementation and stay tuned for further updates as I continue to refine this project.