AI4All Day 9: Starting the Capstone Project, AI to Track and Interpret Human Movement, AI in Autonomous Vehicles

4 min readAug 20, 2022

Nidhi Parthasarathy, Friday, July 8th 2022

Starting the Project

Today, we were introduced to the dataset and started working on our capstone projects. The dataset we were going to study was the “chest” dataset in the MedMNIST database.

Screenshot of MedMNIST database [source]

It was a multi-label and multi-class dataset that helps identify diseases based on chest images. Each picture had multiple labels associated with it (e.g., cardiomyology, pneumonia, etc) and there were 14 different classes overall. This was really fun to start playing with!

We were going to train the model with a linear model and then train it with a CNN (convolutional neural network). To train it with a linear model, we had to flatten the images. However this process took a lot of time. In the meantime, we continued to learn more about how the model worked. (After the class, when we checked, we realized that this was expected to take 9 hours, so we decided to take a different approach.)

Project screenshot from analysis code lab

Interpreting Human Movement and Behaviors in Health Applications with AI

In the afternoon session, we had a talk from Serena Yeung, from Stanford, on her ongoing research on AI to interpret human movement and behaviors in health applications.

She explained how before COVID, 78% of physicians and 62% of nurses experienced symptoms of burnout, leading to a 40% increase of patient deaths. This rate has gotten even worse now post-COVID. She discussed how AI could solve this problem, providing constant awareness and assistance.

She went into detail on the AI/human partnership discussing the spectrum of levels of self-driving cars from 0 to 5: no automation (manual), driver assistance, partial automation, conditional automation, high automation, and full automation.

She talked about developing AI for automatic interpretation of human behavior using either human mesh estimation and skeletal pose and landmark estimation. She talked about the basics of human mesh recovery starting from the encoder to the regression to the camera shape and pose. For the basics of estimating face pose, she needed to define 98 points on face and train a neural network seven thousand times to make it work.

She also talked about developing a computer tool using computer vision and audio analysis to measure and track key parent child interactions, and how these could be used to improve real-life parent-child interactions at home. She discussed how responsible parenting led to positive outcomes. Unlike eye trackers and head mounted cameras in the lab that could not capture natural behavior, such long-term observations in natural home settings were much more effective in providing valuable insights.

She also talked about another way to do human mesh reconstruction from single images using gaze direction or the most common child positions. She talked about visibility and touch features per frame — e.g., five visibility labels for adult-child interactions (who looks at whom, etc).

I found the talk quite inspiring, particularly the results in real-life interactions of parents and children.

More on Autonomous Vehicles

The last session of the day was a demo from former Stanford student, Rachel Gardner, on Aurora (a self-driving technology company). She started off discussing the challenges with self-driving and why it was taking so long. First, safety implications require a much higher level of reliability than other AI applications. Second, data collection in the real world is very expensive. Finally, there is the long tail problem, whichmeans that 90% of your data is the same and the last 10% is the most important (as shown in the diagram below).

She also talked about control. Transferability and generalizability are key, where learned models adapt to the specifics of each new platform. She also talked about building on engineered foundations (e.g., detectors, and tracking).

She also talked about long-range sensing. A fully loaded semi-truck at 65mph can take over 500 feet to stop compared to only 300 feet for a passenger vehicle. To address this, Aurora developed the first lidar solution to see over 300 miles away. They are not limited by solar loading, immune to sensor interference, and measure velocity at every point.

She ended with an overview of how self-driving works, discussing how maps localization, perception, planning, and control together make an Aurora self-driving vehicle work.

Virtual Social

We ended the week with another virtual social.

The Summary For The Week

Even though this was a shorter week due to the July 4th holiday, still, we learned a lot. It was great to learn about different machine learning models, and even more fun to start playing with our dataset and building our own models. Lots of trial and error in building models and learning about different approaches (and using GPUs to run model training faster!). Looking forward to working more on the project next week!

Continue to the blog post on day 10.