Unsupervised Activity Clustering for Energy Expenditure Monitoring

Paper Published in BSN2013, Presentation Slides.


Install NumPy, SciPy, scikit-learn on Mac OS X for data miners

Add on and summarized from blog http://penandpants.com/2012/02/24/install-python/

Outline: 1. install Xcode –> 2. install pip –> 3. install brew –> 4. install NumPy –> 5. install gfortran (important!) –> 6. install SciPy –> 7. install matplotlib (useful) –> 8. install scikit-learn –> 9. test

Preamble: Python 2.5 ~ above is preinstalled in the current Mac OS lion. To make sure, in terminal (search in spotlight), type python after $, you should be able to see the python version installed and prompted to the python interpretation environment. Else type “sudo easy_install python” to intall python2.

1. Download Xcode from app store and install it. After that, open installed Xcode, go to Preferences –>   Download–> Command Line Tools, click ‘install’ to install the commands which are not installed in the shell.

2. The following steps will all be done in terminal.  For this step, sudo easy_install pip.


ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
(updated according to source: http://brew.sh/)

export PATH=/usr/local/bin:/usr/local/share/python:$PATH

4. sudo pip install numpy

5. brew install gfortran  –> this is a critical step before installing scipy, as many dependencies of the latter is contained in this package

6. sudo pip install scipy

7. sudo pip install matplotlib

8. sudo pip install scikit-learn

9. launch python and test the installed packages.

(after $) python –>

(after >>>)

import numpy

import scipy

import matplotlib

import sklearn

Successfully installed all the packages if no error found after the import!

In addition, pandas can also be a handy library for data analysis, to install:

sudo pip install pandas


Matlab Figure Plot Font Size Permanent Change

Tired of changing font size of titles and labels after plotting the figure in order to get intelligible characters in smaller scale? Matlab’s recommended solution is to get the figure handle by calling

h = xlabel(‘blah’); set(h,’FontSize’,10);

h = ylabel(‘blah’);set(h,’FontSize’,10);

h = title(‘blah’);set(h,’FontSize’,10);

However, I found invariably I had to change the font size since Matlab’s default label size is too small when the figure scale needs to be shrunk  So to spare myself of writing the lines to set the property of figure object (labels and titles etc.) every time, I decide to change it permanently in Matlab’s toolbox for plotting. The functions are located in:


After line of

if isempty(ax)

h = xlabel(gca,varargin{:}); %%/toolbox/matlab/graph2d/xlabel.m

or h = ylabel(gca,varargin{:});%%/toolbox/matlab/graph2d/ylabel.m

or h = title(gca,varargin{:});/toolbox/matlab/graph2d/title.m

Add in a line : set(h, ‘FontSize’,14);

Overwrite the original functions, done!

Data cursor precision in matlab

Tired of changing the data cursor display precision? You could change it permanently if you have a copy of Matlab on your local machine (i.e. it’s not a server version Matlab).

Locate file: default_getDatatipText.m

in the directory:


change the line defining number of display digits to higher:

DEFAULT_DIGITS = N;  % Display N digits of x,y position as you wish

Save the update, quit the current Matlab instance, restart, done!

Gait Speed Monitoring for Aging Conditions

 Gait speed is a particularly important parameter in geriatrics, as it is the number one predictor of mortality in adults over 65 years old, with differences of just a couple tenths of a meter per second predicting statistically significant outcome differences. The most common method for gait speed estimation in medical research and clinical practice is to simply use a stopwatch and a tape measure. This typically provides good accuracy but is insufficient for applications that require more continuous and longitudinal data, especially given that speed and many other gait parameters can vary significantly day-to-day and even hour-to-hour in geriatric and gait impaired populations. It is therefore highly desirable to be able to estimate gait speed using inertial BSNs and to do so with a resolution of better than 0.1 m/s.

However, for inertial BSNs, while simple temporal gait parameters such as step time and double stance time can be easily extracted from accelerometer and gyroscope data, parameters that depend on both temporal and spatial information are much more challenging to accurately assess due to integration drift (e.g., acceleration to velocity and position, rotational rate to angular displacement) and node placement uncertainty. And gait speed is one such spatio-temporal parameter, as it includes both stride length and stride time. With the abovementioned challenges, inertial BSNs have been prevented significant progress towards accurate gait speed estimation.


To tackle this problem, mounting calibration using simple pre-defined movements and rotation matrices to ensure accurate spatial analysis regardless of how the BSN nodes are placed on-body. In addition, application specific methods are developed and applied that leverage knowledge of biomechanics and human gait – including temporal knowledge of gait phases – in order to minimize integration drift and to better model stride length. 

  • Mounting Calibration:

  • Gait Cycle Segmentation and Drift Elimination:

    Gait cycle extraction is critical to extract parameters such as gait phase, step time, and stride length, all of which are important for gait speed estimation. Based on the assumption that during the foot on ground event, the angular velocity should be near zero, a local maximum peak detection algorithm is selected for gait cycle extraction. (This portion of the gait cycle was chosen because it also supports integration drift cancelation) To suppress the ripples in the gyroscope signal, a zero-phase, 3rd order, Butterworth low-pass filter with a cutoff frequency of 3Hz is used. The cutoff frequency is determined empirically by inspecting the spectrum of the gyroscope signal, in which the main frequency components lie below 3Hz. The spike representing the Heel-Strike event is removed after the filtering, illuminating the foot-on-ground identified by the peak detection algorithm. Then the time point of foot-on-ground event is recorded and the original gyroscope signal is kept for later integration.

  • Refined Gait Model
To better examine the human gait model, the gait cycle is divided into 8 phases as shown in Figure 5. Research has shown that the angular velocity of the shank reaches its maximum when the leg is fully extended, and the angle of the shank reaches its maximum after this when the leg is flexed. These two events do not overlap in time as illustrated in Figure 5. and verified by the data in Figure 6. Thus using leg length and the maximum shank angle for computing step length during backward swing (the simplified pendulum model in Figure 2. ) is imprecise. This discrepancy suggests a more refined compound pendulum model to compute step length as shown in Figure 7.


As shown in Figure 7. , the step length calculation of our model differs from the model in the reference. One stride’s length is defined as the sum of the step length of the right leg and the step length of the left leg in one gait cycle. The total distance travelled is the sum of the stride lengths of all cycles. Finally, the average gait speed is the distance travelled divided by the total time elapsed.


The RMSE is computed comparing treadmill speed, with a resolution of 0.2 MPH (0.09 m/s) from 1MPH to 3MPH, to the calculated gait speed. The accuracy of the proposed model was significantly higher than that of the reference model, which commonly overestimates gait speed.  The largest RMSE was only 0.095m/s after mounting calibration as shown in . However, at very low and high speeds, the thigh angle can be critical for controlling the step length. At very low speeds, the thigh tends to swing forward ahead of plumb line so as to maintain a very short step length on the treadmill, resulting in a step length that is shorter than predicted, and vice versa at high speeds. Thus, correction factors are needed to further reduce errors at very slow or fast walking speeds.

Future Work:

Work is underway to evaluate the estimation accuracy among various gaits, including both healthy and pathological gait at a greater range of speeds (including running), through experiments with more subjects. For healthy gait, a training set of data can be used to calibrate the algorithm for each individual subject. For certain types of pathological gait, including those with shuffling, a wide base, and out-of-plane motion, more refined gait models will be developed based on biomechanical knowledge.

China’s one-child policy

In order to limit the growth of its population, the Chinese government decided to limit families to having just one child. An alternative that was suggested was the “one-son” policy: as long as a woman has only female children she is allowed to have more children. One concern voiced about this policy was that no family would have more than one son, but many families would have several girls. This concern lead to our question: How would the one-son policy affect the ration of male to female births?

From Elementary Probability for Applications, By Rick Durrett
My thought: Why would it affect the female-male ratio? The natural ratio of sex should be equals since you don’t kill girls to break the balance. (Well, sadly maybe it’s not the case, pretend it’s true for now). Let’s see the book’s explanation.
To simplify the problem we assume that a family will keep having children until it has a male child. Assuming that male and female children are equally likely and the sexes of successive children are independent, the total number of children has a geometric distribution with success probability p =1/2, so by the previous example the expected number of child is 1/p =2 (E of geometric distribution is 1/p). There is always one male child, so the expected number of female children is 2-1 = 1.
Does this continue to hold if some families stop before they have a male child? Consider for simplicity the case in which a family will stop when they have a male child or a total of three children. There are 4 outcomes:
P(M) = 1/2; P(FM) = 1/4; P(FFM) = 1/8; P(FFF) = 1/8;
The average number of male children is  1/2 + 1/4 + 1/8 = 7/8, while the average number of female children is 1*(1/4) + 2*(1/8) + 3*(1/8) = 7/8;
The last calculation makes the equality of the expected values look like a miracle, but it is not, and the claim holds true if a family with k female children continues with probability Pk and stops with probability 1-Pk. To explain this intuitively, if we replace M by +1 and F by -1, then childbirth is a fair game. For the stopping rules under consideration the average winnings when we stop have mean 0; that is, the expected number of male children equals the expected number of female children.

Waiting bus paradox

Waiting bus paradox

One stochastic textbook example I’m experiencing every day. To win the battle against this, I need a smart phone and a GPS bus tracking system…

Back to the days when app like ‘Google Now’ was not available, we’ll just cross fingers hoping the bus schedule follows Poisson distribution, which ideally provides equal waiting time and interval between two buses. However, sometimes the bus schedule can be inhumane, which sets the buses running frequently at a shorter waiting time and the less frequent ones a longer waiting time, say, a 90% bus runs every 1 min and 10% bus runs every 60mins, you might end up waiting for half an hour if it takes you an hour to get to the bus station!