Skip to content

Context fusion for recognising activities of daily living from indoor video data

Student thesis: Doctoral Thesis

Human activity recognition using images and videos is an area that is undergoing intensive research in the field of computer vision. During the last few years, the topic has attracted many researchers due to its benefit in a number of applications, including surveillance systems and assisted living. There is a significant need for an automated system that can help the elderly and enable them to live independently. As the elderly population is increasing, in the near future, there will not be enough care workers to provide them with the necessary help.
The available activity recognition methods target low-level activities, posture activities, and a few high-level activities. The techniques used in identifying high-level activities depend mainly on extracting the low-level features and using these to identify the activities. Although some of these techniques have achieved high performance in identifying certain high-level activities, there are both essential and instrumental activities of daily living that have not yet been recognised. In addition, these methods are trained in identifying staged activities that are recorded in unrealistic conditions and locations, and do not use high-level features to identify activities.
The advancement in computer systems, parallel computing, and the development of deep learning frameworks enable the creation of an automated system that can identify human activities and take the necessary actions based on the recognised activity. This facilitates the use of complicated machine learning algorithms that help in creating a system to support the elderly and people with special needs in living independently and performing their daily activities. This can be achieved by using different algorithms for identifying features and activities based on a sequence of images.
In this thesis, three major contributions are presented. The first is the creation of a method for identifying high-level activities of daily living and falls using high-level features. These high-level features are spatial (i.e., location of the detected person), temporal, the detected person’s posture, and their orientation. The method uses different models of machine learning algorithms including the Convolution Pose Machine, classifiers, the Long Short-Term Memory model, and the Hidden Markov model, for extracting high-level features from sequences of images, and uses these extracted high-level features to identify high-level activities.
The second contribution is the creation of a dataset called PortAD that addresses and solves the major issues that other existing datasets suffer from. This dataset is then used to evaluate the effectiveness of the proposed method to identify high-level activities. PortAD overcomes many of the limitations of the available datasets, including missing activities, non-realistic locations, non-practical locations for cameras, and an inadequate number of cameras used. In this work, 14 activities of daily living and instrumental activities of daily living are recorded using multiple cameras located in the top corners of the rooms.
The third major contribution is an evaluation of the effectiveness of the selected four high-level features in identifying activities of daily living. Multiple activity recognition models are proposed to identify activities of daily living, including the fixed and adaptive time threshold model, the Hidden Markov Model, and the Long Short-Term Memory.
The first finding of the study is that, combined, the selected four high-level features achieved better results compared to using one, two and three features. This is due to the fact that individual features may have problems that can be overcome when used in combination with other features.
The proposed approaches were successfully tested and evaluated via practical experiments using two datasets, the PortAD dataset, and a fall dataset that is used to identify sudden activity. High-level activities were identified using the Forward algorithm, the Forward–Backward algorithm, and the Long Short-Term Memory model; these achieved accuracy rates of 92.88%, 93.01%, and 93.32%, respectively, when tested on PortAD, and accuracy rates of 84.9%, 87.5%, and 87.7%, respectively, when tested on the fall dataset.
Original languageEnglish
Awarding Institution
Award dateApr 2020


Relations Get citation (various referencing formats)

ID: 20788276