Skip to content

Video and image processing based techniques for people detection and counting in crowded environments

Student thesis: Doctoral Thesis

  • Zeyad Qasim Habeeb Al-Zaydi
Different technologies are used to count people but people counting systems based on computer vision are good choices due to different priorities. These priorities may include accuracy, flexibility, cost and acquiring people distribution information. People counting systems based on computer vision can use closed circuit television (CCTV) cameras that have already become ubiquitous and their uses are increasing. This thesis aims to develop people counting systems that can be incorporated with existing CCTV cameras. People counting is a useful task for safety, security and operational purposes and can be important for improving awareness.
This thesis presents two intelligent people counting systems; pixel-wise optimisation based and features regression based people counting systems. Each system works independently to count people and may be more appropriate for particular scenarios.
The pixel-wise optimisation based people counting system based on two algorithms that estimate the density of each pixel in each frame and use it as a basis for counting people. One algorithm uses scale-invariant feature transform (SIFT) features and clustering to represent pixels of frames (SIFT algorithm) and the other uses features from accelerated segment test (FAST) corner points with SIFT features (SIFT-FAST algorithm). Both algorithms are designed using a novel combination of pixel-wise, motion edges, grid map, background subtraction using Gaussian mixture model (GMM).
The features regression based people counting system is composed of a pair of collaborative Gaussian process regression (GPR) model with different kernels, which are designed to count people by taking the level of occlusion into account. The level of occlusion is measured and compared with a predefined threshold for regression model selection for each frame. In addition, this system dynamically identifies the best combination of features for people counting.
The University of California (UCSD), Mall and New York Grand Central Station datasets have been used to test and evaluate the proposed systems. These datasets have been chosen because they cover a wide range of variation of characteristics. They cover a variation of frame rate (fps), resolution, colour, location, shadows, loitering, reflections, crowd size and frame type.
By means of comparisons with state of the art methods, the results of the proposed systems outperform the others methods with respect to mean absolute error (MAE), mean squared error (MSE) and the mean deviation error (MDE) metrics. The MAE, MSE and MDE of the proposed systems are 2.83, 13.92 and 0.092, respectively, for the Mall dataset; 1.63, 4.32, and 0.066, respectively, for UCSD dataset; and 4.41, 25.62 and 0.029, respectively, for New York Grand Central dataset. The computational efficiency results of the proposed systems are 20.76 fps, 38.47fps and 19.23 fps for the Mall, UCSD and New York Grand Central datasets, respectively.
Original languageEnglish
Awarding Institution
Award dateSep 2017


Relations Get citation (various referencing formats)

ID: 10155553