Abstract
Closed circuit television cameras (CCTV) are widely used in monitoring. Most rely on human operators and controllers for detection and information gathering, e.g. the whether there is a person in the monitored area and how many people may be present. This paper presents an adaptive crowd counting system based on two algorithms that estimate the density of each pixel in each frame and use it as a basis for counting people. One algorithm uses scale-invariant feature transform (SIFT) features and clustering to represent pixels of frames (SIFT algorithm) and the other uses features from accelerated segment test (FAST) corner points with SIFT features (SIFT-FAST algorithm). Each algorithm is designed using a novel combination of pixel-wise, motion-region, grid map, background segmentation using Gaussian mixture model (GMM) and edge detection based on Canny algorithm. The Mall and University of California (UCSD) datasets have been used to evaluate the proposed algorithms. Results show that their average accuracies and processing times for each dataset are similar but their accuracies at frame level are different. A fusion technique is proposed and used to increase accuracy by combining the result of the algorithms at frame level. The mean deviation error and the mean absolute error for the two proposed algorithms are less than 0.1 and 3.1, respectively, for the Mall dataset and less than 0.07 and 1.9, respectively, for UCSD dataset.
Original language | English |
---|---|
Pages (from-to) | 23777–23804 |
Number of pages | 29 |
Journal | Multimedia Tools and Applications |
Volume | 75 |
Issue number | 23 |
Early online date | 23 Nov 2016 |
DOIs | |
Publication status | Published - Dec 2016 |
Keywords
- crowd counting systems
- surveillance systems
- CCTV cameras
- image processing
- computer vision
- background segmentation
- monitoring