This paper investigates gaze estimation solutions for interacting children with Autism Spectrum Disorders (ASD). Previous research shows that satisfactory accuracy of gaze estimation can be achieved in constrained settings. However, most of the existing methods can not deal with large head movement (LHM) that frequently happens when interacting with children with ASD scenarios. We propose a gaze estimation method aiming at dealing with large head movement and achieving real time performance. An intervention table equipped with multiple sensors is designed to capture images with LHM. Firstly, reliable facial features and head poses are tracked using supervised decent method. Secondly, a convolution based integer-differential eye localization approach is used to locate the eye center efficiently and accurately. Thirdly, a rotation invariant gaze estimation model is built based on the located facial features, eye center, head pose and the depth data captured from Kinect. Finally, a multi-sensor fusion strategy is proposed to adaptively select the optimal camera to estimate the gaze as well as to fuse the depth information of the Kinect with the web camera. Experimental results showed that the gaze estimation method can achieve acceptable accuracy even in LHM situation and could potentially be applied in therapy for children with ASD.