Soft-attention mechanism has attracted a lot of attention in recent years due to its ability to capture the most discriminative image features for understanding actions. However, soft-attention tends to focus on fine-grained parts on images and ignores global information, which can lead to totally wrong classification results. To address this issue, we propose a novel deep selective feature learning network (DSFNet), which can automatically learn the feature maps with both fine-grained and global information. Specially, DSFNet is designed to have the ability to learn to adjust the actions for feature map selection by maximizing the cumulative discounted rewards. Moreover, the DSFNet is an easy-to-use extension of state-of-the-art base architectures of multiple tasks. Extensive experiments show that the proposed method has achieved superior performance on two standard action recognition benchmarks across still images (PPMI) and videos (HMDB51).