Satisfactory image inpainting requires visually exquisite details and semantically-plausible structures, where encoder-decoder networks have shown their potentials but bear undesired local and global inconsistencies, such as blurry textures. To address this issue, we incorporate a perception operation in the encoder, which extracts features from known areas of the input image, to improve textured details in missing areas. We also propose an iterative guidance loss for the perception operation to guide perceptual encoding features approaching to ground-truth encoding features. The guidance-enhanced perceptual encoding features are transferred to the decoder through skip connections, mutually reinforcing the entire encoder-decoder performance. Since the inpainting task involves different levels of feature representations, we further apply atrous separable parallel-convolutions (i.e. atrous separable pyramid-convolutions or ASPC) with different receptive fields in the last guidance enhanced perceptual encoding feature, which is used to learn high-level semantic features with multi-scale information. Experiments on public databases show that the proposed method achieves promising results in terms of visual details and semantic structures.