Image semantic segmentation has always been a research hotspot in the field of robots. Its purpose is to assign different semantic category labels to objects by segmenting different objects. However, in practical applications, in addition to knowing the semantic category information of objects, robots also need to know the position information of objects to complete more complex visual tasks. Aiming at a complex indoor environment, this study designs an image semantic segmentation network framework of joint target detection. Using the parallel operation of adding semantic segmentation branches to the target detection network, it innovatively implements multi-vision task combining object classification, detection and semantic segmentation. By designing a new loss function, adjusting the training using the idea of transfer learning, and finally verifying it on the self-built indoor scene data set, the experiment proves that the method in this study is feasible and effective, and has good robustness.