The emergence of joint attention is still a matter of vigorous debate. It involves diverse hypotheses ranging from innate modules dedicated to intention reading to more neuro-constructivist approaches. The aim of this study was to assess whether 12-month-old infants are able to recognize a “joint attention” situation when observing such a social interaction. Using a violation-of-expectation paradigm, we habituated infants to a “joint attention” video and then compared their looking time durations between “divergent attention” videos and “joint attention” ones using a 2 (familiar or novel perceptual component) × 2 (familiar or novel conceptual component) factorial design. These results were enriched with measures of pupil dilation, which are considered to be reliable measures of cognitive load. Infants looked longer at test events that involved novel speaker and divergent attention but no changes in infants’ pupil dilation were observed in any conditions. Although looking time data suggest that infants may appreciate discrepancies from expectations related to joint attention behavior, in the absence of clear evidence from pupillometry, the results show no demonstration of understanding of joint attention, even at a tacit level. Our results suggest that infants may be sensitive to relevant perceptual variables in joint attention situations, which would help scaffold social cognitive development. This study supports a gradual, learning interpretation of how infants come to recognize, understand, and participate in joint attention.