Understanding the extent to which humans perceive the emotional state of animals has both theoretical and practical implications. While recent studies indicate that natural selection has led to some convergence of emotion coding among vertebrate species (including humans), highlighting the interspecific value of emotional signals, it has also been argued that interspecific communication of emotions can fail due to species-specific signaling traits impairing information decoding, and/or absence of familiarity with heterospecific communication systems. Here we show that human listeners pay attention to the mean pitch of vocalizations when asked to rate the distress level expressed by human baby cries, and that they use a similar pitch scale to rate the emotional level of baby non-human ape (bonobo and chimpanzee) distress calls. As a consequence the very high-pitched bonobo infant calls were systematically rated as expressing overall high distress levels despite being recorded in contexts eliciting various stress intensity. Conversely, chimpanzee infant calls -which are characterized by a relatively lower-pitch- were systematically rated as expressing relatively lower distress levels. These results indicate that, in the absence of exposure/familiarity, our spontaneous ability to range the emotional content of vocalizations in closely related ape species remains biased by basic frequency differences, suggesting that the absolute interspecific value of emotional signals should not be over-estimated.