Previous research has suggested that the ability to recognize vocal portrayals of socio-emotional expressions improves with age throughout childhood and adolescence. The current study examined whether stimulus-level factors (i.e., the age of the speaker and the type of expression being conveyed) interacted with listeners’ developmental stage to predict listeners’ recognition accuracy. We assessed mid-adolescent (n = 50, aged 13–15 years) and adult (n = 87, 18–30 years) listeners’ ability to recognize basic emotions and social expressions in the voices of both adult and youth actors. Adults’ emotional prosody was better recognized than that of youth, and adult listeners were more accurate overall than were mid-adolescents. Interaction effects revealed that youths’ accuracy was equivalent to adult listeners’ when hearing adult portrayals of anger, disgust, friendliness, happiness, and meanness, and youth portrayals of disgust, happiness, and meanness. Our findings highlight the importance of speaker characteristics and type of expression on listeners’ ability to recognize vocal cues of emotion and social intent.