The Speech Intelligibility Rating (SIR) scale was designed to classify children's global speech production according to one of five hierarchical categories. Individuals rating the scale need to judge which category is appropriate, giving the possibility that different raters could use different scores in their assessments. The aim of this study was to determine the inter-rater reliability of the scale, i.e. whether judges agree as to the category membership for the assessed behaviour. Inexperienced ('naive') judges rated videotape excerpts of implanted children in communication with a known adult according to the scale. Each judge rated four children sequentially from one of four videos: there were two videos of different sets of children, each recorded with two different orders. Both subsets of subjects ranked children in the same order of intelligibility (Kendall's W = 0.86 and W = 0.98 respectively, both p < 0.01). There was agreement on the category membership for each child (Kappa's K = 0.45 and K = 0.68 respectively, both p < 0.001). The intra-class correlation coefficient showed that agreement between raters was high (ICC(2,1) values = 0.80 and 0.81, both p < 0.001), and that ratings were consistent (ICC(3,1) values = 0.82 and 0.97, both p < 0.001). This study indicated that the scale demonstrates good inter-rater reliability. It can be used confidently by cochlear implant teams to monitor the progress of implanted children's speech intelligibility.