A developmental scale for the North Carolina End‐of‐Grade Mathematics Tests was created using a subset of identical test forms administered to adjacent grade levels. Thurstone scaling and item response theory (IRT) techniques were employed to analyze the changes in grade distributions across these linked forms.Three variations of Thurstone scaling were examined, one based on Thurstone's 1925 procedure and two based on Thurstone's 1938 procedure. The IRT scaling was implemented using both Bi Main and Multilog .All methods indicated that average mathematics performance improved from Grade 3 to Grade 8, with similar results for the two IRT analyses and one version of Thurstone's 1938 method.The standard deviations of the IRT scales did not show a consistent pattern across grades, whereas those produced by Thurstone's 1925 procedure generally decreased; one version of the 1938 method exhibited slightly increasing variation with increasing grade level, while the other version displayed inconsistent trends.