Interpreting Test Scores: Placement Tests

C.Alexander MA (Applied Linguistics + TESOL), PGCert (TESOL), CTEFLA, LGSM, Doctor of Education Student (Bristol)

The aim of placement tests is to supply information which will assist in placing students at the stage of the teaching programme most appropriate to their abilities; they must be relatively quick and easy to administer, score and interpret. Formal interviews or other direct tests of oral ability (might be) are out of the question. Placement tests are used to assign students to classes at different levels and a good sub-test should identify weaker students (Wall et al. 1994, 327).

With regard to this paper no information will be given about the institution i.e. we will not know whether these placement tests suit the institution’s teaching programme or how the institution places students. Alderson (2001, 11-12) et al. maintain that in some institutions, students are placed according to their rank in the test so that e.g. the students with the top 8 scores might go into the top class, in other centres the students’ ability may need to be identified i.e. a student might be placed in the top reading class, but in the bottom writing class.

Information regarding the proportion of students assigned to inappropriate classes i.e. the number of misplacements, is vital when validating a placement test, though this will not be considered in this paper. Also, the proficiency levels of students entering may vary considerably from one term to the next; this is a problem for placement and program administration. Bachman (1991, 59) argues that an inflexible programme needs stable numbers of students enrolled per term, a norm referenced test would probably be best suited to this need. If, however the test could have cut-offs for placement that do not change from one term to the next, a criterion-referenced test would probably be best suited to this situation.


 

Interpreting Two Test Scores

Group A Placement Test 1

A (N=126) Placement Test 1

B (N=130) Placement Test 2

Test Scores

Table 1: Group A Placement Test 1

 

Max

Range

M

S.D.

Reading

Cloze

Text 1

Text 2

20

20

10

12-20

16-20

2-9

15

18

6

4.8

1.9

3.2

Listening

24

8-24

14

6.1

Writing 1

Writing 2

12

14

5-11

5-8

9

6

3.0

0.9

  1. What is meant by the columnar headings? ‘Max’ is the maximum number of points; ‘Range’ = range of score; ‘M’ = mean; ‘S.D’ = standard deviation.
  2. Which test is the easiest? Text 1: the mean is high and the standard deviation is small; this would suggest that scores were clustered quite close to each other. This sub-test was easy for all the students as the range =16- 20.
  3. Can you give a reason why the SD’s are different in the two writing tests? Possibly one of the tests was harder as it tested different skills. Which test would you choose? Writing 2 could be the more difficult; the mean is lower and the highest score was 8. The standard deviation was 0.9 which would mean that there is a narrow distribution of scores. It would be difficult to select a cut-off point as scores will probably be thickly clustered around the cut-off point and so identifying weaker students may be problematic. I would therefore choose test 1.

 

 

Group B Placement Test 2

 

 

Table 2: Group B Placement Test 2

 

Max

Range

M

S.D.

Reading

Cloze

Texts

20

10

 

6-19

2-10

14

6

 

6.1

5.2

Listening

Lecture

Mini task

20

12

6-18

6-10

13

7

6.1

2.5

Writing 1

Writing 2

28

10

8-26

3-9

19

7

9.0

3.0

 

 

  1. Which listening component should be retained and why? The lecture: Scores = 6-18 and S.D. = 6.1 a greater range of scores and a larger SD will assist placement decision; this would suggest that assigning students to classes at different levels will be easier. A lecture listening task could be more germane to undergraduate or postgraduate students.
  2. Why do you think there is such a large difference in the SD’s for the two writing components of this test? Writing 1: range = 8-26, mean = 19 (relatively, not as high as in writing 2). SD tends to vary in line with the mean, normalised SD’s are similar (to calculate the normalised SD use the formula SD/M). Clearly there are various abilities in this group i.e. there is significant deviation from the mean. The nature of the writing task might also be applicable ; appropriate scoring procedures for productive skills is problematic; inter/intra-rater reliability.
  3. If you had to choose one of the reading components, which would you choose? Cloze test 2. There are different abilities and a S.D. of 6.1; this test would help assign students to different class levels.

Table 3: Correlations of sub-tests in placement test 1.

 

 

R Cloze

Text 1

Text 2

R Cloze

-

-

-

Text 1

0.3

-

-

Text 2

0.7

0.4

-

Writing 1

0.1

0.14

0.2

Writing 2

0.2

0.23

0.18

 

  1. What is meant by correlation? Correlation is the extent to which two sets of results agree with each other; correlation coefficients can provide valuable empirical information for supporting or rejecting specific interpretations (i.e. amount of shared variance/co-variance) and uses of test scores (Bachman 1991, 259). Perfect positive correlation is + 1.0 and perfect negative correlation is –1.0. Strong negative correlations are unlikely to occur between results of two tests, but might be found, for example, between scores on a language test and some personality measures. Alderson states (2001, 184) ‘Since the reason for having different test components is that they all measure something different and therefore contribute to the overall picture of language ability attempted by the test, we should expect these correlations to be fairly low’.
  2. Which sub-tests show the highest and lowest correlations? R Cloze and Text 2 show the highest correlation; these tests may be testing the same skills. R Cloze and Writing 1 show the lowest correlation; these tests test different skills NB Bachman (1991, 260) ‘ it is impossible to make clear, unambiguous inferences regarding the influence of various factors on test scores on the basis of a single correlation between two tests’.
  3. Why can we omit either the Reading Cloze or the Reading Text 2? There is a high correlation between these tests; they may be testing the same skills.

Table 4: Correlations of certain sub-tests in placement test 1 and 2.

 

 

Test 1

Test 2

R Cloze

Listening

Lecture

Writing

Tabular. Information.

Writing

Lecture-Based

R Cloze

0.9

 

0.2

0.3

Listening

0.2

0.18

0.31

0.24

Writing 1

0.18

 

0.6

0.5

Writing 2

0.3

 

0.21

0.2

 

 

  1. The listening tasks in Test 1 do not correlate highly with listening to lecture task of Test 2. Why? These tasks test different skills; characteristics that affect listening test difficulty include those related to: information processing; what the test taker does with the information; how quickly a response is required.
  2. Writing 2 of Test 1 (writing based on topic of interest) correlates very poorly with both of the writing tasks on Test 2, why? Correlations : 0.21 (writing Tabular Information) and 0.2 (writing Lecture-based). There could be a number of reasons e.g.: (1) the tasks test different skills (academic versus topic-of-interest genres); (2) the scoring procedures for these productive-skill tasks may be inappropriate; (3) intra/inter rater reliability

The correlations of sub tests for placement Test 2 are not given. What might they be?

  1. Placement Test 2—Listening to Lecture and Writing Based on lecture? The construct that writing is harder than listening is relevant but the correlation may be high i.e. students who have difficulty in listening to a lecture may also have problems writing up a lecture NB the listening to lecture in test 2 was quite difficult M=13; S.D.= 6.1
  2. Placement Test 2-Reading Cloze and Reading of interrelated texts? I would expect the correlation to be fairly high as in table 3; both text 2 in table 1 and texts in table 2 have the same mean of 6.

 

Conclusions

In this paper I have given the reader an overview of how to interpret placement test scores. Validating a placement test would involve an enquiry, once courses were under way, into the proportion of students who were thought to be misplaced. It would then be a matter of comparing the number of misplacements and their effect on teaching and learning with the cost of developing and administering a test which would place students more accurately (i.e. predictive validity). In the case of placement tests, the proportion of students that is assigned to inappropriate classes would be the basis for assessing its validity.

I believe ‘predictive validity’ as a field of study will become increasingly important at tertiary level when Poland becomes a full member of the European Union.

 

Bibliography

Alderson, C.J., C. Clapham and D, Wall (2001) Language test construction and evaluation

Cambridge: Cambridge University Press

Bachman, L.F (1991) Fundamental consideration in language testing. Oxford: OUP

Wall, D., C.M. Clapham and J.C. Alderson. (1994) Evaluating a Placement Test. Language

Testing 11(3) : 321-343.