A critical review of Wigglesworth’s article on the influences on performance in task-based oral assessment.
C. Alexander Bristol University
1. Conceptual framework and assumptions/researcher’s aims
This research aims to examine how different task-types influence student L2 output in informal classroom-based assessments. The aspects of tasks analysed comprised: cognitive difficulty of task; whether the interlocutor was a native or non-native speaker; if planning time was made available; task familiarity and structure (i.e. amount of assisting information), NB these last two task variables were mentioned later on in the article. Wigglesworth discussing cognitive load/difficulty/demand (these words appear to be used synonymously) maintains (2001, 186) that there are data to suggest that cognitive load does influence performance, although not always negatively. With regard to the differences between non-native and native speaker interlocutors it is stated (2001, 187) that the interlocutor variable has a significant effect on L2 output, more so than task familiarity.
2. Design of study/analytic tools/research questions
Structure, cognitive load and familiarity of content are seen as features internal to the task; availability of planning time and whether an interlocutor is a NS/NNS are treated as external conditions. All the tasks were competency based assessment tasks which are routinely used for evaluating achievement in the Australian Adult Migrant Education Program. Five tasks at two levels were identified. Task types were relevant to the competences required at each level and tasks were developed from a collection of tasks sent in by teachers from three Australian states. Level one tasks assessed learners at the functional level of proficiency and level two tasks graded learners at a vocational level of proficiency. One specially developed task was used as a control task (NB it was thought to be universally familiar to learners and skill-specific) the other tasks were manipulated using the following variables: structure, familiarity of activity, NS vs NNS and planning time.
80 learners from different ESL centres at each level took part in the project. All 80 learners did a non-manipulated task (1) and then approximately 20 were randomly assigned to one of the remaining four tasks. The tasks were administered by trained and experienced teachers. Each learner was tape-recorded taking four level two and six level one tasks; the number of cassettes used would be dependent on the number of centres and the length of the cassettes rather than multiplying the number of students (80) by 10. Student feedback on task difficulty was elicited using a five-point Likert scale. It was stated that the interviewers were ‘familiar’ with the rating scales and the scales themselves were used in the assessment of English language proficiency of adult immigrants. Performances were randomly double-rated by assigning performances across 16 raters.
Three separate quantitative evaluations were made in order to note oral difficulty-level variations i.e. (1) an analyses of variance and t-tests on rater raw scores measured subject performance; (2) a Rasch analysis using the statistical modelling program FACETS (four facets were included: the candidate, task, rater and rating criteria measured task difficulty; (3) learner feedback enabled measurement of subjects’ evaluation of task difficulty.
3. Major findings/interpretation of findings
- It was found that where the interlocutor was a NNS the task appeared to be easier.
- Familiar activities appeared easier where planning time was not present. Further planning time appeared to adversely influence performance in both structured and unstructured tasks. Planning time seemed to increase difficulty in both structured and unstructured tasks.
- Structure appeared to make the task easier in three tasks i.e. 2/4/5. The exception was task 3.
- Familiarity results were problematic. A familiar activity was easier without planning time but where the task was unfamiliar, planning time had no effect either way.
Wigglesworth suggests (2001, 204) three possible reasons why/how NS interlocutors made the task more difficult: (1) learners may be more relaxed with NNS who in fact were also learners (i.e. different social status); (2) raters may compensate for a perceived disadvantage in having a NNS interlocutor; (3) learners may produce less complex language with NNS interlocutors. It was held that planning time encouraged learners to try to introduce more complex ideas/structures and that when this was translated into linguistic output a learner’s performance was adversely affected. In information elicitation tasks, structure appeared to be quite important. Whereas in more negotiated interaction where questions may be asked and answered, structure was not found to be as influential, though the role of the interlocutor was thought to be crucial here i.e. the interlocutor could provide structure and also dominate/assist.
4. The study.
- This study appears to assume that data derived from people who are not taking a test for real will be reliable/useful: I hold that the process of taking a test for real (e.g. for certification/entrance exams) may be very different from that of taking a test for the purposes of research. Did the participants perceive this to be a formal test? Even though the data are supposed to be relevant to informal classroom assessments, the method of assessment appears to be formal.
- The finding that the presence of a native speaker makes a task more difficult seems interesting though to my mind ‘questionable’. We did not know to what extent the NNS’s (other learners) knew each other or whether the raters knew the students (this could affect grading) NB it may be that the frequency and type of negotiation differs according to whether the interlocutor is familiar or unfamiliar to the test takers. There may be differences between NS’s e.g. Lazaraton (1996, 166) found that native speaker examiners provide candidates with 8 types of support and that the type of support is not consistent and so could impact on a candidates language use and on the rating. Wigglesworth did not explain why learners were chosen to be NNS interlocutors and on what grounds they were chosen (can a learner be a competent NNS interlocutor?). No definition of what constitutes a native speaker was provided. A lot of the research that has been undertaken concerns the way professional NNS’s and NS’s judge oral performances (Brown 1995, Ellis 1995, 63-67); in light of this research, I maintain that whether the raters were NNS’s or NS’s is relevant i.e. research suggests that there are significant differences (in harshness) in the way a NNS and NS assess different productive skills. In a way, the fact that the interlocutors were learners is an interesting slant on this research, however Wigglesworth should have made this clear on the first page (p186) of this article as opposed to page 204. The background of the NS may be pertinent i.e. Brown (1995, 7-8) found that NS’s with an industrial background were harsher than those with a teaching background.
- To what degree did physical characteristics, psychological factors and experiential characteristics (e.g. test preparedness) affect these results? To what degree were the following considered? : the amount of help available to the learner, learner factors such as confidence, motivation, cultural knowledge/awareness, linguistic knowledge.
- Who were the learners (age, sex, cultural background, language level?) For instance, were they actually adults or teenagers (the scales—familiar to raters—for assessment were those of the English language proficiency of adult immigrants)?
- The control task was not manipulated: what did ‘not manipulated’ mean? Was there no planning, structure, familiarity or interlocutor? Wigglesworth did not explain what an ‘unmanipulated’ task was. Furthermore it was stated (p194) that there were no significant differences for learners participating at either level in these baseline tasks (i.e. in the non manipulated tasks). I find this result surprising as it would suggest that the students were all at the same language level (which is unusual at any ESL centre), yet Wigglesworth did not mention that the learners were same-level learners or on what basis they were chosen; she only stated (p191-192) that they ‘were drawn from different ESL centres. An indication of the level of tasks was given on (p206) If they were drawn randomly how was it possible that they were all at the same linguistic level? I suppose, at least for the purposes of the research, it would be important to ‘ensure’ that there were no differences in the control-task data, as ‘differences’ at this stage in the research would raise questions about the usefulness/relevance of the rest the research data. I think presenting all the data for tasks one would have been of interest.
- There were two level types: functional level of proficiency (tasks 1-3) and vocational level of proficiency (tasks 4-5). Yet the use these terms were not defined or justified in detail and how tasks were level-grouped seemed debatable. For instance (P195) giving instructions about how to use a bank automatic teller machine was seen as functional, yet explaining to a 12-year-old child how to change a light bulb was seen as vocational (and not functional).
- A definition was provided for the variable ‘familiarity of activity’ yet I wondered how in practice the familiarity variable was actually operationalised.
- No information on intra/inter rater reliability was given (probably because there were 16 examiners) yet (p193) four facets were used in the FACET analysis : the candidate, task, rater, and rating criteria. It was not clear what aspects of these facets were used e.g. what aspect of the candidate was used? There are various aspects of a speaking test setting that should be considered e.g. raters may react differently to candidates, gender, time of day, or the physical setting; each of these aspects of the setting can be called a facet and one can seek information about the effect of any of these facets. Once this information is found it can be included into an estimate of the ability of the candidate i.e. adjusting the raw scores using the computer program FACET.
- The ‘structure’ variable was mentioned quite late in this article without much justification.
- With regard to the five-point Likert scale for student feedback, even though (p206) some categories had been ‘collapsed’ with others, I felt the ‘easy’ and ‘OK.’ categories presented in the data analyses were vague; what is the semantic difference between ‘easy’ and ‘OK’ and would a learner not interpret these words a slightly synonymous?
- Wigglesworth states (p194) ‘the higher the score, the easier the task is likely to be’. Yet in order to ensure that the polarity of the three tables was in the same direction the average raw scores for each task were subtracted from 28; this I felt made interpretation of the bar charts initially confusing as it was actually the lowest scores presented in the bar-charts that were considered the easiest.
- With regard to task type one (p 195) it was stated that the more familiar task without planning was the easiest i.e. task 3, yet in my opinion student feedback indicated that task 2 (familiar + planning) was the easiest.
- The ‘20’ students taking tasks 2-5 were ‘apparently’ chosen at random; I am curious how students were actually selected at random and why they had to be selected at random if there were no ‘significant’ differences in task 1 (mentioned earlier).
- In task type two the word ‘complex’ is used, yet in figure 9.3 ‘familiar is used; I find this confusing.
- In task type three the student feedback partly contradicts (i.e. more students find this easy) Wigglesworth’s finding regarding the NNS interlocutor making the task easier.
The number of research variables and subsequent analyses of the findings is impressive; though in my opinion the research is over ambitious. Reducing the number of variables to one, or two at the very most, would have made this research a lot easier carry out (as in Wigglesworth 1997). The way variables were grouped seemed arbitrary e.g. planning and familiarity but no planning and NS/NNS combination. Wigglesworth did not explain the logic behind the way task variables were analysed NB there are significantly more than five permutations using these task variables.
I do not think the findings of this study should be generalised to other situations and so they would not be relevant to my professional situation. I do however agree with Wigglesworth (p 206) that in oral assessments, close attention needs to be paid not only to possible task variables but also to the role of interlocutor.
Brown, A. (1995) The effect of rater variables in the development of an occupation-specific language performance test. Language Testing 12/1: 1-15
Ellis, R. (1995) The study of Second Language Acquisition. Oxford: OUP
Lazaraton, Anne. (1996) Interlocutor support in oral proficiency interviews: the case of CASE. Language Testing 13/2: 151-172
Wigglesworth, G. (2001) Influences on performance in task-based oral assessments, in Bygate, M et al. (eds) .......? (not given full source)
Wigglesworth, G. 2001: Influences on performance in task-based oral assessments.In Bygate, M., Skehan, P. & M. Swain: Task based learning. Addison Wesley Longman
Wigglesworth, G. (1997) An investigation of planning time and proficiency level on oral test discourse. Language Testing, 14: 85-106