Formative Assessment

Formative Assessment Definitions

‘Formative assessment’ is one of those educational terms that is almost as widely interpreted as it is used, yet is something that ‘everyone knows’ improves learning (Hargreaves, 2005; Lau, 2015; Shepard, 2005). Originating in 1960’s curriculum studies as formative ‘evaluation’ (Striven, 1967), it was first defined, in contrast to the more common summative evaluation tests, by Bloom and colleagues (1971). Whereas summative evaluations (which may be characterised as ‘assessment of learning’) are usually designed for the purpose of grading students or for evaluating a curriculum, formative evaluations (‘assessment for learning’) were described as “another type of evaluation which all who are involved—student, teacher, curriculum maker—would welcome because they find it so useful in helping them improve what they wish to do” (Bloom et al., 1971, p. 117).

Subsequently and inevitably, across the thousands of published academic papers and many books (cf. Andrade and Cizek, 2009) that include ‘formative assessment’ in their title, formative assessment has been defined in multiple ways (Dorn, 2010).

However, the definition given by Black and Wiliam in their seminal literature review (1998a), summarized in the well-known booklet ‘Inside the Black Box’ written especially for teachers and policy makers (Black and Wiliam, 1998b), is one of the most widely cited. Formative assessment, they write, encompasses “all those activities undertaken by teachers, and/or by their students, which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged” (1998a, pp. 7–8). Put another way, “assessment becomes ‘formative assessment’ when the evidence is actually used to adapt the teaching work to meet learning needs” (Black et al., 2004, p. 10).

For their analysis, Black and Wiliam examined two previous reviews (Crooks, 1988; Natriello, 1987) and 250 academic papers on subjects as diverse as classroom practices, assessment practices, student motivation, learning theory, questioning, and feedback – a scope necessary because before that time very few studies used the term formative assessment. Their core conclusion (1998a, 1998b) was that improving formative assessment in the classroom leads to greater improvements in learning than do other typical educational interventions, with effect sizes (“effect size is the most important tool in reporting and interpreting effectiveness” Higgins et al., 2013, p. 6) ranging from 0.40 to 0.70: an effect size of 0.40 “should be used as the benchmark to judge effects in education.... [effect sizes above 0.40] are worth having” (Hattie, 2008, p. 16). However, while these effect sizes are widely cited as justification for implementing formative assessment practices (cf. Boston, 2002), Black and Wiliam’s own later studies (Wiliam et al., 2004) showed a smaller effect size, 0.32, while a recent meta-analysis (Kingston and Nash, 2011, 2012; contested by Briggs et al., 2012) found only a weighted mean effect size of 0.20. The magnitude of these effect sizes is, however, not as important as the relative scarcity of studies that might be included in any meta-analysis (Kingston and Nash, 2011) because, it has been argued, most published studies “lack the statistical reliability expected of assessment practices” (Clark, 2011, p. 165), which makes it difficult to draw robust conclusions about the general efficacy of formative assessment.

In short, while there are reams of qualitative, correlational or small-scale studies that support using formative assessment in classrooms (cf. Bell and Cowie, 2001; Dorn, 2010; Herman et al., 2006), such that most educators ‘know’ that formative assessment ‘improves learning’ (“assessment which is explicitly designed to promote learning is the single most powerful tool we have for both raising standards and empowering lifelong learners”, Assessment Reform Group, 1999, p. 2), there is surprisingly little quantifiable supporting evidence of the type required by many researchers and policy makers (Dunn and Mulvenon, 2009; Higgins et al., 2013; Kingston and Nash, 2011).

References

Andrade, Heidi L. "Students as the definitive source of formative assessment." Handbook of formative assessment (2010): 90-105.

Bell, Beverley, and Bronwen Cowie. "The characteristics of formative assessment in science education." Science education 85.5 (2001): 536-553.

Black, Paul, and Dylan Wiliam. "Assessment and classroom learning." Assessment in Education: principles, policy & practice 5.1 (1998): 7-74.

Bloom, Benjamin S. "Handbook on formative and summative evaluation of student learning." (1971).

Clark, Alison, and Peter Moss. Listening to young children: The mosaic approach. Jessica Kingsley Publishers, 2011.

Crooks, Terence J. "The impact of classroom evaluation practices on students." Review of educational research 58.4 (1988): 438-481.

Dorn, Sherman. "The political dilemmas of formative assessment." Exceptional Children 76.3 (2010): 325-337.

Dunn, Karee E., and Sean W. Mulvenon. "A critical review of research on formative assessment: The limited scientific evidence of the impact of formative assessment in education." Practical Assessment, Research & Evaluation 14.7 (2009): 1-11.

Hargreaves, Eleanore. "Assessment for learning? Thinking outside the (black) box." Cambridge Journal of Education 35.2 (2005): 213-224.

Hattie, John. Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge, 2008.

Herman, Joan L., et al. "The Nature and Impact of Teachers' Formative Assessment Practices. CSE Technical Report 703." National Center for Research on Evaluation, Standards, and Student Testing (CRESST) (2006).

Kingston, Neal, and Brooke Nash. "Formative assessment: A meta‐analysis and a call for research." Educational measurement: Issues and practice 30.4 (2011): 28-37.

Kruglanski, Arie W., and E. Tory Higgins, eds. Social psychology: Handbook of basic principles. Guilford Publications, 2013.

Natriello, Gary. "The impact of evaluation processes on students." Educational Psychologist 22.2 (1987): 155-175.

Scriven, M. (1967) `The Methodology of Evaluation', in R. Tyler, R. Gagne & M. Scriven Perspectives on Curriculum Evaluation, AERA Monograph Series - Curriculum Evaluation. Chicago: Rand McNally & Co. Google Scholar

Shepard, Lorrie A. "Formative assessment: Caveat emptor." ETS Invitational Conference, New York, NY. 2005.

How to Implement Formative Assessment

While there is no definitive manual of formative assessment, the Assessment Reform Group (1999) provides a useful digest of the key features: formative assessment, they summarise, “is embedded in a view of teaching and learning of which it is an essential part; it involves sharing learning goals with pupils; it aims to help pupils to know and to recognise the standards they are aiming for; it involves pupils in self-assessment; it provides feedback which leads to pupils recognising their next steps and how to take them; it is underpinned by confidence that every student can improve; [and] it involves both teacher and pupils reviewing and reflecting on assessment data” (Assessment Reform Group, 1999, p. 7). However, despite broad agreement on these features, and no shortage of authors putting forward ‘principles’ for successful classroom implementation (cf. Clarke, 2014; Furtak, 2009; Keeley, 2008), the realisation of a formative assessment approach in a typical classroom is not straightforward: “there is no one simple way to improve formative assessment” (Black and Wiliam, 1998b, p. 8) and “no prescribed model of effective classroom action” (Wiliam et al., 2004, p. 51). Nevertheless, there is some agreement about the key areas of classroom practice to which a formative assessment approach might contribute positively, core examples being: questioning, feedback, peer- and self-assessment, and the formative use of summative tests (ibid.).

Questioning

There is much research evidence that typical classroom dialogue, including the use of questions, is far from ideal: “many teachers do not plan and conduct classroom dialogue in ways that might help students to learn” (Black et al., 2004, p. 11) and “at its worst, classroom talk does the opposite of what one might reasonably expect it to do: it disempowers the student” (Alexander, 2006, p. 5). A formative approach to questioning moves away from the typical ‘fact’ or ‘guess what’s in my head’ type of questions, posed too often in too many classrooms, often with too little wait time for the student to gather their thoughts and reply usefully, to refocus on open questions that aim to evoke discussion or promote collaborative activities between students: “asking simple questions, such as “Why do you think that?” or “How might you express that?” can become part of the interactive dynamic of the classroom and can provide an invaluable opportunity to extend students’ thinking through immediate feedback on their work” (Black et al., 2004, p. 13).

Feedback

Formative feedback has been extensively researched (see Shute, 2008), such that a summary is beyond the scope of this review. However, the key feature of formative feedback as advocated by Black and Wiliam is the complete replacement of numerical marks or grades with written comments that “identify what has been done well and what still needs improvement and give guidance on how to make that improvement” (Black et al., 2004, p. 14). The evidence is that numerical marks are inevitably perceived negatively, while combining marks and comments leads the student to ignore the comments and focus on the marks. Most importantly, “a numerical score or a grade does not tell students how to improve their work, so an opportunity to enhance their learning is lost” (ibid., p. 13).

Peer and self-assessment

‘Students can achieve a learning goal only if they understand that goal and can assess what they need to do to reach it. So self-assessment is essential to learning’ (ibid., p. 14). Self-assessment also contributes to self-regulated learning (Nicol and Macfarlane-Dick, 2006) and to working at a metacognitive level (Lajoie, 2008). In this context, Black and colleagues also introduce a very practical strategy, the use of ‘traffic lights’, suggesting that teachers encourage their students to identify their self-assessed level of understanding by marking their work green, amber or red (Black et al., 2004). Meanwhile, peer-assessment can be valuable for various reasons: students may be more willing to accept criticism of their work from one another, they are likely to express their comments in a language style and pitch that the recipient also uses, and they learn from having to consider critically an approach alternative to their own and from having to articulate their thoughts. A practical strategy suggested to support peer-assessment is ‘three stars and a wish’, where the peer reviewer has to identify and comment upon three things in the work that have been successful and one thing that could be improved (Bennett, 2011).

Formative use of summative tests

“From their earliest use it was clear that the terms ‘formative’ and summative’ applied not to the assessments themselves, but to the functions they served” (Black and Wiliam, 2003, p. 623). Given that it is likely classroom and high-stakes summative assessments are not going to go away, how can this assessment of learning be appropriated to support learning? One approach is to use the impending arrival of a summative test as a further reason for the students to undertake self-assessment, identifying (perhaps with ‘traffic lights’) those topics that are sufficiently understood and those that need further effort. Encouraging the students to generate and answer their own questions on topics to be covered by the test can also be especially useful.

Peer-marking of the finished tests can also support learning, especially if the students have themselves been involved in developing the marking rubric or if they use the ‘three stars and a wish’ approach. Peer marking also has the practical benefit of freeing the teacher from the chore of marking thirty scripts and, more importantly, of enabling them to spend more time exploring and discussing the questions in class, especially those that most students found especially challenging.

Formative Assessment and Technology

One of the earliest (and most cited) researchers to explore the efficacy of formative assessments is David Royce Sadler, whose work specifically involved instructional systems. Given that he was writing a quarter of a century ago, it is unsurprising that he thought technology might only contribute to the simplest of formative feedback: “it would be difficult if not impossible... to automate or develop a computer-based system for feedback or formative assessment, or for generating remedial moves and appropriate corrective procedures” (Royce Sadler, 1989, p. 139).

Nonetheless, more recently, the usual technological suspects have been researched for their potential to enhance formative assessment practices. These have included: e-learning and learning management systems (Wang, 2007); the Internet (Buchanan, 2001, 1998; Chen and Chen, 2009; Wang et al., 2006); mobile technologies (Hwang and Chang, 2011; Isabwe, 2012; Susono and Shimomura, 2006); blogs (Olofsson et al., 2011); classroom response systems (Beatty and Gerace, 2009; Feldman and Capobianco, 2008); and computer games (Broussard, 2014; Delacruz, 2011; Tsai et al., 2015). More general computer-based approaches (Bull et al., 2006; Jenkins, 2004; Lewis and Sewell, 2007; Peat and Franklin, 2002; Whitelock, 2007) and other less specific ‘technology-enhanced’ approaches to formative assessment (Landauer et al., 2009; Vendlinski et al., 2005) have also been proposed.

Inevitably, as is typical of research into learning and technology (Selwyn, 2013), these research outputs are almost entirely positive but their coverage is patchy and there is little useful consensus. The nearest to a review of the field is provided by Russell, in ‘The Handbook of Formative Assessment’ (Andrade and Cizek, 2010). Russell (2010) identifies four ‘promising’ ways in which computer-based technologies might be used to support formative assessment: (1) systematically monitoring student progress to inform instructional decisions; (2) identifying misconceptions that may interfere with student learning; (3) providing rapid feedback on student writing; and (4) collecting information about student learning needs during instruction.

The first of these involves students using handhold devices (such as tablets or mobile phones) to self-monitor their understanding and progress in class topics, thus providing the teacher with large amounts of individual data, perhaps by means of easy-to-digest graphics, that they can use to inform discussions about the student’s weaknesses and strengths and decisions about how best to support their learning needs. The second possible use requires somewhat more sophisticated technology: online diagnostic tests that in addition to providing a score automatically identifies what, if any, misconceptions the student holds about the topic in question. Students can then be directed to remedial learning activities. Russell’s third possible technology is more complex still: automatic essay marking (based on techniques such as Latent Semantic Analysis or Bayesian Essay Test Scoring), repurposed to provide students with almost instantaneous feedback on their writing. Such systems might provide information about the student’s use of English, the content of their writing, and the way in which they have structured their ideas, allowing them to rethink and revise their approach. The fourth technology identified by Russell, classroom response systems, are less technically sophisticated but are contingent upon how they are configured by the teacher. While they might enable teachers easily to identify patterns in individual responses, assess understanding and inform individualised teaching, they depend entirely on the quality of the questions and possible responses provided to the students.

Interestingly, of these four possible uses of technology to support formative assessment, in only one, automatic essay marking, is the student directly supported by the technology. The others are described as providing what might be called ‘in-direct’ formative assessment, by focussing on giving information to the teacher, to enable them to provide any necessary remedial support. However, this appears to be a limit of Russell’s approach, rather than anything specific to the technologies, each of which might be reconfigured to provide the students with direct opportunities to self-assess how they are progressing and how they might improve their own learning trajectory.

Key Points About Formative Assessment

Definition of Formative Assessment

  • Assessment becomes “formative assessment” when the evidence is used to adapt the teaching work to meet learning needs.
  • Quantitative evidence of formative assessment is weak owing to methodological challenges in implementing educational interventions.

Key features of formative assessment implementation

  • Sharing learning goals with pupils.
  • Help pupils to know and to recognise the standards they are aiming for.
  • Involves pupils in peer and self-assessment.
  • Provides feedback which leads to pupils recognising their next steps and how to take them.
  • Involves both teacher and pupils reviewing and reflecting on assessment data.
  • Given the lack of a universal model of how to undertake formative assessment, implementation needs to be flexible and adaptive to teachers’ needs.

Strategies for undertaking formative assessment around key areas of practice

  • Move away from questions on facts to open questions that invite discussion such as ‘why is this’, ‘how might you express this?’ where the teacher can provide immediate feedback on the student’s understanding.
  • Avoiding numerical scores and focusing on written comments that identify what has been done well and what still needs improvement and give guidance on how to make that improvement.
  • Embed peer and self-assessment strategies:
    • Students identify their self-assessed level of understanding by marking their work green, amber or red.
    • Peer-assessment is ‘three stars and a wish’, where the peer reviewer has to identify and comment upon three things in the work that have been successful and one thing that could be improved.
    • Students identify (perhaps with ‘traffic lights’) those topics that they have sufficiently understood and those that need further effort.
    • Students generate and answer their own questions on topics to be covered by the test (summative assessment).
    • Peer-marking of tests (summative assessment) where the students have themselves been involved in developing the marking rubric or if they use the ‘three stars and a wish’ approach.

Examples of how technology has been used for formative assessment

  • e-learning and learning management systems.
  • Mobile technologies.
  • Blogs.
  • Classroom response systems.
  • Computer games.
  • Students self-monitor their understanding and progress in class topics, thus providing the teacher with large amounts of individual data that they can use to inform discussions about the student’s weaknesses and strengths and decisions around these.
  • Online diagnostic tests that in addition to providing a score automatically identifies what, if any, misconceptions the student holds about the topic in question. Students can then be directed to remedial learning activities.
  • Automatic essay marking repurposed to provide students with almost instantaneous feedback on their writing.
  • Classroom response systems in which teachers identify patterns in individual responses, assess understanding and inform individualised teaching. These depend entirely on the quality of the questions and possible responses provided to the students.

Share this