



This example show with * where the criteria matches the APP, with strand 5.
Students have recently completed a Scientific enquiry, designed around some of these( 8 of the
24 skills). This was an open ended Science Enquiry, requiring students to do a lot of thinking for themselves or at least make some decisions after being guidded through the thinking.At the end of this task they self assessed with a teacher discussion and guidance to grade their work/thinking during the enquiry. Many were able to justify the levels they awarded themselves.
This is what they tell me about
1. How accurate are they at levelling their own work?
2. Is there any difference between ability groups in this?
The following based upon a random sample of 38 students work, producing 200 self assessed levels over 8 of the APP style criteria.
How the data works. Each SAT level is divided into 3 parts ( for example 5c, 5b 5a leading to 6c,6b,6a). So a score of 3 is one complete SAT level. All levels are compared against end of year reported teacher derived SAT levels.
Despite students only awarding themselves the correct SAT level ( level 5, 6 etc) 26% of the time the students were overall fairly close to their reported SAT levels. From the 200 levels derived by this method 37% were the same as the teacher levels. They underestimated them by a factor of 2.02, meaning that they were around 2 division of a SAT level, for example they said a 5c and the reported level was 5a.(this would be a scoreof 2, as would a 5a and a 6b).The spread of the student self assessed levels is 2.5 so that all students are on average within one level of their reported level, backing up the previous measure. So, it is okay to trust the data produced by students.
This underestimation is not a concern and I am tempted to consider it of value, suggesting that the students have thought about where they have rated themselves. Pleasingly it also suggests the levels within the ladders are of some accuracy ,contain real challenge for the students and in some way are a rigorous form of assessment. Consider a set of data that completely matched their teacher assessed levels, would you firstly trust it? I wouldn't. It would also be impossible to identify areas to work upon to improve.
Inevitably some of the Skills identified will be more difficult to do than others, especially when the students have never previously been asked to think like them. So, I think the data asks a lot of useful questions about how to go about developing these skills and their genuine importance in the learning of Science. I am hopeful that these level ladders will help, as students have identified themselves across the board, on each of the 8 skills in this sample.
A few interesting pointers have also come to light when looking at how the consistency of the students across the ability range. Although the same size is slight homogenous, and therefore prone to skewing effects, there is some genuine food for thought.
Firstly that the level 5 students correctly identify their SAT level 40% of the time and the levels 3 and 4 students 33%. Compared to only 21% of the level 6 students. ( No difference was seen with the top end level 6 compared to the low end). Why is this? Could it be due to the more able students being more reflective about their learning? Could it be down to these students understanding the criteria better? The data suggest yes, the students who most undersestimate their grade is the level 6 students, by a factor of 2.65, gladly still within one level. The level 5 students underestimate by a around half a level (1.8) while the level 3 and 4 overestimate their ability by a small amount (0.33) or for example from 3a to 4c. Another possible explanation is that this form of assessment may actually be testing genuine student ability. Its test what a student can do not what they can remember or have the ability to write down. I hope so. Oh dear I'm beginning to defend the APP!
The big thing I'll take from this is that the first draft of my version of the Science APP is just that a draft. Some of it will need rewriting to make it more accessible for lower ability students and clearer in making it explicit between the levels. But, it does seem to be close and the evidence suggests its a useful thing to have in the classroom.