On Classroom Observations

As STEM education matures, the field will profit from tools that support teacher growth and that support rich instruction. A central design issue concerns domain specificity. Can generic classroom observation tools suffice, or will the field need tools tailored to STEM content and processes? If the latter, how much will specifics matter? This article begins by proposing desiderata for frameworks and rubrics used for observations of classroom practice. It then addresses questions of domain specificity by focusing on the similarities, differences, and affordances of three observational frameworks widely used in mathematics classrooms: Framework for Teaching, Mathematical Quality of Instruction, and Teaching for Robust Understanding. It describes the ways that each framework assesses selected instances of mathematics instruction, documenting the ways in which the three frameworks agree and differ. Specifically, these widely used frameworks disagree on what counts as high quality instruction: questions of whether a framework valorizes orderly classrooms or the messiness that often accompanies inquiry, and which aspects of disciplinary thinking are credited, are consequential. This observation has significant implications for tool choice, given that these and other observation tools are widely used for professional development and for teacher evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Rent this article via DeepDyve

Similar content being viewed by others

Classroom observation frameworks for studying instructional quality: looking back and looking forward

Article 26 May 2018

Using the UTeach Observation Protocol (UTOP) to understand the quality of mathematics instruction

Article 12 March 2018

Possible biases in observation systems when applied across contexts: conceptualizing, operationalizing, and sequencing instructional quality

Article Open access 02 July 2022

Explore related subjects

Notes

To be sure, teacher educators and professional developers hold tacit theories of proficiency, which shape their emphases in teacher preparation and professional development. The question is the degree to which such ideas are explicit, grounded in the literature, and empirically assessed.

The authoring team consists of members of the TRU team. We have done our best to provide enough evidence to allow readers to come to their own judgments about possible issues of bias.

Matters of pedagogy and content are intertwined. For example, a “demonstrate and practice” form of pedagogy may inhibit certain kinds of inquiry that are highly valued in STEM. Thus a rubric that assigns high value to such pedagogy may downgrade classrooms in which there are somewhat unstructured exploratory investigations. The question is how much disciplinary scores matter in assigning the overall score to an episode of instruction, and whether a more fine-grained examination of disciplinary practices reveals things not reflected in a general rubric.

The authors will gladly send interested readers our analysis of Video B, which is written up in detail comparable to the write-ups for Videos A and C.

In accord with our permission to examine the videos from the MET database, we have done everything we can to honor the confidentiality of the research process and to remove possible identifiers of the individuals, cities and schools involved.

The other video with comparably high FfT scores fared similarly less well on the MQI scale, so our choice for exposition does not represent an anomalous example.

These represent “2x + 3 = − 5” (Fig. 1) and “- 5 = 3x - 2” (Fig. 2) respectively, although the equations are not yet written. They will appear later in the lesson.

There are also dark rectangles representing (− x). They cancel out the light rectangles.

We did catch some slips on the part of the teacher, though not enough for us to downgrade the lesson that much. We note that of the 11 videos in our sample of “very high or very low” scores, 9 received MQI scores of 1 for errors and imprecisions, so the MQI scoring on that component of the rubric was very stringent.

This is parallel to the issue of assessing student understanding. A test of STEM content can focus on things that are superficial, or on real sense-making. The same is the case for the assessment of classroom environments.

References

Acknowledgments

The authors gratefully acknowledge support for this work from The Algebra Teaching Study (NSF Grant DRL-0909815 to PI Alan Schoenfeld, U.C. Berkeley, and NSF Grant DRL-0909851 to PI Robert Floden, Michigan State University), and of The Mathematics Assessment Project (Bill and Melinda Gates Foundation Grants OPP53342 PIs Alan Schoenfeld, U. C Berkeley, and Hugh Burkhardt and Malcolm Swan, The University of Nottingham). They are grateful for the ongoing collaborations and support from members of the Algebra Teaching Study and Mathematics Assessment Project teams.

Author information

Authors and Affiliations

  1. Education, EMST, M.C. 1670, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA, 94720-1670, USA Alan H. Schoenfeld, Fady El Chidiac, Dennis Gillingham, Heather Fink, Alyssa Sayavedra, Anna Weltman & Anna Zarkh
  2. College of Education, Erickson Hall, Michigan State University, East Lansing, MI, 48824-1034, USA Robert Floden
  3. Northwestern University School of Education & Social Policy, Walter Annenberg Hall, Northwestern University, 2120 Campus Drive, Evanston, IL, 60208, USA Sihua Hu
  1. Alan H. Schoenfeld