Building a Better Student Evaluation

By drorzel on January 14, 2009.

If you've been a student or faculty member at an American college or university in the past twenty years or so, you've almost certainly run across student course evaluation surveys. They're different in detail, but the key idea is always the same: toward the end of the term, students in every course are asked to fill out a questionnaire, usually a bubble sheet, assigning numerical values to various aspects of the course and the professor's teaching. Most schools also provide some option for free-form written comments as well.

These course surveys, particularly the numerical scores, figure highly in evaluations of faculty for things like merit pay, tenure, and promotion. And yet, almost everybody in academia agrees that they're highly flawed, easily gamed, and totally inadequate to the real task of evaluating faculty performance.

A passing mention of course evaluations in yesterday's links dump (see also Matt's front-line reporting) prompted some suggestions of alternative ways of doing course evaluations. So I thought I'd throw this out there more explicitly:

Suggest some practical ways of improving the standard student course evaluation process.

The constraints are that you have to provide some legitimate avenue for student feedback about the quality of the class, that the scheme has to be legal and ethical, and it has to be something that is not orders of magnitude more difficult or expensive than the current bubble-sheet systems (we do 20+ interviews of randomly selected students for tenure and promotion reviews, which is undoubtedly more accurate, but not remotely feasible for regular evaluations).

Suggestions can be minor tweaks (I personally favor the figure skating system, where you throw out the highest and lowest scores before calculating the average; other people do mid-term evaluations, and use them to adjust the course on the fly), or major overhauls (scrap the whole thing, and base promotion reviews on RateMyProfessors chili-pepper ratings). I'd love to hear some new ideas about how to make the process work better.

More like this

My department head says that the whole evaluation thing could be replaced with one question:

"Do you like this instructor?"

Other than that, it really depends on what the evaluations are for. If they are used for tenure decisions and stuff like that, not sure what can be done. I suggest just observing those faculty's classes.

The other use for evaluations is so that a faculty member can use the information to improve his/her courses. The form of most evaluations is not very good for this use. There are better mid-course methods for doing this kind of thing (more course specific questions with immediate feedback).

If you're using evaluations to improve your teaching, end of course evaluations are clearly too late, and I'd argue mid term are too late too. You can do a casual feedback exercise in class (it only takes a couple of minutes) and get feedback every two or three weeks. If you respond to this feedback, you'll notice that your official feedback will improve a lot - not just because you improved, but because the students had other opportunities to vent their frustrations with the course.

A lot of the variance in ratings is known to be due to the variance in students' abilities and grades. If the class is hard for you, you'll rate the professor lower. It's probably practically impossible, but correlating ratings with students' aptitude, as measured before the class started, would be very useful. Heck, even just correlating the ratings with the answer to the question "what was your grade in the last class you took in this department?" would be very interesting.

I think that there are two evaluations of teaching â not one. Student evaluation of the instructor and the course materials is, of course important for what it is. But more important is the effectiveness of the evaluated instructor in getting the material over to the student. This more important evaluation cannot be self-evaluated by the recipient, the student. One solution is to assign to the evaluated instructor (for purposes of reappointment, promotion, tenure, constructive year-end criticism and such) the following statistic. Using the institutionâs grading computer system, follow each student of the evaluated instructor, particularly in core and major courses, in future courses which have the instructorâs course(s) as a prerequisite. In subsequent course A, calculate the numerical average of the term grades assigned by course Aâs instructor. For each student being followed, calculate the delta statistic equal to the followed studentâs term grade minus the average grade in course A. Average all the followed studentsâ indices in course A and assign that average to the evaluated instructor. The range of such important average deltas (positive or negative) will give the appropriate dean or committee, important quantities indicating the teaching effectiveness of the evaluated instructor.

If memory serves, the forms that Union uses for the evaluations are quite terrible. It starts off with the standard survey-type questions using a scale of 1-5 (noted as strongly disagree and strongly agree) and then asks questions that are poorly phrased; e.g. "This professor is very effective." and "How much has your knowledge of the subject improved." Logically, these blow the usefulness of the scale and make it hard to get an accurate read of the student (e.g. "I strongly disagree that the professor is *very* effective, but I know that a score of 1 would be read as me thinking that they are completely *in*effective.")

I frequently skipped the survey questions and went straight for the short-answer on the back, but I'm not so sure that the answers I wrote would carry the same weight as the overall scores of the survey section.

A simple 3 question sheet would be more effective, I think:
1. Do you feel this professor teaches effectively.
2. What aspects of this professor's teaching would you encourage or discourage?
3. Any other thoughts?

This is an interesting question, we are going through a university-wide review of the system, so there is a lot to say, I might write something about that later. Let me just say that this is a part of a larger issue, which is how to evaluate teaching. In this context one has to wonder what the student evaluations are for. Clearly the students are not in a position to answer some of the questions they are asked, which is one reason for the randomness. This could make sense only as a (small) part of evaluation of teaching.

If we distance ourselves from the evaluation part, and just concentrate on the feedback part, we can ask students which parts of the course worked for them, and which part needs changing, and how. We now have an online evaluation forms, and the course instructor is invited to provide course-specific questions. In addition we are testing a system of feedback forms to be handed to the students during the semester, so their feedback has some effect on the current course. I think any universal set of question will almost always miss the point.

I'm re-submitting something that seems to have timed out on first try.

I programmed a (on hardcopy) 3-D graphics output of student evaluation in FORTRAN at University of Massachusetts in 1973, which they continued to use for a very long time as legacy software.

All the bells and whistles are pointless if the department Chairman and Deans don't use them. I had the highest evaluation semester after semester in one private university class that I taught. When my contract was non-renewed after 5 semesters, I had a discussion with the Chair, with my advocate present (the then Vice-Chair of the Faculty Senate, now Chair). It seems that the Chairman had literally not known (as he admitted) that I'd been teaching this specific course, although he was vaguely aware of other courses that I'd been teaching.

He said it was out of his hands. It was up to the Dean of the College of Arts & Sciences. Said dean later appointed her spouse to teach the course.

"Those who cast the votes decide nothing. Those who count the votes decide everything." -- Communist Tyrant and mass murderer Josef Stalin (attributed).

Posted by: Jonathan Vos Post | January 14, 2009 12:37 PM

And yet, almost everybody in academia agrees that they're highly flawed, easily gamed, and totally inadequate to the real task of evaluating faculty performance.

This conclusion is based on a huge number of scientific studies.

Let's think about this for a minute. If we know that something is uselessâeven counter-productiveâthen what's the rationale for continuing it?

Any department that relies on student evaluations for tenure and promotion decisions is doing a huge disservice to its faculty. As it turns out, there probably aren't many such departments in research universities.

I think student evaluations should be banned in all classes taught in the first two years of university. By the time they reach the third and fourth years they are in a position to make reasonable responses to the standard questions.

The only acceptable alternative is the one suggested in #1. That's to replace the existing questions with the only real question that's being answered; "Do you like your Professor?"

Harlan, you bring up a very good point. In fact, something similar to this was already being done at my school. There was a question that asked "What grade do you expect to get in this course?"

Here is a manual trackback. Executive summary: scrap the whole evaluation thing, and build a system aimed at providing student feedback instead.

An experience I had that ties in to this and one of your other threads: when I took undergrad QM, the class (and, hence, the professor) was almost universally disliked by the students. The reason, as they would explain it: "The class is too mathematical; we never learn any physics." Those few of us who had a solid background in linear algebra and differential equations didn't see it that way, and liked the class just fine. But as for most of the class, they were woefully underprepared, spent the whole time scrambling to make sense of the mathematics, and concluded that the professor was awful at getting the physics across.

In short, I'm afraid that to some extent getting accurate student evaluations would also require overhauling the entire curriculum.

What are they for: Student evaluations came out of the '60s turmoil. Students felt they and their concerns were ignored by administration (quite true then). So they were given visible feedback on their instructors. The results are often available to students, to use in selecting/avoiding instructors.

Do they measure effective teaching? No.

Do they provide students with a sense of what other students experienced in this or that professor's classes? Yes.

The problem we have is that this sop to students' concerns has been perverted into a tool to measure teaching. The question, "How do we improve it?" is fundamentally misguided. If we want to measure teaching effectiveness (if it's even possible to do that), we need to start from scratch: first define what we mean by that and then work on ways to measure it. But we can't start with the student evaluation: "If I wanted to get there, I wouldn't start from here."

"As it turns out, there probably aren't many such departments in research universities." Even if these universities had a perfect way to evaluate teaching effectiveness, they wouldn't use it much in tenure/promotion decisions. Because teaching doesn't matter to them. So that's not evidence that teaching evaluations are useless (although there may well be some "scientific studies" supporting that; feel free to cite?).

To my mind, the most important thing that can be easily changed about course evaluations is that they are collected at multiple points. It's much better for helping students feel listened if you have that early data point.
(as far as format- I think Danny's proposal is pretty good; but I always liked the short-answer type questions much better than inane bubbles)

@Bill Watson- so professors who recommend that their students take fewer classes at a time in future semesters will be rewarded?
Basically there are many techniques, ranging from legitimate study skills to true "gaming the system" instruction that could affect how well students did in subsequent courses without having anything to do with subject-material teaching efficacy.
I get what you're saying, but any single measure of student achievement is just as fraught with problems as any single measure of teaching achievement. To say nothing of the many complicating variables which alter how the former is related to the later. Not that it's bad to try...

I'm one of the students on a committee that provides student input for retention/tenure decisions for a deparment at a research university (posted anonymously for obvious reasons). The way it basically works here is that the student committee (there is one for graduates and undergraduates) provides input to the department during the decision process, except if the students decide not to recommend there is also a seperate university committee that looks at why the students voted no (and they actually take this pretty seriously here), so compared to alot of places my university is fairly liberal (although its not exactly a liberal university by any standard). This actually works pretty well; the people on the commitee are mostly upper level undergrads, so by then we have a good idea that everyone in gened classes classes tend to complain. On the other hand, we look mostly for trends -- significant overwhelming positive or negative reviews with backing evidence or numerical scores that are several deviations from the norm. Normally we end up saying that someone is a good teacher and make a few recommendations on points they might consider improving. Very rarely there is a serious issue that we write up. This seems to work pretty well, because it does a good job filtering all the noise from the student comments. Furthermore if we have questions we can pretty easily go talk to people (who we know, because we are students) who have been in the classes to ask followup questions that might not be so easy for faculty to ask students. We get training every year on how the tenure process works, so we are keenly aware of the consequences for what we are doing; everyone takes it incredibley seriously. So I'm not convinced that student evaluations have to be worthless (although I agree with the people who think these numerical questions are horrible).

Twenty years ago?

I was peripherally involved in discussions thirty years ago about revising a system that had been in place for at least a decade. I think it started during the campus strife of the late 60s.

Yes, it measures and mis-measures many things. There is a fine line between the students learning more because they like the instructor and engage in the class, and students liking the professor because they learn very little.

One thing that has proven eye opening is that there are quite a few students at my CC who actively complain about the second type. They are here for an education, often spending their own money, not to get free tickets to a football game. I was stunned when I overheard a discussion where someone was telling another student: Don't take Prof SoAndSo. You won't learn enough to do well in Next Class.

I never finished university, but as a student I hated evaluations. Invariably half or more of the questions would be maddeningly vague, non-questions, or not make any sense at all. I'd fill up half the short answer sections complaining about just a few of the worst questions. Then I'd try to think up something that would be both useful to the prof and yet close enough to their biases they'd take it seriously. I had a hard time doing that without prior preparation, so sometimes my comments on the prof were not terribly useful.

I was also greatly bothered by the fact that as far as I knew, no-one was testing the profs in any objective fashion (such as Bill Watson in #4 suggests). Nor, as far as I could determine, was there any effort to measure whether the evaluations themselves were of any use. (I didn't find studies on the usefulness of student evaluations until after I left college. When I did, it seemed they all agreed the bubble question in student evaluations were of little or no value, but sometimes profs felt they learned something from the short answer section. )

Adandon the bubble sheets!!! I don't think they provide useful information.

For improving the class, I think free form or short answer questions would be most useful. Ask about specific things that they found helpful in learning. You can in theory act on them.

I'd suggest peer observation for overall quality feedback. Perhaps across departments to avoid what are really curriculum debates. I think an experienced professor can tell when students are engaged and learning while observing a class. If everyone looks shell shocked it's a bad thing.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Go On Till You Come to the End; Then Stop

October 31, 2017

ScienceBlogs is coming to an end. I don't know that there was ever a really official announcement of this, but the bloggers got email a while back letting us know that the site will be closing down. I've been absolutely getting crushed between work and the book-in-progress and getting Charlie the…

Meet Charlie

October 30, 2017

It's been a couple of years since we lost the Queen of Niskayuna, and we've held off getting a dog until now because we were planning a big home renovation-- adding on to the mud room, creating a new bedroom on the second floor, and gutting and replacing the kitchen. This was quite the undertaking…

Physics Blogging Round-Up: August

September 1, 2017

Another month, another set of blog posts. This one includes the highest traffic I think I've ever seen for a post, including the one that started me on the path to a book deal: -- The ALPHA Experiment Records Another First In Measuring Antihydrogen: The good folks trapping antimatter at CERN have…

The Age Math Game

August 22, 2017

I keep falling down on my duty to provide cute-kid content, here; I also keep forgetting to post something about a nerdy bit of our morning routine. So, let's maximize the bird-to-stone ratio, and do them at the same time. The Pip can be a Morning Dude at times, but SteelyKid is never very happy to…

Kid Art Update

August 13, 2017

Our big home renovation has added a level of chaos to everything that's gotten in the way of my doing more regular cute-kid updates. And even more routine tasks, like photographing the giant pile of kid art that we had to move out of the dining room. Clearing stuff up for the next big stage of the…