Summaries of Research Articles

These summaries were crowdsourced and compiled with the help of undergraduate students and research assistants. Please alert me to any errors or omissions.

Suggested Citation: Holman, Mirya, Ellen Key and Rebecca Kreitzer. 2019. “Evidence of Bias in Standard Evaluations of Teaching.” http://www.rebeccakreitzer.com/bias/

You can find a google document with all summaries here.

You can submit an article or correction to the list here.

Academic Articles, Book Chapters, and Working Papers Finding Bias

In an experiment, 41 male and 46 female undergraduate students taking psychology courses at a regional comprehensive southern university read a lecture written by what they perceive to be a male and female teacher that focuses on pay disparities between men and women. Students were then asked to rate the lecture. The male professor received significantly higher overall rating, higher rating on accuracy of the lecture, and a low rating on sexist rating. There was “no significant interaction between sex of student and sex of professor; male and female student ratings were similar for both professors.” Students with more liberal attitudes were correlated with lower ratings of sexism for the female professor. Students with more traditional views rated female professors to be more sexist than male professors; no difference between students with liberal attitudes.

Students read a syllabus and then answer questions about the hypothetical professor and class. There were 2 different syllabi, one for each hypothetical course used, and they varied dependent on the gender, ethnicity, and teaching style of the professor. Gender and ethnicity ere indicated by gender pronouns and first and last names, while teaching style was indicated through language in the syllabus to convey either a strict or lenient teaching style. 594 students rated professors on characteristic dimensions such as warmth, professional competence and difficulty. “Women professors were perceived to be warmer than men professors. Even though their syllabuses were identical, the simple presence of a gender identifiable name influenced students’ assumptions about the professor.” (469) “Latinas were viewed as more warm than Anglo women and Latino men. Latinas who teach composition were also viewed as more warm than Latinas who teach pre-calculus. These patterns suggest that students find Latina professors teaching the gender-consistent composition course as particularly warm compared to other combinations.” (469) “Women professors of composition (a female-stereotyped course) with lenient teaching styles were viewed as more competent than men who taught the same course with the same style. This pattern suggests that students use stereotype consistency to rate a professor’s competence.” (469) “Women’s and Latina/os’ student ratings were contingent on teaching style—they are rewarded or penalized for their style—while Anglo men’s ratings were not.” (470)

A sample of ethnically diverse undergraduates (622 students in Study 1; 545 students in Study 2) read a syllabus for a proposed Psychology of Human Sexuality course and responded it to it. Manipulated sources of information in the syllabi were whether or not the course was taught by (a) a woman or a man; (b) a lesbian/gay man or a heterosexual; (c) a politically conservative or politically liberal professor; and (d) a professor with a syllabus containing typographical errors or no errors. Study 1 utilizes a course syllabus paradigm to examine students’ perceptions of lesbian and gay professors teaching a course on human sexuality. Study 2 added students’ self‐reported attitudes toward lesbians and gay men as another potential predictor of perceptions of professors. It used students’ responses to the Modern Homonegativity and Homonegativity scales to “categorize the students according to old‐fashioned and modern views about homosexuality.” Study 1 found lesbian and gay professors who teach a course on human sexuality were viewed as more biased than were heterosexual professors teaching the same course with the same syllabus. In study 2, they find that “regardless of who taught the course, and regardless of the political perspective of the professor, homonegatives found the course to be problematic. Specifically, homonegatives (relative to non‐homonegatives) tended to believe that the professors teaching the course were politically biased.” “Regardless of political ideology, that modern homonegatives viewed lesbian/gay professors to be more politically biased than heterosexual professors with the same syllabus. Similarly, modern homonegatives viewed lesbian/gay professors as more biased than did non‐homonegatives, but less biased than homonegatives.”

Authors briefly review the literature on SETs and broadly synthesize major findings, making recommendations for evaluation based on the findings. They draw three conclusions. First, issues in measurement on SETs should speak to the need for multiple means of evaluating teaching effectiveness; the authors suggest a strong reliance on SETs should be avoided. Second, variation in teaching styles are a factor not often considered in evaluation and the authors find that student-centered contact is often undervalued. Finally, students have gendered expectations of teaching styles that can impact how they evaluate faculty.

All students watched the same 35-min picture slide-audio taped presentation. The slide was a picture of a computer-generated genderless stick figure; the audiotape was read by a 45-year-old woman whose voice did not suggest her age or gender. 198 female and 154 male college students from a medium-sized western university (N = 352). The participants were given a 12-item evaluation form to rate the effectiveness of the professor on a 1-6 scale, as well as determining the age and gender of the professor speaking. Students rated the ““young” male professor description higher than they did the “young” female, “old” male, and “old” female professors on speaking enthusiastically and using a meaningful voice tone during the class lecture regardless of the identical manner.” Participants rated professors similar for objective, content information.

This article describes the influences of course evaluations on promotion, tenure, and merit decisions. It describes how faculty can combat biases to ensure that their teaching effectiveness is accurately portrayed. It reviews alternative evaluation methods, including portfolios, peer feedback sessions, and informal student surveys.

Used qualitative and quantitative teaching evaluation data from a female professor who taught a feminist course over three semesters while pregnant and carrying the child to term. They analyzed the shifting student reaction to the professor depending on her fulfillment of gendered expectations and the impact on teaching evaluations. The student evaluations contained 2 parts: respond to 10 statements on a scale from 1 to 7 and voluntarily write comments about the course and the teacher. The number of students varied by term: 99 students in the spring, 22 in the summer, and 124 in the fall filled out the questionnaire. Of the 99 in the spring 52 wrote comments, of the 124 in the fall 57 wrote comments, and of the 23 in the summer 20 wrote comments. Gendered expectations of the professor were evident in the written evaluations. “When gendered expectations of their professor were met, students did not find fault with Dr. Baker for being pregnant or teaching from a feminist perspective. However, when students’ expectations for Dr. Baker as a woman were unmet, students’’ written comments became negative and hostile and the content of their comments often touched on professor bias, male-bashing, and rudeness, which students attributed to the feminist course content and to Dr. Baker’s pregnancy” (page 40). Their findings demonstrate that student expectations shift based on how professors presents themselves and their situation, which presents challenges for anyone who isn’t a white, able-bodied man.

The sample consisted of 61 female and 47 male students at a small private liberal arts college in the northeast, predominantly White and almost entirely sophomores to seniors (by design) The researchers used a 25-item questionnaire to evaluate 16 male and female professors that were matched on experience, tenure, and division of courses. Questions investigated scholarship, organization/clarity, instructor-group interaction, instructor-individual interaction, and dynamism/enthusiasm. Used a questionnaire consisting of two questions: “Think of the best teacher you’ve had in college” and “describe what made him or her the best in your opinion.” They found a significant student gender by professor gender interaction on all five factors such that male students gave female professors significantly poorer ratings than they gave male professors. The significant findings on the two interaction factors were attenuated when student-perceived ratings of their professors’ instrumental–active (stereotypically masculine) and expressive–nurturant (stereotypically feminine) personality traits were used as covariates. Both sets of traits were significantly and positively associated with student ratings, and the “best” professors typically combine both; i.e., they are androgynous.”

In this study, researchers analyzed student evaluations completed over a four-year period at a private liberal arts college for the effects of teacher gender, student gender, and divisional affiliation. Overall, the ratings of male professors appeared to be unaffected by student gender. By contrast, female professors tended to receive their highest ratings from female students and their lowest ratings frommale students. The interaction between student gender and teacher gender generally remained when possible confounding factors (e.g., teacher rank) were partialed out. The mean ratings that female and male professors received also varied as a function of the divisional affiliation of the course.

553 male and 527 female college students evaluated 16 pairs of male and female professors (32 professors total). Using a multivariate analysis, these evaluations asked students to answer 26 questions, rating the professors on various criteria using a 5-point scale (with 1 being well below average and 5 being well above average). The questions were equally divided across 5 factors: scholarship, organization, instructor-group interaction, instructor-individual student interaction, and enthusiasm. The first question in the evaluation asked about overall teaching quality. The evaluation was administered in the 5th or 6th week of a 14-week semester. Of the 1,080 students who participated, 48% majored in the social sciences, 19% majored in natural sciences, 14% majored in engineering, 5% majored in the humanities, and 9% were undecided. 21% of the students were freshmen, 22% were sophomores, 30% were juniors, and 26% were seniors. The fields of study for the professors varied with 28% in the humanities, 47% in the social sciences, and 25% in the natural sciences. There was not a professor in the engineering department, leaving ⅓ of the class without representation. The study found that both male and female students rated female professors less favorably than male professors (male students rated female professors significantly lower). According to the results of the evaluations, scholarship, organization, and enthusiasm contributed the most to the discrimination of teacher sex. In other words, male students gave significantly different ratings for female professors on the basis on scholarship, organization and clarity while female students rated male and female professors more similarly.

This study analyzed 803 undergraduate student evaluations of 20 female and 23 male professors at a liberal arts college. Students rated their professors in an evaluation of 26 questions, each focusing on one of 5 teaching factors meant to evaluate the effectiveness of the instructor. Professors rated themselves using the same questions and 9 extra exploratory questions. The questionnaire that sought to specifically find the effect of professor and student gender, and divisional affiliation on student ratings of professors and professor self-ratings. Based on the results of the study, Basow and Montgomery concluded that professor gender and divisional affiliation (department/field of study) contributed to the results of student evaluations. Female professors were rated higher than male professors on two interpersonal factors and on scholarship, and natural science courses were rated the lowest for most factors. Humanities professors received the highest overall ratings however, male professors in the humanities received lower ratings than female professors. However, professor self-ratings varied in regards to divisional affiliation. Overall, the researchers found that there was not a significant correlation between professor self-ratings and students’ ratings.

In this study, 253 students in non-science introductory courses at a liberal arts college completed a course evaluation questionnaire. Women instructors were perceived as warmer and more potent individuals. This largely accountedfor their higher formal student ratings in specific areas of teaching performance. However, students required women instructors to offer greater interpersonal support and judged them more closely than male instructors in providing it

Data from a randomized, controlled, blind experience from the United States was combined with data for a natural experiment at a French university. The French natural experiment analyzed data from student evaluations dated from 2008 to 2013. Students in different sections of the same course, all with different professors, were instructed to take the same final exam. Data was collected from 4,423 students in 1,177 sections at a French university, taught be 379 instructors. The American experiment collected data from four online sections of the same class. One class was taught by a male professor, one was taught by a male pretending to be female, another was taught by a female professor, and the last class was taught by a female pretending to be male. Data was collected using 43 out of 47 student evaluations. The combined data found that student evaluations of teaching (SETs) are influenced by the gender of the instructor. Female instructors received lower scores than males. With regards to the US study, the instructors who were female or thought to be female (male pretending to be female) were given lower scores. The study also found that students reported “female” instructors to be less prompt, regardless of the fact that assignments in all class sections were released simultaneously. The data from the French study found that bias in SETs varies based on the gender of the student as well as other factors. However, it is impossible to adjust for potential biases due to the nature of these factors. The evaluations studied were more indicative of gender bias than of teaching effectiveness therefore, gender bias can skew a SET so severely that more effective teachers could receive worse evaluations.

This study collected data from student evaluations of teaching (SETs) to test for potential relationships between teacher incentives of gender biases and student evaluations of professors. The study analyzed 22,665 SET observations collected from 4,423 students evaluating 372 undergraduate instructors at a French university. According to the data, male students were more likely to favor male instructors- a male student is 30% more likely to give an excellent overall satisfaction rating to male instructors than female instructors. There was also an evident influence of gender stereotypes in the evaluation of teaching dimensions. Students valued organization and class preparation in female instructors and class leadership in male instructors. However, the students learned at a fairly equal level with both male and female instructors. From this data, the authors concluded that improved teaching methods are not necessarily able to be measured by SETs.

Data collected from a French university was analyzed to identify the potential of gender bias in student evaluations of teaching (SETs). The data was compiled from a database of 20,197 SET scores that were collected over the course of 5 years and varying student, professor, and course characteristics. 11,522 of these evaluations were completed by female students and 8,675 were completed by male students; the evaluations were for 359 different professors with 33% identifying as female and 67% identifying as male. The study found that male students give significantly higher scores to male professors on overall satisfaction and teaching dimensions such as the professor’s perceived knowledge. Female students also give higher scores to male professors on ability to lead in the class, to relate to current issues, and to contribute to intellectual development. Female students give higher scores to female professors on the dimensions of teaching related to course content and assignments. The results from the study suggest that gender stereotypes may drive students’ evaluations of professors.

36 instructors in the College of Social Sciences at the University of Houston selected one Undergraduate class to participate in this study. 20 students were randomly selected from each class to contribute to a total sample size of 497 students. The study analyzed data from questionnaires to assess the potential relationship between gender role orientations, teacher sex, student sex, and student evaluations of course progress and satisfaction with instructors. The questions used were measured using two rating Bem Sex Role Inventory (BSRI) and the Instructional Development and Assessment System (IDEA). BSRI gave each instructor an androgyny score, classifying them as masculine, androgynous, or feminine while IDEA provided student evaluations. The study found that androgynous teachers received somewhat higher student evaluations. Female students reported higher rates of progress in classes taught by female professors as opposed to male professors.

Study examines the role gender (sex) plays in the dominance behaviors exhibited in the classroom-both among students and in male and female professor’s classrooms. “Dominance” was to mean both assertiveness (measured by frequency/duration of speech) and aggression (measured by interruptions). These behaviors were recorded from students (n=294; 72 male, 222 female) in six separate instructors’ (3 male, 3 female) graduate social work courses. In the male professors’ courses, no gender-based differences were found in assertiveness, however, in female professors’ courses men both spoke more often and at greater length than their female counterparts. Male students were found to be more aggressive than female students in both male and female professors’ classrooms, and interrupted female professors slightly more than they did male professors. Similarly, male students were more likely-in both male and female-led classrooms-to interrupt other students, with a higher frequency of interruptions in female professors’ courses. Interestingly, while male students were more likely to interrupt female students than female students were to interrupt male students, a strong majority of interruptions by each gender (about 90% for both genders) were of the opposite gender.

An experiment was conducted at a Midwestern state university. The sample size was composed of 42 undergraduate seniors, mostly Caucasian females; 10 of the 42 students were male and 3 were black (all females). The students were given 1 of 4 different photographs- attractive or unattractive teachers who were either male or female. Each photograph also included a description of the teachers’ teaching style/philosophy, divided by either an authoritarian or humanistic style. The results of the study showed that attractiveness did not have an effect on the ratings of instructor effectiveness. The study also found that authoritarianism was strongly associated with negative evaluations. However, contradictingly attractive authoritarian females were rated significantly more positively than the other 3 possibilities of authoritarian instructor characteristics.

78 student evaluations were analyzed to test the hypothesis of bias towards masculine personality traits. Participants in the study (35% male, majority white) were asked to rate the desirability of 52 trait adjectives and the importance of 25 behaviors for 3 hypothetical professors- “Sam”, “Sarah”, or “Dr.” Lawson. The results found that male students were mostly responsible for gender distinctions in evaluations. 13 of 20 highly desirable traits were considered to be traditionally masculine. In an analysis for discrimination, “Sam Lawson’ was comparable to “Dr. Lawson” suggesting that the label of professor or doctor is associated with a male schema. The characteristics of “self-confident”, “stable”, and “steady” were rated more important for “Sarah Lawson” than the other identities.

This study was designed to identify gendered expectations, holding occupation constant, based on evaluations of instructors’ performance in the classroom. Undergraduate students (N= 198) evaluated male and female instructors from the same disciplines, using the same questionnaire. Questions were selected based on gendered expectations. The researchers found that male and female instructors are evaluated differently on aspects of presentation and classroom structure.

In 2011, Michigan adopted high-stakes teaching evaluations, which required sanctions for low performing teachers. Using evaluations for 97,446 teachers, they find that Latinx and Afridan American teachers were 2 and 3 times more likely to receive a low rating than White teachers. Male teachers were 50% more likely to receive a low rating compared to female teachers. Low effectiveness ratings were also associated with working in a charter, high poverty- or high-minority school.

Using both a survey of professors (n=88) and an experiment asking students about special favor requests (n=121), the researchers found that academically gifted students held stronger expectations that female (versus male) professors would grant their special favor request, and help stronger negative emotional and behavioral responses when those requests were not granted.

This study used a matched pairs sampling method to analyze student evaluation surveys from Southern Illinois University at Carbondale for 1,474 courses in 1971. 1,607 student evaluations were used to compile data from a questionnaire called the Instructional Improvement Questionnaire that included questions on instructor evaluation, evaluation of course, strengths and weakness, research data, and other optional questions. According to the results of the survey, there were few significant differences between male and female faculty in the results of the surveys. There was also no evident relationship between faculty sex and student sex- no evidence of potential gender bias in evaluations.

This studied used 22 pairs of courses evaluated by 838 college students to evaluate if there was a potential influence of teacher sex, student sex, and teacher warmth on student evaluations of teachers as well as teacher evaluations of themselves. The evaluations were compiled through a questionnaire to evaluate instructor performance through a 3 factor analysis. The data showed that students gave higher ratings to teachers that were perceived as warmer and primarily interested in students. When female professors evaluated themselves lower on either factor, they received higher effectiveness ratings from the students. Contrastingly, when male professors perceived themselves as high in either indicator, they received higher effectiveness ratings.

This study used an “attributional ambiguity paradigm” to compare student reactions to lectures when students did not know the sexual orientation of the speaker and when the speaker was identified as gay or lesbian. When the lecture delivered was “weak,” participants actually rated the gay and lesbian lecturers more highly, which the authors suggest may be an effort to avoid discrimination. However, when the lecture was “strong,” participants rated known gay and lesbian lecturers more negatively than the lecturers whose sexual orientations were not known. According to the authors, these findings suggest how biases and prejudices based on sexual orientation may affect perceptions of “merit, skill, or likeability,” which often weigh heavily in hiring decisions.

The researchers used course satisfaction data from student evaluations from 2010-2016 at a national Australian university. There is statistically significant evidence of gender and cultural bias from students, which tends to have a negative impact on the SET scores of women and teachers of non-English-speaking backgrounds. Women who do not come from an English-speaking background were the most affected. Female instructors that have no cultural barriers have a harsher feedback score when they are teaching in the science field rather than the English field. There are no significant biased results for women with no cultural barriers in the Arts and Social Sciences department, but both men and women with cultural barriers are more likely to have harsh feedback. In the Engineering and Medicine department, the significant discrepancy was for harsh feedback was towards female instructors that have cultural barriers. Excluding Engineering, all other departments had results that displayed that male instructors are more likely to receive the highest possible grade of 6 than are female instructors. Male students are going to be more willing to give the higher grade to male instructors, rather than female instructors, specifically in the science department.

1,281 students at the University of Illinois at Urbana were sent questionnaires to collect data to test the hypothesis that women were prejudiced against women instructors/professionals. The questionnaires were composed of three questions and asked about the number of male and female professors the students’ had, the hypothetical preference of having a male or female professor in different classroom settings, and to determine if a student was more willing to agree to an opinion based on whether a male or female stated it. The results of the study were that men received higher ratings from students in sciences, social sciences, and agriculture and women received higher ratings from students in home economics. Male students were also more likely to prefer male professors than female professors. The study also found that men were more likely to agree with an opinion stated by a male while females were equally likely to agree with an opinion stated by a male or female. From the results, the researchers were able to conclude that contact with sophisticated women helps to break stereotypes among both men and women.

In the first experiment, researchers examined the effects of course type, student gender, and instructor gender and gender role on student evaluations of instructor effectiveness. In the second experiment, researchers explored students’ perceptions of the importance of various gender role characteristics in instructors of different course types. Results suggested that instructor gender role is more important than instructor gender in affecting student evaluations. Female and male students preferred instructors (science instructors, in particular) who hadbothfeminine and masculine characteristics, regardless of the instructor gender.

The researchers analyzed 19,952 student evaluations from a university in the Netherlands. Students at the university were randomly assigned to female or male professors, making the conditions ideal as the potential of gender bias in the selection process could be avoided. 35% of the instructors and 38% of the students involved in the study were females. From analyzing the evaluations, the researchers found that women receive lower ratings in teaching evaluations than males. Specifically, the researchers found that “male students evaluate female instructors evaluate female instructors 20.7% of a standard deviation worse than male instructors.” However, the study also found that female students rate female professors “25.8% of a standard deviation higher than male professors.” Overall, women received lower teaching evaluations than men despite the fact that the instructors’ genders had no clear effect on students’ grades or effort in class.

Data was collected from 205 academic courses over the course of 3 academic semesters at the University of Washington. The researchers analyzed various aspects of student evaluations such as evaluation ratings, expected course grades, and course workloads. According to the results, the grading leniency of instructors influences student ratings, in other words students gave higher ratings to courses in which they received higher grades and lighter workloads.

Three studies were analyzed to determine whether women were penalized for success at typically male-oriented gendered tasks. All three studies used questionnaires that manipulated the sex of the individual in question and the information provided about the individual in the scenario. The first study….. The study found that women who are successful in male gender-typed domains are penalized for their success. These women are perceived more negatively, they are disliked more than men who are described identically and are found to be less desirable as bosses.

The authors of this publication reviewed and referenced data from other studies to identify the potential for race and gender bias in higher education. The majority of the data comes from student evaluations. The data provided that female faculty, in general, are not rated significantly lower than male faculty. However, there are two exceptions where there female instructors are systematically rated lower than male instructors- for large introductory courses and in male-domineering disciplines (science, economics, engineering).

100 first year and senior graduate students (50 male and 50 female) at San Jose State University were asked to evaluate the teaching methods of three male and three female professors using a scaling method. The professors were evaluated teaching in two “male”, two “female”, and two non-sex-linked academic fields. The mean ratings for the professors found that both male and female instructors were rated equally by female students, but male students rated male professors higher than female professors. In regards to perceived “power”, both male and female students rated male professors higher however, the students also said that they would prefer to take the female professors’ course over the male professors’. The researchers mentioned that the lower power rating for female professors may not be influenced by student bias and may be attributed to the gender dynamic in their respective academic fields. It was concluded that the variation in academic field had little to no effect on the ratings of the professors, male or female.

Two experiments were analyzed to attempt to identify the impact of social contact, facial expression, and instructor sex on student ratings of instruction (SRIs). The first experiment consisted of a survey that asked participants to read about and rate a hypothetical professor (male or female), who either did or did not have social contact with students out of class. The second experiment asked students to watch videos of male or female professors lecturing either with or without smiling and then rate the performance of the instructor. The results of the first experiment showed that male instructors were rated higher than female instructors and that female instructors who has social contact with students were rated higher (social contact did not influence male instructor ratings). According to the second experiment, more students said they would take a course with the male instructors and were described using adjectives that emphasized intelligence. Female professors received fewer adjectives related to intelligence. The unsmiling male was rated a little higher than the smiling woman but the smiling woman was rated significantly higher than the unsmiling female.

Some scholars have found gender to have no (or very little) influence on evaluations of teaching, whereas other scholars have found gender to affect evaluations significantly. The authors drawon insights from sociological scholarship on gender and evaluation and argue that this apparent inconsistency is itself an artifact of the way that quantitative measures can mask underlying gender bias. The authors offer concrete strategies for faculty, researchers, and administrators to improve the efficacy of the system of evaluation.

60 male and 60 female undergraduate students were asked to rank a professor based on their perceived level of attractiveness. The students were also provided with a short description that was identical for each hypothetical professor excluding the name. The students were then asked to rate each professor on an 11-point scale based on specific positive and negative personality traits. The study found that the students rated the attractive professors higher, regardless of sex, than the unattractive professor. However, the researchers also noted that in regards to the characteristics that the professors were rated on, almost all male instructors were rated as more competent than their female counterparts, regardless of attractiveness.

72 students from an online introductory course were asked to complete online evaluations of their instructors’ performance. The instructors were rated on a 5 point scale for 15 questions on various factors such as overall quality and effectiveness. The survey measured aspects such as effectiveness, interpersonal traits, and communication skills. The study found that students rated the male instructor identity significantly higher than the female (regardless of the actual unknown gender of the instructor). The bias in this experiment is not however, a result of the gendered behavior of instructors but is instead a result of student bias. The online environment in which this experiment was conducted allowed the actual gender of the instructor to be concealed, allowing the biases to be effectively analyzed.

This paper uses data from 24 consecutive semesters over the course of 9 years, specifically focusing on economics courses taught at the large university. Data was collected from 618 introductory courses taught by 60 different professors and 379 upper-level courses taught by 22 instructors. Student evaluations were collected from these courses to analyze what factors could impact evaluations of instructors. The study found that instructors who inflated grade expectations could “buy” better evaluation scores and that class size and instructor experience play a role in the evaluation ratings of introductory courses but not in upper-level courses. According to the data, male instructors received higher ratings/scores than female instructors and younger instructors were more popular than older faculty members.

The authors of this study referenced and reviewed data from other studies, specifically publically available student evaluation data from a political science department as a large public university to determine a potential relationship between gender and leadership assessments. According to the data collected, there is a relationship between instructor gender and student assumptions about leadership roles. In particular, the study found that traditionally male-oriented leadership stereotypes and roles play a large role in influencing evaluations of large lecture courses. Interestingly, the perception of leadership roles did not heavily influence student evaluations of smaller courses.

Students were randomly assigned to a male or female professor at a university in the Netherlands and 19,952 student evaluations were analyzed to identify potential gender bias in evaluating 735 different instructors. The researchers found that female faculty received systematically lower evaluations even though the students’ grades were relatively the same with male and female professors. Male students evaluated female instructors more negatively, 21% of a standard deviation worse than male professors. Female students rated female instructors 8% of a standard deviation lower than male instructors. Both male and female students rated female instructors lower than their male counterparts.

This research study collected data from over 30,000 student evaluations of 225 professors over the course of 6 semesters to measure the effects of factors beyond the professors’ control. The researchers tested six hypotheses: SETs in elective classes will be higher than in required classes, SETs in large, medium, and small classes will be statistically different, SETs for male and female professors will be unequal, SETs and expected course grades will be positively correlated, SETs in large required classes will be lower than those in large elective classes, SETs in medium and large sections of required classes will be statistically equal for male and female professors. The results found that all of the hypotheses were supported except the third hypothesis as the data supported that the SETs for male and female professors were actually relatively equal. However, female instructors teaching larger classes had substantially lower scores than their male colleagues.

300 student volunteers in 5 sociology classes completed a self-administered survey. The survey listed every professor or instructor who actively taught courses in the department with two added fictitious names. One of the “fake instructors was male and the other was female. The students were told to indicate how familiar they were with each of the instructors on the survey and to indicate the highest degree they believed the professors they were most familiar with had. According to the data, the students believed that on average the male faculty had higher degrees than the female faculty. In reality, all of the instructors on the survey had Ph.D’s.

This research study analyzed data from student evaluations, specifically the evaluation comments and ordinal scoring of instructors. The data was collected from student evaluations from online courses from different universities as well as RateMyProfessor ratings. According to the results of the study, the language used to evaluate male and female professors is significantly different. The female instructors were evaluated differently in intelligence, perceived competence, and personality. The researchers found that women were evaluated more heavily based on personality and appearance and were more likely to be labored as a “teacher” as opposed to a “professor”. The data also provided that male professors received higher ordinal scores regardless of the fact that the courses taught by male and female professors were administered identically. Overall, the data suggested that evaluations were largely different between male and female instructors, making the use of evaluations when making employment decisions discriminatory.

This study experimented with students resistance or the students’ unwillingness to accept or think about ideas that contradict with their sense of social order or values. In this particular case, Moore (an instructor at a University) measured differences in student feedback and facial expressions when Moore was teaching the class in comparison to when a male colleague taught the class. The male instructor was told to state that men in families were the “winners” and women in families were the “losers”. The students stated that the statements were much more acceptable coming from the male instructor than they were when Moore repeated them. A student said the male instructor was more qualified to make such a statement because he was identified to have a Ph.D. However, Moore also had a Ph.D indicating that the students were displaying a form of resistance by being more accepting of a statement made by a man as opposed to a woman.

The authors conducted an experiment in four large introductory courses in Spring of 2018: two introduction to biology courses and two introduction to American politics courses. Within each pair of courses, one section was taught by a white, female instructor and one section was taught by a white, male instructor. Students were randomized into into one of two conditions. In the control condition, students received the standard electronic SET survey for their department. In the treatment condition, the solicitation and the evaluation instrument used language that informed students about biases. For each of the evaluation questions, there were significantly higher scores for the female faculty in the treatment condition compared to the control. The evidence from our experiment with SET suggests that a simple intervention informing students of the potential for gender biases can have significant effects on the evaluation of female instructors. These effects were substantial in magnitude; as much as half a point on a five-point scale.

This study by Chavella T. Pittman focuses on the classroom experiences of women of color faculty. Through interviews, Pittman found that women of color faculty experience gendered racism in the classroom and that their authority is challenged, their competency is questioned, and their scholarly expertise is disrespected “almost exclusively” by white male students. According to Pittman, the findings “are consistent across black, Asian American, and Latina faculty.” Similarly unequal power dynamics are reported among women of color faculty in the humanities, social sciences, and natural sciences, and among faculty from all ranks, junior to senior. In the conclusion, Pittman notes that “until additional knowledge and strategies are developed to eliminate race and gender oppression in the classroom, universities must take action to acknowledge it and protect women faculty of color from the ramifications of this difficult environment.”

Data was collected from student evaluations filled out by 30 males and 28 females with 17 males and 15 females evaluating a female professor and 13 males and 13 females evaluating a male professor. The students stated that they did not intentionally choose professors of certain genders. Female students were less likely to rate professors highly than male students in regards to teaching style and ability. Likeability did not influence the evaluations. As a result, the study concluded that any evidence of gender bias is completely unintentional.

This study used natural experiment at a large North American university before and after a change in student evaluations that used a 10 point to a 6-point scale. They use instructor-course fixed effects to control for course quality as well as a survey experiment that varied gender and points on the evaluation scale. The natural experiment: “include 105,034 student ratings of 369 instructors in 235 courses (and 625 instructor-course combinations)” a large North American University (page 9) and the survey experiment: 400 students. In the natural experiment, the gender gap in evaluations was smaller with the 6-point scale than under the 10-point scale, controlling for individual instructor. The effect was most prominent in male-dominated fields. The survey experiment replicates the natural experiment. Respondents were more willing to get the perceived female instructor the top rating on a 6-point scale compared to the 10-point scale. Open-ended and Likert questions yield some insight into the mechanisms. “First, even when raters received identical evidence of teaching performance, there was a gender gap in evaluations under the 10-point rating scale. Yet, that gap virtually disappeared when the raters used the 6-point scale. Second, raters saw the female instructor as less brilliant than her otherwise identical male counterpart, but their relatively lower expectations of brilliance for a 6/6 rating meant she was more likely to receive the highest rating on the 6-point than on the 10-point scale.” (page 20)

Data was collected from RateMyProfessor to find correlations between the different rating aspects such as overall instructor quality, gender, discipline, physical attractiveness and easiness. 7,882,80 ratings were evaluated, only professors with over 20 ratings were included in the data set. The researchers found statistically significant strong positive correlation between clarity, helpfulness, overall quality and easiness. Gender had a statistically significant impact (at least on a department level).

In this study, researchers examined the effects of students’ and professors’ sex on student evaluations of professors’ teaching effectiveness. Researchers analyzed the ratings of over 400 faculty made by over 9,000 students and controlled for many variables. Results showed that (a) students gave male faculty significantly higher evaluations on global teacher effectiveness and academic competence than they gave female faculty; (b) female faculty were notrated as more sensitive to student needs than male faculty; and (c) students seem to place more weight on academic competence for male faculty than for female faculty when making overall global judgments of faculty performance

This research study asked students to fill out a teaching evaluation the week after final grades for courses were released at a university. The teaching evaluations were completed on an 11 point scale to determine whether there was a potential for test motivated stereotyping. 580 course evaluations were collected, 334 from students who received high grades and 236 from students who received low grades (142 were disqualified for insufficient information) for 318 instructors. The research found that male and female instructors typically gave students the same grades. Both male and female professors received higher ratings from students if they gave higher grades. However, female instructors were given significantly lower ratings from students who received lower grades. The evaluations of female instructors was more heavily influenced by the grades they gave in comparison to the effect of grades on the evaluations of male instructors. Students were also found to be more upset about a low grade when they received a low grade from a female instructor.

The researchers analyzed data from 13,702 students evaluations compiled over the course of three years. The evaluations contained 36 questions which were measured on a scale from 1 to 5. The demographics of the university at which the study was conducted were predominantly white with 87% of the student population identified as Causasian and only 6.5% identified as African American. The purpose of the study was to identify whether the race of the student impacted the evaluation of Black faculty members. According to the results, the faculty in general had above average scores on most items tested in the questionnaire however, black faculty received the lowest mean scores on the multidimensional questions and “global” items (overall value of course). The researchers suggested that evaluations should focus on multidimensional questions as opposed to global items to get a more accurate evaluation for each faculty member.

This article reviews recent literature on student evaluation of teaching (SET) in higher education. It is based on the SET meta-validation model, drawing upon research reports published in peer-reviewed journals since 2000.

The researchers used a qualitative analysis of words used in student evaluations of instructors to describe the best and worst rated professors. 288 college students responded to an optional survey that asked them behavioral questions about their professors. The questions asked about students’ opinions on social issues, confidence in social institutions, satisfaction and other items. The students were also asked to write down four adjectives to describe the best teacher they had, and repeat the same for the worst teacher they’ve had. Out of the 288 students, 198 were women and 90 were men and the majority of the respondents were white. 135 of the students responded that their “best teacher” was a man and 153 of the students responded that their “best teacher” was a woman. Male instructors were seen as performers and were described as “funny” while female instructors were seen as “caring and nurturing”. The worst male instructors were described as “boring and self-centered” while the worst female instructors were “rigid, mean, and unfair”. According to the differences in descriptive words used for male and female instructors, it is evident that students hold teachers to certain gender expectations but the burdens are more prevalent for females.

The researcher reviewed various pieces of literature to compile the data for their study. The goal of the study was to use statistical information from course evaluations to prove that student evaluations of instructor effectiveness are not accurate measures of teaching effectiveness. The publication emphasizes that due to the nature of course evaluations, universities should not rely on student evaluations to make decisions regarding promotions and tenure for faculty. According to Stark, using averages of evaluations to make employment decisions is not an accurate way to measure instructor effectiveness.

The publication analyzes various aspects of teaching in universities and how students respond to differences in teaching methods. Data was collected from observing professors in classrooms and from student evaluations and feedback. The authors found that the gender of the professor alters how critical students are of the instructor due to the fact that gender is created and influences social expectations.

The researchers created a “brilliance language score” by standardizing the frequencies of words such as “brilliant” and “genius” for instructors within each discipline to get a number representation of the usage of these words to refer to female and male instructors. The data of the words usage was collected from RateMyProfessor.com. The results provided that male instructors were more likely to be described as “brilliant” or “genius” than female faculty. However, there did not seem to be a large discrepancy in the usage of the words “excellent” and “amazing”. There was a strong positive correlation between the use of “brilliant” and “genius” to describe instructors and rigorous courses. According to the data, the researchers noticed that women were given less praise on their intellectual ability and intelligence.

The researchers analyzed data from the International Institute of Social Studies of Erasmus University in the Netherlands. The data was compiled from student evaluations from 2010 to 2015, 688 teaching evaluations were used in the study. According to the results, female instructors were 11 percentage points less likely to reach the line necessary to receive a promotion in comparison to male instructors. The researchers attempted to reduce potential bias by controlling for various variables such as leadership and teacher experience however, female instructors were still rated significantly lower than males. The data supports the idea that students rate women more negatively in evaluations which disadvantages female faculty looking for promotions.

Data was collected from 765 undergraduate and graduate students at a university in the U.S. The students were asked to rate instructors using a 25 question survey where they were expected to rate the instructors on three factors- interpersonal characteristics, pedagogical characteristics, and course content characteristics on a scale from 1 to 9.  From the data collected, there seemed to be a significant interaction between student gender and professor gender in regards to teaching characteristics and course content ratings. However, there did not seem to be a significant difference in the ratings of interpersonal characteristics. This insinuates that gender bias does play a role in student evaluations of effective teaching and course content. Female students rated female instructors and male students rated male instructors higher on these specific characteristics.

Academic Articles, Chapters and Working Papers Finding Bias Favoring Women

486 undergraduate students (out of 3500) from a mid-size private university in CA responded to a survey of closed-ended questions about perceptions of male and female faculty characteristics / practices and one open-ended question to write about the differences between the two they’ve experienced. “Female students tend to evaluate female faculty very highly across both the more traditionally feminine dimensions of caring-expressiveness and an interactive teaching style, as well as the more masculine dimensions of professional-ism-challenge and organization. They also tend to evaluate male faculty considerably lower on these same dimensions. In contrast, the male students evaluated male and female faculty more equally across the five measures of teaching style.” P 201 “The most immediate surprise when reviewing the open-ended responses was that the majority of students elected to write (either in positive or negative terms) about their female professors. Despite the wording of the question, asking students for observations about male or female instructors, their responses were most often structured so as to point out how female faculty differed from male faculty. Their comments on how female faculty depart from male faculty (for better or worse) appear to reinforce this understanding of the female professor’s anomalous or outsider status.” P 203

This study investigated gender differences in student evaluation of teaching through two analyses, based upon data from 741 college classes. In the first analysis, researchers compared female and male student ratings in the same classes for female and male instructors. In the second analysis, researchers examined how student ratings differed for male and female instructors. Female students gave higher ratings to female instructors on three of eight scales for all disciplines combined. Male students gave higher ratings to male instructors on only one scale. Male and female students showed no difference in their rankings of male teachers. For the total sample of classes, when students gave more favorable ratings, such ratings were largely given by female students to female instructors.

This study evaluated three different aspects- teacher immediacy, the instructor overall, and course evaluations. Data was collected from student evaluations from 197 undergraduate students. The data supported a positive correlation between students’ evaluations of instructors and the overall course evaluations. There was no significant difference in immediacy ratings between male and female instructors. However, female instructors received higher teacher and course ratings than male instructors.

In this study, the authors analyzed faculty evaluations from 12,153 studentsto investigate the effects of faculty gender, course type, and course level (graduate versus undergraduate)on thefaculty evaluations. They found that female instructors receivedhigher ratings than male instructorsand that ratings differed significantly by course type and by students’ perceived amount of learning

Students at a midwestern university completed a Curriculum and Instruction Survey (C&I) form during the 1985-86 academic year. The researchers identified four types of classes (lecture, lecture-discussion, discussion, and laboratory) in sufficient quantity to be analyzed. They obtained atotal of 5,843 evaluations for 242 different classes. They studied a number of class and instructor variables, as well as the interactions among those variables. One finding in this study was that overall, female instructors received higher ratings than male instructorsdid.

Academic Articles, Chapters, and Working Papers Finding No Gender or Race Bias

The data is comprised of student evaluations from 29,519 students who attended Auburn University. The goal of the study was to attempt to identify potential gender-oriented biases in student evaluations. The data was collected from student ratings of instructors using an eight question survey that required students to rate statements on a 5-point Likert scale. According to the results of the study, student gender seemed to influence student evaluation however, instructor gender did not seem to have an effect. The researchers found that female students rated instructors higher in general than male students did and that students gave higher ratings to instructors with the same gender as them. The researchers did acknowledge that in order to get a better understanding of external variables that cause differences in student evaluations, the interpretation of rankings must be changed.Wallisch, P. and J. Cachia. 2019. “Determinants of perceived teaching quality: The role of divergent interpretations of expectations.”

Analyzing a comprehensive set of data from RateMyProfessors.com. We find that gender differences in perceived quality are minute, and that the overall ratings can be well predicted by specific instructor attributes. We conclude that student derived evaluations of teaching are largely unbiased and reflect instructor qualities.

Given that there is not a singlestudy that summarizesresearch on student evaluations of teaching (SETs) with regard to their validity, susceptibility to bias, practical use, and effective implementation, the authors conducted a comprehensive overview of SETs by combining nineprior meta-analyses (covering 193 studies) related to SETs. This yielded a small-to-medium overall weighted mean effect size between SETs and the variables studied. The authors’ findings suggest that SETs appear to be valid, have practical use that is largely free from gender bias,and aremost effective when implemented with consultation strategies.

Reports, Recommendations and Newsletters

http://advance.unl.edu/files/annualevalutationoffaculty3_2013.pdf

The authors describe and examine potential sources of bias in student evaluations of faculty teaching, such as professor gender, professor race/ethnicity, professor attractiveness, professor age, course difficulty, and expected grade.

In this paper, the authors attempt to summarize the conclusions of the major reviews of the student ratings research and literature from the 1970s to 2010.

Result of RIT’s NSF ADVANCE Grant; includes list of work of other NSF ADVANCE institutions with links.

https://www.rit.edu/nsfadvance/assets/pdf/gender_bias_in_faculty_teaching_evals_03.13.15forwebsite.pdf

News Articles and Newsletters

https://www.theguardian.com/lifeandstyle/womens-blog/2015/feb/13/female-academics-huge-sexist-bias-students?CMP=share_btn_fb

https://www.npr.org/sections/codeswitch/2015/03/05/390686619/study-at-rate-my-professor-a-foreign-accent-can-hurt-a-teachers-score

http://activehistory.ca/2017/03/shes-hot-female-sessional-instructors-gender-bias-and-student-evaluations/

https://www.insidehighered.com/news/2016/09/21/new-study-could-be-another-nail-coffin-validity-student-evaluations-teaching

http://www.insidehighered.com/news/2019/05/20/fighting-gender-bias-student-evaluations-teaching-and-tenures-effect-instruction#.XONiZ-91UfY.twitter

https://psmag.com/education/how-to-combat-gender-bias-in-teacher-evaluations

https://www.npr.org/sections/ed/2016/01/25/463846130/why-women-professors-get-lower-ratings

https://www.psychologytoday.com/us/blog/everybody-is-stupid-except-you/201305/do-the-best-professors-get-the-worst-ratings

http://www.slate.com/blogs/xx_factor/2014/12/09/gender_bias_in_student_evaluations_professors_of_online_courses_who_present.html

https://hbr.org/2019/04/one-way-to-reduce-gender-bias-in-performance-reviews

https://slate.com/human-interest/2014/04/student-evaluations-of-college-professors-are-biased-and-worthless.html

https://www.washingtonpost.com/news/monkey-cage/wp/2013/10/02/student-evaluations-of-teaching-are-probably-biased-does-it-matter/?utm_term=.f70a561214c3

Other Miscellaneous Tools and Articles