BIAS INTERRUPTERS TOOLKIT

Performance Evaluations

Incremental steps that improve diversity in your organization can yield large gains. Diverse work groups perform better and are more committed, innovative, and loyal.

It’s time to go beyond just talking about the problem of workplace bias. Bias Interrupters is an evidence-based model that provides solutions. By taking small steps, Bias Interrupters can yield big changes.

We’ve distilled the huge literature on bias into simple steps that help you and your company perform better.

THE CHALLENGE

A study of performance evaluations in tech found that 66% of women’s performance reviews contained at least one negative personality criticism (“You come off as abrasive”) whereas only 1% of men’s reviews did.¹ In our performance evaluation audit at a law firm, we found that people of color and white women were far more likely to have their personality mentioned in their evaluations (including negative personality traits). What’s optional for white men (getting along with others), appears to be necessary for white women and people of color. Case in point: 83% of Black men were praised for having a “good attitude” vs. 46% of white men, and 27% of white women were praised for being “friendly and warm” vs. 10% of white men.²

Research also shows that white men tend to be judged on their potential while “prove-it-again groups” (women, people of color, individuals with disabilities,³ members of the LGBTQIA+ community,⁴ older employees,⁵ and first-generation professionals) are judged (or scrutinized) on their performance. Small biases can have large effects: According to one study, women received significantly lower “potential” ratings despite higher job performance ratings and this accounted for 30-50% of the gender promotion gap.⁶

THE SOLUTION

1. Use Metrics
Data and metrics help you spot problems—and assess the effectiveness of the measures you’ve taken. Businesses use metrics to help them achieve any strategic goal.

Key metrics:
Do your performance evaluations show consistently higher ratings for majority men than for women, people of color, or other relevant groups?
Do your performance evaluations show consistently higher ratings for in-person workers than remote and hybrid workers?
Do women’s ratings fall after they have children? Do employees’ ratings fall after they take parental leave or adopt flexible work arrangements?
Do the same performance ratings result in different promotion or compensation rates for different groups?

Keep metrics by: 1) individual supervisor; 2) department; and 3) the organization as a whole.

Collecting Data

Your organization already collects performance evaluations from managers, but you will need to pull this data in a way that allows you to analyze the demographic breakdown of the data.

Keep in mind that your performance evaluation system data may be stored in a different location than demographic information (which typically is collected in HR records created when someone is first hired):

Demographic data of employees:
Race/Ethnicity: This is likely collected from employees when they first apply to work at your organization.
Gender identity: Again, this is likely collected from employees when they first apply to work at your organization.

Performance evaluations data:
Quantitative ratings: All quantitative ratings that are tracked.
Narrative answers: Answers to all open-ended questions.

Interpreting Data

Examining the demographic breakdown of quantitative ratings will help you determine whether there is a pattern of group differences in your performance evaluations.

Looking at the indicators of bias will help you understand precisely how bias is playing out at your organization and provide a path forward.

Quantitative ratings: In an organization where bias is not playing out in performance evaluations, we would expect the average ratings for each demographic group to be roughly equal. If the ratings at your company differ across groups, that could be evidence of bias.

Narrative data: What managers write in evaluations is also important. For example, in a company with prove-it-again bias, we would expect to see white men being described as valuable assets to the company even when they had lower ratings than women or people of color. A company with tightrope bias would see more personality comments for women and people of color, while white men would be more likely to receive praise for leadership skills.

As you interpret your qualitative data, it is important to keep in mind that the objective is not to call out specific managers or even to spot bias in individual evaluations. Instead, we look for patterns that typically only become evident when we read a group of evaluations together.

Acting on Data

Depending on the pattern(s) you see in the pre-intervention data, there are two key areas of focus for your structural intervention:

· Revising the written materials connected to performance evaluation
· Providing training to help people combat bias

If your process includes the following, we have Bias Interrupters for them, too:

· Self-evaluations
· Calibration meetings

We have curated Bias Interrupters for each area of focus. Read through the drop-down menu below to determine which Bias Interrupters should be implemented at your organization.

Interpreting Post-Intervention Data

After implementing your chosen interventions, you will want to examine the impact of the changes you have made. There are a few key indicators you should be looking for:

Ratings: Compare your pre-intervention ratings to the post-intervention ratings.

Are you closer to equal ratings for groups? That is a good indicator that your intervention was impactful.

Are you seeing the same issues as before? That is a good indicator that you need to add more bias interrupters.

Are you seeing more, or different issues than before? Interrupting bias is an iterative process – you may need to make several rounds of changes. Consider the menu of options below, and decide whether you want to add in more bias interrupters to different parts of the performance evaluation process.

More level playing field across groups:
Compare your pre-intervention narrative results to the post-intervention narrative results.

Are you closer to equal numbers for groups? That is a good indicator that your intervention was impactful.

Are you seeing the same issues as before? That is a good indicator that you need to add more bias interrupters.

Are you seeing more, or different issues than before? Interrupting bias is an iterative process – you may need to make several rounds of changes. Consider the menu of options below, and decide whether you want to add in more bias interrupters to different parts of the performance evaluations process.

2. Implement Bias Interrupters detailed in the drop-down menu below.

Designing the Performance Evaluation Form

· Add an expected character or word count to text boxes.
This helps managers understand how much they should be writing for each employee and makes sure everyone gets around the same amount of feedback. Without it, managers have to guess how much they should be writing, leading them to write more for some rather than others.

· Begin with clear and specific performance criteria directly related to job requirements.
Try: “She writes maintainable code, tests her work thoroughly, offers clear and useful suggestions during code reviews, and communicates well with clients to gather requirements,” instead of: “She’s a great programmer.”

· Require evidence from the evaluation period that justifies the rating.
Try: “This year, he did a great job in helping us win X project, writing a clear client proposal that defined a tight scope and communicated our fee structure in a way that was carefully and strategically considered.” instead of: “He’s great at helping us win projects.”

· Consider performance and potential separately for each candidate.
Given the tendency for majority men to be judged on their potential while others are judged on their performance, the two criteria should be evaluated separately.

· Separate personality issues from skill sets for each candidate.
Personal style should be appraised separately from skills because a narrower range of behavior often is accepted from women and people of color. For example, women may be labeled “difficult” for doing things that are accepted in majority men.

Controlling for Bias in the Process of Filling out Form

Equality Action Center conducted an experiment with Dr. Monica Biernat at the University of Kansas examining the effects of reading our Identifying Bias in Performance Evaluations Guide. Participants completed reviews for hypothetical employees. Half of the participants were randomly assigned to a group that read the Bias Guide and listened to a brief audio recording summarizing the main messages; the other half received no further instructions.

Our findings indicate that reading the toolkit leads participants to give higher ratings, monetary bonuses, and promotion recommendations for both women and Black workers.

Before the next round of performance evaluations, have everyone on your team watch this short 2 minute video and read the Identifying Bias in Performance Evaluations Guide.

· Don’t eliminate your performance appraisal system.
Eliminating formal performance evaluation systems and replacing them with feedback-on-the-fly creates conditions for bias to flourish.

· Don’t accept global ratings without back-up.
Require evidence from the evaluation period that justifies the rating. Try: “In March, she gave X presentation in front of Y client on Z project, answered his questions effectively, and was successful in making the sale,” instead of: “She’s quick on her feet.”

In the performance evaluation experiment at a law firm (described above), we redesigned the form to focus on specific competencies that mattered to the organization and required that evaluators list 3 pieces of evidence to accompany every numerical rating. Doing so minimized the “halo-horns effect:” where white men are artificially advantaged by global ratings because they get halos (one strength is generalized into an overall high rating) whereas other groups get horns (one mistake is generalized into an overall low rating).

· Combat in-person favoritism.
With more companies transitioning to hybrid models of work, it is important to ensure that “face-time” in the office doesn’t translate to higher ratings on performance evaluations, quicker promotions, and increased compensation. Instead, when assessing employee performance, be sure to use output-based evaluation.

· Evaluations for remote/hybrid workers should be done through video conference or in-person.
To prevent any potential misunderstandings, it is important to have context such as facial expressions.

· Equip yourself and others involved in the evaluation process by keeping a copy of our Performance Evaluation Checklist nearby when writing and reviewing performance evaluations.

· Provide a bounceback.
Managers whose performance evaluations show persistent bias should receive a bounceback (i.e. someone should talk through the evidence with them).

What’s a bounceback? An example: in one organization, when a supervisor’s ratings of an underrepresented group deviate dramatically from the mean, the evaluations are returned to the supervisor with the message: either you have an undiagnosed performance problem that requires a Performance Improvement Plan (PIP), or you need to take another look at your evaluations as a group.The organization found that a few people were put on PIPs– but that over time supervisors’ ratings of underrepresented groups converged with those of majority men.The organization that used this found that all groups found performance evaluations equally fair.

Calibration Meetings

In many organizations, managers meet to produce a target distribution of ratings or cross-calibrate rankings.
Adding structure to these meetings can help you avoid common pitfalls detailed in our Harvard Business Review article.

1) Have managers read our Identifying Bias in Performance Evaluations Guide before they meet.

2) Pre-commit. Require all managers to fill out and submit their evaluations before they walk into the room. Registering responses in this way ensures that all managers feel empowered to speak up, and opinions won’t be swayed based on the evaluation of whoever speaks first.

3) Use a consistent rubric. Establishing key competency criteria will ensure that you evaluate each employee on the same job-relevant dimensions.

4) Stick to it. If the conversation strays away from the established competency criteria, steer everyone back to what is relevant. For example, if an employee’s personality is brought up, you can say “is this relevant to the rubric?”

5) Have Bias Interrupters play an active role. Have a trained Bias Interrupter in the room who can take responsibility for ensuring that the conversation sticks to the established criteria.

Writing an Effective Self-Evaluation

Some people feel more comfortable with self-promotion than others. This partly depends on how you were raised: some people were taught to be forthcoming about their accomplishments. Others grew up with the “modesty mandate”—to be self-effacing and underplay their accomplishments.

Individuals can
· Self-promote effectively by using our Writing an Effective Self Evaluation Guide, and handing it out to your reports (if you have them).

Managers and organizations can
· Level the playing field with respect to self-promotion by ensuring everyone knows they’re expected to do so and that they know how. Distribute our Writing an Effective Self-Evaluation Guide to help.

· Offer alternatives to self-promotion. Encourage or require managers to set up more formal systems for sharing successes, such as a monthly email that lists employees’ accomplishments.

CLICK TO DOWNLOAD FULL TOOLKIT

Equality Action Center. This work is licensed under a Creative Commons Attribution 4.0 International License.

Footnotes

Snyder, K. (2014, August 26). The abrasiveness trap: High-achieving men and women are described differently in reviews. Fortune. Retrieved from http://fortune.com/2014/08/26/performance-review-gender-bias/
Williams, J.C., Lewin Loyd, D., Boginsky, M., & Armas-Edwards, F. (2021). How One Company Worked to Root Out Bias from Performance Reviews. Harvard Business Review. https://hbr.org/2021/04/how-one-company-worked-to-root-out-bias-from-performance-reviews
Ameri, M., Schur, L., Adya, M., Bentley, F. S., McKay, P., & Kruse, D. (2018). The disability employment puzzle: A field experiment on employer hiring behavior. ILR Review, 71(2), 329-364. doi: 10.1177/0019793917717474
Pride and prejudice: Employment discrimination against openly gay men in the United States. American Journal of Sociology, 117(2), 586-626. doi: 10.1086/661653
Cuddy, A. J. C., Norton, M. I., Fiske, S. T. (2005). This old stereotype: The pervasiveness and persistence of the elderly stereotype. Journal of Social Issues, 61(2), 265-283. doi: 10.1111/j.1540-4560.2005.00405.x
Benson, A., Li, D., Shue, K. (2021). “Potential and the Gender Promotion Gap.” Working Paper.