Gender bias is deeply ingrained in hiring and work.
In Australia, women on average earn 23 per cent less than men, are less often invited for a job interview and are evaluated more harshly.
Blind resume screening or blind hiring, which hides the applicants’ names from CVs during the application process, is one common strategy used to address this bias. Another uses machine learning (ML) or Artificial Intelligence (AI) to streamline at least part of the decision-making. After all, AI should be oblivious to human stereotypes.
Our research shows that strategies like ‘resume blinding’, which may work for human hirers, do not work for AI.
That AI replicates the bias inherent in its training data is already well known.
Amazon, for instance, presented an automatic CV screening system for engineering applicants. The system was subsequently revealed to be sexist (and swiftly deactivated). It had learnt an association between ‘maleness’ and applicant quality.
Our research takes a close look at the different levels of gender bias in algorithms in hiring.
We found that gender signals – much more subtle than a name – are ingested and used by AI, which becomes an increasingly pressing issue with powerful generative AI, like ChatGPT, on the rise.
We find that gender is so deeply embedded in our society – how we talk, where we work, what we study – that it is near-impossible to gender-blind a CV from AI and humans.
So, what does this mean? Well, even with our best intentions, the algorithm can pick up your gender. And algorithms that can pick up gender can use it to make predictions when it comes to the quality of an applicant.
The parenthood proxy
ChatGPT (currently, the most powerful language-based AI) is impacting so many facets of society at an unprecedented speed that developers have called for a halt to AI experiments to better understand its effects and consequences.
Our recent research looked at gender bias in ChatGPT in the context of hiring for a job and asked it to rate the CVs of applicants.
We constructed CVs for a range of different occupations. Our CVs were high-quality and highly competitive, but we made two important changes.
Firstly, we swapped the applicant’s name to signal whether a man or woman was applying for the job. Secondly, we added a parental leave gap for half of our respondents.
All of our applicants had the same qualifications and job experiences – but some were men, some were women, some were parents and some were not.
We showed ChatGPT the CVs and asked it to rate how qualified the person was for a job on a scale from zero to 100. We repeated this for six different occupations and 30 times per CV to ensure that our results are robust.
Here is where something interesting happened.
We found ChatGPT did not rank men’s and women’s CVs any differently. Regardless of our name change, the rankings of our scores were equivalent for the men and women applying for the job.
But, when we added in a gap for parental leave, we found that ChatGPT ranked our parents lower in every single occupation. This was true for fathers and mothers – a gap for caregiving leave told the algorithm that this person was less qualified for the job.
Gender bias might be scrubbed from ChatGPT’s predictive capacity, but parenthood is not.
Why is this important?
Even if ChatGPT is stopped from being gender biased by its developers, this same bias will sneak in through a different mechanism – parenthood. We know that women take on the bulk of caring work in our societies and women’s CVs are more likely to have parental leave gaps than men’s.
While we do not know whether ChatGPT will actually rate CVs in real life, our research shows how easily bias can creep into models and that it’s almost impossible for AI developers to anticipate all of those biases or combinations of variables which lead to bias – especially in complex models like ChatGPT.
The language liability
Imagine two CVs identical in all aspects but for the identity of the applicant: one was written by a man, one by a woman. If we hide the applicant’s identity from a hiring panel, will there be any room for discrimination?
Our research shows that the answer is a resounding ‘yes’ – if the hiring panel is AI.
Analysing the data of 2,000 CVs, we found men and women, in the same occupation, use subtly different language to describe their skills and education.
For instance, we found that women use significantly more verbs that evoke an impression of low power (like ‘assist’, ‘learn’ or ‘need’) than men.
Now, here’s the problem.
In a follow-up experiment, our team found that AI representations link these subtle language differences back to gender. It means that machine learning models can identify the gender in a CV even after the names and pronouns are removed.
If AI can predict gender based on language, it can then use it as a piece of evidence in CV rating.
A strong policy response
So, what does this mean as we start to integrate AI into our work lives?
Well, our research shows that while ‘blind resume screening’ may work for humans, it doesn’t for AI. Even if we drop all identifying language – the shes, hers and names – other language is signalling gender.
Careful auditing of biases can remove the most obvious layer of discrimination, but further work needs to be done on proxies that can lead to bias but may not be as obvious.
Here is where we need regulatory controls in response to this more nuanced understanding of AI’s capacity to discriminate and ensure everyone understands AI is not neutral or fair.
We all need to do our part to ensure AI that is fair and beneficial to everyone – including women and parents.
Banner: Getty Images