And that’s where the world of investment banking sits today (EEOC, 2015), on top of stark imbalances that many banks are ready and eager to address.
For many businesses, the bias that causes and perpetuates this type of disproportionate landscape starts in the hiring process. Studies have shown that humans have a tendency to prefer people who are similar to us, known as the “like-me” bias. “Like-me” criteria can stretch from education and perceived social standing, all the way to race and gender. Paradoxically, it has been shown that the desire to hire those who mirror ourselves and our life experiences causes firms to sacrifice the creativity, inclusivity, and increased revenue that real diversity and representation brings. Firms that invest in increasing their racial, ethnic, and gender diversity are 15-35% more likely to have financial returns above their national industry medians.
If a firm only considers finance majors with 3.5+ GPA from the top 10 universities, they are likely only considering a relatively homogeneous population with very little dimensionality, leading to tons of selection bias. Instead, Suited allows firms to assess hundreds of characteristics that are distributed equally across racial, gender, ethnic, and socioeconomic groups. Using A.I., we are able to identify high-potential candidates with highly diverse backgrounds who have the raw characteristics required to be successful at each individual firm.
But, how can we ensure the machine learning models we create don’t themselves contain bias? In the world of A.I., it would not be absurd to assume that algorithms built in a vacuum of homogeny produce biased predictions. There are, however, scientific ways to mitigate bias and negate its adverse impact. Here’s how we do it:
As mentioned, we create unique prediction models for each partner we work with. To initiate this process, we collect data from employees who have worked there long enough to demonstrate their level of performance. We ask the employees to take our assessment, and then their managers provide a measure of employee performance, such as annual performance scores. In the aggregate, the data contains enough diverse employees to provide insight into the biasing factors, thereby allowing us to identify and at least begin removing bias.
However, we are not naive to the fact that the investment banking industry does not contain all the diversity data we need to produce bias-free models. For example, the investment banking analyst workforce is 41% female and 59% male. So, to develop technology to help solve the industry’s diversity problem, we use our existing data to programmatically generate “synthetic” data to balance out the lack of under-represented information present in our data sets. This new data is created by estimating attributes of the population in question based on the data we already have.
When training models, it is best practice to create balanced classes of sub-segments. For example, women are often underrepresented in the data we collect. Prior to building a model, we would generate a set of synthetic candidates that are similar to the existing set of female candidates until the proportion of men to women in the dataset becomes 1:1. We always strive to hit the 1:1 ratio with any gender, race, age group regardless of the percentage of the population they represent.
Basically, we figure out if there is a dominant population of people that is causing specific questions to produce bias results.
For example, those who are successful in fighting sports, like boxing, are likely to have a low variations, yet high values, on an attribute like aggression. If we trained a model to predict success in Mixed Martial Arts ("MMA"), it would almost certainly discriminate against anyone who comes from the Jain religion, which preaches a doctrine of peace and non-violence. Using a data science technique called a principal component analysis ("PCA"), we would pick up on the low standard deviation of aggression of those who are successful in MMA and identify that trait as a target for elimination.
Applied to the investment banking industry, some firms may find a similar trend. Let’s say a firm has a lot of high performing men that all score high on aggression. Without a PCA, the machine may be partial to aggressive men, and because we need to give everyone a fair shot, we would adjust or remove this trait from the model to not allow aggressiveness to impact the predictions.
If PCA is a mechanism for finding sources of adverse impact, then a hyper-parameter adjustment is a mechanism for fixing it.
We start by visualizing the data to help us understand what traits we measure may be unintentionally causing bias. This helps us easily spot potential problems that could cause the final algorithm to be biased against a group of candidates. To confirm if certain traits are causing bias, we will increase or decrease the prevalence of identified attributes and make an adjustment to the model's hyper-parameters. If we determine that an attribute is causing an adverse impact with statistical significance, we will train our models to weight this particular trait as less important.
Although a segment of the employee population may not be prevalent in the dataset, we can adjust the importance of underrepresented segments of data so the machine learning focuses on that set just as much as the more represented segments of the data.
For example, let’s say we don’t have enough data from African American women in the data set — not even enough to synthetically generate appropriate estimations (see above). To correct this, instead of artificially creating samples, we will tell the machine to assign more value to the female African American data in the algorithm. That way, the machine knows that the predictions associated with that data are equally important to other populations that are more represented.
When we produce a final model for our partners to use in their recruiting efforts, it’s actually many models built on top of each other. We need to teach the A.I. involved in making predictions to look at attributes that are both predictive for specific firms and predictive industry-wide. Again, most firms have bias, but they are likely not all biased in the same way. If we use additional models based on aggregated data, we are more likely to reduce bias.
This works especially well with something like where a candidate went to college — we have so much data that tells us that where a candidates or employee studied does not significantly impact their performance at work. So, even if a firm has historically hired from mostly Ivy League schools, the aggregate data we feed the machine will outweigh the bias towards these universities.
We test a model over and over to make sure no discrimination is taking place against a certain group or groups of people. We will never deploy a model that does not meet the Equal Employment Opportunity Commission technical guidelines.
While our methods for de-biasing predictions have proven to be effective thus far, and are a substantial improvement over traditional candidate selection processes, there is always more that can be done to foster a fair and inclusive recruiting and selection process. For example, at the moment we only consider a binary conception of gender — male and female. Our goal is to provide everyone, regardless of sexual identity, a fair shot in the industries we serve, and will work to be inclusive of those who identify as queer, transgender, and non-binary. Additionally, we plan to one day help to eliminate biases that affect talented individuals post-hire.