And that’s where the world of investment banking sits today (EEOC, 2015), on top of stark imbalances that many banks are ready and eager to address.
For many businesses, the bias that causes and perpetuates this type of disproportionate landscape begins during the hiring process. Studies have shown that humans have a tendency to prefer people who are similar to us, known as the “like-me” bias. “Like-me” criteria can stretch from education and perceived social standing, all the way to race and gender. Paradoxically, it has been shown that the desire to hire those who mirror ourselves and our life experiences causes firms to sacrifice the creativity, inclusivity, and increased revenue that real diversity and representation brings. Firms that invest in increasing their racial, ethnic, and gender diversity are 15-35% more likely to have financial returns above their national industry medians.
If a firm only considers finance majors with 3.5+ GPA from the top 10 universities, they are likely only considering a relatively homogeneous population with very little dimensionality, leading to selection bias. Instead, Suited allows firms to assess hundreds of characteristics that are distributed equally across racial, gender, ethnic, and socioeconomic groups. Using A.I., we are able to identify high-potential candidates with diverse backgrounds who have the raw characteristics required to be successful at each individual firm.
But, how can we ensure the machine learning models we create don’t themselves contain bias? In the world of A.I., it would not be absurd to assume that algorithms built in a vacuum of homogeny produce biased predictions. There are, however, scientific ways to mitigate bias and negate its adverse impact. Here’s how we do it:
Diverse Data Collection
Summary: We collect data directly from our partners
As mentioned, we create unique prediction models for each partner we work with. To initiate this process, we collect data from employees who have worked at our company partner long enough to demonstrate their level of performance. We ask the employees to take our assessment, and then their managers provide a measure of employee performance, such as annual performance scores. In the aggregate, the data contains enough diverse employees to provide insight into the biasing factors, thereby allowing us to identify and at least begin removing bias.
Synthetic Data Generation
Summary: We also programmatically generate data to correct imbalances
However, we are not naive to the fact that the investment banking industry does not contain all the diversity data we need to produce bias-free models. For example, the investment banking analyst workforce is 41% female and 59% male. So, to develop technology to help solve the industry’s diversity problem, we use our existing data to programmatically generate “synthetic” data to balance out the lack of under-represented information present in our datasets. This new data is created by estimating attributes of the population in question based on the data we already have.
When training models, it is best practice to create balanced classes of sub-segments. As mentioned above, women are often underrepresented in the data we collect. Prior to building a model, we would generate a set of synthetic candidates that are similar to the existing set of female candidates until the proportion of men to women in the dataset becomes 1:1. We always strive to achieve the 1:1 ratio in our dataset with any gender, race, age group regardless of the percentage of the population they represent in our partner's workplace.
Principal Component Analysis (PCA)
Summary: We determine if certain questions in our assessment are causing bias
Basically, we figure out if there is a dominant population of people that is causing specific questions to produce bias results.
For example, those who are successful in fighting sports, like boxing, are likely to have low variations, yet high values, on an attribute like aggression. If we trained a model to predict success in Mixed Martial Arts ("MMA"), it would almost certainly discriminate against anyone who comes from the Jain religion, which preaches a doctrine of peace and non-violence. Using a data science technique called a principal component analysis ("PCA"), we would pick up on the low standard deviation of aggression of those who are successful in MMA and consider eliminating the attribute of aggression from the model.
Applied to the investment banking industry, some firms may find a similar trend. Let’s say a firm has a lot of high performing men who all score high on the attribute of aggression. Without a PCA, the machine may be partial to aggressive men, and because we want to give everyone a fair shot, we would adjust or remove this trait from the model so as not to allow aggressiveness to impact the predictions.
Hyper-Parameter Adjustment (HPA)
Summary: If we determine that certain attributes we measure cause bias, we adjust them
If PCA is a mechanism for finding sources of adverse impact, then a hyper-parameter adjustment is a mechanism for fixing it.
We start by visualizing the data to help us understand what traits we measure may be unintentionally causing bias. This visualization helps us easily spot potential problems that could cause the final algorithm to be biased against a group of candidates. To confirm if certain traits are causing bias, we will increase or decrease the prevalence of identified attributes and make an adjustment to the model's hyper-parameters. If we determine that an attribute is causing an adverse impact with statistical significance, we will train our models to weight this particular trait as less important.
Summary: We compose a model that focuses equally on each segment of the population
Although a segment of the employee population may not be prevalent in the dataset, we can adjust the importance of underrepresented segments of data so the machine learning focuses on that set just as much as the more represented segments of the data.
For example, let’s say we don’t have enough data from African American women in the dataset — not even enough to synthetically generate appropriate estimations (see above). To correct this challenge, instead of artificially creating samples, we will tell the machine to assign more value to the female African American data in the algorithm. That way, the machine knows that the predictions associated with that data are equally important to other populations that are more represented.
Firm-Specific Model Ensembles
Summary: We are able to reduce a firm’s specific biases by incorporating industry-wide data into a firm’s unique model
When we produce a final model for our partners to use in their recruiting efforts, it’s actually many models built on top of each other. We need to teach the A.I. involved in making predictions to look at attributes that are both predictive for specific firms and predictive industry-wide. Again, most firms have bias, but they are likely not all biased in the same way. If we use additional models based on aggregated data, we are more likely to reduce bias.
This works especially well with something like where a candidate went to college — we have so much data that tells us that where a candidate or employee studied does not significantly impact their performance at work. So, even if a firm has historically hired from mostly Ivy League schools, the aggregate data we feed the machine will outweigh the bias towards these universities.
Summary: We rigorously test each model to ensure it does not cause adverse impact
We test a model over and over to make sure no discrimination is taking place against a certain group or groups of people. We will never deploy a model that does not meet the Equal Employment Opportunity Commission technical guidelines.