Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know these studies are common (probably because they are easy to conduct) but they don't measure the intrinsic bias of the interviewer.

What they measure is a combination of the intrinsic bias of the interviewer, and the statistical inference that the interviewer does based on race/gender (statistical discrimination).

Because a resume is a noisy signal, it is still possible that race/gender contains extra information even after you have seen the resume. For example, suppose people tend to exaggerate, and this exaggeration introduces some randomness. Given the signal "lead some impressive project", there is some probability that the person didn't really lead the project. Now if the probability of exaggerating is the same, but a Black person or woman was less likely to have lead the project a priori, then even after observing the resume that claims to have lead the project, the a posteriori probability of having lead the project is lower for the Black person or woman.



how do you not understand that "statistical discrimination" is a subtype of "intrinsic bias"?


Many people, perhaps not you, are interested in distinguishing between discrimination based on Bayesian inference, and discrimination that isn't, and is instead, for example, based on incorrect views or inherent dislike of certain groups. You might say that both are equally bad. I suspect that if you were given the opportunity to argue that a particular form of discrimination is not justified by Bayesian inference, you would do so. It's only because I'm suggesting that it's possible discrimination is justified by Bayesian inference, that you are (rudely) objecting to the distinction.


I thought (and it occurs to me I have no evidence for this and don't know where the belief came from) that probability of exaggerating was lower for women and black folks.


Responding to your highly inflammatory hypothetical:

You have gone from a nostrum about an entire population ("...a Black person or woman was less likely to have lead the project a priori..."), which could have included thousands-to-millions of people, to a statement about one particular individual.

The grossest error of this way of thinking is that it is mixing a vague, dubious, and unquantified signal (your a priori "knowledge") with a very high-quality signal (a specific and verifiable statement made by a single person about a single project).

If you're really proposing to do some kind of "Bayesian" weighting of these two pieces of knowledge, you're trusting your machinery for assessment of probabilities way too much. That a priori knowledge is junk compared to the statement on the resume.

Or, to look at it the other way round: If you're so well-calibrated that you're taking population-wide information into account, I shudder to think what you must be doing with other side information like the font, page layout, semicolon count, or paper composition. Lump it into the prior! What could possibly go wrong?!

I must add that you're deploying a hyper-logical argument in a real-world situation in what is honestly a stupid fashion. Nobody who does real-world inference should operate this way.


> I shudder to think what you must be doing with other side information like the font, page layout, semicolon count, or paper composition.

You joke, but one of the best predictors of being accepted to (a particular) graduate business school (while I was still working in admissions) was to simply look at the style, formatting, grammar, etc of their resume.

It's likely that with a large enough corpus, you probably could extract some meaningful signal out of just that information.


We are talking about academic studies, not how I would personally act.

You are asserting that the signal from race/gender is very noisy and the signal from the resume is very precise.

We can debate the precision of the signal from the resume, but race at least is highly predictive of many objective qualities, e.g. it is highly correlated with IQ. So what you call a vague, dubious, and unquantified signal is actually a highly informative signal.


> You are asserting that the signal from race/gender is very noisy and the signal from the resume is very precise.

> We can debate the precision of the signal from the resume, but race at least is highly predictive of many objective qualities, e.g. it is highly correlated with IQ. So what you call a vague, dubious, and unquantified signal is actually a highly informative signal.

You’re only making a very short statement, so I don’t know what you personally think. However, the statement is imprecise enough that others may mistake what you mean for the following fallacy.

Let’s say we have two kinds of Sneetches: Those with stars and those without stars. A star is highly correlated with success taking a certain type of test that we’ll say measures “Scintillence.” I am interviewing Sneetches for a job where scintellence is also highly correlated with competence. I ask for Sneetches with five years of experience doing this job.

Now: Should I refuse to interview Sneetches without stars, because not having a star correlates with less success in the scintellence test, which then correlates with less competence in the job?

The trap that many fall into is saying that since there is a correlation in the general population of Sneetches, we can draw inferences about the Sneetches applying for this particular job. However, we are dealing with the subset of Sneetches who have already demonstrated their aptitude for the job by having five years of actual experience competently performing a job that correlates with scintellence. We are not selecting Sneetches at random from the general population, we are using a combination of self-selection (“apply for this job if you have a desire to do this job”) and external filtering (“apply for this job if you have five years of experience doing this job.”)

The presence or lack of a star on a Sneetch may be highly informative about their ability to do this job if we pick Sneeteches at random, but that’s not what we’re doing here, so no, it isn’t highly informative for the purpose of choosing whom to interview.

Summary:

The presence or lack of a star may be highly informative if we have no selection pressure on the sample, but when we apply other filters that are themselves correlated with the attribute that interests us, it loses its ability to inform us.


I notice you went right from race being correlated with IQ to race being predictive of IQ. Unsurprising.


This is a discussion on probability and statistics, so I was using the technical terms. In statistics if A and B are correlated, then A predicts B. But you're just a liberal who assumes everyone who isn't is dumb. Fuck you.


Hey, there's another thing that's very, very predictable: That a person whose argument is making bigoted remarks and claiming they're neutral statistical results very quickly devolved into saying 'liberals are stupid!'


You say Bayesian inference, I say bigotry.


I think unfortunately Bayesian inference often does mean bigotry. And bigots aren't necessarily always wrong.


I think what would really help these kinds of discussions is to first agree on whether we're discussion a 'should' or an 'is'.

In the latter case, there are many conclusions one can draw that are considered racist, misogynist, or discriminatory in any other way. And this knowledge can be valuable for study, or other ways.

In the former case, however, all of that doesn't necessarily matter, partly because we assume the reasons for things being as they are, are a result of unfair processes, and partly because we prefer to give the individual the benefit of the doubt over trusting on the statistics that would put this individual in a group that he might not really be part of.

I often feel that both types of discussions are significantly harmed by conflating the two.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: