Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Their method of identifying genders is to use a data source of Github user emails and cross reference those with a Google+ account. Then they scrape the Google+ account and attempt and try to automatically determine the gender. Using this method, they are able to identify only 35% of the users involved in these PRs.

I find this suspect, because, anecdotally, I can look almost any author of a PR and determine their gender to a high degree of certainty. Probably well over 90% of the time, just from their name + profile picture + handle. Try it for yourself..look at the latest commits on a random project and see how obvious the genders are most of the time.

So the claim that "when a woman is identifiable, PRs are merged less" is totally suspect, because they themselves can only identify the genders of a small percentage relative to what a normal human can identify. If people can identify the genders way more often and accurately, then the claim being made is bogus. Perhaps there is a correlation of strongly signalling your gender (to the point where an inaccurate method of gender-identification has no problem) to being a below average developer.



If you make to the end of the article, they do account for exactly this issue by comparing cases where they could find an out-of-band gender indication (google+) but where the name was not identifiable via github name/profile.

From the article:

> For gender-neutral profiles, we included GitHub users that used an identicon, that Michael’s tool could not infer a gender for, and that a mixed-culture panel of judges could not guess the gender for.


Only 35% percent of the accounts have their gender listed in a linked Google+ account. Checking someone's social media profile is a relatively sure way of automatically determining the gender of a lot of people. The authors did use another automated tool to see if they could figure out the gender of users from their Github profile as well, which is something they needed for the second part of their analysis. They don't specify how accurate that procedure was, so it's possible that they are more accurate than you think.

Of course, there is still the issue that they have effectively limited their sample to people with Google+ accounts which may affect the results of the study. Given that men's acceptance rate also dropped when their gender was identifiable (but not by as much) gives credence to the idea that there might be a flaw in their Github profile analyzer.


Well, they used two steps. First they identified a sample set of contributors that have self identified, thus validated to some extent, their gender.

Further down they then distinguish between contributors where the gender can be inferred from looking at their name & profile picture. Splitting the group of those 35% which were identified via Google+ into two separate groups - identifiable vs. non-identifiable.


You have talked me into not reading this.

Which leaves me frustrated. It seems I want things that mostly do not exist.


Not to mention it doesn't account for biological sex.


This is irrelevant, I think.


You thinking it does not magically make it so.


I'd like to hear your reasoning for why it's relevant enough to have added value to the results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: