# Three hundred programming interviews in thirty days

## By Ammon Bartram on Jun 13, 2015

We launched Triplebyte one month ago, with the goal of improving the way programmers are hired. Too many companies run interviews the way they always have, with resumes, white boards and gut calls. We described our initial ideas about how to do better than this in our manifesto. Well, a little over a month has now passed. In the last 30 days, we've done 300 interviews. We've started to put our ideas into practice, to see what works and what doesn't, and to iterate on our process. In this post, I'm going to talk about what we've learned from the first 300 interviews.

I go into a lot of detail in this post. The key findings are:

1. Performance on our online programming quiz is a strong predictor of programming interview success
2. Fizz buzz style coding problems are less predictive of ability to do well in a programming interview
3. Interviews where candidates talk about a past programing project are also not very predictive

## Process

Our process has four steps:

1. Online technical screen.
2. 15-minute phone call discussing a technical project.
3. 45-minute screen share interview where the candidate writes code.
4. 2-hour screen share where they do a larger coding project.

Candidates work on their own computers, using their own dev environments and strongest languages. In both of the longer interviews, they pick the problem or project to work on from a short list. We're looking to find strengths, so the idea is that most candidates should be able to pick something they're comfortable with. We keep the list of options short, however, to help standardize evaluation. We want to have a lot of data on each problem.

We're looking for programming process and understanding, not leaps of insight. We do this by offering help with design/algorithm of each problem (and not penalizing candidates for this). We evaluate interviews with a score card. For now we go a little overboard, tracking the time to reach a number of milestones in each problem. We also score on understanding, whether they speak specifically or generally, do they seem nervous, and a bunch of other things (basically everything we can think of). Most of these, no doubt, are horrible measures of performance. We record them now so that we can figure out which are good measures later.

## Screening

The first experiment we ran was screening people without looking at resumes. Most job applicants are rejected at the screening stage. The sad truth is that a high percentage of the people applying for any job post on the Internet are bad. To protect the time of their interviewers, companies need a way to filter people early, at the mouth of the hiring funnel. Resumes are the traditional way to do this. However, as Aline Lerner has shown, resumes don't work. Good programmers can't be reliably distinguished from bad ones by looking at their resumes. This is a problem. What the industry needs is a way to screen candidates by looking at their actual ability, not where they went to school or worked in the past[1]. To this end, we tested two screening steps:

1. A fizzbuzz-like programming assignment. Applicants completed two simple problems. We tracked the time to complete each, and manually graded each on correctness and code quality.
2. An automated quiz. The questions on the quiz were multiple choice, but involved understanding actual code (e.g., look at a function, and select which of several bugs is present).

We then correlated the results of these two steps with success in our subsequent 45 minute technical interview. The following graph shows the correlations after 300 interviews.

We can see that the quiz is a strong predictor of success in our interviews! Almost a quarter of interview performance (23%) can be explained by the score on the quiz. 15% can be explained by quiz completion time (faster is better). Speed and score are themselves only loosely correlated (being accurate means you're only slightly more likely to be fast). This means that they can be combined, into what we're calling the composite score, which has the strongest correlation of all and explains 29% of interview performance![2].

The fizzbuzz-style coding problems, however, did not perform as well. While the confidence intervals are large, the current data shows less correlation with interview results. I was surprised by this. Intuitively, asking people to actually program feels like the better test of ability, especially because our interviews (the measures we're using to evaluate screening effectiveness) are heavily focused on coding. However, the data shows otherwise. The coding problems were also harder for people to finish. We saw twice the drop off rate on the coding problems as we saw on the quiz.

## Talking versus coding

Before launching, we spoke to a number of smart people with experience in technical hiring to collect ideas for the interviewing. The one I liked the most was having candidates talk us through a technical project, including looking at source code. This seemed like it’d be the least adversarial, most candidate friendly approach.

As soon as we started doing them however, I saw a problem. Almost everyone was passing. Our filter was not filtering. We tried extending the duration of the interviews to probe deeper and looking at code over Google hangouts. Still, the pass rate remained too high.

The problem was we weren’t getting enough signal from talking about projects to confidently fail people. So we started following up with interviews where we asked people to write code. Suddenly, a significant percentage of the people who had spoken well about impressive-sounding projects failed, in some cases spectacularly, when given relatively simple programming tasks.Conversely, people who spoke about very trivial sounding projects (or communicated so poorly we had little idea what they had worked on) were among the best at actual programming.

In total we did 90 experience interviews, scoring across several factors (did the person seem smart, did they understand their project well, were they confident, and was the project impressive). Then we correlated our factors with performance in the 45 minute programming interview. Confidence had essentially zero correlation. Impressiveness, smartness and understanding each had about a 20% correlation. In other words, experience interviews underperformed our automated quiz in predicting success at coding.

Now, talking about past experience in more depth may be meaningful. This is how (I think) I know which of my friends are great programmers_._ But, we found, 45 minutes is not enough time to make talking about coding a reasonable analog for actually coding.

## Interview duration, and interviewer sentiment

A final test we ran was to look at when during the interview we make decisions. Laszlo Bock, VP of People at Google, has written much about how interviewers often make decisions in the first few minutes of an interview, and spend the rest of the time backing up this decision. I wanted to make sure this was not true for us. To test this, we added a pop-up to our interviewing software, asking us every five minutes during each interview if the candidate is performing well, or poorly. Looking at these sentiments in aggregate, we can tell exactly when during each interview we made the decision.

We found that in 50% of our 45-min interviews, we decide (become positive for someone who ends up passing, or negative for someone who does not pass) in the first 20 minutes. In 20%, however, we do not settle on our final sentiment until the last 5 minutes. In the 2-hour interview, the results are similar. We decide 60% in the first 20 minutes (both positively and negatively), but 10% make it almost to the 2-hour mark. (In that case, unfortunately, it's positives turning to negatives, because we can't afford to send people we're unsure about to companies)[3].

## Conclusion

It's been a crazy month. Guillaume, Harj and I have spent nearly all our time in interviews. Sometimes, at 10 PM on a Saturday, after a day of interviewing, I wonder why we started this company. But as I write this blog post, I remember. Hiring decisions are important, and too many companies are content to do what they've always done. In our first 30 days, we've come up with a replacement for resume screens, and shown that it works well. We've found that programming experience interviews (used at a bunch of companies) don't work particularly well. And we've written software to help us measure when and why we make decisions.

For now, we're evaluating all of our experiments against our final round interview decisions. This does create some danger of circular reasoning (perhaps we're just carefully describing our own biases). But we have to start somewhere, and basing our evaluations on how people write actual code seems like a good place. The really exciting point comes when we can re-run all this analysis, basing it on actual job performance, rather than interview results. Doing that is why we started this company.

Next, we want to experiment with giving candidates projects to do on their own time (I'm particularly interested in making this an option, to help with interview anxiety), and interviews where candidates are asked to work with an existing codebase. We're also adding harder questions to the quiz, to see if we can improve its effectiveness. We'd love to hear what you think about these ideas. Email us at founders@triplebyte.com.

Thanks to Emmett Shear, Greg Brockman and Robby Walker for reading drafts of this.

An earlier version of this post confused the correlation coefficient R with R^2, and overstated the correlations. Since this blog was posted, however, a new version of the quiz has increased the correlation of the composite score to 0.69 (0.47 R^2)

1. This is a complex issue. There are good arguments for allowing experienced programmers to skip screening steps, and not have to continually re-prove themselves. At some point, track record should be enough. However, this type of screening can also be done in very bad ways (e.g., only interviewing people who have worked at top companies or come from a few schools). Evaluating experience is something we plan to experiment with, but for now we're focusing on how to directly identify programming ability.

2. It’s worth noting the error bars (showing 95% confidence intervals). The true value for each of the correlations in the graph falls in the range shown with 95% confidence. The error bars are large because our sample is small. However, even comparing the bottom of our confidence interval to Aline Lerner’s results on resume screening (she found a correlation close to 0), shows our quiz is a far better first step in a hiring funnel than resumes are.

3. We're not perfect, and we certainty reject great people. I always like to mention this when talking about rejections. We know this (and think it's true of all interview processes). We're trying to get better.

Back to Blog

# How to Interview Engineers

## By Ammon Bartram on Jun 26, 2017

We do a lot of interviewing at Triplebyte. Indeed, over the last 2 years, I've interviewed just over 900 engineers. Whether this was a good use of my time can be debated! (I sometimes wake up in a cold sweat and doubt it.) But regardless, our goal is to improve how engineers are hired. To that end, we run background-blind interviews, looking at coding skills, not credentials or resumes. After an engineer passes our process, they go straight to the final interview at companies we work with (including Apple, Facebook, Dropbox and Stripe). We interview engineers without knowing their backgrounds, and then get to see how they do across multiple top tech companies. This gives us, I think, some of the best available data on interviewing.

Read More

# Does it Make Sense for Programmers to Move to the Bay Area?

## By Mark Lane on Dec 14, 2016

If you’re a programmer considering a move to the Bay Area, you probably know at least two basic facts: 1) tech salaries are higher here than elsewhere, and 2) living here is really expensive. Both facts have been true for a long time, but they have become especially true in the past four years. Since 2012 home prices have risen by about 60% and rents by about 70% in both the San Francisco and San Jose metro areas. The absence of any apparent upper limit to these increases has given rise to a new journalistic subgenre, the Bay Area Housing Horror Story. Maybe you’ve heard about the cheapest house in San Francisco, a $350,000 “decomposing wooden shack” whose interior is “unlivable in its current condition”? Or the tent next to Google X that was renting for$895 a month? Or the guy on Reddit who calculated that it would be cheaper to commute daily to the Bay Area from Las Vegas by plane than to rent an apartment in San Francisco?

Read More

# Bootcamps vs. College

## By Ammon Bartram on May 19, 2016

Programming bootcamps seem to make an impossible claim. Instead of spending four years in university, they say, you can learn how to be a software engineer in a three month program. On the face of it, this sounds more like an ad for Trump University than a plausible educational model.

But this is not what we’ve found at Triplebyte. We do interviews with engineers, and match them with startups where they’ll be a good fit. Companies vary widely in what skills they look for, and by mapping these differences, we’re able to help engineers pass more interviews and find jobs they would not have found on their own. Over the last year, we’ve worked with about 100 bootcamp grads, and many have gone on to get jobs at great companies. We do our interviews blind, without knowing a candidate's background, and we regularly get through an interview and give a candidate very positive scores, only to be surprised at the end when we learn that the candidate has only been programming for 6 months.

Read More