Evolución de la Alianza 1996-2004: principales resultados

JOSH WILLS

170

I ended up going to Duke University. The best part about Duke was that I got to take whatever math courses I wanted right away. My first course was graduate level topology. That was interesting because I was taking it with the other math freshman who were really good mathematicians. It became apparent to me relatively quickly that I while was good, the other freshmen were on another level altogether, which was very humbling. I think everyone runs into this at some point in life, and I felt relatively lucky that I encountered it during my freshman year because it gave me time to recover.

Anyway, I stuck with math, and I thought I was going to become a math professor. But I was also interested in many little side things- I did philosophy, economics for a while, and then became interested in cognitive neuroscience. I was lucky enough to do a Research Experience for Undergraduates (REU) fellowship at Carnegie Mellon the summer after my sophomore year, modeling road and spatial navigation. That was my first introduction to real programming in MATLAB, building large models to simulate brain function. That experience is what got me interested in programming in general. Did this push you to start taking programming classes at Duke as well?

Yes. I took Duke’s introductory courses in computer science and I learned how to program in C++. I never really studied algorithms or operating systems or other things computer science majors study. In my professional career, I’ve discovered all of these huge and embarrassing gaps in my computer science knowledge, usually during job interviews. At the start of my senior year, I decided to put the academic career on hold and go get a real job. I was interviewing with some startups and accepted an offer, but it was rescinded as part of the whole dotcom implosion thing that was happening in late 2000/early 2001. I wasn’t alone here, and Duke’s recruiting office was really great in helping folks find jobs elsewhere. I ended up getting a job in IBM’s Austin office. My first day was June 17th, 2001, and the week after I started, IBM announced a hiring freeze, so I suppose I slid in just under the wire.

IBM Austin had a hardware group that does chip design and system bring-up, which is where you hack early stage hardware to get around all of the bugs so that you can load and run an operating system. I was managing a MySQL database of test data for microprocessors. All in all, it was 15 gigabytes of data, which at the time seemed enormous, but now seems laughable — my phone has more storage than that whole volume of test data! I was building dashboards and running statistical analyses of machine

I was just completely enthralled with the beauty of mathematics, the same way a person would appreciate a beautiful painting or work of art.

JOSH WILLS

171

performance and chip performance; trying to predict how fast a chip would clock based on a number of measurements that were made during wafer fabrication. It was classic statistics, classic data analysis, and just learning how to program. To be honest, it was pretty dull, and I got bored with it fairly quickly. I also have this masochistic approach to achievement, and so sometimes I like to do things just to prove that I can do them, regardless of whether or not it’s actually a good idea or not. So in that vein, I applied for and got into the Operations Research (OR) graduate program at the University of Texas at Austin (UT). UT didn’t have a statistics department, which is what I actually wanted to study, and OR was as close as I could get without having to leave Austin, which was just a really great place to be at the time.

As an undergraduate, I didn’t take any statistics courses at all until my very last semester, which was really my blow-off semester. It was when I took music appreciation, introduction to logic (oddly enough, a philosophy course), and introduction to statistics. Intro stats was actually a requirement to graduate, but I felt like it was beneath me after all of the abstract algebra and hyperbolic partial differential equations. And the funny thing is that I completely fell in love with it. A lot of the philosophy and neuroscience stuff I was into were things involving epistemology and symbolic reasoning, about understanding how we can say that we know something to be true.

And statistics is about quantifying uncertainty and what we can’t know.

Precisely! It is the quantification of what is knowable and what is not. Here is your data, what can you say that you know? It was deeply appealing to me. Personally, that kind of stuff really winds my clock. I loved statistics. Fast-forward a couple of years, and now I’m at UT and taking a full graduate course load in OR. I did three courses a semester for two years to get my Master’s degree, while simultaneously working at IBM. That was a terrible idea. It was absolutely horrible. I had no life.

It sounds like you learned how to do the relatively simple statistical analysis at IBM and thought, “I want to expand my intellectual horizon.”

Very much so. My IBM introductory software engineering job was pretty easy, and I wrote a bunch of crazy Perl scripts that more-or-less automated my job. But I had this kind of residual itch from my statistics class and from seeing that statistics was actually pretty useful to people in the real world. My mental model at the time was that if you wanted to learn more about something, school was a pretty good place to do it, and so I went back to school.

A semester into my graduate program, I made another switch: I changed teams at IBM to be able to do some “real” programming, not just dashboards and Perl scripts. I switched

JOSH WILLS

172

to a team did very low level firmware programming in C++. This was basically writing firmware for hardware systems that didn’t fully work yet because they haven’t debugged all the circuits. I was working as part of a team and learning to use things like source control, write tests, all of those good practices that I never learned in school. More than anything though, the most useful skill I learned was how to debug black box systems. I was trying to run firmware on a piece of hardware that didn’t really work yet, and my job was to figure out a way to make that software run by hacking around whatever bugs I came across in the hardware itself. I didn’t know anything about hardware. I still don’t know anything about hardware. I can’t even program a VCR. I think that I became a software engineer because I can’t understand any system that I didn’t design myself. Anyway, the black box system is a piece of hardware that doesn’t work. I would give it an input, and it would not give me an output. I had to figure out a hack, some sequence of commands, that would cause this piece of hardware to begin communicating with the rest of the system. And this skill, the art of debugging something that you don’t understand at all, is maybe the most useful thing I learned there.

What did you end up learning through this experience of debugging black box systems?

I don’t think there’s any secret to it: I’m obsessive. I was one of those kids that played with Legos for five or six hours straight. I’m still pretty much like that. I was born in 1979, so I’m borderline millennial. It is unacceptable to me for a computer system to not do what I want it to do. I was willing

to beat on the black box hardware for whatever amount of time was required to make it do what I wanted.

I’ve had a few instances in my life where I have worked on a very satisfying problem. A satisfying problem is one where your technical skills are good, but the problem is just a little bit too hard for you. You’re trying to do something slightly more difficult than what you already know how to do, and that is great, great feeling. I can lose myself in those kinds of problems. That’s typically when my personal relationships tend to fall apart, because I’m not really paying attention to anything else.

There was this trend for awhile in data science job interviews to have candidates analyze real datasets during the interview. I’m a huge fan of this practice. I had one job interview where they gave me a problem and a dataset and two whole hours of quiet time to just sit and do data analysis. It was maybe the happiest two hours of my entire year. I should do more job interviews just so I can do that.

It is the quantification of what is knowable and what is not. Here is your data, what can you say that you know? It was deeply appealing to me.

JOSH WILLS

173

You had mentioned how, at one point of your college career, you were burned out from academia. One of the hallmarks of academia seems to be that once you’ve reached a certain point, you have the opportunity to spend all of your time obsessing over an open problem. Given that your personality type seems to fit that role, why wasn’t academia appealing to you anymore?

As a pseudo-millennial, I’m not just entitled, I’m impatient. I don’t think the requirements of academia were appealing anymore; there were large sets of things I would have to complete before I reached that point of being able to obsess over an open problem. Once you’re a graduate student, you work for a professor on that professor’s grant, doing largely what the grant says you’re supposed to do. Then, you do a post-doctorate for a couple of years and become an assistant professor. You go through that horror, and, after 10 years, you get tenure. It’s a really long time to wait before you can have that promise of obsessive problem solving fulfilled. Even then, I don’t feel the promise is fulfilled because you have to spend a lot of time working on grant proposals and managing your graduate students and postdocs.

Now I’m 35 years old. Time-wise, I may be roughly at that point in my career now. I have a really great job where I get to do what I want and do, whatever is interesting to me. But it’s also a be-careful-what-you-wish-for situation. The freedom to work on whatever you think is interesting is stressful because there’s no one else you can blame if you’re not working on the right thing or if you miss a technology shift that has a profound impact. Amr Awadallah (Cloudera’s CTO) wrote a blog post about what a chief technology officer does. He was comparing the CTO’s performance to CFO’s performance. The CFO is not responsible for making the sales numbers every quarter, but if there is a big surprise miss, the CFO gets fired. Similarly, the CTO is not responsible for shipping products on time, that’s what the VP of Engineering is for. But if the CTO misses a major technology shift, he or she gets fired.

I have a CTO-kind of job right now. I am free in my job to think about analytics, the future of data science, what exactly is coming down the pike. If I miss something, I should be fired because that miss could have profoundly negative consequences for Cloudera. There’s tremendous pressure that comes with that freedom. Now that I get that, it’s slightly horrifying. I have a fair amount of anxiety about it.

Can you talk a little bit more about what happened in between IBM and Cloudera? How did you get to this point?

JOSH WILLS

174

One of my professors worked with a local startup in Austin called Zilliant. I wanted a job focused on operations research, so my professor hired me to work as a data analyst there. There, I went back to SAS and R and started doing data analysis and building models for things like market segmentation and price elasticity.

When you come from academia, you tend to think the world is more interesting than it actually is, or that a problem is more complex than it is. The reason that price optimization hasn’t really taken off as a software discipline is because the primary pricing problem for Fortune 500 companies is to sell things for more money than it costs to make them. If

they don’t know how much it costs to make things, they can’t know how much they should sell those things for to ensure that they make a profit. It’s not rocket science. You don’t need a data scientist to do that. You just need good reporting.

Why is it that companies don’t know this bit of crucial information?

It seems like a fundamental component, and yet many of them do not actually know. The problem is incentives. The person who is selling the deal, the salesman, is going to get a commission, and his or her income depends on the commission. They’re putting together a package of things that are going to be sold as a part of the deal. There’s going to be some materials and professional services, that’s just text and contracts. These contracts get read and improved, but no one necessarily understands how much it’s going to cost to fulfill these contracts. There’s way too much variance. And people have a tendency to be very optimistic. They don’t think they’re going to have conflicts. They don’t think they’re going to have errors. They don’t think there are going to be hurricanes.

These aren’t trivial problems, but they’re also not the kind of problems that are amenable to the complicated data analysis techniques that you typically learn in graduate school. They’re very different kinds of problems.

They are simple problems. They’re simple but not easy. Losing weight is simple but not easy. Most industrial problems are simple but not easy.

So after Zilliant, did you make it your goal to attack the industry problems?

I like to be useful more than anything. I like to solve people’s problems. I like to be helpful. I’m a helpful person by nature. I enjoy abstractions. I enjoy art and weird stuff aesthetically, but I would rather have my day-to-day work be more focused on people’s problems and making their lives better. The beauty and the theory are never so appealing

When you come from academia, you tend to think the world is more interesting than it actually is, or that a problem is more complex than it is.

JOSH WILLS

175

that they manage to draw me away from the real problems.

You worked at a bunch of different startups before Google. Were you solving different initial problems at these startups? What prompted the shift to Google? It really took me forever to leave Austin. I could make a list of all of the bad financial decisions that I made because I was too afraid to leave Austin. I had a job offer from Google to be an engineering analyst, which I turned down in 2005. I turned down a data science job at Facebook in 2007. I try not to think about that one too much.

The thing that finally got me to San Francisco was auction theory. I was working on my PhD at UT and had taken some classes in game theory and mechanism design, and we covered auction theory. I absolutely loved it; it was beautiful math that could also be used to create socially optimal outcomes. I was really curious about how auctions worked in the real world, but there weren’t really any places in Austin where I could go design auctions for a living. I was fortunate that I had kept in touch with Diane Tang, who had tried to hire me at Google back in 2005 and was running Google’s ads quality team which was responsible for the ad auction. She’s now Google’s first and only female Google Fellow, but at the time, she was just my friend who hired me to go to Google and work on auctions full time. She has been an amazing mentor to me, one of the most important people in my career.

What was it like on Google’s ad quality team? Was that a confluence of smart people who had studied auction theory as well and then implemented it in the real world?

I think the thing to know about Google is that it is smart software engineers with no specific expertise designed most of the core systems. Eric Veach, who had a PhD in computer graphics but no machine learning experience, designed Google’s original machine learning system. Eric was tasked with the problem, read a book, and came up with a wholly new solution.

I remember when I first got to Google and read about how that system worked. It was the most brilliant and unique solution to the world’s first truly large-scale machine learning problem. His original algorithm was really clever and I’ve never seen anything like it published anywhere, and I don’t think we ever will because, of course, Google has now gone on to even more advanced machine learning systems.

Eric was also the person who designed Google’s original auction algorithm. Again, Eric

They are simple problems. They’re simple but not easy. Losing weight is simple but not easy. Most industrial problems are simple but not easy.

JOSH WILLS

176

is a graphics guy, he’s not an auction theorist. So he read a book about second-price auctions, and he came up with this very simple generalization that is called the GSP, the generalized second price auction.

I worked on a number of auction-related features and launches at Google. I really enjoyed it, but at the end of the day, the auction can only be as complicated as the understanding of the auction participants. Advertisers are wonderful, but they’re still just people, while the really interesting bidding strategies and auction models are so complicated and computationally intensive that they require serious software engineering chops just to participate in them. It wasn’t in Google’s interest to have an auction that was so complicated that no one besides auction theorists could appreciate it.

This seems to be emblematic of one of the differences between academia and industry. In academia you’re focused on getting the optimal solution. In the real world, you find that your implementation priority queue is dominated not only by optimality but also by feasibility and expedience. Was this shift hard for you to see and interact with?

I don’t think so. I was fairly lucky. Most of my graduate work in operations research was working on impossible problems. Operations research consists primarily of very hard problems where you cannot find the optimal answer. The job is to do the best you can, and I actually love those kinds of problems because the expectations are low. If the

In document Informe Global de Alianza para el Campo en el Estado de Veracruz Septiembre de 2005 (página 33-39)