Sit in any Silicon Valley coffee shop and you’ll soon overhear someone talking about data analytics. What is it and who’s crunching those numbers? It’s the job of a data scientist to finding patterns in large amounts of data and connect them to real-world decisions.
Image adapted from Nemo (Pixabay)
Looking for trends in large pools of data seems a simple enough concept but is somewhat more difficult in execution. To learn about what a data scientist does, we spoke with Dan Mallinger, who works with Think Big’s Data Science Practice and uses his academic background to help consult business and engineering decisions.
Tell us a little about yourself and your experience.
I’m a data scientist with degrees in mathematical sciences and organisational psychology; I also have significant academic training in computer science and sociology. I’ve spent my career in statistics, analytics and technology roles but almost entirely under business groups, which has framed much of my professional outlook. Today, I am the Director of Data Science for Think Big and have been with the company for four years.
What drove you to choose your career path?
I’ve had a curiosity about human organisation and a love of quantitative decision making for most of my life. In college, I was the student who wanted to apply Axelrod and Hamilton’s work to sociology and social psychology to game theory courses. Working to help businesses become data driven seems to be a professional extension of that identity.
How did you go about getting your job? What kind of education and experience did you need?
Professionally, I moved through different roles doing statistics, research, and technology early in my career. I did work similar to what I do now, namely analysis of real world data with open source technologies, but before the term “data scientist” existed. From there, I got into big data in 2010. But it took me a year or so to really appreciate what Hadoop and similar tools could do for data science. After that I crossed paths with Rick Farnell, president of Think Big, who got me very excited to build a data science team in professional services because of the impact data science is making in the enterprise. While my statistics and technology experiences were critical, I think my education in social sciences and experience working with business teams were most critical to my role. They enable me to think through challenges, consider the agency behind the maths, and do so with an eye towards operationalisation within the organisation.
What kinds of things do you do beyond what normal people see? What do you actually spend the majority of your time doing?
Most people have heard of “data wrangling” and now know it is a significant part of executing data science. However, many folks are not aware of how cross-functional data science is and how much time is spent aligning business, analytics, and technology teams. Especially in the enterprise, where teams have multiple competing agendas, getting these groups to speak the same language and align priorities is a significant part of the job.
What misconceptions do people often have about your job?
The signal biggest misconception in data science is that it’s all about “algorithms”. I constantly run into people and would-be data scientists who think our work is about deciding between a neural net and a support vector machine. In truth, data science begins by translating a business case into an analytics agenda. Much more time is spent developing hypotheses, understanding data, exploring patterns, and measuring impact than selecting algorithms.
What are your average work hours?
Data scientists are professionals and should expect a professional work week. Nowadays, that seems to be 60 hours per week.
What personal tips and shortcuts have made your job easier?
Two tips make life doing data science easier for our teams: First, keeping an internal blog where daily results (even with visuals) are quickly recorded. These aren’t formal write ups, but casual documentation of insights over time that support a common understanding of insights and data across data science, project managers, etc. It also supports other scientists looking at the same data in later months.
A second tip is to make a “runbook” after doing any modelling. This is documentation of what models are run, why they were developed, and how to repeat any analysis done. This ensures our work is repeatable, even by yourself. It’s easy to forget an analysis from three months ago when you are busy.
What do you do differently from your coworkers or peers in the same profession? What do they do instead?
I spend less time chasing new technologies than many of my peers. Instead, I focus on a core set with which I am familiar. Today, tools like Hive over Hadoop, R, and Python get me very far. I’ve watched teams lose countless cycles trying to do something the “new” way — spending more time trying to get new technology to work rather than innovating on approach. It’s a delicate balance, but I try to wait until I see a sensible application of new tools without waiting until I feel the pain of my existing tools falling short.
What’s the worst part of the job and how do you deal with it?
As a data scientist, the most frustrating thing is to build models or do work that do not become part of the ongoing processes of the organisation. While a certain amount of data science is R&D, we want our work to be meaningful and to be used by the organisation. The canonical example is the Netflix prize, which was never implemented as it was considered too costly (though certainly has importance within the profession). To deal with this, we have checklists we cover before starting a project. These ensure that we understand the business case, there are key performance indicators (KPI) tied to the outcomes, and that there is a path to operationalisation to ensure our work is integrated and lasting.
What’s the most enjoyable part of the job?
I love seeing customers become data driven. Clients who now have models running, tools to support answering questions, and critically have developed meaningful processes to carry them from data to KPI to decision making. That’s the real goal of data science and it’s beautiful to see it in action.
Do you have any advice for people who need to enlist your services?
One of the things that is rarely talked about is how high the attrition rate (folks leaving their job) is within data science. While some of that can be explained by a competitive market, I’ve longed believed that much of the rate is due to companies hiring data scientists before they have a plan of how to use them or expecting data scientists to solve business woes from a bubble. I frequently see data scientists in client organisations who sit in technology groups making models that never get used in a meaningful way. And I have seen these groups dissolve in their lack of mission.
You don’t hire a plumber to build you a house; you expect them to work with other professionals, to even be guided by the architects. Similarly, don’t land a data scientist and expect them to build you a business. Your job role most likely looked for statistics and technology skills. Have a goal and a plan to join those skills with your business drivers before you even start hiring.
What kind of money can one expect to make at your job?
It certainly varies but is a well-paid role. Even first year data scientists often make over $US80k. Seasoned data scientist salaries vary by where they sit in the organisation. Those in technical roles leading teams can certainly make more than double that. But the highest paid data scientists are those who have learned to work in business roles, similar to how analytics is typically structured in enterprises. Those can make up to $US400k.
How do you move up in your field?
There are multiple paths. Some data scientists sit under the technology organisation (more common for those in the big data space) and have a similar growth path as many engineers- promoted into team management. Others work under business (similar to how traditional analytics is structured in enterprises) and may grow to management, ownership of solutions and products, etc. I don’t know if we’ve seen many promotion paths from this new field into Chief Analytics Officers yet (at least in large companies) but I suspect they will come from the business side.
What do your clients under/over value?
They undervalue the importance of clearly defined and communicated KPI. These measures of throughput, not output, are the most likely thing that data scientists will be able to measure and communicate with regards to model impact. In enterprises, the relationship between throughput and revenue is complex and slow to evaluate. Having clearly defined KPI centres the communication between data science and business, creates clear missions and objectives, and is core to being data driven. It also helps data scientists to answer an often asked question: “When do I stop iterating a model?” When model performance is more than a percentage or an error rate, when it is a KPI, one can clearly identify success, or alternatively, when one is spinning their wheels.
What advice would you give to those aspiring to join your profession?
Spend as much time learning analytics communication as learning models. The popularity of machine learning has led to lots of data scientists who hunch over a computer analysing data but can’t communicate the results. I’ve seen data scientists attempt to explain results by trying to teach C-levels what a random forest is (with obvious fallout). Communicating analytics isn’t about teaching your CEO to be a data scientist, it’s about interpreting models and relating them to outcomes that matter. Sadly, even statistical methodologies related to this, like sensitivity and robustness analysis, have been forgotten as “the algorithm” reigns in many data science curricula.
Career Spotlight is an interview series on Lifehacker that focuses on regular people and the jobs you might not hear much about — from doctors to plumbers to aerospace engineers and everything in between.