50 Important Things You Need to Know About Data Science
According to IBM, the world generates 2.5 quintillion bytes of data every day. A decent chunk of those quintillion bytes is made up of people asking the experts how to break into and excel in the dynamic, lucrative field of data science. An even larger chunk of those bytes consists of convoluted, contradicting answers to that question.
This is, on one hand, a great thing. Multiple prominent data science innovators are out there giving you free advice on your most pressing questions, backed up by years of their experience and training. Add to that the plethora of graduate degrees now available online, for everything from data science to counseling to online EdD, and you’ve got a bonafide smorgasbord of educational options to jumpstart your entry into one of the fastest-growing careers of the century.
However, as you’ve probably noticed, the task of sifting through the glut of information to extract what’s actually relevant to you can be a tedious, time-consuming, even discouraging challenge. That’s why we’ve interviewed the internet to develop this handy list of 50 quotes from some of the leading minds in the field. Whether you’re thinking about getting your Master’s in data science or are already an established data wrangler, this is your opportunity to eavesdrop on some of the greatest data science influencers in thought-provoking (albeit time-lapsed) conversation. Some offer advice; others, humor. But the similarities between them are eyeopening... as are the differences. So read all the way to the end for a list of essential takeaways!
Who is a data scientist?
1. “By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo―starry-eyed explorers and skeptical detectives.” ―Monica Rogati, Independent Data Science Advisor 2. “‘Possessed’ is probably the right word. I often tell people, ‘I don’t want to necessarily be a data scientist. You just kind of are a data scientist. You just can’t help but look at that data set and go, ‘I feel like I need to look deeper. I feel like that’s not the right fit.’” ―Jennifer Shin, Senior Principal Data Scientist at Nielsen; Lecturer at UC Berkeley 3. “I think of data science as more like a practice than a job. Think of the scientific method, where you have to have a problem statement, generate a hypothesis, collect data, analyze data and then communicate the results and take action…. If you just use the scientific method as a way to approach data-intensive projects, I think you’re more apt to be successful with your outcome.” ―Bob Hayes, Ph.D, Chief Research Officer at Appuri 4. “As a data scientist, I can predict what is likely to happen, but I cannot explain why it is going to happen. I ca predict when someone is likely to attrite, or respond to a promotion, or to commit fraud, or pick the pink button ove the blue button, but I cannot tell you why that’s going to happen. And I believe that the inability to explain wh something is going to happen is why I struggle to call ‘data science’ a science.” ―Bill Schmarzo, Chief Technology Officer at Dell EMC 5. “Data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.” ―John Foreman, Vice President of Product Management at MailChimp 6. “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” ―Josh Wills, Director of Data Engineering at Slack 7. “As data scientists, our job is to extract signal from noise.” ―Daniel Tunkelang, Consultant / Advisor 8. “The job of the data scientist is to ask the right questions. If I ask a question like ‘how many clicks did this link get?’ which is something we look at all the time, that’s not a data science question. It’s an analytics question. If I ask a question like, ‘based on the previous history of links on this publisher’s site, can I predict how many people from France will read this in the next three hours?’ that’s more of a data science question.” ―Hilary Mason, Founder, Fast Forward Labs 9. “A data scientist does model-driven analyses of our data; analyzes to improve our planning, increase our productivity, and develop our deeper levels of subject matter expertise. A data scientist works at the tactical, operational, and strategic levels, sharing insights with the business.” ―Chris Pehura, Practice Director, Management Consultant at C-SUITE DATA 10. “[Data scientists are] able to think of ways to use data to solve problems that otherwise would have been unsolved, or solved using only intuition.” ―Peter Skomoroch, Former Principal Data Scientist at LinkedIn 11. “What sort of personality makes for an effective data scientist? Definitely curiosity…. The biggest question in data science is ‘Why?’ Why is this happening? If you notice that there’s a pattern, ask, “Why?” Is there something wrong with the data or is this an actual pattern going on? Can we conclude anything from this pattern? A natural curiosity will definitely give you a good foundation.” ―Carla Gentry, Data Scientist at Talent Analytics 12. “There is a saying, ‘A jack of all trades and a master of none.’ When it comes to being a data scientist you need to be a bit like this, but perhaps a better saying would be, ‘A jack of all trades and a master of some.’” ―Brendan Tierney, Principal Consultant at Oralytics
How do I become a data scientist?
13. “My number one piece of advice always is to follow your passions first. Know what you are good at and what you care about, and pursue that…. As a successful data scientist, your day can begin and end with you counting your blessings that you are living your dream by solving real-world problems with data.” ―Dr. Kirk Borne, Principal Data Scientist, Booz Allen Hamilton 14. “The No. 1 thing is you’ve got to have passion. This rich passion for going ruthlessly after the problem and being deeply intellectually honest with yourself about whether this is a reasonable answer….“The second part is having the ability to be extremely clever with the data. And what I mean by that is: You’re working with ambiguity. And very often you can’t approach the problem with the rigor you would a homework assignment. The only way to survive through that is by being clever—to think of a different question that gets at the answer.” ―DJ Patil, Former US Chief Data Scientist 15. “Nobody ever talks about motivation in learning. Data science is a broad and fuzzy field, which makes it hard to learn. Really hard. Without motivation, you'll end up stopping halfway through and believing you can't do it, when the fault isn't with you―it's with the teaching. Take control of your learning by tailoring it to what you want to do, not the other way around.” ―Vik Paruchuri, Founder, Dataquest 16. “I do not know how you teach someone to love to learn, but being self-motivated is integral to this field. Once you have the core concepts, to be able to be really excited about, and continue to seek out, new information is something that I look for, for example, when we are recruiting people.” ―Shelly D. Farnham, Ph.D., Executive Director & Research Scientist, Third Place Technologies 17. “You can best learn data mining and data science by doing, so start analyzing data as soon as you can! However, don't forget to learn the theory, since you need a good statistical and machine learning foundation to understand what you are doing and to find real nuggets of value in the noise of big data.” ―Gregory Piatetsky-Shapiro, Founder, KDNuggets 18. “The best way to become a data scientist is to do data science.” ―Daniel Tunkelang, Consultant / Advisor 19. “Learning how to do data science is like learning to ski. You have to do it.” ―Claudia Perlich, Chief Scientist, Dstillery 20. “Read as much as possible, and if you wish to become a Big Data Scientist, practice is vital.” ―Mark van Rijmenam, Founder, Datafloq 21. “As a data scientist, even if you don’t have the domain expertise you can learn it, and can work on any problem that can be quantitatively described.” ―Erin Shellman, Statistician and Data Scientist, Nordstrom Data Lab 22. “Data science’s learning curve is formidable. To a great degree, you will need a degree, or something substantially like it, to prove you’re committed to this career….“Classroom instruction is important, but a curriculum that is 100 percent devoted to reading books, taking tests and sitting through lectures is insufficient. Hands-on laboratory work is paramount for a truly well-rounded data scientist….“It should not degenerate into a program that produces analytics geeks with heads stuffed with theory but whose diplomas are only fit for hanging on the wall.” ―James Kobielus, Lead Analyst at SiliconANGLE Media, Inc. 23. “In my view, success for data science professionals relies on becoming trained and able data scientists with the ability to perform data processing and computation at a massive scale. To achieve this, professionals must invest time in ongoing education through institutions with multidisciplinary programs that include elements from engineering, mathematical sciences, and social sciences. Converting big data into meaningful information begins with skilled professionals who are educated in all disciplines to be both data scientists and statisticians.” ―Devavrat Shah, Professor at MIT’s Department of Electrical Engineering and Computer Science 24. “There is no bottleneck for data scientists….The bottleneck is very often for companies who don’t have a culture of working with data to actually cut down the process into the right steps.” ―Lutz Finger, Director of Data Science at Snap 25. “Once you have a certain amount of math/stats and hacking skills, it is much better to acquire a grounding in one or more subjects than in adding yet another programming language to your hacking skills, or yet another machine learning algorithm to your math/stats portfolio…. Clients will rather work with some data scientist A who understands their specific field than with another data scientist B who first needs to learn the basics―even if B is better in math/stats/hacking.” ―Stephan Kolassa, Data Science Expert at SAP Switzerland AG What makes a good data scientist?
26. “The thought process is the most important ingredient in data science.” ―Catalin Ciobanu, Senior Director Data & Analytics at Carlson Wagonlit Travel 27. “Being a data scientist is not only about data crunching. It’s about understanding the business challenge, creating some valuable actionable insights to the data, and communicating their findings to the business.” ―Jean-Paul Isson, Global VP Predictive Analytics & BI, Monster Worldwide Inc. 28. “Without a grounding in statistics, a Data Scientist is a Data Lab Assistant.” ―Martyn Jones, Managing Director at Cambriano Energy 29. “Having skills in statistics, math, and programming is certainly necessary to be a great analytic professional, but they are not sufficient to make a person a great analytic professional.” ―Bill Franks, Chief Analytics Officer at Teradata 30. “Talented data scientists leverage data that everybody sees; visionary data scientists leverage data that nobody sees.” ―Vincent Granville, Executive Data Scientist & Co-Founder at Data Science Central 31. “What makes a good scientist great is creativity with data, skepticism and good communication skills. Getting all of that together in the same person is difficult―because traditionally, different people follow different paths in their careers―some are more technical, others are more creative and communicative. A data scientist has to have both.” ―Monica Rogati, Independent Data Science Advisor 32. “Good data science is exactly the same as good science…. Good data science will never be measured by the terabytes in your Cassandra database, the number of EC2 nodes your jobs is using, or the volume of mappers you can send through a Hadoop instance. Having a lot of data does not license you to have a lot to say about it.” ―Drew Conway, Founder and CEO at Alluvium
33. “Critical thinking skills...really [set] apart the hackers from the true scientists, for me…. You must must MUST be able to question every step of your process and every number that you come up with.” ―Jake Porway, Founder and Executive Director of DataKind 34. “How do we start to regulate the mathematical models that run more and more of our lives? I would suggest that the process begin with the modelers themselves. Like doctors, data scientists should pledge a Hippocratic Oath, one that focuses on the possible misuses and misinterpretations of their models.” ―Cathy O’Neil, Author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
35. “With too little data, you won't be able to make any conclusions that you trust. With loads of data you will find relationships that aren't real... Big data isn't about bits, it's about talent.” ―Douglas Merrill, Founder & CEO at ZestFinance 36. “Data analysts who don't organize their transformation pipelines often end up not being able to repeat their analyses, so the advice I would give to myself is the same advice often given to traditional scientists: make your experiments repeatable!” ―Mike Driscoll, Founder & CEO at Metamarkets
37. “Great data scientists never assume they know something without in-depth analysis, they think in hypotheses which need to be either rejected or proved, and they ask a lot of questions, even if they are 99.9% sure they know the answer.” ―Karolis Urbonas, Head of Business Intelligence, Amazon Devices at Amazon 38. “For me, data science is a mix of three things: quantitative analysis (for the rigor necessary to understand your data), programming (so that you can process your data and act on your insights), and storytelling (to help others understand what the data means).” ―Edwin Chen, Data Scientist and Blogger 39. “The difference between a great [data scientist] and a good one is like the difference between lightning and a lightning bug.” ―Thomas C. Redman, Ph.D., “The Data Doc” at Data Quality Solutions How should I communicate data science findings?
40. “The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning…. Data-driven predictions can succeed―and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.” ―Nate Silver, Founder and Editor-in-Chief of FiveThirtyEight 41. “One of the big challenges of being a data scientist that people might not usually think about – is that the results or the insights you come up with have to make sense and be convincing. The more intelligible you can make them, the more likely it is that your recommendations will be put into effect.” ―Victor Hu, Head of Data Science at QBE Insurance 42. “A data scientist must possess the knack of being able to ‘identify business value from mathematical models.’ But that vital business value can only materialize if the data scientist also networks with other departments, understands their objectives, is familiar with their data and processes – and can spot the analysis options they provide.” ―Alexander Linden, VP of Data Science, Gartner 43. “People like simple explanations for complex phenomena. If you work as a data scientist, or if you are planning to become/hire one, you’ve probably seen storytelling listed as one of the key skills that data scientists should have. Unlike ‘real’ scientists that work in academia and have to explain their results mostly to peers who can handle technical complexities, data scientists in industry have to deal with non-technical stakeholders who want to understand how the models work. However, these stakeholders rarely have the time or patience to understand how things truly work. What they want is a simple hand-wavy explanation to make them feel as if they understand the matter―they want a story, not a technical report.” ―Yanir Seroussi, Independent Data Science Consultant & Entrepreneur 44. “You need to be able to take a dataset and discover and communicate what's interesting about it for your users. To turn this into a product requires understanding how to turn one-off analysis into something reliable enough to run day after day, even as the data evolves and grows, and as different users experience different aspects of it.” ―Amy Heineike, VP Technology, Stealth Startup What is the value of data and data science?
45. “We’re entering a new world in which data may be more important than software.” ―Tim O’Reilly, Founder, Chairman & CEO at O’Reilly Media
46. “Every business is or will become a digital business. Data scientists are key players in this process. The tools that exist today exist because of the talent of data scientists.” ―Ronald van Loon, Director of Advertisement 47. “What does it mean to live in an era where things and people are infinitely observed?....Just because you can’t measure it easily doesn’t mean it’s not important.” ―Jeffrey Hammerbacher, Founder and Chief Scientist at Cloudera 48. “Data is a precious thing and will last longer than the systems themselves.” ―Sir Timothy Berners-Lee, Inventor of the World Wide Web What is the future of data science?
49. “We should expect a ‘Big Data 2.0’ phase to follow ‘Big Data 1.0’. Once firms have become capable of processing massive data in a flexible fashion, they should begin asking: ‘What can I do that I couldn’t do before, or do better than I could do before?’ This is likely to be the golden era of data science.” ―Foster Provost and Tom Fawcett, Co-authors of Data Science for Business 50. “As the field grows, keep an open mind and evolve with it. Work hard, think outside the box, and learn as much as you can about the technical side of being a data scientist. Be responsible with the data and realize the potential the data can have to solving problems. Always ask yourself how the data can be used to positively impact the lives around you, and use that to guide your design and development.” ―Dr. Shanji Xiong, Chief Scientist at Experian’s Global DataLab Conclusions
Takeaways from this article:
Do data science because you love data. Weaker motivations won’t sustain you in the long run.
Learn data science by doing it. And then by doing it some more. The word “possessed” rings true here. Longer versions of these quotes reveal a common denominator in these influencers’ schedules: they spent, and continue to spend, a lot of time practicing their craft.
Don’t forget the importance of theory amid all that practice.
Don’t expect to learn data science overnight (or even over several nights). It’s a lifelong learning curve, especially as the field evolves so quickly.
Communicate. Learn how to tell a meaningful, understandable story with the insights you draw from your data. This is critical in just about any application of data science. So, to sum up the experts, data science as a field is fast-paced, diverse, challenging, and at times, even grueling.
Some opinions expressed in this article may be those of a guest author and not necessarily Analytikus. Staff authors are listed http://www.datasciencecentral.com/profiles/blogs/50-important-things-you-need-to-know-about-data-science