Below is a more in-depth chapter-by-chapter account of my book review “Data Scientist – The Definitive Guide to Becoming a Data Scientist by Dr Zacharias Voulgaris”.
In chapter 1 Voulgaris starts with an introduction to Big Data, because he links the profession of the Data Scientist very closely to the rise of “Big Data”. He sets out the four Vs of Big Data (Volume, Velocity, Variety and Veracity). We have written about these before: The Vs and Ss to conquer to make ‘Big’ data smart. Data Science exists as a result of the new challenges presented by Big Data (and its Vs and Ss) for turning data into insights, as Voulgaris argues. Next he highlights some of the open doors regarding industries where big data plays a role (after all, data and big data have value to offer anywhere), then gives an interesting overview of the origin of the notion of Data Science. The term Data Science first appeared in the spotlight in 1996 at the “Data Science, Classification and Related Methods” conference in Kobe, Japan – there’s something you probably didn’t know! The ideas behind Data Science had been around for much longer and so the full timeline is explained in more detail in chapter 2. The timeline starts in 1962 with the leading statistician John Tukey and his book The Future of Data Analysis. Incidentally, this historical overview is also almost identical to a 2013 article in Forbes. It is not clear from references who was inspired by whom (Voulgaris <-> Forbes?), but it certainly is an interesting summary of the origins of the term and the hype surrounding Data Science. What follows is a brief but significant introduction to Big Data terms such as MapReduce, Hadoop, text analysis, programming languages and alternative database structures, some of which are explained in more detail in later chapters.
Chapter 2 ends with a description of what I consider to be one of the most important characteristics of a Data Scientist: The mindset of a Data Scientist. Data Science requires a systematic approach, and the Data Scientist needs to combine imagination for the problem with a sufficient level of pragmatism. Put more simply: Technical skills alone are not enough, it’s about the Data Scientist building an extremely clear picture of the problem, breaking it down into bite-sized sub-questions and resolving them one-by-one with the right balance between speed and accuracy. The ability to collaborate in a multidisciplinary team is therefore another crucial facet of a good Data Scientist. Chapter 4 breaks down the most important character traits of a Data Scientist in more detail: Curiosity, willingness to experiment, creativity combined with working systematically and, finally, communication. With this point, I think Voulgaris gets to the essence of what a good Data Scientist is and mentions traits that are much more tricky to train than the hard qualities and skills that Voulgaris describes in the rest of chapter 4 and also in chapter 5. Voulgaris therefore makes a distinct addition to, for example, Conway’s Venn Diagram, which still mainly focuses on the “hard” knowledge of programming, methods and techniques and subject knowledge:
I actually skipped over chapter 3, but it does actually contain an interesting classification of Data Scientists into different types. Voulgaris makes this classification with the help of insights from a 2013 study by Harris, Murphy and Vaisman. This study, on the basis of research involving approximately 250 international Data Scientists, reveals four varieties of Data Scientist:
And then Voulgaris adds one more type of his own:
The book that Voulgaris has written up to and including chapter 5 is absolutely fascinating for anyone interested in the field of Data Science. I won’t look so much at some of chapters following chapter 5. These chapters are particularly valuable for those about to become Data Scientists themselves:
The chapters above contain useful tips and references to other sources. As a proper introduction to tools and languages it really only summarises them – but it does provide a good overview. The overview of MOOCs (Massive Online Open Courses – free/cheap online training courses) is comprehensive but, due to the rapid pace of developments in this field, it is already partially out-of-date. For example, Coursera had its top position in the providers of MOOCs but has already lost this as Udemy, DataCamp, Code Academy and other have now gained a much greater market share. The same is true of the summary of R packages. Voulgaris did include a list of good packages for machine learning, but the field is still developing at lightning speed and there already new, better or more user-friendly packages available. This is unavoidable when writing a book and the overview of packages still does serve as a good starting point for new Data Scientists to learn the ropes in Data Science.
Chapter 12 is extremely useful for current data professionals as this is where Voulgaris describes the transition to Data Scientist from the perspective of various different roles (programmer/software developer, statistician/data miner, IT specialist/business intelligence specialist, new starters, and so on). Where are the greatest challenges? What strength does a professional offer with their expertise?
Chapters 13 to 15 are about finding a job in the Data Science field: Where should you look? (13) How should you present yourself? (14) And: As a freelance Data Scientist. (15). I find that quite a lot of open doors, but it may be worthwhile for a new starter to go through it. Finally, Voulgaris concludes with a number of real Data Science vacancies (chapter 18) and interviews with real Data Scientists (chapter 17). Both of these certainly add something extra for the reader. The interviews are particularly valuable, and emphasise Voulgaris’s main points relating to the Data Scientist’s mindset. Both the importance of answering the right question by first asking the questions and the major importance of creativity in the role of Data Scientist both now and in the future.
Click here to go back to the start of this book review.
Do you want to know more about this subject? Please contact Jurriaan Nagelkerke using the details below