September 27, 2016
by Francesco Gadaleta
Produced by: worldofpiggy.com
In this episode I want to point out the minimum required by a Data Scientist in terms of knowledge and technical skills. I will also explain why I think the job of the data scientist can disappear sooner than we think and what should data scientists do in order to survive.
So you are a data scientist, right? Data scientists have the sexiest job in US and Europe, soon also in Asia if not already. But how many real Data Scientists are out there? There are a lot of statisticians, who rebranded themselves as Data Scientist, many applied mathematicians, and also a lot of bioinformaticians.
In this episode I want to point out the minimum required or… well expected from a Data Scientist in terms of knowledge and technical skills.
Very briefly a data scientist should master concepts like clustering, decision trees, regression, neural networks, principal component analysis, singular value decomposition, naive bayes classifiers, etc. If you want to listen to some episodes that are dedicated to a specific machine learning method, feel free to leave your request in the comments at worldofpiggy.com or itunes
This knowledge can be automatized very easily, making Data Scientist the most vulnerable job, not the sexiest at all. That’s why there are some skills that a Data Scientist must have if he/she doesn’t want to become a useless resource in any company that make use of predictive analytics.
Knowledge of Algorithm complexity is mandatory. As data increases in size a quadratic algorithm will be slow and infeasible. Linear or log linear algorithms should be considered. This usually makes things simple and fancy algorithms just cannot be applied. But at least the problem can be approached.
Programming skills in at least languages like Python, R, shell scripting for UNIX are essential. Somehow discussed by many, those who say that data scientists should not be great developers, some others who say that they should. I’m going to explain what I think and more importantly why do I think that data scientists should be quite advanced programmers and software engineers.
Analytics applied to small datasets is called statistics (merely what we could see some years ago with cohorts of some hundreds observations and/or survival analysis). Real time, streaming and big data analytics require more than pure statistics and that’s where things get big that optimization and elegance in coding really make the difference. Hence I think that coding is a great asset for the Data Scientist of the future.
Inflation in data science as it was in academia. Everybody has a PhD today when a master was more than enough 10 years ago. Data scientists will be expected to know more and more as things get automatized. If you were great at random forest, now that random forest is automatically applied, a data scientist should also provide data collection skills and data cleaning and when all these will be automatically applied also infrastructure allocation skills. And who offers more wins the battle.
In a previous episode we mentioned that the future of data science will not be played around deep learning or any other fancy technology. But around data collection. Knowing what to collect is extremely important and most of the self-claimed data scientists decide to collect as much as they can, because - you know what? - we can deal with big data. Well no! Collecting the right data not only prevents from allocating resources that might be useless, but also helps simple algorithms to perform way better than the fancy stuff that few people know about.
I like to launch a provocation here. Data Scientist is the sexiest job today but not forever. Soon Data Scientists will be completely automatized.
Would you like a tip?
Listen to this episode again and focus on the human aspect of Data Scientist. This will help you keeping your job for a while, at least until you decide to retire.
I provide high quality training on statistics, data science, computer programming, in order to facilitate setting data analytics pipelines in the most optimized and cost effective ways. Feel free to schedule a meeting with me
Designing algorithms is what I have been doing in the last 10 years and more. I can set up data analytics solutions for small and large enterprises and apply machine learning algorithms to detect patterns and trends, extract knowledge and support decisions within several commercial domains, from finance, healthcare, traffic, and sales forecasting. I design and deploy algorithms and cloud-based software systems for production environments with high industrial standards. Feel free to schedule a meeting with me
I can provide technical consulting regarding data science, deep learning, computing architectures and the most prominent software packages currently available. My skills in distributed infrastructure are essential to give the insights you need to start your own data analytics pipeline in the most optimized and efficient environment. Feel free to schedule a meeting with me