Big data analytics not only means crunching algorithms over high dimensional data for weeks. It also means preparing the data to get processes with more or less standard tools. A common pipeline that every data scientist should follow is reported … Continue reading
Category Archives: General
R code of some analytic tools I have been developing/using in real-life problemshttps://github.com/worldofpiggy/R-code0 forks.0 open issues.Recent commits: running parallel code in R, Francesco Gadaleta canonical correlation analysis, Francesco Gadaleta Principal Component Analysis DIY, Francesco Gadaleta metropolis hastings algo in R, … Continue reading
One paper that, in my opinion, will be more influential than the garbage constantly published on many paid journals, is “Train faster, generalize better: Stability of stochastic gradient descent”, written by Moritz Hardt at Google. The authors published it on Arxiv, … Continue reading
Many data scientists are familiar with the tools to analyse large or small datasets. Not so many know what to use when. Here is a summary of the methods that are more appropriate in specific situations. Feel free to download … Continue reading
Extracting knowledge from large datasets with large number of variables is always tricky. Dimensionality reduction helps in analyzing high dimensional data, still maintaining most of the information hidden behind complexity. Here are some methods that you must try before further … Continue reading