Representative Subsets For Big Data Learning


Back to episodes

Listen now

May 03, 2016

by Francesco Gadaleta

Produced by: worldofpiggy.com

Support us

Did you like the show?
Please support us with a small donation. We will really appreciate!

How would you perform accurate classification on a very large dataset, by just looking at a sample of it?

In this episode I interview friend and colleague Rocco Langone, Machine Learning Researcher at the University of Leuven, Belgium. One of his recent papers is about big data and similarity metrics.

In this work Rocco proposes a deterministic method to obtain subsets from Big Data which are a good representative of the inherent structure in the data itself. This allows one to consider only a subset of the entire dataset, still performing at high accuracy if not better than traditional (eg. random) sampling.

As you can see, there is always a solution in Big Data. More details in this episode. Enjoy!

Show notes

Representative Subsets For Big Data Learning using k -NN Graphs

Flowchart</a> Algorithm Flowchart[/caption]

</a>

<img src=h”ttps://s3-eu-west-1.amazonaws.com/wopcontent/uploads/2016/05/mapreduce.jpg”> Distributed Map-Reduce graph construction