I recently attended a seminar/talk by Ariel Rabkins who received his PhD in Computer Science from UC Berkeley. His main talk was on a programming model called MapReduce which was created by Google and used by big companies like Google and Facebook to store and process large quantities of data. These companies are known to store and process petabytes of data on a daily basis and the question raises on how do you manage these large quantities of data without system malfunction. Well this is where MapReduce comes to play.
There is two parts to MapReduce:
1. Map: First, there is a master node which takes the large quantities of data and divides it into small amount or sub-problems. From there, these worker nodes processes the data and returns it back to the master node.
2. Reduce: This part involves the master node in collecting the solutions from the worker node and merging it into one solution that was originally specified.
I really enjoyed this seminar because he showed us how the MapReduce was programmed in Java and currently I am taking CSE 017 which deals with Programming and Data Structure. I was able to make the connection between how large data can sometimes be complicated to handle but with the right programming tools, it can be done easily.