Recording: Introduction to Hadoop for Glolent community meetup

Last Thursday evening I had the opportunity to talk about Hadoop at a Glolent Global Talent community virtual meetup. Glolent connects remote IT workers across the globe and facilitates skill-sharing sessions that any member can join or present at. I rarely demonstrate Hadoop anymore so I needed to take a couple of evenings to brush up on the fundamentals. The talk is a distilled product of that study. I approached Hadoop’s architecture from a historical perspective: I started the talk by introducing the root problem – I/O bottleneck in processing Big Data – and positioned Hadoop Distributed File System as its panacea. There was an obligatory intro to the original processing paradigm on Hadoop: MapReduce, and a classic word count example of Shakespeare’s collected works. That was followed by a review of programming abstractions built on MapReduce and some alternative processing engines to MR – with an emphasis on Spark. A 30-minutes talking slot is very little time so I had to cut out any mention of resource or cluster management. You can judge for yourself how it turned out – I shared the recording below.