A guide to deploying and administering Hadoop Clusters like a smart administrator

Image by Free-Photos from Pixabay


A journey into the evolution of Big Data Compute Platforms like Hadoop and Spark. Sharing my perspective on where we were, where we are and where we are headed.

Image by Gerd Altmann from Pixabay

The age of on premises clusters…..


A guide to implement effective Kafka Clusters Design Strategies using Partitioning and Replication

Image by Tumisu from Pixabay


Learn how to track real-time gold prices using of Apache Kafka Pandas. Plot latest prices on a Bar Chart.

Photo by Chris Liverani on Unsplash


Easily process data changes over time from your database to Data Lake using Apache Hudi on Amazon EMR

Image by Gino Crescoli from Pixabay


Easily capture data changes over time from your database to Data Lake using Amazon Database Migration Service (DMS)

Image by Gino Crescoli from Pixabay


How we were able to auto-scale an Optical Character Recognition Pipeline to convert thousands of PDF documents into Text per day using event driven microservices architecture driven by Docker and Kubernetes

Image by mohamed Hassan from Pixabay


Easily create Spark ETL jobs using AWS Glue Studio — no Spark experience required

Image by Gerd Altmann from Pixabay


Performance Comparison of well known Big Data Formats — CSV, JSON, AVRO, PARQUET & ORC

Photo by Mika Baumeister on Unsplash


Comparison of two known engines for optical character recognition (OCR) and Naturtal Language Processing

Image by Felix Wolf from Pixabay

Manoj Kukreja

Big Data Engineering, Data Science, Data Lakes, Cloud Computing and IT security specialist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store