In this article we will Extract data (youtubedata.csv file) from Google Cloud Storage, Transform it with Apache Beam and load the results into BigQuery using Eclipse —as we journey into revealing the top 5 categories with maximum number of videos uploaded, the top 10 rated Videos, and the most viewed Video.

For the sake of simplicity we will stick to a single extraction source.


  • Some Java knowledge
  • Eclipse IDE and Cloud Tools for Eclipse Plugin: Follow [1] to set up
  • GCP Project: to get a free tier GCP account with $300 free credits visit [2] → click on the button…

In this tutorial we will understand a way you can write and test your Hadoop Program with Maven on IntelliJ without configuring Hadoop environment on your own machine or using any cluster. This is not a word count map reduce code tutorial a basic understanding of map-reduce functionalities is assumed.

  • SDK
  • IntelliJ (Click to download)
  • Linux or Mac OS

Click create new project and choose Maven then click next

