In this article we will Extract data (youtubedata.csv file) from Google Cloud Storage, Transform it with Apache Beam and load the results into BigQuery using Eclipse —as we journey into revealing the top 5 categories with maximum number of videos uploaded, the top 10 rated Videos, and the most viewed Video.
For the sake of simplicity we will stick to a single extraction source.
In this tutorial we will understand a way you can write and test your Hadoop Program with Maven on IntelliJ without configuring Hadoop environment on your own machine or using any cluster. This is not a word count map reduce code tutorial a basic understanding of map-reduce functionalities is assumed.
Click create new project and choose Maven then click next