Big Data Start–Batch in and Batch out

Let’s say somehow you already built out your big data platform–it could be Hortonwroks sandbox, Hortonworks on windows , claudera sandbox or HDinsights. Now, what is next?

Usually, you can challenge yourself to push data into this big data platform(HDFS) and fetch them out. This should be good start for you to understand this ECO system.

Say data in first. The easiest way is to use the Hadoop Shell: “CopyFromLocal”, this command can help you to copy the raw files from your local to your HDFS (Big Data File System). Or the other option you have is to use “Sqoop”. Sqoop can help Put the data from Relational Databases into HDFS.

Sqoop tutorial: https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html

Now, you got data in your Big Data system esp. in HDFS. How to present these data out? well, in this market, the only tool I know is Excel Power Query, which is a free add-on for excel. Once you install this add-on, you have the option to connect HDFS to fetch the data. More details, please check out this post. http://joeydantoni.com/2013/08/15/power-query-and-hadoop-2/

Keep tune, I am back for more big data discussions.

Advertisements

Published by

Derek Dai

focusing DB, Big Data and BI tech.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s