Cloudera Certified Developer for Apache Hadoop Sample Questions:
1. In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
A) Write a custom MapRunner that iterates over all key-value pairs in the entire file.
B) Increase the parameter that controls minimum split size in the job configuration.
C) Set the number of mappers equal to the number of input files you want to process.
D) Write a custom FileInputFormat and override the method isSplittable to always return false.
2. You need to import a portion of a relational database every day as files to HDFS, and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this?
A) fuse-dfs
B) Hue
C) Oozie
D) Pig
E) Sqoop
F) Hive
G) Flume
3. You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
A) Hue
B) Oozie
C) Hadoop Streaming
D) Pig
E) Sqoop
F) Hive
G) Flume
4. What is the difference between a failed task attempt and a killed task attempt?
A) A failed task attempt is a task attempt that threw a RuntimeException (i.e., the task fails). A killed task attempt is a task attempt that threw any other type of exception (e.g., IOException); the execution framework catches these exceptions and reports them as killed.
B) A failed task attempt is a task attempt that threw an unhandled exception. A killed task attempt is one that was terminated by the JobTracker.
C) A failed task attempt is a task attempt that did not generate any key value pairs. A killed task attempt is a task attempt that threw an exception, and thus killed by the execution framework.
D) A failed task attempt is a task attempt that completed, but with an unexpected status value. A killed task attempt is a duplicate copy of a task attempt that was started as part of speculative execution.
5. Given a directory of files with the following structure: line number, tab character, string:
Example:
1.abialkjfjkaoasdfjksdlkjhqweroij
2.kadf jhuwqounahagtnbvaswslmnbfgy
3.kjfteiomndscxeqalkzhtopedkfslkj
You want to send each line as one record to your Mapper. Which InputFormat would you use to complete the line: setInputFormat (________.class);
A) SequenceFileInputFormat
B) KeyValueTextInputFormat
C) BDBInputFormat
D) SequenceFileAsTextInputFormat
Solutions:
Question # 1 Answer: D | Question # 2 Answer: E | Question # 3 Answer: F | Question # 4 Answer: D | Question # 5 Answer: A |