1.
You have an Azure HDInsight cluster.
You need a build a solution to ingest real-time streaming data into a nonrelational distributed database.
.What should you use to build the solution?
2.
You have an Apache Hive table that contains one billion rows.
You plan to use queries that will filter the data by using the WHERE clause. The values of the columns will be known only while the data loads into a Hive table.
You need to decrease the query runtime.
What should you configure?
3.
You plan to copy data from Azure Blob storage to an Azure SQL database by using Azure Data Factory.
Which file formats can you use?
4.
You have an Apache Spark cluster in Azure HDInsight.
You plan to join a large table and a lookup table.
You need to minimize data transfers during the join operation.
What should you do?
5.
You have an Apache Spark cluster in Azure HDInsight.
You execute the following command.
What is the result of running the command?
6.
You use YARN to manage the resources for a Spark Thrift Server running on a Linux-based Apache Spark cluster in Azure HDInsight. You discover that the cluster does not fully utilize the resources. You want to increase resource allocation. You need to increase the number of executors and the allocation of memory to the Spark Thrift Server driver. Which two parameters should you modify? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
7.
You are configuring the Hive views on an Azure HDInsight cluster that is configured to use Kerberos.
You plan to use the YARN logs to troubleshoot a query that runs against Apache Hadoop.
You need to view the method, the service, and the authenticated account used to run the query.
Which method call should you view in the YARN logs?
8.
You have an Azure HDInsight cluster.
You need to store data in a file format that maximizes compression and increases read performance.
Which type of file format should you use?
9.
You have an Apache Hadoop cluster in Azure HDInsight that has a head node and three data nodes. You have a MapReduce job.
You receive a notification that a data node failed.
You need to identify which component cause the failure.
Which tool should you use?
10.
You deploy Apache Kafka to an Azure HDInsight cluster.
You plan to load data into a topic that has a specific schema.
You need to load the data while maintaining the existing schema.
Which file format should you use to receive the data?