Amazon AWS Certified Big Data Specialist Set 2

Question - 15 Maximum mark : - 15

Home
All Exams
Amazon AWS Certified Big Data Specialist Online Practice Exams
Amazon AWS Certified Big Data Specialist Set 2

A travel website needs to present a graphical quantitative summary of its daily bookings to website visitors for marketing purposes. The website has millions of visitors per day, but wants to control costs by implementing the least-expensive solution for this visualization.
What is the most cost-effective solution?

Generate a static graph with a transient EMR cluster daily, and store it an Amazon S3. ✘ Generate a graph using MicroStrategy backed by a transient EMR cluster. ✘ Implement a Jupyter front-end provided by a continuously running EMR cluster leveraging spot instances for task nodes. ✘ Implement a Zeppelin application that runs on a long-running EMR cluster. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

An organization needs a data store to handle the following data types and access patterns:
Faceting
Search
Flexible schema (JSON) and fixed schema
Noise word elimination

Which data store should the organization choose?

Amazon Relational Database Service (RDS) ✘ Amazon Redshift ✘ Amazon DynamoDB ✘ Amazon Elasticsearch Service ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

The items in the table contains several string attributes describing the transaction as well as a JSON attribute containing the shopping cart and other details corresponding to the transaction. Average item size is - 250KB, most of which is associated with the JSON attribute. The average customer generates – 3GB of data per month.
Customers access the table to display their transaction history and review transaction details as needed. Ninety percent of the queries against the table are executed when building the transaction history view, with the other 10% retrieving transaction details. The table is partitioned on CustomerID and sorted on transaction date.
The client has very high read capacity provisioned for the table and experiences very even utilization, but complains about the cost of Amazon DynamoDB compared to other NoSQL solutions. Which strategy will reduce the cost associated with the client’s read queries while not degrading quality?

Modify all database calls to use eventually consistent reads and advise customers that transaction history may be one second out-of-date. ✘ Change the primary table to partition on TransactionID, create a GSI partitioned on customer and sorted on date, project small attributes into GSI, and then query GSI for summary data and the primary table for JSON details. ✘ Vertically partition the table, store base attributes on the primary table, and create a foreign key reference to a secondary table containing the JSON data. Query the primary table for summary data and the secondary table for JSONdetails. ✘ Create an LSI sorted on date, project the JSON attribute into the index, and then query the primary table for summary data and the LSI for JSON details. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

A company that manufactures and sells smart air conditioning units also offers add-on services so that customers can see real-time dashboards in a mobile application or a web browser. Each unit sends its sensor information in JSON format every two seconds for processing and analysis. The company also needs to consume this data to predict possible equipment problems before they occur. A few thousand pre-purchased units will be delivered in the next couple of months. The company expects high market growth in the next year and needs to handle a massive amount of data and scale without interruption.
Which ingestion solution should the company use?

Write sensor data records to Amazon Kinesis Streams. Process the data using KCL applications for the end-consumer dashboard and anomaly detection workflows. ✘ Batch sensor data to Amazon Simple Storage Service (S3) every 15 minutes. Flow the data downstream to the end-consumer dashboard and to the anomaly detection application. ✘ Write sensor data records to Amazon Kinesis Firehose with Amazon Simple Storage Service (S3) as the destination. Consume the data with a KCL application for the end-consumer dashboard and anomaly detection. ✘ Write sensor data records to Amazon Relational Database Service (RDS). Build both the end-consumer dashboard and anomaly detection application on top of Amazon RDS. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

A customer is collecting clickstream data using Amazon Kinesis and is grouping the events by IP address into 5-minute chunks stored in Amazon S3. Many analysts in the company use Hive on Amazon EMR to analyze this data. Their queries always reference a single IP address. Data must be optimized for querying based on IP address using Hive running on Amazon EMR.
What is the most efficient method to query the data with Hive?

Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS. ✘ Store the Amazon S3 objects with the following naming scheme: bucket_name/source=ip_address/year=yy/month=mm/day=dd/hour=hh/filename. ✘ Store the data in an HBase table with the IP address as the row key. ✘ Store the events for an IP address as a single file in Amazon S3 and add metadata with keys: Hive_Partitioned_IPAddress. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its Redshift schema. The ORDERS table has foreign key relationships with multiple dimension tables in this schema. How should the company determine the most appropriate distribution key for the ORDERS table?

Identify the largest and most frequently joined dimension table and ensure that it and the ORDERS table both have EVEN distribution. ✘ Identify the largest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table. ✘ Identify the smallest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table. ✘ Identify the largest and the most frequently joined dimension table and designate the key of this dimension table as the distribution key of the ORDERS table. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

An Amazon Kinesis stream needsto be encrypted.
Which approach should be used to accomplish this task?

Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the producer. ✘ Use a partition key to segment the data by MD5 hash function, which makes it undecipherable while in transit. ✘ Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the consumer. ✘ Use a shard to segment the data, which has built-in functionality to make it indecipherable while in transit. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

An online photo album app has a key design feature to support multiple screens (e.g, desktop, mobile phone, and tablet) with high-quality displays. Multiple versions of the image must be saved in different resolutions and layouts. The image-processing Java program takes an average of five seconds per upload, depending on the image size and format. Each image upload captures the following image metadata: user, album, photo label, upload timestamp.

The app should support the following requirements:

Hundreds of user image uploads per second

Maximum image upload size of 10 MB

Maximum image metadata size of 1 KB

Image displayed in optimized resolution in all supported screens no later than one minute after image upload

Which strategy should be used to meet these requirements?

Write images and metadata to Amazon Kinesis. Use a Kinesis Client Library (KCL) application to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB. ✘ Write image and metadata RDS with BLOB data type. Use AWS Data Pipeline to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB. ✘ Upload image with metadata to Amazon S3, use Lambda function to run the image processing and save the images output to Amazon S3 and metadata to the app repository DB. ✘ Write image and metadata to Amazon Kinesis. Use Amazon Elastic MapReduce (EMR) with Spark Streaming to run image processing and save the images output to Amazon S3 and metadata to app repository DB. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

A data engineer is running a DWH on a 25-node Redshift cluster of a SaaS service. The data engineer needs to build a dashboard that will be used by customers. Five big customers represent 80% of usage, and there is a long tail of dozens of smaller customers. The data engineer has selected the dashboarding tool. How should the data engineer make sure that the larger customer workloads do NOT interfere with the smaller customer workloads?

Apply query filters based on customer-id that can NOT be changed by the user and apply distribution keys on customer-id. ✘ Place the largest customers into a single user group with a dedicated query queue and place the rest of the customers into a different query queue. ✘ Push aggregations into an RDS for Aurora instance. Connect the dashboard application to Aurora rather than Redshift for faster queries. ✘ Route the largest customers to a dedicated Redshift cluster. Raise the concurrency of the multi-tenant Redshift cluster to accommodate the remaining customers. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

10.

A company is using Amazon Machine Learning as part of a medical software application. The application will predict the most likely blood type for a patient based on a variety of other clinical tests that are available when blood type knowledge is unavailable. What is the appropriate model choice and target attribute combination for this problem?

Multi-class classification model with a categorical target attribute. ✘ Regression model with a numeric target attribute. ✘ Binary Classification with a categorical target attribute. ✘ K-Nearest Neighbors model with a multi-class target attribute. ✔

Answer & Solution Discuss in Board Save for Later

Answer & Solution

Answer: Option B

Solution:

It provides fallback for IE8.

Prev Next

Congratulations

you have successfully completed the test

Back to Question View Result

Detail Form

We will send your result on your email id and phone no. please fill detail

Name

Email ID Note: Report will be send to the above Email ID

Phone No

Amazon AWS Certified Big Data Specialist Set 2

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Answer & Solution

Answer: Option B

Solution:

Congratulations

COMPANY

Products

OTHERS

Partner