You are building an Azure Machine Learning workflow by using Azure Machine Learning Studio.
You create an Azure notebook that supports the Microsoft Cognitive Toolkit.
You need to ensure that the stochastic gradient descent (SGD) configuration maximizes the samples per second and supports parallel modeling that is managed by a parameter server.
Which SGD algorithm should you use?
You are analyzing taxi trips in New York City. You leverage the Azure Data Factory to create data pipelines and to orchestrate data movement.
You plan to develop a predictive model for 170 million rows (37 GB) of raw data in Apache Hive by using Microsoft R Server to identify which factors contribute to the passenger tipping behavior.
All of the platforms that are used for the analysis are the same. Each worker node has eight processor cores and 28 GB of memory.
Which type of Azure HDInsight cluster should you use to produce results as quickly as possible?
You have an Azure Machine Learning experiment.
You discover that a model causes many errors in a production dataset. The model causes only few errors in the training data.
What is the cause of the errors?
You are building an Azure Machine Learning experiment.
You need to transform a string column that has 47 distinct values into a binary indicator column. The solution must use the One-vs-All Multiclass model.
Which module should you use?
You are building a classification experiment in Azure Machine Learning.
You need to ensure that you can use the Evaluate Model module in the experiment.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
The manager of a call center reports that staffing the center is difficult because the number of calls is unpredictable.
You have historical data that contains information about the calls.
You need to build an Azure Machine Learning experiment to predict the number of total calls each hour.
Which model should you use?
You have an Apache Spark cluster in Azure HDInsight. The cluster includes 200 TB in five Apache Hive tables that have multiple foreign key relationships.
You have an Azure Machine Learning model that was built by using SPARK Accelerated Failure Time (AFT) Survival Regression Model (spark.survreg).
You need to prepare the Hive data into a single table as input for the Machine Learning model. The Hive data must be prepared in the least amount of time possible. What should you use to prepare the data?
You plan to use Azure Machine Learning to develop a predictive model. You plan to include an Execute Python Script module.
What capability does the module provide?
You are building an Azure Machine Learning experiment.
You are preparing the output of a Boosted Decision Tree Regression module. You add a Normalize Data module to the experiment.
You need to ensure that the range of the transformation method produces an output on a scale of -1 to 1.
Which transformation method should you use?