1.
You have a Microsoft SQL Server instance that has R Services (In-Database) installed. The server has a comma-separated values (CSV) file stored in the local file system.
For analytic purposes, you need to read the CSV file into a database table in the SQL Server instance.
You connect to the SQL Server instance by using SQL Server Management Studio.
What should you use from sp_execute_external_script?
2.
You plan to read data from an Oracle database table and to store the data in the file system for later processing by dplyrXdf. The size of the data is larger than the memory on the server to be used for modelling.
You need to ensure that the data can be processed by dplyrXdf in the least amount of time possible.
How should you transfer the data from the Oracle database?
3.
You are running a parallel function that uses the following R code segment. (Line numbers are included for reference only.)
01 cp <- 0.01 xval <- 0 maxdepth <- 5
02
03 (form, data = "segmentationDataBig", maxDepth = maxdepth, cp = cp, xval = xval, blocksPerRead = 250
You need to complete the R code. The solution must support chunking.
Which function should insert at line 02?
4.
You have one-class support vector machines (SVMs).
You have a large dataset, but you do not have enough training time to fully test the model.
What is an alternative method to validate the model?
5.
You need to build a model that looks at the probability of an outcome. You must regulate between L1 and L2.
Which classification method should you use?
6.
You have a dataset.
You need to repeatedly split randomly the dataset so that 80 percent of the data is used as a training set and the remaining 20 percent is used as a test set.
Which method should you use?
7.
You need to run a large data tree model by using rxDForest. The model must use cross validation.
Which rxDForest option should you use?
8.
You have the following regression forest.

Which variable contributes the most to the dependent variable?