Problem 4: Statistical Description of Multivariate Data for a Real-World Dataset [40 points]
To complete this task you have to use the crx.data file. This file crx.data contains data collected from credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. The dataset is downloaded from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.php).
This dataset is interesting because there is a good mix of attributes — continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values. Read the data in R using the following command.
data <- read.table(“path/crx.data”, sep = “,”);
Here, replace the path with the path of the file crx.data in your computer. After loading the data in R you can access each column using data[ , 1], data[ , 2], … , data[ , 15]. All the data will be in character format when you load it from crx.data you will have to convert the numeric columns from character to numeric using the as.numeric() function as follows. You can view the data using view(data) command.
attribute1 <- as.numeric(data[ , 2])
For missing values, NAs will be introduced by coercion.
There are 16 columns in the data the first 15 columns are the attributes of the data and the 16th column is the label of the data. You have to only analyze the attributes of the data.
*Do not forget to label the axes of the plots.