B. A dataset has 1000 records and 50 variables with 5% of the values missing, spread randomly throughout the records and variables. An analysis decides to remove records that have missing values. About how many records would you expect would be removed? (20 points)
C. Given a database table containing weather data as follows:
Outlook |
Temperature |
Humidity |
Windy |
Class: Play |
Sunny |
Hot |
High |
False |
No |
Sunny |
Hot |
High |
True |
No |
Overcast |
Hot |
High |
False |
Yes |
Rainy |
Mild |
High |
False |
Yes |
Rainy |
Cool |
Normal |
False |
Yes |
Rainy |
Cool |
Normal |
True |
No |
Overcast |
Cool |
Normal |
True |
Yes |
Sunny |
Mild |
High |
False |
No |
Sunny |
Cool |
Normal |
False |
Yes |
Rainy |
Mild |
Normal |
False |
Yes |
Sunny |
Mild |
Normal |
True |
Yes |
Overcast |
Mild |
High |
True |
Yes |
Overcast |
Hot |
Normal |
False |
Yes |
Rainy |
Mild |
High |
True |
No |
Where Outlook, Temperature, Humidity, and Windy are the input variables (predictors), and Play is the output variable (response).
a. Compute the prior probability
P(PLAY=’Yes’) =
P(PLAY=’No’) =
b. Compute the conditional probability
P(Outlook=’Sunny’|PLAY=’Yes’) =
P(Outlook=’Sunny’|PLAY=’No’) =
P(Temperature = ‘Mild’|PLAY=’Yes’) =
P(Temperature = ‘Mild’|PLAY=’No’) =
P(Humidity = ‘High’| PLAY=’Yes’) =
P(Humidity = ‘High’| PLAY=’No’) =
P(Windy = ‘False’| PLAY=’Yes’) =
P(Windy = ‘False’| PLAY=’No’)=
c. Using naïve Bayes classification method to classify the following unknown record and to indicate whether to play or not.
(Outlook = ‘Sunny’, Temperature = ‘Mild’ , Humidity = ‘High’ , Windy = ‘False’)
(20 points)
D. Association Rule Mining: (20 points)
Given a transaction database for mining association rule as follows:
Database D
TID |
Items |
100 |
A C D |
200 |
B C E |
300 |
A B C E |
400 |
B E |
Please useApriorialgorithm to mine association rules with minimum support count = 2.
(Please show the derivation process step by step with candidate itemsets.)