Category:
Computer Science

**Instructions**

You must make your own calculations and you must show your calculations in the answer document. Insufficient calculation steps will result in reduced points earned.

__Imbalanced Classifiers__

1. Begin by writing the formula for each calculation, then show your steps to arrive at your answer.

a. Calculate Accuracy

b. Precision

c. Recall

d. F- Measure

2. Begin by writing the formula for each calculation, then show your steps to arrive at your answer.

a. Calculate Accuracy

b. Precision

c. Recall

d. F- Measure

__Bayes Theorem__

3. (a) Suppose the fraction of undergraduate students who smoke is 15% and

the fraction of graduate students who smoke is 23%. If one-fifth of the college students are graduate students and the rest are undergraduates, what is the probability that a student who smokes is a graduate student?

Answer

(b) Given the information in part (a), is a randomly chosen college student

more likely to be a graduate or undergraduate student?

Answer

(c) Repeat part (b) assuming that the student is a smoker.

Answer:

(d) Suppose 30% of the graduate students live in a dorm but only 10% of

the undergraduate students live in a dorm. If a student smokes and lives in the dorm, is he or she more likely to be a graduate or undergraduate student? You can assume independence between students who live in a dorm and those who smoke.

Answer:

Bayes Theorem

4. Consider the data set below.

a) Estimate the conditional probabilities for (P(A|+), P(B|+), P(C|+), P(A|-). P(B|-), P(C|-)

(b) Use the estimate of conditional probabilities given in the previous question to predict the class label for a test sample (A =0, B =1, C =0) using the naïve Bayes approach.

1. Consider a binary classification problem with the following set of attributes and attribute values:

• Air Conditioner = {Working, Broken}

• Engine = {Good, Bad}

• Mileage = {High, Medium, Low}

• Rust = {Yes, No}

Suppose a rule-based classifier produces the following rule set:

(a) Are the rules mutually exclusive?

Answer:

(b) Is the rule set exhaustive?

Answer:

(c) Is ordering needed for this set of rules?

Answer:

(d) Do you need a default class for the rule set?

Answer:

2. Consider a training set that contains 100 positive examples and 400 negative examples. For each of the following candidate rules.

R1: A -→ + (covers 4 positive and 1 negative examples)

R2: B -→ + (covers 30 positive and 10 negative examples)

R3: C -→ + (covers 100 positive and 90 negative examples)

**Note:** **The rules do not cover the entire training set.** **This is not an exhaustive rule set.**

a. Determine which is the best and worst candidate rule according to Rule accuracy.

Answer:

b. Determine which is the best and worst candidate rule according to FOIL’s information gain.

**Review of FOIL’s Information Gain**

**R0:** **{} => class** **(initial rule)**

**R1:** **{A} => class (rule after adding conjunct)**

**Gain(R0, R1) = t [** **log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]**

**where…**

**t** **(total) number of positive instances covered by both R0 and R1**

**p0** **number of positive instances covered by R0**

**n0** **number of negative instances covered by R0**

**p1** **number of positive instances covered by R1**

**n1** **number of negative instances covered by R1**

Answer:

3. Consider the one-dimensional data set shown below.

Data set for Exercise 3.

x

0.5

3.0

4.5

4.6

4.9

5.2

5.3

5.5

7.0

9.5

y – 1st

y – 2nd

y – 3rd

y – 4th

a. Place the indicated symbol ( + or – ) into each cell for the purpose of classifying the data point x =5.0 according to its 1-, 3-, 5-, and 9-nearest neighbors (using majority vote).

Answer:

__Number of data points__ __symbol to be used/inserted into y row__

1st Row 1-nearest neighbor +

2nd Row 3-nearest neighbor –

3rd Row 5-nearest neighbor +

4th Row 9-nearest neighbor –