Dataset: ICLR
In this assignment, we are working with manuscripts and their reviews from a famous CS conference, ICLR (International Conference on Learning Representations). This is a top conference in computer science on machine learning.
Each manuscript have 2 – 3 reviews. Each row in the training.csv and test_contentonly.csv represent a review to a specific manuscript. They contains the following columns
The decision column was not directly listed in the test_contentonly.csv. Instread, it was listed in test_label.csv.
Grading policy
We will grade based on your code notebook (Python notebook or R markdown file) on GitHub. Your codes should have clear documentations of the process you take and decisions you have made. Also discuss your results when appropriate (see the problem descriptions below).
1. Supervised methods (60 pts) Please use Python or R to do the assignment
In this task, you need to predict whether a manuscript is accepted (1) or rejected (0), based on the review texts.
1.1 Dictionary method (20 pts)
Use the dictionary method to predict whether manuscripts in the test data were accepted or rejected.
1.2 Supervised methods (20 pts)
Use the dictionary method to predict whether manuscripts in the test data were accepted or rejected, using training.csv as the training data.
1.3 Evaluation (20 pts)