Topic – Titanic: Machine Learning from Disaster
Part 1 – Proposal and Sample cases
a) Submit a proposal (no more than TWO pages), that includes
• a brief description of the problem/opportunity
• specific business objective(s) of your analysis
• a brief explanation of the predictive modeling task(s)
• potential dataset(s) that you plan to use and their sources
• approximate number of cases in your dataset
• approximate number of cases you plan to use for i) training and ii) validation
• potential target/response/dependent variable(s)
• potential predictor/explanatory/independent variables
• data mining techniques (i.e., decision tree, logistic regression, neural network) that you are considering for the analysis
• data mining software (i.e., SAS Enterprise Guide, SAS Studio, SAS Enterprise Miner, R) that you are considering for the analysis
Note: Your proposal should explicitly address each requirement listed above. Predictive modeling is required for the project. Do not submit a proposal that includes only descriptive and exploratory analysis.
b) Submit an Excel or CSV file containing a sample of 50 to 100 cases (with appropriate column headers) from your dataset.
If you plan to use competition or dataset from Kaggle (or, any other source) for your project, include the link (i.e., URL) to the competition/dataset. Repeating verbatim the text from the competition is plagiarism. Write the proposal in your own words.
Part 2 – Data (this is applicable only if you plan to use the on-demand version of Enterprise Miner)
To upload your project data set(s) to the SAS server, follow the instructions provided here:
Part 3 – Final Report
Submit a written report (12 pages excluding appendices) that includes the following:
• executive summary of the project
• business problem/opportunity (from the proposal)
• specific business objective(s) (from the proposal)
• process followed for selecting and gathering data
• discussion of preliminary data exploration and findings
• description of data preparation – repairs, replacements, reductions, partitions, derivations, transformations, and variable clustering
• description of data modeling/analyses and assessments
• explanation of model comparisons and model selection
• conclusions and recommendations (i.e., what did you learn from the analysis; did you meet your stated business objective(s); how can the results of your analysis address the business problem/opportunity; what further analyses, that builds on your work, can be in done in the future)
Relevant output from your analyses should be included in the Appendix and referenced in the body of your report.