Assignment 2: Learning from a Data Set

[Machine Learning in Practice Course]


Overview

Your job includes,

1) Read the description of the task and download the data set

2) Implement an algorithm and submit the prediction

3) Write a report

4) Submit your report and code

TA's job is to evaluate your work.

CAUTION: DO YOUR JOB ALONE!

Task Decription and dataset download

The Story: Most financial companies provide loan services. When a user applies for a loan from the financial company, the user needs to submit a loan application first, and then the financial company validates the user's eligibility for loan and decides whether to provide loan services to the user according to the user's application. The financial company wants to automate the loan eligibility process based on user's details provided while filling out an online application form. It’s a classification problem. You can start by data analysis, then data preprocessing, exploring various feature engineering techniques, training different models to predict whether the loan will be paid off or not and finally conducting the model evaluation and selection.

Download:

Original data file

  • training set(train.csv): 346 records + 1 line (with header). The details of the training set is shown in Figure 1.
  • testing set(test.csv): 54 records + 1 line (with header)
  • README.md
  • Figure 1. Training Set

     

    Task: Predict "loan_status" for each record in the testing set.

    Implementation and Output

    Implementation: It is up to you to implement any learning algorithm with any programming language. It is encouraged that you can make some detail analysis about the difficulty of this task, and this will be good to you to find out appropriate learning algorithm. Problem analysis and innovative thoughts will help you get higher score.

    Output: The output of your learning algorithm should be a txt file "yourId.txt" which contains 54 lines without header, each line is prediction for the corresponding example. Please do not make confusion about the order of test example, otherwise you may get a very low performance.

    How to write and what to write

    Your report should includes:

    1) Your understand and analysis of the problem;

    2) The motivation of your algorithm and introduction of the background of your algorithm;

    3) Full technical details of your algorithm, especially including pseudo code of your algorithm;

    4) Description or analysis of the performance you got;

    5) Conclusion and (optional) discussion

     

    CAUTION: NOT PLAGIARIZE! OTHERWISE, YOU WILL GET PUNISHMENT!

    Please use MSWord template or LaTeX template to write your report in chinese with english abstract. Attention, please transform your source file to PDF file for submission.

    Name your PDF file with "report.pdf".

     

    How to submit

    Your submission includes:

    1) 'yourId.txt' file : containing 54 lines of predictions; (submit online [click here])

    2) 'report.pdf' file : your report;

    3) source code [python notebook (.ipynb) is also acceptable]

    Please carefully check out your submission.

    Note that "yourId" should be replaced with your ID and the name of the files should not be other names.

    Pack all your files into a single compressed file (compress in ZIP format).

    Name the compressed file using your student ID, Name, and version number, e.g., "12345678_name_v1.zip". We will take your file with highest version number as your final homework, e.g., "12345678_name_v2.zip"
    Please delete the .bak, i.e., the backup files from your final zip files.

    Uplode your ZIP file to 东大云盘

    Evaluate your work

    Evaluate of your prediction: According to your "yourId.txt", we will use macro F1 score to evaluate your prediction. As for macro F1 score, you may refer to https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html.

    Evaluate of your report: Novel idea, sound techniques, and beautiful writing gain you high scores. See also The evaluation of your report in Assignment 1.

    Evaluate of your source code: Fake and plagiarized source codes receives low scores.