Description: The goal of this project is to have students demonstrate the ability to follow the main steps of a Machine Learning project and develop a Machine Learning model. Each student has to select a dataset either from the links that are provided, or you can find yours from any other resource. The final project includes the basic steps that require students to master data science skills to solve a multiclass classification problem.

Don't use plagiarized sources. Get Your Custom Essay on
Just from \$13/Page

The dataset: the excel

Project Requirements: Run the KNearest Neighbors model in Python to predict the class label from the different measurements in the dataset.

1. Introduction: Start with an introduction of your project. This introduction should introduce (1) the problem you want to solve. (2) Dataset descriptions like the size, the number of measurements, the type of the measurements, and the number of classes and their labels.
2. Load the data and discover & visualize it to get insights: generate graphs to discover if there is any relationship between measurements or find any clustering.
3. Prepare the dataset: Do preprocessing if your dataset needs for example, dimension reduction, removing outliers, handling text and categorical variables, cleaning the data, and/or data standardization (all of the variables used for K-NN model must be on the same order of magnitude in order to produce accurate results.
4. Data partitioning: After preprocessing your dataset, you need now to split the dataset into non-overlap sets to perform training and testing phases.
5. Different values of K: Choose three different values of K. Discuss your reasons for choosing the different values of K.
6. Training Phase: Run the model using the three different values of K you chose in the previous step. Discuss the three main steps in the KNN algorithm: calculate the distance, find the nearest neighbors, and making predictions.
7. Testing Phase: Compare the accuracy between the training phase and the testing phase. Discuss this results
8. Evaluation Phase: Check the accuracy of all models predictions (the different values of K) by creating the confusion matrix, compute Recall score, and Precision score. Discuss the predictions results in terms of the accuracy and the misclassification error.
9. Present the best model: choose the best model you found based on the results from the evaluation phase. Think of any improvement that can be made to get better results.