I provide all the information in the file. please read carefully, and I also provide one ipynb that i did so far.
Tasks Data Understanding: Conduct a thorough EDA to gain a better understanding of the given data and business objectives. This includes but not limited to: checking/dealing with missing data and outliers if any; top popular items sold and their characteristics; recommendation and rating patterns across departments and product types. buying and reviewing behavior of different age groups, etc. Carefully present your analysis and findings in your report.
Build a Benchmark Model to Predict Recommendation: Build a simple logistic regression model to assess the feasibility of recommendation prediction and establish a baseline model. For this task, you are required to build your baseline model using bag of words of the review text only. Use scikit-learn’s logistic regression model with “solver” set to ‘liblinear’ and all other parameters set to default. Use scikit-learn’s CountVectorizer with “max_features” set to 500 and all other parameters set to default. You need to choose appropriate evaluation metrics and model evaluation strategies to validate your model. Present your analysis and discuss your findings. Improving Your Benchmark Model: You are required to make attempts to improve the performance of your benchmark model as much as you can. You should consider using more advanced feature engineering techniques and adding extra features to rebuild your model. Your choice of decisions should be justified based on the evidence from the data and accompanied by detailed explanation. You must properly validate your model and optimize appropriate hyperparameters that apply. Simply building a model without any consideration of validation and optimisation does not meet the minimum requirements. You should demonstrate evidence of your efforts and you will be assessed based on the depth of your exploration. Provide a summary of what has worked and what has not. Report on your improved models and make comparisons with the benchmark model. Note: You must use logistic regression and no other models are allowed for this task. Interpreting Results: Decide on your best model and provide analysis and interpretation of its behavior. For example, you may report on the features associated with positive/negative recommendation. For your interpretation, you should focus on identifying general rules that might be useful for the company to improve its business in the future. Final Test Results: Finally, apply your best model on the test data. You are asked to report the classification results on the test data. Save your results into a csv file containing two columns, one for the Review Index (ID from product_test.csv) and the other column Recommended for the predicted labels (1’s or 0’s). An example file of test results test_results_example.csv is also provided. Name your file as GroupID_Test_Results.csv. The results on the test data will be assessed to decide your group performance among the entire class (group competition!).
There are no bids yet.
All Rights Reserved, Dataedy.com 2020