Coffee Shop Reviews

Overview Download & Link

In linear regression, we used 65 explanatory variables to fit in a multiple linear regression to predict the daily count of ratings and try to figure out whether there is linear relationship between features and response. It turns out that although there existed several significant variables and the total p-value is extremely small, the adjusted R square of the model is 0.01516, which means only 1.516% of the response variable variation that can be explained by this linear model. Moreover, we test the model on the test data, and in order to know how well it could predict whether the app can be invest or not, we set a threshold of 90 percentile to predicted result we get. That is, if the daily rating counts are larger than the 90% quantile, we set it as good to invest, and if it is no more than 90% quantile, we set it as bad to invest. Finally, we can compare the original daily rating count of test data with the data we predicted. It turns out the accuracy rate is 86.74%, which is not low. However, since our data consists of much larger part of “bad to invest”, accuracy rate is not a good reflection to the accuracy of models. For investors, they prefer to know whether their investment is valuable, thus precision could be a good criteria. Precision measures the proportion of cases identified as positive that are actually positive. The precision is 33.76%. Therefore, it is too low that 66.24% of investment would not be successful.