R programming language resources › Forums › Statistical analyses › using predict in R
- This topic has 1 reply, 2 voices, and was last updated 9 years, 11 months ago by Johnvereen.
- AuthorPosts
- April 19, 2013 at 3:03 am #908ArchieIndianMember
I am reading through predict() in R and am confused:
There is a dataset Spam from which we have created a train data and test data using random sampling. We have used the trainSpam(training data set to train the system). We want to see how good the model is, by testing on the test dataset(testSpam).predictionModel = glm(numType ~ charDollar, family ="binomial", data = trainSpam)
predictionTest = predict(predictionModel, testSpam)
predictedSpam = rep("nonspam", dim(testSpam)[1])
predictedSpam[predictionModel$fitted >0.5]="spam"#Here is my problem
table(predictedSpam, testSpam$type)
In the line where we say:predictedSpam[predictionModel$fitted >0.5]="spam"
How doespredictionModel$fitted
predict spams in the test data. It seems to be using predictionModel$fitted from the training data. Then we go on to compare with the spams of test data. Can someone explain?
Here is what I understood. In the line:predictionModel = glm(numType ~ charDollar, family = “binomial”, data = trainSpam)
We create a model using the trainSpam data.
In the next line:predictionTest = predict(predictionModel, testSpam)
We create predictionTest using the same model but the test data.
In the next line:predictedSpam = rep(“nonspam”, dim(testSpam)[1])
We created a vector with all values “nonspam”
In the next line:predictedSpam[predictionModel$fitted > 0.5] = “spam”
We are using the predictionModel$fitted, which has been fitted over the training data to decide which of the rows are to be classified as spam. Shouldn’t we rather use something like predictionTest to identify the spams?
This is where I am reading from: https://github.com/jtleek/dataanalysis/blob/master/week2/002structureOfADataAnalysis2/structureOfADataAnalysis2.pdf - AuthorPosts
- You must be logged in to reply to this topic.
One thought on “using predict in R”
Comments are closed.