ISLR 5.3 Lab: Cross-Validation and the Bootstrap
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了ISLR 5.3 Lab: Cross-Validation and the Bootstrap,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含5490字,纯文字阅读大概需要8分钟。
内容图文
5.3.1 The Validation Set Approach
sample() function splits the set of observations into two halves, by selecting a random subset of 196 observations out of the original 392 observations.We refer to these observations as the training set.
> library (ISLR) > set.seed (1) > train=sample (392 ,196)
We then use the subset option in lm() to fit a linear regression using only the observations corresponding to the training set.
lm.fit =lm(mpg~horsepower ,data=Auto ,subset =train )
use the predict() function to estimate the response for all 392 observations, and we use the mean() function to calculate the MSE of the 196 observations in the validation set.
> attach (Auto) > mean((mpg -predict (lm.fit ,Auto))[-train ]^2) [1] 26.14142
use the poly() function to estimate the test error for the polynomial and cubic regressions.
> lm.fit2=lm(mpg~poly(horsepower ,2) ,data=Auto ,subset =train ) > mean((mpg -predict (lm.fit2 ,Auto))[-train ]^2) [1] 19.82259 > lm.fit3=lm(mpg~poly(horsepower ,3) ,data=Auto ,subset =train ) > mean((mpg -predict (lm.fit3 ,Auto))[-train ]^2) [1] 19.78252
5.3.2 Leave-One-Out Cross-Validation
In this lab, we will perform linear regression using the glm() function rather than the lm() function because
the latter can be used together with cv.glm(). The cv.glm() function is part of the boot library.
> library (boot) > glm.fit=glm(mpg~horsepower ,data=Auto) > cv.err =cv.glm(Auto ,glm.fit) > cv.err$delta [1] 24.2315124.23114
Our cross-validation estimate for the test error is approximately 24.23.
To automate the process, we use the for() function to initiate a for loop which iteratively fits polynomial regressions for polynomials of order i = 1 to i = 5, computes the associated cross-validation error, and stores it in the ith element of the vector cv.error. We begin by initializing the vector.
> for (i in1:5){ + glm.fit=glm(mpg~poly(horsepower ,i),data=Auto) + cv.error[i]=cv.glm (Auto ,glm.fit)$delta [1] + } > cv.error [1] 24.2315119.2482119.3349819.4244319.03321
the trend in cv.error indicates how cv is used for prm selection
5.3.3 k-Fold Cross-Validation
The cv.glm() function can also be used to implement k-fold CV.
> cv.error.10= rep (0 ,10) > for (i in1:10) { + glm.fit=glm(mpg~poly(horsepower ,i),data=Auto) + cv.error.10[i]=cv.glm (Auto ,glm.fit ,K=10) $delta [1] + }
5.3.4 The Bootstrap
Estimating the Accuracy of a Statistic of Interest
first create a function, alpha.fn(), which takes as input the (X, Y) data as well as a vector indicating which observations should be used to estimate α. The function then outputs the estimate for α based on the selected observations.
following command tells R to estimate α using all 100 observations.
> alpha.fn=function (data ,index){ + X=data$X [index] + Y=data$Y [index] + return ((var(Y)-cov (X,Y))/(var(X)+var(Y) -2* cov(X,Y))) + }
The next command uses the sample() function to randomly select 100 observations from the range 1 to 100, with replacement. This is equivalent to constructing a new bootstrap data set and recomputing ?α based on the new data set.
alpha.fn(Portfolio ,sample (100 ,100 , replace =T))
We can implement a bootstrap analysis by performing this command many times, recording all of the corresponding estimates for α, and computing the resulting standard deviation. However, the boot() function automates this approach. Below we produce R = 1, 000 bootstrap estimates for α.
> boot(Portfolio ,alpha.fn,R=1000) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Portfolio, statistic = alpha.fn, R = 1000) Bootstrap Statistics : original bias std. error t1* 0.5758321 -7.315422e-050.08861826
The final output shows that using the original data, ?α = 0.5758, and that the bootstrap estimate for SE(?α) is 0.0886.
Estimating the Accuracy of a Linear Regression Model
We first create a simple function, boot.fn(), which takes in the Auto data set as well as a set of indices for the observations, and returns the intercept
and slope estimates for the linear regression model. We then apply this function to the full set of 392 observations in order to compute the estimates of β0 and β1 on the entire data set using the usual linear regression coefficient estimate formulas from Chapter 3.
> boot.fn=function (data ,index ) + return (coef(lm(mpg~horsepower ,data=data ,subset =index))) > boot.fn(Auto ,1:392) (Intercept) horsepower 39.9358610 -0.1578447
Next, we use the boot() function to compute the standard errors of 1,000 bootstrap estimates for the intercept and slope terms
> boot(Auto ,boot.fn ,1000) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Auto, statistic = boot.fn, R = 1000) Bootstrap Statistics : original bias std. error t1* 39.93586100.02695630850.859851825 t2* -0.1578447 -0.00029064570.007402954
Below we compute the bootstrap standard error estimates and the standard linear regression estimates that result from fitting the quadratic model to the data.
> boot.fn=function (data ,index ) + coefficients(lm(mpg~horsepower +I( horsepower ^2) ,data=data ,subset =index)) > set.seed (1) > boot(Auto ,boot.fn ,1000) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Auto, statistic = boot.fn, R = 1000) Bootstrap Statistics : original bias std. error t1* 56.9000997026.098115e-032.0944855842 t2* -0.466189630 -1.777108e-040.0334123802 t3* 0.0012305361.324315e-060.0001208339
原文:http://www.cnblogs.com/jiajiaxingxing/p/4685271.html
内容总结
以上是互联网集市为您收集整理的ISLR 5.3 Lab: Cross-Validation and the Bootstrap全部内容,希望文章能够帮你解决ISLR 5.3 Lab: Cross-Validation and the Bootstrap所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。