[R 기초] 통계, 분포함수, t-검정

[R 기초] 통계, 분포함수, t-검정 #dnorm #dt #가설 검정 #t-검정

통계 분포 함수

접두어

d(ensity) : 확률 밀도 함수 값 구하기 P[X=x]

p(robability) :누적 분포 함수에 의한 누적확률을 구하기 P[X<2]=?

q(uantile) : 누적 확률에 해당하는 분위x 구하기 P[X<?]=0.2

r(andom) : 난수 생성

r에서 제공하는 분포함수 이름 위 접두어와 결합하여 각 기능을 제공함

자주사용하는 분포함수

정규분포

dnorm(x, mean=0, sd=1, log=F)

pnorm(p, mean=0, sd=1 lower.tail=T log.p=F)

qnorm(q, mean=0, sd=1 lower.tail=T log.p=F)

rnorm(n, mean=0, sd=1)

#lower.tail T: P[X<x] F: P[X>x] log: 확률에 로그를 취할 것인가

예시)

> dnorm(1.645, mean=0,sd=1)

[1] 0.1031108

> pnorm(1.645, mean=0,sd=1)

[1] 0.9500151

> qnorm(0.95, mean=0,sd=1)

[1] 1.644854

> rnorm(10)

[1] -2.33395030 -0.78676791 0.82354830 -1.17745923 1.19559888 -0.11425960 -0.38693942

[8] 1.02974891 -0.04596817 -0.26000348

t-분포

> dt(1.6, df=5)

[1] 0.1098193

> pt(1.6, df=5)

[1] 0.9147524

> qt(0.914, df=5)

[1] 1.593179

> rt(10, df=5)

[1] 1.12855004 0.38523979 1.12946584 1.88121382 -0.65580245 -0.80290157 -1.86671479

[8] 0.32523323 -0.73656271 0.08811627

t-분포의 자유도가 높아질 수록 정규분포에 가까워진다

예시)

> par(mfrow=c(1,1))

> x<-seq(-3,3,by=0.01)

> z<-dnorm(x)

> plot(x,z,type="l")

> t.3<-dt(x,df=3)

> lines(x,t.3,col="red")

> t.10<-dt(x,df=10)

> lines(x,t.10,col="blue")

> t.30<-dt(x,df=30)

> lines(x,t.30,col="green")

중심극한정리

참고: http://pubdata.tistory.com/40

중심 극한정리 : 표본평균은 모평균과 같고(불편추정량) 표본표준편차는 모표준편차/표본수제곱근로 되는 정규분포를 따른다

예시) 업데이트 예정

통계적 가설검정

참고: http://pubdata.tistory.com/41

귀무가설: 차이가 없다

대립가설 :차이가 있다

유의수준 알파a : 임계수준 (양측 검정, 단측 검정) 임계수준 바깥쪽은 기각역

검정통계량이 기각역 바깥쪽에 생겼을 경우(유의확률이 유의수준보다 작을 경우) 통계적으로 유의하며, 귀무가설을 기각함

제 1종 오류 : 실제 차이가 없는데 차이가 있다고 결론 내리는 오류

평균 비교 t.test() 단일 표본 도는 2개 표본에 대한 평균 비교

t.test(x, y=NULL, alternative = c("two,sided","less","greater"), mu=0, paired=F, var.equal=F, conf.level=0.95,...)

alternative: 양측검정, 단측검정의 경우 부등호

mu : 평균은 기본 0

paired : F-두 표본에서 짝을 이룬 두 표본인 경우

var.equal : T- 두표본의 분산이 동일할 경우

conf.level : 신뢰수준(알파)

예제1) 단일 표본 문제

수동차량이 연비가 좋다는 가설을 확인하기 위해 평균 20보다 큰지 유의수준 0.05에서 검정

귀무가설: 수동차량의 평균연비는 20이다

대립가설: 수동차량의 평균연비는 20보다 크다

유의수준: 0.05

> str(mtcars)

'data.frame': 32 obs. of 11 variables:

$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...

$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...

$ disp: num 160 160 108 258 360 ...

$ hp : num 110 110 93 110 175 105 245 62 95 123 ...

$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...

$ wt : num 2.62 2.88 2.32 3.21 3.44 ...

$ qsec: num 16.5 17 18.6 19.4 17 ...

$ vs : num 0 0 1 1 0 1 0 1 1 1 ...

$ am : num 1 1 1 0 0 0 0 0 0 0 ...

$ gear: num 4 4 4 3 3 3 3 4 4 4 ...

$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

> mtcars$am

[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1

> with(mtcars,t.test(mpg[am==1],mu=20,alternative = "greater"))

One Sample t-test

data: mpg[am == 1]

t = 2.5682, df = 12, p-value = 0.01231

alternative hypothesis: true mean is greater than 20

95 percent confidence interval:

21.3441 Inf

sample estimates:

mean of x

24.39231

#유의확률 p-value가 유의수준 0.05보다 작기 때문에 귀무가설을 기각하고 대립가설을 채택

> qt(0.95,df = 12)

[1] 1.782288

#t-검정통계량 2.5682가 t-분포의 0.95부분의 분위보다 크기 때문에 귀무가설을 기각

예제2) 짝을 이룬 두 표본 문제

수면제가 효과가 있는지 유의수준 0.025

> str(sleep)

'data.frame': 20 obs. of 3 variables:

$ extra: num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 ...

$ group: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

$ ID : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

> with(sleep,t.test(extra[group==1],extra[group==2],paired=T))

Paired t-test

data: extra[group == 1] and extra[group == 2]

t = -4.0621, df = 9, p-value = 0.002833

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-2.4598858 -0.7001142

sample estimates:

mean of the differences

-1.58

#유의확률 p-value가 유의수준 0.025보다 작기 때문에 귀무가설을 기각하고 대립가설을 채택

> qt(0.025, df=9)

[1] -2.262157

#t-검정통계량 -4.0621가 t-분포의 0.025부분의 분위보다 작기 때문에 귀무가설을 기각

예제3) 독립인 두 표본 문제

A,B 두 살충제의 성능에 차이가 없는지 검정, 유의수준 0.05

> str(InsectSprays)

'data.frame': 72 obs. of 2 variables:

$ count: num 10 7 20 14 14 12 10 23 17 20 ...

$ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...

> attach(InsectSprays)

#InsectSprays 데이터 프레임내의 항목이름을 변수로 사용

The following objects are masked from InsectSprays (pos = 3):

count, spray

> A<-count[spray=="A"]

> B<-count[spray=="B"]

#변수A와 B에 각각 A살충제 B살충제가 죽인 벌레수를 넣음

> var.test(A,B)

#두 집단의 분산이 같다는 가정을 확인하기 위해 F-Test를 시행

#결과를 보면 유의확률 p-value가 크기때문에 분산이 같다는 귀무가설을 기각하지 못함

F test to compare two variances

data: A and B

F = 1.2209, num df = 11, denom df = 11, p-value = 0.7464

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.3514784 4.2411442

sample estimates:

ratio of variances

1.22093

> t.test(A,B,var.equal = T)

#분산이 같다는 옵션을 넣고 t-test 시행

#유의확률이 유의수준 0.05보다 크기 때문에 성능에 차이가 없다는 귀무가설을 기각하지 못함, 양측검정

Two Sample t-test

data: A and B

t = -0.45352, df = 22, p-value = 0.6546

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-4.643994 2.977327

sample estimates:

mean of x mean of y

14.50000 15.33333

> t.test(A,B,alternative = "greater",var.equal = T)

#단측검정 옵션을 넣고 t-test 시행

#유의확률이 유의수준 0.05보다 크기 때문에 성능에 차이가 없다는 귀무가설을 기각하지 못함, 단측검정

Two Sample t-test

data: A and B

t = -0.45352, df = 22, p-value = 0.6727

alternative hypothesis: true difference in means is greater than 0

95 percent confidence interval:

-3.988519 Inf

sample estimates:

mean of x mean of y

14.50000 15.33333

> t.test(A,B,alternative = "greater",var.equal = F)

#두 집단의 분산이 같지 않은 경우를 가정해(var.equal =F) t-test시행하면 Welch Two Sample t-test가 시행됨

Welch Two Sample t-test

data: A and B

t = -0.45352, df = 21.784, p-value = 0.6727

alternative hypothesis: true difference in means is greater than 0

95 percent confidence interval:

-3.989891 Inf

sample estimates:

mean of x mean of y

14.50000 15.33333

> detach(InsectSprays)

저작자표시 비영리 변경금지

'Data > R' 카테고리의 다른 글

[R] 잡음 처리(MAF), 이상치 검출(카이제곱분포, LOF) (0)	2016.03.31
[R 기초] 범주형 변수를 지시형 변수로 변환, 결측치 채우기 (0)	2016.03.24
[R 기초] 패키지와 그래프 (0)	2016.03.06
[R 기초] 사용자정의함수, 재귀함수, z-test (0)	2016.03.06
[R 기초] 파일 가져오기, 데이터 뽑기, 조건문, 반복문 (0)	2016.03.06

Stock, Data, Dev

[R 기초] 통계, 분포함수, t-검정

'Data > R' 카테고리의 다른 글

티스토리툴바

[R 기초] 통계, 분포함수, t-검정

'Data > R' 카테고리의 다른 글

관련글

티스토리툴바