[R]4.소셜 네트워크 감정 분석 sentiment analysis

[R]4. 소셜 네트워크 감정 분석 sentiment analysis #twitter #감정분석

소셜 네트워크 감정 분석 Sentiment Analysis

[연구 질문]

특정 키워드를 태깅하거나 언급한 소셜 네트워크 데이터에서 긍정/부정 적인감정을 알아내고 싶다

[개념]

트위터에서 키워드로 언급된 데이터를 대상으로 해당 트윗에서 긍정적인 단어 또는 부정적인 단어가 얼마나 나타나는지 빈도수를 계산하여 긍정 단어수 - 부정 단어수의 수치로 나타낸다.

http://www.cs.uic.edu/ 에서 영어의 긍정 / 부정 단어들을 가져왔고 트윗의 단어들과 매칭한다.

임의로 추가 / 변경할 수 있다.

[필요 데이터]

Twitter api를 통해 데이터를 받고 전처리해야 한다.

코드에서 사용할 데이터를 제공한다.

일정 기간 중 apple, samsung을 언급하거나 태깅한 1500건의 데이터이며, RData로 되어 있어

R Studio에서 Open File - apple.RData, samsung.RData를 읽어 환경변수에 등록한 후 아래 코드를 확인할 수 있다.

apple.RData

samsung.RData

http://www.cs.uic.edu/ 에서 제공하는 긍정, 부정 단어들 목록을 첨부한다

negative-words.txt

positive-words.txt

[필요 패키지]

library(twitteR)

library(ROAuth)

library(plyr)

library(stringr)

library(ggplot2)

[코드] *파란색은 스크립트, 검정색은 결과값입니다

> load("C:/data/apple.RData")

> load("C:/data/samsung.RData")

> length(apple.tweets)

[1] 1500

> class(apple.tweets)

[1] "list"

> apple.tweets[1:5]

[[1]]

[1] "NorthIsUp: “Hey Siri, lay down a beat” - @apple this needs to happen"

[[2]]

[1] "FPiednoel: The @apple #watch statistics. https://t.co/hTpJ1i1aZ0 https://t.co/jnz6hxoKd2"

[[3]]

[1] "SwagginCactus: @tim_cook @apple how the fuck u gonna tell me you have an anal beads emoji \xed��\xed�� but no emoji for \"horny\" or \"lemme smash\""

[[4]]

[1] "mosaicofchange: RT @MIAuniverse: someone just found a phone @apple and sent me this Hi DEF link > \nhttps://t.co/oLdnZSD4zW"

[[5]]

[1] "brasscitygamers: Apple's next iPhone reportedly ditches the headphone jack \n@Apple #BCGT https://t.co/bDzHkgQxhl"

> tweet<-apple.tweets[[1]]

> tweet$getScreenName()

[1] "NorthIsUp"

> tweet$getText()

[1] "“Hey Siri, lay down a beat” - @apple this needs to happen"

> apple.text<-lapply(apple.tweets,function(t){t$getText()})

> head(apple.text,3)

[[1]]

[1] "“Hey Siri, lay down a beat” - @apple this needs to happen"

[[2]]

[1] "The @apple #watch statistics. https://t.co/hTpJ1i1aZ0 https://t.co/jnz6hxoKd2"

[[3]]

[1] "@tim_cook @apple how the fuck u gonna tell me you have an anal beads emoji \xed��\xed�� but no emoji for \"horny\" or \"lemme smash\""

> pos.word=scan("c:\\data\\positive-words.txt",what="character",comment.char=";")

Read 2006 items

> neg.word=scan("c:\\data\\negative-words.txt",what="character",comment.char=";")

Read 4783 items

#기준이되는 단어에 원하는 단어를 추가하거나 삭제할 수 있습니다.

> pos.words<-c(pos.word,"upgrade")

> neg.words<-c(neg.word,"wait","waiting")

#함수를 만들어 놓으면 편리합니다. 의미없는 데이터를 제거하고, 단어별로 자르는 단계와 Positive Neagtive 단어들과 매칭되는 값으로 점수를 매겨 dataframe형태로 반환하는 함수입니다.

> score.sentiment <- function(sentences, pos.words, neg.words, .progress='none')

+ {

+ require(plyr)

+ require(stringr)

+ scores <- laply(sentences, function(sentence, pos.words, neg.words){

+ sentence <- gsub('[[:punct:]]', "", sentence)

+ sentence <- gsub('[[:cntrl:]]', "", sentence)

+ sentence <- gsub('\\d+', "", sentence)

+ sentence <- tolower(sentence)

+ word.list <- str_split(sentence, '\\s+')

+ words <- unlist(word.list)

+ pos.matches <- match(words, pos.words)

+ neg.matches <- match(words, neg.words)

+ pos.matches <- !is.na(pos.matches)

+ neg.matches <- !is.na(neg.matches)

+ score <- sum(pos.matches) - sum(neg.matches)

+ return(score)

+ }, pos.words, neg.words, .progress=.progress)

+ scores.df <- data.frame(score=scores, text=sentences)

+ return(scores.df)

+ }

> apple.scores=score.sentiment(apple.text,pos.words,neg.words,.progress='text')

|======================================================| 100%

#각 트윗의 단어별로 Positive - Negative 단어 매칭 결과의 차이이기 때문에 0보다 클 수록 해당 기간동안 긍정적인 트윗이 많았다는 것을 의미합니다.

> hist(apple.scores$score)

> samsung.text<-laply(samsung.tweets,function(t){t$getText()})

> samsung.text<-samsung.text[!Encoding(samsung.text)=="UTF-8"]

> samsung.scores=score.sentiment(samsung.text,pos.words,neg.words,.progress='text')

|======================================================| 100%

> hist(samsung.scores$score)

안되는 부분이나 궁금한 점이 있으면 댓글 달아주세요 :)

저작자표시 비영리 변경금지 (새창열림)

'Data > R' 카테고리의 다른 글

[R 기초] 파일 가져오기, 데이터 뽑기, 조건문, 반복문 (0)	2016.03.06
[R 기초] 배열, 행렬, 요인, 데이터프레임, 리스트, 함수적용 (0)	2016.03.06
[R]3. 결정 트리 Classification (9)	2016.02.10
[R]2. 데이터 클러스터링 k-means 알고리즘 (4)	2016.02.10
[R]1. 문서의 유사도 tdm, cosine similarity (2)	2016.02.09

On the ball

[R]4.소셜 네트워크 감정 분석 sentiment analysis

'Data > R' 카테고리의 다른 글

티스토리툴바

[R]4.소셜 네트워크 감정 분석 sentiment analysis

'Data > R' 카테고리의 다른 글

관련글

티스토리툴바