๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
IT/Python

[Python] 4. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง (๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐํ™” ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”)

by ITyranno 2023. 12. 13.
728x90
๋ฐ˜์‘ํ˜•

 

 

 

 

 

 

 

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์„ธ๊ณ„๋ฅผ ํƒ๊ตฌํ•ฉ์‹œ๋‹ค.

 

 

 

 

 

 

 

 

1 ,2, 3ํŽธ์€ ์ด์ „ ๊ฒŒ์‹œ๊ธ€ ์ฐธ๊ณ  ๋ฐ”๋ž๋‹ˆ๋‹ค.

 

2023.12.08 - [IT/Python] - [Python] 1. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง

 

[Python] 1. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์„ธ๊ณ„๋ฅผ ํƒ๊ตฌํ•ฉ์‹œ๋‹ค. ์ˆ˜์ง‘๋ฐ์ดํ„ฐ ์˜ํ™”์ œ๋ชฉ, ํ‰์ , ๋Œ“๊ธ€ ์ƒ์„ฑํ•  ๋ฐ์ดํ„ฐ ๊ธ์ •/๋ถ€์ • URL https://movie.daum.net HOME Daum์˜ํ™”์—์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•˜์„ธ์š”! movie.daum.net ๋‹ค์Œ์˜ํ™” > ๋žญํ‚น > ๋ฐ•์Šค์˜ค

ityranno.tistory.com

 

 

2023.12.11 - [IT/Python] - [Python] 2. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง (๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐํ™” ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„, ์ (๋ถ„ํฌ) ๊ทธ๋ž˜ํ”„)

 

[Python] 2. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง (๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐํ™” ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„, ์ (

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์„ธ๊ณ„๋ฅผ ํƒ๊ตฌํ•ฉ์‹œ๋‹ค. 1ํŽธ์€ ์ด์ „ ๊ฒŒ์‹œ๊ธ€ ์ฐธ๊ณ  ๋ฐ”๋ž๋‹ˆ๋‹ค. 2023.12.08 - [IT/Python] - [Python] 1. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง [Python] 1. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค

ityranno.tistory.com

 

 

 

2023.12.12 - [IT/Python] - [Python] 3. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง (๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐํ™” ์›ํ˜• ๊ทธ๋ž˜ํ”„)

 

[Python] 3. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง (๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐํ™” ์›ํ˜• ๊ทธ๋ž˜ํ”„)

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์„ธ๊ณ„๋ฅผ ํƒ๊ตฌํ•ฉ์‹œ๋‹ค. 1, 2ํŽธ์€ ์ด์ „ ๊ฒŒ์‹œ๊ธ€ ์ฐธ๊ณ  ๋ฐ”๋ž๋‹ˆ๋‹ค. 2023.12.08 - [IT/Python] - [Python] 1. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง [Python] 1. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ

ityranno.tistory.com

 

 

 

 

 

 

 

<  Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง (๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐํ™”)  >

 

 

 

 

 

 

์ˆ˜์ง‘๋ฐ์ดํ„ฐ

์˜ํ™”์ œ๋ชฉ, ํ‰์ , ๋Œ“๊ธ€

 


 ์ƒ์„ฑํ•  ๋ฐ์ดํ„ฐ

๊ธ์ •/๋ถ€์ •

 

 

 

 

 

URL

 

https://movie.daum.net

 

HOME

Daum์˜ํ™”์—์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•˜์„ธ์š”!

movie.daum.net

 

 

 

 

 

<  ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”  >

 

 

"""

<ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ : KoNLPY>
 - Java ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ JDK ์„ค์น˜ ๋ฐ ํ™˜๊ฒฝ ์„ค์ • ํ•„์š”
 - ํ™˜๊ฒฝ๋ณ€์ˆ˜ ๋“ฑ๋ก (์‹œ์Šคํ…œ > ๊ณ ๊ธ‰ ์‹œ์Šคํ…œ ์„ค์ • > ๊ณ ๊ธ‰ > ํ™˜๊ฒฝ ๋ณ€์ˆ˜)
  * JAVA_HOME : JDK ์„ค์น˜ ํด๋”๊นŒ์ง€
  * Path ์ˆ˜์ • ํ›„ ๋‘ ๊ฐœ ์ถ”๊ฐ€ : %JAVA_HOME%, %JAVA_HOME%\bin
 - PC ์žฌ๋ถ€ํŒ… ํ›„ ์„ค์ • ํ™•์ธ : command ์ฐฝ open -> java, javac ์ž…๋ ฅ ํ›„ help ๋‚ด์šฉ ๋‚˜์˜ค๋ฉด ์„ฑ๊ณต

<๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜>
 - nltk ์„ค์น˜ : ์˜์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ (KoNLPY์˜ ์ƒ์œ„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ)
 * pip install nltk

 - nltk ํ”Œ๋Ÿฌ๊ทธ์ธ ์ถ”๊ฐ€ ์„ค์น˜(๋‹ค์šด๋กœ๋“œ ์„ค์น˜ ๋ฐฉ์‹)
  ->> ํ”Œ๋Ÿฌ๊ทธ์ธ์€ ์ตœ์ดˆ ํ•œ ๋ฒˆ ์„ค์น˜ํ•˜๋ฉด ์ถ”ํ›„ ๋‹ค๋ฅธ ๊ฐ€์ƒํ™˜๊ฒฝ์—๋„ ์ ์šฉ๋จ
  ->> ๋‹ค๋ฅธ ๊ฐ€์ƒํ™˜๊ฒฝ์—์„œ๋Š” pip install nltk ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋งŒ ์„ค์น˜ํ•˜๋ฉด ๋จ
  > python
  > import nltk
  > nltk.download()
  > NLTK ์ฐฝ์ด open๋จ
  > All packages ํƒญ ์„ ํƒ > punkt ๋”๋ธ”ํด๋ฆญ, stopwords ๋”๋ธ”ํด๋ฆญ
  > exit()


 - ์›Œ๋“œํด๋ผ์šฐ๋“œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜
  > pip install wordcloud
 
 - Konlpy ์„ค์น˜ ์ „์— ํŒŒ์ด์ฌ์—์„œ JAVA ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ธ์‹์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜
   > pip install JPype1 (๋์—๋Š” ์ˆซ์ž 1์ž…๋‹ˆ๋‹ค.)
   > pip install konlpy

 - jvm.py ํŒŒ์ผ ๋‚ด์— ๋ณ„ํ‘œ์‹œ(*) ์‚ญ์ œํ•˜๊ธฐ
  * ์œ„์น˜ : C:\Users\user\anaconda3\envs\gj_env_01\Lib\site-packages\konlpy
  * ๋ฉ”๋ชจ์žฅ์œผ๋กœ jvm.py์—ด๊ธฐ
  * folder_suffix[] ๋ฆฌ์ŠคํŠธ ๋‚ด์— ๋ณ„(*) ํ‘œ์‹œ ์ฐพ์•„์„œ ์‚ญ์ œ > ์ €์žฅ > ๋‹ซ๊ธฐ
  

"""

 

 

cmd - javac

 

 

 

 

NLTK Downloader

 

 

 

konlpy ํ…Œ์ŠคํŠธ

 

from konlpy.tag import Okt

 

okt = Okt()
okt

 

okt.nouns("์•ˆ๋…• ํ•˜์„ธ์š”~ ํŒŒ์ด์ฌ์ž…๋‹ˆ๋‹ค. ์•ˆ๋…•")

 

 

 

 

 

 

 

<  ์˜ํ™” ๊ธ์ •/๋ถ€์ • ๋ฆฌ๋ทฐ๋ฐ์ดํ„ฐ ๋นˆ๋„๋ถ„์„ ๋ฐ ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”  >

 

 

๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ •์˜ํ•˜๊ธฐ

 

### ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ •์˜ํ•˜๊ธฐ
import pandas as pd

 

 

 

๋ฐ์ดํ„ฐ์…‹ ์ฝ์–ด๋“ค์ด๊ธฐ

 

### ๋ฐ์ดํ„ฐ์…‹ ์ฝ์–ด๋“ค์ด๊ธฐ
# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ€์ˆ˜๋ช… : df_org
df_org = pd.read_csv("./data/df_new.csv")
df_org

 

 

 

<  ๊ธ์ • ๋ฐ ๋ถ€์ •์— ๋Œ€ํ•ด์„œ๋งŒ ๊ฐ๊ฐ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋งํ•˜๊ธฐ  >

 

 

๊ธ์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง

 

 

### ๊ธ์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง
# - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ€์ˆ˜๋ช… : pos_reviews

pos_reviews = df_org[df_org["label"] == 1]
pos_reviews

 

 

 

 

 

 

 

๋ถ€์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง

 

### ๋ถ€์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง
# - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ€์ˆ˜๋ช… : nag_reviews

neg_reviews = df_org[df_org["label"] == 0]
neg_reviews

 

 

 

 

 

 

<  ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ  >

 

๊ธ์ • ๋ฐ ๋ถ€์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์—์„œ ํ•œ๊ธ€ ์ด์™ธ ๋ชจ๋‘ ์ œ๊ฑฐํ•˜๊ธฐ

 

### ์ •๊ทœํ‘œํ˜„์‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ™œ์šฉ
import re

 

 

๊ธ์ • ๋ฆฌ๋ทฐ์—์„œ ํ•œ๊ธ€ ์ด์™ธ ๋ชจ๋‘ ์ œ๊ฑฐํ•˜๊ธฐ

### ๊ธ์ • ๋ฆฌ๋ทฐ์—์„œ ํ•œ๊ธ€ ์ด์™ธ ๋ชจ๋‘ ์ œ๊ฑฐ ์ฒ˜๋ฆฌํ•˜๊ธฐ
pos_reviews.loc[ : , "comment"] = [re.sub(r"[^ใ„ฑ-ใ…ฃ๊ฐ€-ํžฃ+]", 
                                          " ",
                                          data) for data in pos_reviews["comment"]]
pos_reviews

 

 

 

 

 

 

 

๋ถ€์ • ๋ฆฌ๋ทฐ์—์„œ ํ•œ๊ธ€ ์ด์™ธ ๋ชจ๋‘ ์ œ๊ฑฐํ•˜๊ธฐ

 

### ๋ถ€์ • ๋ฆฌ๋ทฐ์—์„œ ํ•œ๊ธ€ ์ด์™ธ ๋ชจ๋‘ ์ œ๊ฑฐ ์ฒ˜๋ฆฌํ•˜๊ธฐ
neg_reviews.loc[ : , "comment"] = [re.sub(r"[^ใ„ฑ-ใ…ฃ๊ฐ€-ํžฃ+]", 
                                          " ",
                                          data) for data in neg_reviews["comment"]]
neg_reviews

 

 

 

 

 

 

 

 

<  ๊ธ์ • ๋ฐ ๋ถ€์ • ๋ฆฌ๋ทฐ ํ˜•ํƒœ์†Œ ์ถ”์ถœํ•˜๊ธฐ  >

 

### ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
# jpype : java ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ python์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
# - konlpy๋Š” java๋กœ ๋งŒ๋“ค์–ด์ง„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
import jpype

# Okt : ํ•œ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
#  - Okt(Open Korean Text) : ๋Œ€ํ‘œ์  ํ•œ๊ธ€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ
from konlpy.tag import Okt

 

 

 

ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ๊ฐ์ฒด ์ƒ์„ฑํ•˜๊ธฐ

 

### ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ๊ฐ์ฒด ์ƒ์„ฑํ•˜๊ธฐ
okt = Okt()
okt

 

 

 

 

 

 

 

๊ธ์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์—์„œ -- > ๋ช…์‚ฌ ์ถ”์ถœํ•˜๊ธฐ

 

### ๊ธ์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์—์„œ -- > ๋ช…์‚ฌ ์ถ”์ถœํ•˜๊ธฐ
# - ๋ช…์‚ฌ๋งŒ ๋‹ด์•„ ๋†“์„ ๋ฆฌ์ŠคํŠธ ๋ณ€์ˆ˜ ์„ ์–ธ
pos_comment_nouns = []

for cmt in pos_reviews["comment"] :
    # print(okt.nouns(cmt))
    ### extend() : ๋ฆฌ์ŠคํŠธ์— ๊ฐ’๋งŒ ์ถ”์ถœํ•˜์—ฌ ํ™•์žฅํ•ด์„œ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹
    # - append() : ๋ฆฌ์ŠคํŠธ์— ํ˜•ํƒœ(type) ์ž์ฒด๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹
    pos_comment_nouns.extend(okt.nouns(cmt))

print(pos_comment_nouns)

 

 

 

 

 

 

๋ถ€์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์—์„œ -- > ๋ช…์‚ฌ ์ถ”์ถœํ•˜๊ธฐ

 

 

### ๋ถ€์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์—์„œ -- > ๋ช…์‚ฌ ์ถ”์ถœํ•˜๊ธฐ
# - ๋ช…์‚ฌ๋งŒ ๋‹ด์•„ ๋†“์„ ๋ฆฌ์ŠคํŠธ ๋ณ€์ˆ˜ ์„ ์–ธ
neg_comment_nouns = []

for cmt in neg_reviews["comment"] :
    # print(okt.nouns(cmt))
    ### extend() : ๋ฆฌ์ŠคํŠธ์— ๊ฐ’๋งŒ ์ถ”์ถœํ•˜์—ฌ ํ™•์žฅํ•ด์„œ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹
    # - append() : ๋ฆฌ์ŠคํŠธ์— ํ˜•ํƒœ(type) ์ž์ฒด๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹
    neg_comment_nouns.extend(okt.nouns(cmt))

print(neg_comment_nouns)

 

 

 

 

 

<  ๊ธ์ • ๋ฐ ๋ถ€์ • ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์—์„œ 1๊ธ€์ž๋Š” ๋ชจ๋‘ ์ œ์™ธ ์‹œํ‚ค๊ธฐ  >

 

### ๊ธ์ • ๋ฆฌ๋ทฐ ๋ช…์‚ฌ ๋ฐ์ดํ„ฐ์—์„œ 1๊ธ€์ž ๋ชจ๋‘ ์ œ์™ธํ•˜๊ธฐ
pos_comment_nouns2 = [w for w in pos_comment_nouns if len(w) > 1]
print(pos_comment_nouns2)

 

 

 

 

 

### ๋ถ€์ • ๋ฆฌ๋ทฐ ๋ช…์‚ฌ ๋ฐ์ดํ„ฐ์—์„œ 1๊ธ€์ž ๋ชจ๋‘ ์ œ์™ธํ•˜๊ธฐ
neg_comment_nouns2 = [w for w in neg_comment_nouns if len(w) > 1]
print(neg_comment_nouns2)

 

 

 

 

 

 

 

<  ๊ธ์ • ๋ฐ ๋ถ€์ • ๋ช…์‚ฌ๋“ค์˜ ๋นˆ๋„ ๋ถ„์„  >

 

 

### ๊ธ์ • ๋ฆฌ๋ทฐ ๋ช…์‚ฌ๋“ค์— ๋Œ€ํ•œ ์›Œ๋“œ์นด์šดํŠธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from collections import Counter

 

 

๊ธ์ • ๋ช…์‚ฌ ์›Œ๋“œ์นด์šดํŠธ ์ฒ˜๋ฆฌ

 

### ๊ธ์ • ๋ช…์‚ฌ ์›Œ๋“œ์นด์šดํŠธ ์ฒ˜๋ฆฌ
pos_word_count = Counter(pos_comment_nouns2)
print(pos_word_count)

 

 

 

 

๋ถ€์ • ๋ช…์‚ฌ ์›Œ๋“œ์นด์šดํŠธ ์ฒ˜๋ฆฌ

 

### ๋ถ€์ • ๋ช…์‚ฌ ์›Œ๋“œ์นด์šดํŠธ ์ฒ˜๋ฆฌ
neg_word_count = Counter(neg_comment_nouns2)
print(neg_word_count)

 

 

 

 

 

 

๊ธ์ • ๋ฐ ๋ถ€์ • ์›Œ๋“œ์นด์šดํŠธ ์ƒ์œ„ 20๊ฐœ ๋‹จ์–ด๋งŒ ์ถ”์ถœ  >

 

๊ธ์ • -> ์›Œ๋“œ์นด์šดํŠธ ์ƒ์œ„ 20๊ฐœ ๋‹จ์–ด ์ถ”์ถœ

 

 

### ๊ธ์ • -> ์›Œ๋“œ์นด์šดํŠธ ์ƒ์œ„ 20๊ฐœ ๋‹จ์–ด ์ถ”์ถœ
# - Count() -> ์ง€์›gksms gkatn ํ•จ์ˆ˜ ์ค‘์— ๋‚ด๋ฆผ์ฐจ์ˆœ ํ•จ์ˆ˜ : most_common(20)
#   --> ๋‚ด๋ฆผ์ฐจ์ˆœ ํ›„์— ์ƒ์œ„ 20๊ฐœ ์ถ”์ถœํ•˜๋Š” ํ•จ์ˆ˜์ž„
pos_top_20 = {}
for k, v in pos_word_count.most_common(20):
    pos_top_20[k] = v

pos_top_20 = {k:v for k, v in pos_word_count.most_common(20)}
pos_top_20

 

 

 

 

 

 

 

๋ถ€์ • -> ์›Œ๋“œ์นด์šดํŠธ ์ƒ์œ„ 20๊ฐœ ๋‹จ์–ด ์ถ”์ถœ

 

 

### ๋ถ€์ • -> ์›Œ๋“œ์นด์šดํŠธ ์ƒ์œ„ 20๊ฐœ ๋‹จ์–ด ์ถ”์ถœ
# - Count() -> ์ง€์›gksms gkatn ํ•จ์ˆ˜ ์ค‘์— ๋‚ด๋ฆผ์ฐจ์ˆœ ํ•จ์ˆ˜ : most_common(20)
#   --> ๋‚ด๋ฆผ์ฐจ์ˆœ ํ›„์— ์ƒ์œ„ 20๊ฐœ ์ถ”์ถœํ•˜๋Š” ํ•จ์ˆ˜์ž„
neg_top_20 = {}
for k, v in neg_word_count.most_common(20):
    neg_top_20[k] = v

neg_top_20 = {k:v for k, v in neg_word_count.most_common(20)}
neg_top_20

 

 

 

 

 

 

<  ๊ธ์ • ๋ฐ ๋ถ€์ • ์ƒ์œ„ 20๊ฐœ ๋ช…์‚ฌ์— ๋Œ€ํ•œ ๋นˆ๋„ ์‹œ๊ฐํ™”  >

 

 

### ์‹œ๊ฐํ™” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
import matplotlib.pyplot as plt

### ํฐํŠธ ์„ค์ • ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from matplotlib import font_manager, rc

### ํฐํŠธ ์„ค์ •
plt.rc("font", family="Malgun Gothic")

### ๋งˆ์ด๋„ˆ์Šค๊ธฐํ˜ธ ์„ค์ •
plt.rcParams["axes.unicode_minus"] = False

 

 

 

๊ธ์ • ์ƒ์œ„ 20๊ฐœ ๋ช…์‚ฌ์— ๋Œ€ํ•œ ๋นˆ๋„ ์‹œ๊ฐํ™” ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„

 

 

### ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•œ ๋นˆ๋„ ์‹œ๊ฐํ™”
plt.figure(figsize=(10, 5))

### ์ œ๋ชฉ ๋„ฃ๊ธฐ
plt.title("๊ธ์ • ๋ฆฌ๋ทฐ์˜ ๋‹จ์–ด ์ƒ์œ„ (20๊ฐœ) ๋นˆ๋„ ์‹œ๊ฐํ™”", fontsize=17)

### ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
for key, value in pos_top_20.items() :
    ### ์˜ํ™” ๋ผ๋Š” ๋‹จ์–ด๋Š” ์˜๋ฏธ๊ฐ€ ์—†์„ ๊ฒƒ์œผ๋กœ ์—ฌ๊ฒจ์ง
    # - ์ œ์™ธ ์‹œํ‚ค๊ธฐ
    if key == "์˜ํ™”" :
        continue
    plt.bar(key, value, color="lightgrey")

### x์ถ•๊ณผ y์ถ• ์ œ๋ชฉ ๋„ฃ๊ธฐ
plt.xlabel("๋ฆฌ๋ทฐ ๋ช…์‚ฌ")
plt.ylabel("๋นˆ๋„(count)")

### x์ถ• ๊ฐ๋„ ์กฐ์ ˆ
plt.xticks(rotation=70)

### ๊ทธ๋ž˜ํ”„ ๋ณด์—ฌ์ค˜~~
plt.show()

 

 

 

 

 

 

 

 

 

 

๋ถ€์ • ์ƒ์œ„ 20๊ฐœ ๋ช…์‚ฌ์— ๋Œ€ํ•œ ๋นˆ๋„ ์‹œ๊ฐํ™” ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„

 

 

### ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•œ ๋นˆ๋„ ์‹œ๊ฐํ™”
plt.figure(figsize=(10, 5))

### ์ œ๋ชฉ ๋„ฃ๊ธฐ
plt.title("๋ถ€์ • ๋ฆฌ๋ทฐ์˜ ๋‹จ์–ด ์ƒ์œ„ (20๊ฐœ) ๋นˆ๋„ ์‹œ๊ฐํ™”", fontsize=17)

### ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
for key, value in neg_top_20.items() :
    ### ์˜ํ™” ๋ผ๋Š” ๋‹จ์–ด๋Š” ์˜๋ฏธ๊ฐ€ ์—†์„ ๊ฒƒ์œผ๋กœ ์—ฌ๊ฒจ์ง
    # - ์ œ์™ธ ์‹œํ‚ค๊ธฐ
    if key == "์˜ํ™”" :
        continue
    plt.bar(key, value, color="lightgrey")

### x์ถ•๊ณผ y์ถ• ์ œ๋ชฉ ๋„ฃ๊ธฐ
plt.xlabel("๋ฆฌ๋ทฐ ๋ช…์‚ฌ")
plt.ylabel("๋นˆ๋„(count)")

### x์ถ• ๊ฐ๋„ ์กฐ์ ˆ
plt.xticks(rotation=70)

### ๊ทธ๋ž˜ํ”„ ๋ณด์—ฌ์ค˜~~
plt.show()

 

 

 

 

 

 

 

 

 

 

<  ๊ธ์ • ๋ฐ ๋ถ€์ • ๋ฆฌ๋ทฐ ๋‹จ์–ด ์›Œ๋“œํด๋ผ์šฐ๋“œ(wordcloud) ์‹œ๊ฐํ™”  >

 

 

### ์›Œ๋“œํด๋ผ์šฐ๋“œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from wordcloud import WordCloud

 

 

 

๊ธ์ • ๋ฆฌ๋ทฐ ๋‹จ์–ด ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”

 

### ๊ธ์ • ๋ฆฌ๋ทฐ ๋‹จ์–ด ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”
plt.figure(figsize=(8, 8))

### ๊ทธ๋ž˜ํ”„ ์ œ๋ชฉ
plt.title("[๊ธ์ •] ๋ฆฌ๋ทฐ ๋‹จ์–ด ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”")

### ์‚ฌ์šฉํ•  ํฐํŠธ ํŒŒ์ผ ์ง€์ •ํ•˜๊ธฐ
font_path = "C:/Windows/Fonts/malgunsl.ttf"

### ์›Œ๋“œํด๋ผ์šฐ๋“œ ๊ทธ๋ž˜ํ”„ ์†์„ฑ ์„ค์ •
wc = WordCloud(
            ### ํฐํŠธ ์ง€์ •
            font_path=font_path,
            ### ๋ฐฐ๊ฒฝ์ƒ‰ ์ง€์ •
            background_color="ivory",
            ### ๊ทธ๋ž˜ํ”„ ๋„ˆ๋น„
            width=800,
            ### ๊ทธ๋ž˜ํ”„ ๋†’์ด
            height=600
        )

### ์›Œ๋“œํด๋ผ์šฐ๋“œ ๊ทธ๋ž˜ํ”„์— ๋ฐ์ดํ„ฐ ๋„ฃ๊ธฐ
# - generate_from_frequencies() : ์›Œ๋“œํด๋ผ์šฐ๋“œ ์ด๋ฏธ์ง€๋กœ ๋ฐ˜ํ™˜ํ•ด์คŒ
# cloud = wc.generate_from_frequencies(pos_top_20)

### ๊ธ์ • ์ „์ฒด ๋‹จ์–ด ๋„ฃ์–ด๋ณด๊ธฐ
cloud = wc.generate_from_frequencies(pos_word_count)

### ์›Œ๋“œํด๋ผ์šฐ๋“œ ์ด๋ฏธ์ง€ ๋ณด์—ฌ์ฃผ๊ธฐ
plt.imshow(cloud)

### x y ์ขŒํ‘œ์ถ• ์ œ์™ธ์‹œํ‚ค๊ธฐ
plt.axis("off")

### ์ €์žฅํ•˜๊ธฐ
plt.savefig("./img/๊ธ์ •_๋ฆฌ๋ทฐ_๋‹จ์–ด_์›Œ๋“œํด๋ผ์šฐ๋“œ_์‹œ๊ฐํ™”.png")

### ๋ณด์—ฌ์ค˜~
plt.show()

 

 

 

 

 

 

 

 

๋ถ€์ • ๋ฆฌ๋ทฐ ๋‹จ์–ด ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”

 

 

### ๋ถ€์ • ๋ฆฌ๋ทฐ ๋‹จ์–ด ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”
plt.figure(figsize=(8, 8))

### ๊ทธ๋ž˜ํ”„ ์ œ๋ชฉ
plt.title("[๋ถ€์ •] ๋ฆฌ๋ทฐ ๋‹จ์–ด ์›Œ๋“œํด๋ผ์šฐ๋“œ ์‹œ๊ฐํ™”")

### ์‚ฌ์šฉํ•  ํฐํŠธ ํŒŒ์ผ ์ง€์ •ํ•˜๊ธฐ
font_path = "C:/Windows/Fonts/malgunsl.ttf"

### ์›Œ๋“œํด๋ผ์šฐ๋“œ ๊ทธ๋ž˜ํ”„ ์†์„ฑ ์„ค์ •
wc = WordCloud(
            ### ํฐํŠธ ์ง€์ •
            font_path=font_path,
            ### ๋ฐฐ๊ฒฝ์ƒ‰ ์ง€์ •
            background_color="black",
            ### ๊ทธ๋ž˜ํ”„ ๋„ˆ๋น„
            width=800,
            ### ๊ทธ๋ž˜ํ”„ ๋†’์ด
            height=600
        )

### ์›Œ๋“œํด๋ผ์šฐ๋“œ ๊ทธ๋ž˜ํ”„์— ๋ฐ์ดํ„ฐ ๋„ฃ๊ธฐ
# - generate_from_frequencies() : ์›Œ๋“œํด๋ผ์šฐ๋“œ ์ด๋ฏธ์ง€๋กœ ๋ฐ˜ํ™˜ํ•ด์คŒ
# cloud = wc.generate_from_frequencies(pos_top_20)

### ๊ธ์ • ์ „์ฒด ๋‹จ์–ด ๋„ฃ์–ด๋ณด๊ธฐ
cloud = wc.generate_from_frequencies(neg_word_count)

### ์›Œ๋“œํด๋ผ์šฐ๋“œ ์ด๋ฏธ์ง€ ๋ณด์—ฌ์ฃผ๊ธฐ
plt.imshow(cloud)

### x y ์ขŒํ‘œ์ถ• ์ œ์™ธ์‹œํ‚ค๊ธฐ
plt.axis("off")

### ์ €์žฅํ•˜๊ธฐ
plt.savefig("./img/๋ถ€์ •_๋ฆฌ๋ทฐ_๋‹จ์–ด_์›Œ๋“œํด๋ผ์šฐ๋“œ_์‹œ๊ฐํ™”.png")

### ๋ณด์—ฌ์ค˜~
plt.show()

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•

loading