๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
IT/Python

[Python] 1. Daum ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›นํฌ๋กค๋ง

by ITyranno 2023. 12. 8.
728x90
๋ฐ˜์‘ํ˜•

 

 

 

 

 

 

 

 

 

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์„ธ๊ณ„๋ฅผ ํƒ๊ตฌํ•ฉ์‹œ๋‹ค.

 

 

 

 

 

 

 

 

<  ๋‹ค์Œ ์˜ํ™” ์‚ฌ์ดํŠธ ์›นํฌ๋กค๋ง  >

 

 

 

 

 

 

์ˆ˜์ง‘๋ฐ์ดํ„ฐ

์˜ํ™”์ œ๋ชฉ, ํ‰์ , ๋Œ“๊ธ€

 


 ์ƒ์„ฑํ•  ๋ฐ์ดํ„ฐ

๊ธ์ •/๋ถ€์ •

 

 

 

 

 

URL

 

https://movie.daum.net

 

HOME

Daum์˜ํ™”์—์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•˜์„ธ์š”!

movie.daum.net

 

 

 

 

  ๋‹ค์Œ์˜ํ™” > ๋žญํ‚น > ๋ฐ•์Šค์˜คํ”ผ์Šค > ์›”๊ฐ„ ์œ„์น˜์˜ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

 

 

 

 

 

 

 

์›นํฌ๋กค๋ง ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

 

 

 - ์ •์ ์ธ ์›นํฌ๋กค๋ง์„ ํ•  ๊ฒฝ์šฐ : BeautifulSoup
   : ํ•˜๋‚˜์˜ ํŽ˜์ด์ง€์— ๋ณด์ด๋Š” ๋ถ€๋ถ„๋งŒ ์ˆ˜์ง‘ํ•  ๋•Œ ์‚ฌ์šฉ
   
 - ๋™์ ์ธ ์›นํฌ๋กค๋ง์„ ํ•  ๊ฒฝ์šฐ : selenium
   : ํด๋ฆญ๊ณผ ๊ฐ™์€ ์ด๋ฒคํŠธ ๋“ฑ ํŽ˜์ด์ง€ ์ „ํ™˜์„ ํ•˜๋ฉด์„œ ์ˆ˜์ง‘ํ•  ๋•Œ ์‚ฌ์šฉ

 

 

 

 

selenium ์„ค์น˜

 

 

์„ค์น˜ ํ•„์š”  : pip install selenium

 

### ์„ค์น˜ ํ•„์š”  : pip install selenium
# ๋™์  ์›นํŽ˜์ด์ง€ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from selenium import webdriver

# ์›นํŽ˜์ด์ง€ ๋‚ด์— ๋ฐ์ดํ„ฐ ์ถ”์ถœ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from selenium.webdriver.common.by import By

# ์‹œ๊ฐ„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ถ”๊ฐ€
import time

 

 

 

Daum ํŽ˜์ด์ง€ ์—ด๊ธฐ

 

 

์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๊ณ  ์‹คํ–‰ํ•˜๋ฉด ํŽ˜์ด์ง€๊ฐ€ ์˜คํ”ˆ๋ฉ๋‹ˆ๋‹ค.

 

### ํฌ๋กฌ ๋ธŒ๋ผ์šฐ์ € ๋„์šฐ๊ธฐ
# - ๋ธŒ๋ผ์šฐ์ € ์ปจํŠธ๋กค
driver = webdriver.Chrome()

### url์„ ์ด์šฉํ•˜์—ฌ ํŽ˜์ด์ง€ ์ ‘๊ทผ
# - get() : ํŽ˜์ด์ง€์— ์ ‘๊ทผ ํ›„ ํ•ด๋‹น html ์ฝ”๋“œ ์ฝ์–ด ๋“ค์ด๊ธฐ
# - driver ๊ฐ์ฒด๊ฐ€ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ
driver.get("https://movie.daum.net/ranking/boxoffice/monthly")

 

 

 

 

 

 

 

 

 

[ํ‰์ ] ํƒญ์—์„œ ๋ฆฌ๋ทฐ -> ํŽผ์น˜๊ธฐ [๋” ๋ณด๊ธฐ] ์—†์ด ์ง„ํ–‰

 

 

 

 

์›นํฌ๋กค๋ง ์ฒ˜๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด driver ์ข…๋ฃŒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (๊ผญ!!!)

driver.quit()

 

 

 

์˜ํ™” ์ œ๋ชฉ์˜ ์œ„์น˜ ์ €์žฅ

 

- ํฌ๋กฌ๋ธŒ๋ผ์šฐ์ € > F12(๊ฐœ๋ฐœ์ž ๋ชจ๋“œ) > ์˜ํ™” ์ œ๋ชฉ ๋งˆ์šฐ์Šค ์šฐํด๋ฆญ > [๊ฒ€์‚ฌ] ํด๋ฆญ  > a ํƒœ๊ทธ์— ๋งˆ์šฐ์Šค ์œ„์น˜ ํ›„ ์šฐํด๋ฆญ > Copy > Copy selector ํด๋ฆญ

 

 

 

 

 

 

์˜ํ™” ํ‰์ ์˜ ์œ„์น˜ ์ €์žฅ

 

 

- ํฌ๋กฌ๋ธŒ๋ผ์šฐ์ € > F12(๊ฐœ๋ฐœ์ž ๋ชจ๋“œ) > ์˜ํ™” ์ œ๋ชฉ ๋งˆ์šฐ์Šค ์šฐํด๋ฆญ > [๊ฒ€์‚ฌ] ํด๋ฆญ  > a ํƒœ๊ทธ์— ๋งˆ์šฐ์Šค ์œ„์น˜ ํ›„ ์šฐํด๋ฆญ > Copy > Copy selector ํด๋ฆญ

 

 

 

 

try:
    ### ํฌ๋กฌ ๋ธŒ๋ผ์šฐ์ € ๋„์šฐ๊ธฐ
    # - ๋ธŒ๋ผ์šฐ์ € ์ปจํŠธ๋กค
    driver = webdriver.Chrome()
    
    ### url์„ ์ด์šฉํ•˜์—ฌ ํŽ˜์ด์ง€ ์ ‘๊ทผ
    # - get() : ํŽ˜์ด์ง€์— ์ ‘๊ทผ ํ›„ ํ•ด๋‹น html ์ฝ”๋“œ ์ฝ์–ด ๋“ค์ด๊ธฐ
    # - driver ๊ฐ์ฒด๊ฐ€ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ
    driver.get("https://movie.daum.net/ranking/boxoffice/monthly")
    
    ### ์ œ๋ชฉ์ด ์žˆ๋Š” ๋ถ€๋ถ„์˜ HTML ํƒœ๊ทธ ๊ฒฝ๋กœ(ํŒจ์Šค) ์ถ”์ถœํ•˜๊ธฐ
    # - ํฌ๋กฌ๋ธŒ๋ผ์šฐ์ € > F12(๊ฐœ๋ฐœ์ž ๋ชจ๋“œ) > ์˜ํ™” ์ œ๋ชฉ ๋งˆ์šฐ์Šค ์šฐํด๋ฆญ > [๊ฒ€์‚ฌ] ํด๋ฆญ
    #   > a ํƒœ๊ทธ์— ๋งˆ์šฐ์Šค ์œ„์น˜ ํ›„ ์šฐํด๋ฆญ > Copy > Copy selector ํด๋ฆญ
    # - ํ•ด๋‹น ์ œ๋ชฉ์˜ ์œ„์น˜ ์ €์žฅ๋จ
    
    ### a ํƒœ๊ทธ ์œ„์น˜ ๊ฒฝ๋กœ
    movie_path = "#mainContent > div > div.box_boxoffice > ol > li > div > div.thumb_cont > strong > a"
    
    #mainContent > div > div.box_boxoffice > ol > li:nth-child(1)(:nth-child(1) ์‚ญ์ œ) > div > div.thumb_cont > strong > a
    
    ### ํ˜„์žฌ ํฌ๋กฌ๋ธŒ๋ผ์šฐ์ €์— ๋ณด์ด๋Š” ์˜ํ™”์ œ๋ชฉ ๋ชจ๋‘ ์ถ”์ถœํ•˜๊ธฐ
    # - find_element() : ํ•œ๊ฑด ์กฐํšŒ, find_elements() : ์—ฌ๋Ÿฌ๊ฑด ์กฐํšŒ(๋ฆฌ์ŠคํŠธ ํƒ€์ž…์œผ๋กœ ๋ฐ˜ํ™˜)
    # - By.CSS_SELECTOR : CSS ์Šคํƒ€์ผ ๊ฒฝ๋กœ๋ฅผ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์ •
    movie_elements = driver.find_elements(By.CSS_SELECTOR, movie_path)
    # print(f"movie_elements Length = {len(movie_elements)}")
    # print(f"title[0] => {movie_elements[0].text}")
    # print(f"title[0] => {movie_elements[1].text}")
    # print(f"movie_elements(์ œ๋ชฉ) = {movie_elements}")


    ### -------------------------------

    ### ์ˆ˜์ง‘๋ฐ์ดํ„ฐ txt ํŒŒ์ผ๋กœ ์ €์žฅ์‹œํ‚ค๊ธฐ
    f = open("./data/movie_reviews.txt","w",encoding="UTF-8")
    

    ### -------------------------------
    ### ์ œ๋ชฉ 10๊ฐœ๋งŒ ์ถ”์ถœํ•˜๊ธฐ
    for i in range(10) :
        title = movie_elements[i].text.strip()
        print(f"No[{i}] / title[{title}] Start ----------->>>")

        ### ์ œ๋ชฉ์„ ํด๋ฆญ์‹œ์ผœ์„œ ์ƒ์„ธ ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•˜๊ธฐ
        # - ๋งˆ์šฐ์Šค๋กœ ์ œ๋ชฉ์„ ํด๋ฆญํ•˜๋Š” ํ–‰์œ„์™€ ๋™์ผํ•œ ์ฝ”๋“œ
        # - click() ์ด๋ฒคํŠธ ๋ฐœ์ƒ
        movie_elements[i].click()

        ### ์ƒ์„ธํŽ˜์ด์ง€๋กœ ์ ‘๊ทผํ–ˆ๋‹ค๋ผ๋Š” ์ •๋ณด๋ฅผ ๋ฐ›์•„์˜ค๊ธฐ
        # - ์‹ค์ œ ์ƒ์„ธํŽ˜์ด์ง€์— ์ ‘๊ทผ
        # - window_handles : ํŽ˜์ด์ง€๊ฐ€ ์—ด๋ฆด๋•Œ๋งˆ๋‹ค ๋ฆฌ์ŠคํŠธํƒ€์ž…์œผ๋กœ ์œˆ๋„์šฐ ์ •๋ณด๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฐ์ฒด
        #                  : -1์€ ๋งˆ์ง€๋ง‰์— ์ ‘๊ทผํ•œ ํŽ˜์ด์ง€๋ฅผ ์˜๋ฏธํ•จ
        movie_handle = driver.window_handles[-1]
        # - ์ƒˆ๋กœ ์—ด๋ฆฐ ํŽ˜์ด์ง€๋กœ ์ „ํ™˜ํ•˜๊ธฐ
        driver.switch_to.window(movie_handle)

        ### ํŽ˜์ด์ง€ ๋กœ๋”ฉ ๋ฐ ์ฝ”๋“œ ์ฝ์–ด๋“ค์ด๋Š” ์‹œ๊ฐ„์„ ๋ฒŒ์–ด์ฃผ๊ธฐ
        time.sleep(1)

        ### --------------------------------
        ### [ํ‰์ ] ํƒญ ํด๋ฆญ ์ด๋ฒคํŠธ ๋ฐœ์ƒ์‹œํ‚ค๊ธฐ
        tab_score_path = "#mainContent > div > div.box_detailinfo > div.tabmenu_wrap > ul > li:nth-child(4) > a"
        ### aํƒœ๊ทธ ์ •๋ณด ๊ฐ€์ง€๊ณ  ์˜ค๊ธฐ
        tab_score_element = driver.find_element(By.CSS_SELECTOR, tab_score_path)
        ### [ํ‰์ ] ํƒญ, ์ฆ‰ aํƒœ๊ทธ ํด๋ฆญ ์ด๋ฒคํŠธ ๋ฐœ์ƒ์‹œํ‚ค๊ธฐ
        tab_score_element.click()

        ### [ํ‰์ ] ํŽ˜์ด์ง€๋กœ ์ ‘๊ทผํ–ˆ๋‹ค๋ผ๋Š” ์ •๋ณด๋ฅผ ๋ฐ›์•„์˜ค๊ธฐ
        tab_score_handle = driver.window_handles[-1]
        # - ์ƒˆ๋กœ ์—ด๋ฆฐ ํŽ˜์ด์ง€๋กœ ์ „ํ™˜ํ•˜๊ธฐ
        driver.switch_to.window(tab_score_handle)

        ### ํŽ˜์ด์ง€ ๋กœ๋”ฉ ๋ฐ ์ฝ”๋“œ ์ฝ์–ด๋“ค์ด๋Š” ์‹œ๊ฐ„์„ ๋ฒŒ์–ด์ฃผ๊ธฐ
        time.sleep(1)

        
        ### ------------------------------------
        ### [ํ‰์ ] ๋”๋ณด๊ธฐ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜์—ฌ ๋ชจ๋‘ ํŽผ์น˜๊ธฐ
        ### - ํŽผ์นœ ๊ฐฏ์ˆ˜ ํ™•์ธ ๋ณ€์ˆ˜
        more_view_cnt = 0

        ### ๋ชจ๋‘ ํŽผ์น˜๊ธฐ(๋”๋ณด๊ธฐ) ์ˆ˜ํ–‰
        while True :
            try:
                movie_view_path = ""
                more_view_element = driver.find_element(By.CSS_SELECTOR, more_view path)
                more_view_element.click()

                ### ๋”๋ณด๊ธฐ ํด๋ฆญ ํ›„ -> ์ ‘๊ทผํ–ˆ๋‹ค๋ผ๋Š” ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ
                movie_handle = driver.window_handles[-1]
                # - ์ƒˆ๋กœ ์—ด๋ฆฐ ํŽ˜์ด์ง€๋กœ ์ „ํ™˜ํ•˜๊ธฐ
                driver.switch_to.window(movie_handle)
        
                ### ํŽ˜์ด์ง€ ๋กœ๋”ฉ ๋ฐ ์ฝ”๋“œ ์ฝ์–ด๋“ค์ด๋Š” ์‹œ๊ฐ„์„ ๋ฒŒ์–ด์ฃผ๊ธฐ
                time.sleep(1)

                ### ๋”๋ณด๊ธฐ ํด๋ฆญ ํšŸ์ˆ˜ ํ™•์ธ์„ ์œ„ํ•ด 1์”ฉ ์ฆ๊ฐ€
                more_view_cnt += 1
                    
            except Exception as e :
                    ### ๋”์ด์ƒ ๋”๋ณด๊ธฐ ๋ฒ„ํŠผ์ด ๋ณด์ด์ง€ ์•Š์œผ๋ฉด ์˜ค๋ฅ˜ ๋ฐœ์ƒ
                    # - ์˜ค๋ฅ˜ ๋ฐœ์ƒ ์‹œ์ ์ด ๋”๋ณด๊ธฐ ๋ฒ„ํŠผ์ด ๋๋‚˜๋Š” ์‹œ์ 
                    break

        ### ๋”๋ณด๊ธฐ ํด๋ฆญํšŸ์ˆ˜ ํ™•์ธํ•˜๊ธฐ
        print(f"๋”๋ณด๊ธฐ ํด๋ฆญ ํšŸ์ˆ˜ : [{more_view_cnt]")


        ### ------------------------
        ### ๋ชจ๋“  ํ‰์  ๋ฐ์ดํ„ฐ ์ถ”์ถœํ•˜๊ธฐ
        #comment922467050 > div > div.ratings.rating_10
        score_path = "ul.list_comment div.ratings"
        score_lists = driver.find_elements(By.CSS_SELECTOR, score_path)
        # print(f"ํ‰์  ๊ฐฏ์ˆ˜ : {len(score_lists)}")

        ### ๋ชจ๋“  ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ์ถ”์ถœํ•˜๊ธฐ
        #comment922467050 > div > p
        comment_path = "ul.list_comment p.desc_txt"
        comment_lists = driver.find_elements(By.CSS_SELECTOR, comment_path)
        # print(f"๋ฆฌ๋ทฐ ๊ฐฏ์ˆ˜ : {len(comment_lists)}")

        ### ----------------------
        ### ํ‰์ , ๋ฆฌ๋ทฐ ์ถ”์ถœํ•˜๊ธฐ
        # - ํ‰์ ์„ ์ด์šฉํ•˜์—ฌ ๊ธ์ •/๋ถ€์ • ๊ฐ’ ์ƒ์„ฑํ•˜๊ธฐ
        for j in range(len(score_lists)):
            ### ํ‰์  ์ถ”์ถœํ•˜๊ธฐ
            score = score_lists[j].text.strip()

            ### ๋ฆฌ๋ทฐ ์ถ”์ถœํ•˜๊ธฐ
            comment = comment_lists[j].text.strip().replace("\n", "")

            ### ํ‰์ ์„ ์ด์šฉํ•ด์„œ ๊ธ์ •/๋ถ€์ • ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
            # - ๊ธ์ • : ํ‰์ ์ด 8 ์ด์ƒ์ธ ๊ฒฝ์šฐ๋กœ, ๊ธ์ •๊ฐ’์€ 1 ์‚ฌ์šฉ
            # - ๋ถ€์ • : ํ‰์ ์ด 4 ์ดํ•˜์ธ ๊ฒฝ์šฐ๋กœ, ๋ถ€์ •๊ฐ’์€ 0 ์‚ฌ์šฉ
            # - ๊ธฐํƒ€ : ๋‚˜๋จธ์ง€, ๊ธฐํƒ€๊ฐ’์€ 2 ์‚ฌ์šฉ
            label = 0
            if int(score) >= 8 :
                label = 1
            elif int(score) <= 4 :
                label = 0
            else :
                label = 2

            ### ๊ฐ ์˜ํ™” ๋ณ„ ํ™•์ธํ•˜๊ธฐ
            print(f"{title} \t{score} \t{comment} \t{label} \n")

            ### ํŒŒ์ผ์— ์“ฐ๊ธฐ
            f.write(f"{title}\t{score}\t{comment}\t{label}\n")

        ### -------------------------
        ### ์˜ํ™” ํ•œํŽธ์— ๋Œ€ํ•œ ์ •๋ณด ์ˆ˜์ง‘์ด ๋๋‚˜๋ฉด ๋‹ค์‹œ ๋ฉ”์ธ์œผ๋กœ ์ด๋™
        # - execute_script() : ์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ ๋ฌธ๋ฒ• ์ฒ˜๋ฆฌ ํ•จ์ˆ˜
        driver.execute_script("window.history.go(-2)")
        time.sleep(1)
            

    ### ์›นํฌ๋กค๋ง ์ฒ˜๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด driver ์ข…๋ฃŒํ•ด์•ผ ํ•จ (๊ผญ!!!)
    # driver.quit()

except Exception as e :
    print(e)

    ### ํŒŒ์ผ ์ž์› ๋‹ซ๊ธฐ
    f.close()

    ### ์›นํฌ๋กค๋ง ์ฒ˜๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด driver ์ข…๋ฃŒํ•ด์•ผ ํ•จ (๊ผญ!!!)
    driver.quit()

finally :
    ### ํŒŒ์ผ ์ž์› ๋‹ซ๊ธฐ
    f.close()
    
    ### ์›นํฌ๋กค๋ง ์ฒ˜๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด driver ์ข…๋ฃŒํ•ด์•ผ ํ•จ (๊ผญ!!!)
    driver.quit()

 

 

 

 

 

 

 

 

 

[ํ‰์ ] ํƒญ์—์„œ ๋ฆฌ๋ทฐ -> ํŽผ์น˜๊ธฐ [๋” ๋ณด๊ธฐ] ํด๋ฆญํ•˜์—ฌ ์ง„ํ–‰

 

 

try:
    ### ํฌ๋กฌ ๋ธŒ๋ผ์šฐ์ € ๋„์šฐ๊ธฐ
    # - ๋ธŒ๋ผ์šฐ์ € ์ปจํŠธ๋กค
    driver = webdriver.Chrome()
    
    ### url์„ ์ด์šฉํ•˜์—ฌ ํŽ˜์ด์ง€ ์ ‘๊ทผ
    # - get() : ํŽ˜์ด์ง€์— ์ ‘๊ทผ ํ›„ ํ•ด๋‹น html ์ฝ”๋“œ ์ฝ์–ด ๋“ค์ด๊ธฐ
    # - driver ๊ฐ์ฒด๊ฐ€ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ
    driver.get("https://movie.daum.net/ranking/boxoffice/monthly")
    
    ### ์ œ๋ชฉ์ด ์žˆ๋Š” ๋ถ€๋ถ„์˜ HTML ํƒœ๊ทธ ๊ฒฝ๋กœ(ํŒจ์Šค) ์ถ”์ถœํ•˜๊ธฐ
    # - ํฌ๋กฌ๋ธŒ๋ผ์šฐ์ € > F12(๊ฐœ๋ฐœ์ž ๋ชจ๋“œ) > ์˜ํ™” ์ œ๋ชฉ ๋งˆ์šฐ์Šค ์šฐํด๋ฆญ > [๊ฒ€์‚ฌ] ํด๋ฆญ
    #   > a ํƒœ๊ทธ์— ๋งˆ์šฐ์Šค ์œ„์น˜ ํ›„ ์šฐํด๋ฆญ > Copy > Copy selector ํด๋ฆญ
    # - ํ•ด๋‹น ์ œ๋ชฉ์˜ ์œ„์น˜ ์ €์žฅ๋จ
    
    ### a ํƒœ๊ทธ ์œ„์น˜ ๊ฒฝ๋กœ
    movie_path = "#mainContent > div > div.box_boxoffice > ol > li > div > div.thumb_cont > strong > a"
    
    #mainContent > div > div.box_boxoffice > ol > li:nth-child(1)(:nth-child(1) ์‚ญ์ œ) > div > div.thumb_cont > strong > a
    
    ### ํ˜„์žฌ ํฌ๋กฌ๋ธŒ๋ผ์šฐ์ €์— ๋ณด์ด๋Š” ์˜ํ™”์ œ๋ชฉ ๋ชจ๋‘ ์ถ”์ถœํ•˜๊ธฐ
    # - find_element() : ํ•œ๊ฑด ์กฐํšŒ, find_elements() : ์—ฌ๋Ÿฌ๊ฑด ์กฐํšŒ(๋ฆฌ์ŠคํŠธ ํƒ€์ž…์œผ๋กœ ๋ฐ˜ํ™˜)
    # - By.CSS_SELECTOR : CSS ์Šคํƒ€์ผ ๊ฒฝ๋กœ๋ฅผ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์ •
    movie_elements = driver.find_elements(By.CSS_SELECTOR, movie_path)
    # print(f"movie_elements Length = {len(movie_elements)}")
    # print(f"title[0] => {movie_elements[0].text}")
    # print(f"title[0] => {movie_elements[1].text}")
    # print(f"movie_elements(์ œ๋ชฉ) = {movie_elements}")


    ### -------------------------
    ### ์ˆ˜์ง‘๋ฐ์ดํ„ฐ txt ํŒŒ์ผ๋กœ ์ €์žฅ์‹œํ‚ค๊ธฐ
    f = open("./data/movie_reviews.txt","w",encoding="UTF-8")
    

    ### -------------------------------
    ### ์ œ๋ชฉ 10๊ฐœ๋งŒ ์ถ”์ถœํ•˜๊ธฐ
    for i in range(10) :
        title = movie_elements[i].text.strip()
        print(f"No[{i}] / title[{title}] Start ----------->>>")

        ### ์ œ๋ชฉ์„ ํด๋ฆญ์‹œ์ผœ์„œ ์ƒ์„ธ ํŽ˜์ด์ง€๋กœ ์ด๋™ํ•˜๊ธฐ
        # - ๋งˆ์šฐ์Šค๋กœ ์ œ๋ชฉ์„ ํด๋ฆญํ•˜๋Š” ํ–‰์œ„์™€ ๋™์ผํ•œ ์ฝ”๋“œ
        # - click() ์ด๋ฒคํŠธ ๋ฐœ์ƒ
        movie_elements[i].click()

        ### ์ƒ์„ธํŽ˜์ด์ง€๋กœ ์ ‘๊ทผํ–ˆ๋‹ค๋ผ๋Š” ์ •๋ณด๋ฅผ ๋ฐ›์•„์˜ค๊ธฐ
        # - ์‹ค์ œ ์ƒ์„ธํŽ˜์ด์ง€์— ์ ‘๊ทผ
        # - window_handles : ํŽ˜์ด์ง€๊ฐ€ ์—ด๋ฆด๋•Œ๋งˆ๋‹ค ๋ฆฌ์ŠคํŠธํƒ€์ž…์œผ๋กœ ์œˆ๋„์šฐ ์ •๋ณด๋ฅผ ์ˆœ์„œ๋Œ€๋กœ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฐ์ฒด
        #                  : -1์€ ๋งˆ์ง€๋ง‰์— ์ ‘๊ทผํ•œ ํŽ˜์ด์ง€๋ฅผ ์˜๋ฏธํ•จ
        movie_handle = driver.window_handles[-1]
        # - ์ƒˆ๋กœ ์—ด๋ฆฐ ํŽ˜์ด์ง€๋กœ ์ „ํ™˜ํ•˜๊ธฐ
        driver.switch_to.window(movie_handle)

        ### ํŽ˜์ด์ง€ ๋กœ๋”ฉ ๋ฐ ์ฝ”๋“œ ์ฝ์–ด๋“ค์ด๋Š” ์‹œ๊ฐ„์„ ๋ฒŒ์–ด์ฃผ๊ธฐ
        time.sleep(1)

        ### --------------------------------
        ### [ํ‰์ ] ํƒญ ํด๋ฆญ ์ด๋ฒคํŠธ ๋ฐœ์ƒ์‹œํ‚ค๊ธฐ
        tab_score_path = "#mainContent > div > div.box_detailinfo > div.tabmenu_wrap > ul > li:nth-child(4) > a"
        ### aํƒœ๊ทธ ์ •๋ณด ๊ฐ€์ง€๊ณ  ์˜ค๊ธฐ
        tab_score_element = driver.find_element(By.CSS_SELECTOR, tab_score_path)
        ### [ํ‰์ ] ํƒญ, ์ฆ‰ aํƒœ๊ทธ ํด๋ฆญ ์ด๋ฒคํŠธ ๋ฐœ์ƒ์‹œํ‚ค๊ธฐ
        tab_score_element.click()

        ### [ํ‰์ ] ํŽ˜์ด์ง€๋กœ ์ ‘๊ทผํ–ˆ๋‹ค๋ผ๋Š” ์ •๋ณด๋ฅผ ๋ฐ›์•„์˜ค๊ธฐ
        tab_score_handle = driver.window_handles[-1]
        # - ์ƒˆ๋กœ ์—ด๋ฆฐ ํŽ˜์ด์ง€๋กœ ์ „ํ™˜ํ•˜๊ธฐ
        driver.switch_to.window(tab_score_handle)

        ### ํŽ˜์ด์ง€ ๋กœ๋”ฉ ๋ฐ ์ฝ”๋“œ ์ฝ์–ด๋“ค์ด๋Š” ์‹œ๊ฐ„์„ ๋ฒŒ์–ด์ฃผ๊ธฐ
        time.sleep(1)

        
        ### ------------------------------------
        ### [ํ‰์ ] ๋”๋ณด๊ธฐ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜์—ฌ ๋ชจ๋‘ ํŽผ์น˜๊ธฐ
        ### - ํŽผ์นœ ๊ฐฏ์ˆ˜ ํ™•์ธ ๋ณ€์ˆ˜
        more_view_cnt = 0

        ### ๋ชจ๋‘ ํŽผ์น˜๊ธฐ(๋”๋ณด๊ธฐ) ์ˆ˜ํ–‰
        while True :
            try:
                more_view_path = "#alex-area button.link_fold"
                more_view_element = driver.find_element(By.CSS_SELECTOR , more_view_path)
                more_view_element.click()
                
                ### ์ƒˆ๋กœ ์—ด๋ฆฐ ํŽ˜์ด์ง€ ์ „ํ™˜ํ•˜๊ธฐ # ์ƒˆ๋กœ ์—ด๋ฆฐ ํŽ˜์ด์ง€ ํ•ธ๋“ค๋งํ•ด์„œ ์ง€์ •ํ•˜๊ธฐ
                more_view_handle = driver.window_handles[-1]
                driver.switch_to.window(more_view_handle)
                time.sleep(1)

                # ๋” ๋ณด๊ธฐ ํด๋ฆญ ํšŸ์ˆ˜ ํ™•์ธ์„ ์œ„ํ•ด 1 ์”ฉ ์ฆ๊ฐ€
                more_view_cnt += 1
                
                ### ์ž„์‹œ๋กœ cnt๊ฐฏ์ˆ˜ 2๊ฐœ๋งŒ
                if more_view_cnt ==2:
                    break
                
            except Exception as e:
                ### ๋” ์ด์ƒ ๋” ๋ณด๊ธฐ ๋ฒ„ํŠผ์ด ๋ณด์ด์ง€ ์•Š์œผ๋ฉด ์˜ค๋ฅ˜ ๋ฐœ์ƒ
                # - ์˜ค๋ฅ˜ ๋ฐœ์ƒ ์‹œ์ ์ด ๋” ๋ณด๊ธฐ ๋ฒ„ํŠผ์ด ๋๋‚˜๋Š” ์‹œ์ 
                break
                
        ### ๋” ๋ณด๊ธฐ ํด๋ฆญํšŸ์ˆ˜ ํ™•์ธํ•˜๊ธฐ
        print(f'๋” ๋ณด๊ธฐ ํด๋ฆญ ํšŸ์ˆ˜:  {more_view_cnt}')


        ### ------------------------
        ### ๋ชจ๋“  ํ‰์  ๋ฐ์ดํ„ฐ ์ถ”์ถœํ•˜๊ธฐ
        #comment922467050 > div > div.ratings.rating_10
        score_path = "ul.list_comment div.ratings"
        score_lists = driver.find_elements(By.CSS_SELECTOR, score_path)
        # print(f"ํ‰์  ๊ฐฏ์ˆ˜ : {len(score_lists)}")

        ### ๋ชจ๋“  ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ ์ถ”์ถœํ•˜๊ธฐ
        #comment922467050 > div > p
        comment_path = "ul.list_comment p.desc_txt"
        comment_lists = driver.find_elements(By.CSS_SELECTOR, comment_path)
        # print(f"๋ฆฌ๋ทฐ ๊ฐฏ์ˆ˜ : {len(comment_lists)}")

        ### ----------------------
        ### ํ‰์ , ๋ฆฌ๋ทฐ ์ถ”์ถœํ•˜๊ธฐ
        # - ํ‰์ ์„ ์ด์šฉํ•˜์—ฌ ๊ธ์ •/๋ถ€์ • ๊ฐ’ ์ƒ์„ฑํ•˜๊ธฐ


        ### ํ‰์  ๋˜๋Š” ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์—†์„ ์ˆ˜ ์žˆ๊ธฐ์—
        # - ๋‘๊ฐœ ๋ฆฌ์ŠคํŠธ์˜ ๊ฐฏ์ˆ˜ ์ค‘ ์ž‘์€ ๊ฐ’์„ ์‚ฌ์šฉ
        # - ํ‰์  ๋˜๋Š” ๋ฆฌ๋ทฐ๊ฐ€ ์—†์œผ๋ฉด, ์ˆ˜์ง‘์—์„œ ์ œ์™ธ
        for_cnt = 0
        if len(score_lists) < len(comment_lists) :
            for_cnt = len(score_lists)
        elif len(score_lists) > len(comment_lists) :
            for_cnt = len(comment_lists)
        else :
            for_cnt = len(score_lists)
            
        for j in range(for_cnt):
            ### ํ‰์  ์ถ”์ถœํ•˜๊ธฐ
            score = score_lists[j].text.strip()

            ### ๋ฆฌ๋ทฐ ์ถ”์ถœํ•˜๊ธฐ
            comment = comment_lists[j].text.strip().replace("\n", "")

            ### ํ‰์ ์„ ์ด์šฉํ•ด์„œ ๊ธ์ •/๋ถ€์ • ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
            # - ๊ธ์ • : ํ‰์ ์ด 8 ์ด์ƒ์ธ ๊ฒฝ์šฐ๋กœ, ๊ธ์ •๊ฐ’์€ 1 ์‚ฌ์šฉ
            # - ๋ถ€์ • : ํ‰์ ์ด 4 ์ดํ•˜์ธ ๊ฒฝ์šฐ๋กœ, ๋ถ€์ •๊ฐ’์€ 0 ์‚ฌ์šฉ
            # - ๊ธฐํƒ€ : ๋‚˜๋จธ์ง€, ๊ธฐํƒ€๊ฐ’์€ 2 ์‚ฌ์šฉ
            label = 0
            if int(score) >= 8 :
                label = 1
            elif int(score) <= 4 :
                label = 0
            else :
                label = 2

            ### ๊ฐ ์˜ํ™” ๋ณ„ ํ™•์ธํ•˜๊ธฐ
            print(f"{title} \t{score} \t{comment} \t{label} \n")

            ### ํŒŒ์ผ์— ์“ฐ๊ธฐ
            f.write(f"{title}\t{score}\t{comment}\t{label}\n")

        ### -------------------------
        ### ์˜ํ™” ํ•œํŽธ์— ๋Œ€ํ•œ ์ •๋ณด ์ˆ˜์ง‘์ด ๋๋‚˜๋ฉด ๋‹ค์‹œ ๋ฉ”์ธ์œผ๋กœ ์ด๋™
        # - execute_script() : ์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ ๋ฌธ๋ฒ• ์ฒ˜๋ฆฌ ํ•จ์ˆ˜
        driver.execute_script("window.history.go(-2)")
        time.sleep(1)
            

    ### ์›นํฌ๋กค๋ง ์ฒ˜๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด driver ์ข…๋ฃŒํ•ด์•ผ ํ•จ (๊ผญ!!!)
    # driver.quit()

except Exception as e :
    print(e)

    ### ํŒŒ์ผ ์ž์› ๋‹ซ๊ธฐ
    f.close()

    ### ์›นํฌ๋กค๋ง ์ฒ˜๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด driver ์ข…๋ฃŒํ•ด์•ผ ํ•จ (๊ผญ!!!)
    driver.quit()

finally :
    ### ํŒŒ์ผ ์ž์› ๋‹ซ๊ธฐ
    f.close()
    
    ### ์›นํฌ๋กค๋ง ์ฒ˜๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์™„๋ฃŒ๋˜๋ฉด driver ์ข…๋ฃŒํ•ด์•ผ ํ•จ (๊ผญ!!!)
    driver.quit()

 

 

 

 

 

 

 

 

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•

loading