๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
IT/Python

[Python] ๋จธ์‹ ๋Ÿฌ๋‹ 05_๋‹ค์ค‘ํšŒ๊ท€๋ชจ๋ธ_ํŠน์„ฑ๊ณตํ•™(PolynomialFeatures, ๊ทœ์ œ, ๋ฆฟ์ง€(Ridge), ๋ผ์˜)

by ITyranno 2023. 12. 27.
728x90
๋ฐ˜์‘ํ˜•

 

 

 

 

 

 

 

 

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์„ธ๊ณ„๋ฅผ ํƒ๊ตฌํ•ฉ์‹œ๋‹ค.

 

 

 

 

 

 

 

 

 

๋‹ค์ค‘ํšŒ๊ท€๋ชจ๋ธ(Multiple Regression)

 

 

 - ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํŠน์„ฑ์„ ์‚ฌ์šฉํ•œ ํšŒ๊ท€๋ชจ๋ธ
 - ํŠน์„ฑ์ด ๋งŽ์„์ˆ˜๋ก ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€๋จ(ํ›ˆ๋ จ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆผ, ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์— ๋”ฐ๋ผ ๋น ๋ฅผ ์ˆ˜๋„ ์žˆ์Œ)
 - ๋‹ค์ค‘ํšŒ๊ท€๋ชจ๋ธ ๊ณต์‹
   y = a*x1 + b*x2 + c*x3 + .... + y์ ˆํŽธ

 

 

 

 

๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ๋“ค์ด๊ธฐ

 

### ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ๋“ค์ด๊ธฐ
# ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ€์ˆ˜ : df
import pandas as pd

df = pd.read_csv("./data/03_๋†์–ด์˜_๊ธธ์ด_๋†’์ด_๋‘๊ป˜_๋ฐ์ดํ„ฐ.csv")
df.info()
df.head()
df.describe()

 

 

 

 

 

 

๋†์–ด์˜ ๊ธธ์ด, ๋‘๊ป˜, ๋†’์ด ๊ฐ’์„ ์ด์šฉํ•ด์„œ -> ๋ฌด๊ฒŒ ์˜ˆ์ธกํ•˜๊ธฐ

 

 - ๋…๋ฆฝ๋ณ€์ˆ˜ : ๊ธธ์ด, ๋‘๊ป˜, ๋†’์ด
 - ์ข…์†๋ณ€์ˆ˜ : ๋ฌด๊ฒŒ

 

 

 

๋…๋ฆฝ๋ณ€์ˆ˜ ์ƒ์„ฑํ•˜๊ธฐ

 

"""
๋…๋ฆฝ๋ณ€์ˆ˜ ์ƒ์„ฑํ•˜๊ธฐ
 - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ํŠน์„ฑ ์ค‘์— ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ์‚ฌ์šฉํ•  ํŠน์„ฑ๋“ค์„
 - 2์ฐจ์›์˜ ๋ฆฌ์ŠคํŠธ ๋˜๋Š” ๋ฐฐ์—ด ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
"""

perch_full = df.to_numpy()
perch_full.shape

 

 

 

 

 

์ข…์†๋ณ€์ˆ˜ ์ƒ์„ฑํ•˜๊ธฐ

 

"""
์ข…์†๋ณ€์ˆ˜ ์ƒ์„ฑํ•˜๊ธฐ
"""

import numpy as np

### ๋†์–ด ๋ฌด๊ฒŒ
perch_weight = np.array(
                        [5.9, 32.0, 40.0, 51.5, 70.0, 100.0, 78.0, 80.0, 85.0, 85.0, 
                         110.0, 115.0, 125.0, 130.0, 120.0, 120.0, 130.0, 135.0, 110.0, 
                         130.0, 150.0, 145.0, 150.0, 170.0, 225.0, 145.0, 188.0, 180.0, 
                         197.0, 218.0, 300.0, 260.0, 265.0, 250.0, 250.0, 300.0, 320.0, 
                         514.0, 556.0, 840.0, 685.0, 700.0, 700.0, 690.0, 900.0, 650.0, 
                         820.0, 850.0, 900.0, 1015.0, 820.0, 1100.0, 1000.0, 1100.0, 
                         1000.0, 1000.0]
                         )

perch_weight.shape

 

 

 

 

 

 

ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

 

"""
ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ
 - ๋ถ„๋ฅ˜ ๊ธฐ์ค€ : ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ 30%๋กœ, ๋žœ๋ค๊ทœ์น™์€ 42๋ฒˆ
"""
from sklearn.model_selection import train_test_split

train_input, test_input, train_target, test_target = train_test_split(perch_full,
                                                                      perch_weight,
                                                                      test_size=0.3,
                                                                      random_state=42)
print(f"{train_input.shape}, {train_target.shape}")
print(f"{test_input.shape}, {test_target.shape}")

 

 

 

 

 

๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ

 

""" ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ """
### ๋ถ„๋ฅ˜๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑ์‹œํ‚ค๊ธฐ
lr = LinearRegression()
lr

 

 

 

 

 

 

""" ๋ชจ๋ธ ํ›ˆ๋ จ ์‹œํ‚ค๊ธฐ """
### ๋ชจ๋ธ ํ›ˆ๋ จ(ํ•™์Šต) ์‹œํ‚ค๊ธฐ
# - KNN์€ ์ง€๋„ํ•™์Šต ๋ชจ๋ธ : ๋…๋ฆฝ๋ณ€์ˆ˜์™€ ์ข…์†๋ณ€์ˆ˜ ๋ชจ๋‘ ์‚ฌ์šฉ
# - ํ›ˆ๋ จ(ํ•™์Šต)์‹œํ‚ค๋Š” ํ•จ์ˆ˜ : fit()
lr.fit(train_input, train_target)

 

 

 

 

 

 

 

""" ํ›ˆ๋ จ์ •ํ™•๋„ ๋ฐ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ """

train_r2 = lr.score(train_input, train_target)
test_r2 = lr.score(test_input, test_target)

train_r2, test_r2

 

 

 

 

 

 

"""๊ณผ์ ํ•ฉ ์—ฌ๋ถ€ ํŒ๋‹จํ•˜๊ธฐ
  - ํ›ˆ๋ จ๊ณผ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ๊ฒฐ์ •๊ณ„์ˆ˜์˜ ๊ฒฐ๊ณผ๋กœ ๋ณผ ๋•Œ ๊ณผ์†Œ์ ํ•ฉ์€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜์œผ๋ฉฐ,
    ๋˜ํ•œ, 0.07~0.08 ์ •๋„๋กœ ๊ณผ๋Œ€์ ํ•ฉ ๋˜ํ•œ ์ผ์–ด๋‚˜์ง€ ์•Š์€ ์ผ๋ฐ˜ํ™”๋œ ๋ชจ๋ธ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Œ
  - ๋‹ค๋งŒ, ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„๊ฐ€ 0.8๋Œ€์—์„œ 0.9๋Œ€๋กœ ์˜ฌ๋ฆด ์ˆ˜ ์—†์„์ง€ ๊ณ ๋ฏผํ•ด ๋ด…๋‹ˆ๋‹ค.

  - ํŠน์„ฑ๊ณตํ•™์„ ์ ์šฉํ•˜์—ฌ ํŠน์„ฑ ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ง‘์ค‘๋„๋ฅผ ๊ฐ•ํ™”ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋˜๋Š”์ง€ ํ™•์ธ
"""

 

 

 

ํŠน์„ฑ์„ ์ƒ์„ฑํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

 

   - ์‚ฌ์šฉ ํŒจํ‚ค์ง€ : sklearn.preprocessing
   - ์‚ฌ์šฉ ํด๋ž˜์Šค : PolynomialFeatures(๋ณ€ํ™˜๊ธฐ๋ผ๊ณ  ๋ณดํ†ต ์นญํ•ฉ๋‹ˆ๋‹ค.)
   - ์‚ฌ์šฉ ํ•จ์ˆ˜   : fit (ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜์—์„œ ์ƒ์„ฑํ•  ํŠน์„ฑ์˜ ํŒจํ„ด ์ฐพ๊ธฐ), transform (์ฐพ์€ ํŒจํ„ด์œผ๋กœ ํŠน์„ฑ ์ƒ์„ฑํ•˜๊ธฐ)
   - ์ข…์†๋ณ€์ˆ˜๋Š” ์‚ฌ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

 

 

 

 

"""ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ์˜ˆ์ธกํ•˜๊ธฐ"""
test_pred =  lr.predict(test_input)
test_pred

 

 

 

 

 

"""ํ‰๊ท ์ ˆ๋Œ€์˜ค์ฐจ(MAE) ํ™•์ธํ•˜๊ธฐ"""
from sklearn.metrics import mean_absolute_error
mean_absolute_error(test_target, test_pred)

 

 

 

 

 

 

PolynomialFeatures(๋ณ€ํ™˜๊ธฐ๋ผ๊ณ  ๋ณดํ†ต ์นญํ•ฉ๋‹ˆ๋‹ค.)

 

 

### ํŒจํ‚ค์ง€ ์ •์˜ํ•˜๊ธฐ
from sklearn.preprocessing import PolynomialFeatures

 

 

### ํด๋ž˜์Šค ์ƒ์„ฑํ•˜๊ธฐ
# - ํŠน์„ฑ์„ ์ƒ์„ฑ์‹œํ‚ฌ ๋•Œ y์ ˆํŽธ๊ฐ’๋„ ์ƒ์„ฑ์„ ํ•จ๊ป˜ ์‹œํ‚ต๋‹ˆ๋‹ค.
# - ํŠน์„ฑ๋งŒ ์žˆ์œผ๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์— y์ ˆํŽธ์€ ์ƒ์„ฑ์—์„œ ์ œ์™ธ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ
#    -> include_bias = False๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
poly = PolynomialFeatures()
poly

 

 

 

 

 

 

sample ๋ฐ์ดํ„ฐ๋กœ ์–ด๋–ค ํŠน์„ฑ๋“ค์ด ๋งŒ๋“ค์–ด์ง€๋Š”์ง€ ํ™•์ธ ๋จผ์ €

 

### sample ๋ฐ์ดํ„ฐ๋กœ ์–ด๋–ค ํŠน์„ฑ๋“ค์ด ๋งŒ๋“ค์–ด์ง€๋Š”์ง€ ํ™•์ธ ๋จผ์ €
temp_data = [[2, 3, 4]]
temp_data

 

 

 

 

 

 

ํŠน์„ฑ์„ ๋งŒ๋“ค ํŒจํ„ด ์ฐพ๊ธฐ

 

### ํŠน์„ฑ์„ ๋งŒ๋“ค ํŒจํ„ด ์ฐพ๊ธฐ
poly.fit(temp_data)

 

 

 

 

 

์ฐพ์€ ํŒจํ„ด์œผ๋กœ ํŠน์„ฑ ์ƒ์„ฑํ•˜๊ธฐ

 

### ์ฐพ์€ ํŒจํ„ด์œผ๋กœ ํŠน์„ฑ ์ƒ์„ฑํ•˜๊ธฐ
poly.transform(temp_data)

 

 

 

 

 

 

 

์‹ค์ œ ๋…๋ฆฝ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ ํŠน์„ฑ ์ƒ์„ฑํ•˜๊ธฐ

 

### ํด๋ž˜์Šค ์ƒ์„ฑํ•˜๊ธฐ
# - degree=2 : ์ฐจ์›์„ ์˜๋ฏธํ•˜๋ฉฐ 2๋Š” ์ œ๊ณฑ์Šน์„ ์˜๋ฏธํ•จ
#            : 3์„ ๋„ฃ์œผ๋ฉด 2์˜ ์ œ๊ณฑ, 3์˜ ์ œ๊ณฑ์„ ์ˆ˜ํ–‰
#            : 4๋ฅผ ๋„ฃ์œผ๋ฉด 2์˜ ์ œ๊ณฑ, 3์˜ ์ œ๊ณฑ, 4์˜ ์ œ๊ณฑ์Šน์„ ์ˆ˜ํ–‰ํ•จ
#            : ๊ธฐ๋ณธ๊ฐ’์€ 2 (์ƒ๋žตํ•˜๋ฉด 2์˜ ์ œ๊ณฑ์Šน์ด ์ ์šฉ๋จ)
poly = PolynomialFeatures(degree=2, include_bias=False)
poly

 

 

 

 

 

### ํŒจํ„ด ์ฐพ๊ธฐ
# - ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜ ์‚ฌ์šฉ
poly.fit(train_input)

 

 

 

 

 

ํŠน์„ฑ ์ƒ์„ฑํ•˜๊ธฐ

 

### ํŠน์„ฑ ์ƒ์„ฑํ•˜๊ธฐ
### ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜์— ํŠน์„ฑ ์ถ”๊ฐ€ํ•˜๊ธฐ
train_poly = poly.transform(train_input)

### ํ…Œ์ŠคํŠธ ๋…๋ฆฝ๋ณ€์ˆ˜์— ํŠน์„ฑ ์ถ”๊ฐ€ํ•˜๊ธฐ
test_poly = poly.transform(test_input)

train_poly.shape, test_poly.shape

 

 

 

 

 

์‚ฌ์šฉ๋œ ํŒจํ„ด ํ™•์ธํ•˜๊ธฐ

 

### ์‚ฌ์šฉ๋œ ํŒจํ„ด ํ™•์ธํ•˜๊ธฐ
poly.get_feature_names_out()

 

 

 

 

 

### ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ
lr = LinearRegression()
lr

### ๋ชจ๋ธ ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ
lr.fit(train_poly, train_target)

### ํ›ˆ๋ จ ์ •ํ™•๋„ ๋ฐ ํ…Œ์ŠคํŠธ (๊ฒ€์ฆ) ์ •ํ™•๋„ ํ™•์ธ
train_r2 = lr.score(train_poly, train_target)
test_r2 = lr.score(test_poly, test_target)

### ์˜ˆ์ธกํ•˜๊ธฐ
test_pred = lr.predict(test_poly)

### ๋ชจ๋ธ ํ‰๊ฐ€ํ•˜๊ธฐ(MAE)
mae = mean_absolute_error(test_target, test_pred)

train_r2, test_r2, mae

"""
(ํ•ด์„)
 - ํŠน์„ฑ๊ณตํ•™์„ ์ ์šฉํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ์€ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„๊ฐ€ ๋‹ค์†Œ ๋‚ฎ์•˜์œผ๋ฉฐ, ์˜ค์ฐจ๊ฐ€ 50g ์ •๋„์˜€์œผ๋‚˜,
 - ํŠน์„ฑ๊ณตํ•™์„ ์ ์šฉํ•˜์—ฌ ํŠน์„ฑ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํ›ˆ๋ จ ์ง‘์ค‘๋„๋ฅผ ๋†’์˜€์„ ๋•Œ๋Š”
   -> ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„ ๋ชจ๋‘ ๋†’์•„์กŒ์œผ๋ฉฐ, ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š์€ ์ผ๋ฐ˜ํ™” ๋ชจ๋ธ๋กœ
   -> ์˜ค์ฐจ๋Š” 30g ์ •๋„์˜ ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋กœ ํŒ๋‹จ๋จ
 - ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด, ๋…๋ฆฝ๋ณ€์ˆ˜์˜ ํŠน์„ฑ ๊ธธ์ด, ๋‘๊ป˜, ๋†’์ด, ๋‘๊ป˜ 3๊ฐœ์˜ ํŠน์„ฑ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋ฉฐ, ํŠน์„ฑ ์ƒ์„ฑ ์‹œ degree 2 ๋ฅผ ์ ์šฉํ•œ ํŠน์„ฑ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•จ

"""

 

 

 

 

 

ํ•ด์„

 

 - ํŠน์„ฑ๊ณตํ•™์„ ์ ์šฉํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ์€ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„๊ฐ€ ๋‹ค์†Œ ๋‚ฎ์•˜์œผ๋ฉฐ, ์˜ค์ฐจ๊ฐ€ 50g ์ •๋„์˜€์œผ๋‚˜,
 - ํŠน์„ฑ๊ณตํ•™์„ ์ ์šฉํ•˜์—ฌ ํŠน์„ฑ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํ›ˆ๋ จ ์ง‘์ค‘๋„๋ฅผ ๋†’์˜€์„ ๋•Œ๋Š”
   -> ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„ ๋ชจ๋‘ ๋†’์•„์กŒ์œผ๋ฉฐ, ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š์€ ์ผ๋ฐ˜ํ™” ๋ชจ๋ธ๋กœ
   -> ์˜ค์ฐจ๋Š” 30g ์ •๋„์˜ ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋กœ ํŒ๋‹จ๋จ
 - ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด, ๋…๋ฆฝ๋ณ€์ˆ˜์˜ ํŠน์„ฑ ๊ธธ์ด, ๋‘๊ป˜, ๋†’์ด, ๋‘๊ป˜ 3๊ฐœ์˜ ํŠน์„ฑ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋ฉฐ, ํŠน์„ฑ ์ƒ์„ฑ ์‹œ degree 2 ๋ฅผ ์ ์šฉํ•œ ํŠน์„ฑ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•จ

 

 

 

degree 3์œผ๋กœ ๋ณ€๊ฒฝํ–ˆ์„ ๋•Œ

 

### ์‚ฌ์šฉ๋œ ํŒจํ„ด ํ™•์ธํ•˜๊ธฐ
poly.get_feature_names_out()

 

 

 

 

### ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ
lr = LinearRegression()
lr

### ๋ชจ๋ธ ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ
lr.fit(train_poly, train_target)

### ํ›ˆ๋ จ ์ •ํ™•๋„ ๋ฐ ํ…Œ์ŠคํŠธ (๊ฒ€์ฆ) ์ •ํ™•๋„ ํ™•์ธ
train_r2 = lr.score(train_poly, train_target)
test_r2 = lr.score(test_poly, test_target)

### ์˜ˆ์ธกํ•˜๊ธฐ
test_pred = lr.predict(test_poly)

### ๋ชจ๋ธ ํ‰๊ฐ€ํ•˜๊ธฐ(MAE)
mae = mean_absolute_error(test_target, test_pred)

train_r2, test_r2, mae

 

 

 

 

 

 

 

๊ทœ์ œ

 

  - ๊ณผ๋Œ€ ๋˜๋Š” ๊ณผ์†Œ ์ ํ•ฉ ์ค‘์— ์ฃผ๋กœ ๊ณผ๋Œ€์ ํ•ฉ์ด ๋ฐœ์ƒํ–ˆ์„ ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  - ํ›ˆ๋ จ์˜ ์ •ํ™•๋„๊ฐ€ ๋‹ค์†Œ ๋‚ฎ์•„์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋‚˜, ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Œ
  - ํ›ˆ๋ จ๋ชจ๋ธ์„ ์ผ๋ฐ˜ํ™”ํ•˜๋Š” ๋ฐ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•์ž„
  - ๊ทœ์ œ ๊ฐœ๋…์„ ์ ์šฉํ•œ ํ–ฅ์ƒ๋œ ๋ชจ๋ธ : ๋ฆฟ์ง€(Ridge)์™€ ๋ผ์˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

 

 

 

๊ทœ์ œ ์ˆœ์„œ

 

 1. ์ •๊ทœํ™”(๋‹จ์œ„(์Šค์ผ€์ผ)์„ ํ‘œ์ค€ํ™” ์‹œํ‚ค๋Š” ๋ฐฉ์‹)
 2. ๊ทœ์ œ๊ฐ€ ์ ์šฉ๋œ ๋ชจ๋ธ ํ›ˆ๋ จ/๊ฒ€์ฆ

 

 

 

ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋…๋ฆฝ๋ณ€์ˆ˜ ์ •๊ทœํ™”ํ•˜๊ธฐ

 

 

### ํ˜„์žฌ ์‚ฌ์šฉ๋˜๋Š” ๋…๋ฆฝ๋ณ€์ˆ˜
train_poly.shape, test_poly.shape

 

 

 

 

 

"""์ •๊ทœํ™”๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ"""
from sklearn.preprocessing import StandardScaler

 

 

 

์ •๊ทœํ™” ์ˆœ์„œ

 

 

  1. ์ •๊ทœํ™” ํด๋ž˜์Šค ์ƒ์„ฑ
  2. fit() : ์ •๊ทœํ™” ํŒจํ„ด ์ฐพ๊ธฐ (ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜ ์‚ฌ์šฉ)
  3. transform() : ์ฐพ์€ ํŒจํ„ด์œผ๋กœ ์ •๊ทœํ™” ๋ฐ์ดํ„ฐ๋กœ ๋ณ€ํ™˜ (ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋…๋ฆฝ๋ณ€์ˆ˜ ๋ณ€ํ™˜)

 

 

### ์ •๊ทœํ™” ํด๋ž˜์Šค ์ƒ์„ฑํ•˜๊ธฐ
ss = StandardScaler()
ss

 

### ์ •๊ทœํ™” ํŒจํ„ด ์ฐพ๊ธฐ
ss.fit(train_poly)

 

### ์ฐพ์€ ํŒจํ„ด์œผ๋กœ ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋…๋ฆฝ๋ณ€์ˆ˜ ๋ณ€ํ™˜ ์ƒ์„ฑํ•˜๊ธฐ
train_scaled = ss.transform(train_poly)
test_scaled = ss.transform(test_poly)
print(f"{train_scaled.shape} / {test_scaled.shape}")

 

 

 

 

 

๋ฆฟ์ง€(Ridge) ๋ชจ๋ธ

 

### ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ
from sklearn.linear_model import Ridge

ridge = Ridge()
ridge

 

 

 

 

 

### ๋ชจ๋ธ ํ›ˆ๋ จ ์‹œํ‚ค๊ธฐ
ridge.fit(train_scaled, train_target)

 

 

 

 

 

 

### ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ
train_r2 = ridge.score(train_scaled, train_target)
test_r2 = ridge.score(test_scaled, test_target)
train_r2, test_r2

 

 

 

 

 

### ์˜ˆ์ธกํ•˜๊ธฐ
test_pred = ridge.predict(test_scaled)

 

 

### ํ‰๊ฐ€ํ•˜๊ธฐ(MAE)
mae = mean_absolute_error(test_target, test_pred)

train_r2, test_r2, mae

 

 

 

 

 

ํ•ด์„

 

 

- ๊ณผ์ ํ•ฉ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•œ ๊ฒฐ๊ณผ, ๊ณผ์†Œ์ ํ•ฉ์€ ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜์œผ๋ฉฐ,
- ๊ธฐ์กด ํŠน์„ฑ๊ณตํ•™์„ ์ ์šฉํ•œ ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋ณด๋‹ค๋Š” ํ›ˆ๋ จ์ •ํ™•๋„๋Š” 0.005์ •๋„ ๋‚ฎ์•„์กŒ์ง€๋งŒ,
  ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„๋Š” 0.013์ •๋„ ๋†’์•„์กŒ์Œ
- ๋˜ํ•œ, ํ‰๊ท ์ ˆ๋Œ€์˜ค์ฐจ(MAE)๋„ 1g ๋‚ฎ์•„์กŒ์Œ
- ๋”ฐ๋ผ์„œ, ์ผ๋ฐ˜ํ™”๋˜๊ณ  ์˜ค์ฐจ๊ฐ€ ์ž‘์€ Ridge(๋ฆฟ์ง€)๋ชจ๋ธ์€ ๋งค์šฐ ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋กœ ํŒ๋‹จ๋จ

 

 

 

๋ผ์˜ (Lasso) ๋ชจ๋ธ

 

 

### ์‚ฌ์šฉํ•  ํŒจํ‚ค์ง€
from sklearn.linear_model import Lasso

 

 

### ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ
lasso = Lasso()
lasso

 

 

 

 

### ๋ชจ๋ธ ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ
lasso.fit(train_scaled, train_target)

 

 

 

 

### ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ(ํ…Œ์ŠคํŠธ) ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ
train_r2 = lasso.score(train_scaled, train_target)
test_r2 = lasso.score(test_scaled, test_target)
train_r2, test_r2

 

 

 

 

"""์˜ˆ์ธกํ•˜๊ธฐ"""
test_pred = lasso.predict(test_scaled)

"""ํ‰๊ฐ€ํ•˜๊ธฐ"""
mae = mean_absolute_error(test_target, test_pred)

train_r2, test_r2, mae

 

 

 

 

 

ํ•ด์„

 

 - 0.0002 ์ •๋„์˜ ๊ณผ์†Œ์ ํ•ฉ์ด ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ž„
 - ์˜ค์ฐจ๊ฐ’๋„ 3g์ •๋„ ์ž‘์•„์กŒ์Œ
 - ๊ณผ์†Œ์ ํ•ฉ์ด ๋ฏธ์„ธํ•œ ์ฐจ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฆฟ์ง€ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋‚˜์œ ๋ชจ๋ธ์€ ์•„๋‹ˆ์ง€๋งŒ, 

    ์‚ฌ์šฉํ•˜๊ธฐ์—๋Š” ๋ฏธํกํ•œ ๋ถ€๋ถ„์œผ๋กœ ํŒ๋‹จ๋จ

 

 

 

ํ•˜์ดํผํŒŒ๋ผ๋ฉ”ํ„ฐ ํŠœ๋‹ํ•˜๊ธฐ (๊ทœ์ œ ์ ์šฉ)

 

 

๋ฆฟ์ง€(Ridge) ๋ชจ๋ธ ๊ทœ์ œ ํŠœ๋‹ํ•˜๊ธฐ

 

 - alpha : ๊ทœ์ œ๊ฐ•๋„ ๊ฐ’
 - ๊ฐ’์˜ ๋ฒ”์œ„ 0.001 ~ 100 ์‚ฌ์ด์˜ ๊ฐ’
 - ๊ฐ’์ด ์ž‘์„์ˆ˜๋ก ํ›ˆ๋ จ ์ •ํ™•๋„๋Š” ๋‚ฎ์•„์ง€๋ฉด์„œ, ๊ณผ์ ํ•ฉ์— ๋„์›€์„ ์ฃผ๊ฒŒ ๋จ
 - ๊ฐ’์ด ์ปค์งˆ์ˆ˜๋ก ํ›ˆ๋ จ ์ •ํ™•๋„๋Š” ๋†’์•„์ง. ๊ณผ์ ํ•ฉ์—๋Š” ๋„์›€์ด ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ.

 

 

"""
๋ฆฟ์ง€(Ridge) ๋ชจ๋ธ ๊ทœ์ œ ํŠœ๋‹ํ•˜๊ธฐ
 - alpha : ๊ทœ์ œ๊ฐ•๋„ ๊ฐ’
 - ๊ฐ’์˜ ๋ฒ”์œ„ 0.001 ~ 100 ์‚ฌ์ด์˜ ๊ฐ’
 - ๊ฐ’์ด ์ž‘์„์ˆ˜๋ก ํ›ˆ๋ จ ์ •ํ™•๋„๋Š” ๋‚ฎ์•„์ง€๋ฉด์„œ, ๊ณผ์ ํ•ฉ์— ๋„์›€์„ ์ฃผ๊ฒŒ ๋จ
 - ๊ฐ’์ด ์ปค์งˆ์ˆ˜๋ก ํ›ˆ๋ จ ์ •ํ™•๋„๋Š” ๋†’์•„์ง. ๊ณผ์ ํ•ฉ์—๋Š” ๋„์›€์ด ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ.
 - ๊ธฐ๋ณธ ๊ฐ’ = 1
"""

ridge = Ridge(alpha = 0.1)
ridge.fit(train_scaled, train_target)
ridge.score(train_scaled, train_target), ridge.score(test_scaled, test_target)
# 1 = (0.9849041294689239, 0.9845173591615219)
# 0.1 = (0.9882780161390031, 0.9868237771849514)

 

 

 

 

 

 

 

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•

loading