๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
IT/Python

[Python] ๋จธ์‹ ๋Ÿฌ๋‹ 02_ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

by ITyranno 2023. 12. 21.
728x90
๋ฐ˜์‘ํ˜•

 

 

 

 

 

 

 

 

 

ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์„ธ๊ณ„๋ฅผ ํƒ๊ตฌํ•ฉ์‹œ๋‹ค.

 

 

 

 

 

 

 

 

 

ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

 

 

 

 

ํ›ˆ๋ จ, ๊ฒ€์ฆ, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ ์‹œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ณ€์ˆ˜๋ช…

 

 

 - ์ •์˜๋œ ๋ณ€์ˆ˜ ์ด๋ฆ„์€ ์—†์Œ


 - ํ›ˆ๋ จ๋ฐ์ดํ„ฐ : ํ›ˆ๋ จ(fit)์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ
             : (ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜) train_input, train_x, x_train
             : (ํ›ˆ๋ จ ์ข…์†๋ณ€์ˆ˜) train_target, train_y, y_train
 - ๊ฒ€์ฆ๋ฐ์ดํ„ฐ : ํ›ˆ๋ จ ์ •ํ™•๋„(score)์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ
             : (๊ฒ€์ฆ ๋…๋ฆฝ๋ณ€์ˆ˜) val_input, val_x, x_val
             : (๊ฒ€์ฆ ์ข…์†๋ณ€์ˆ˜) val_target, val_y, y_val
 - ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ : ์˜ˆ์ธก(predict)์— ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ
             : (ํ…Œ์ŠคํŠธ ๋…๋ฆฝ๋ณ€์ˆ˜) test_input, test_x, x_test
             : (ํ…Œ์ŠคํŠธ ์ข…์†๋ณ€์ˆ˜) test_target, test_y, y_test

 

 

 

 

๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ ์ˆœ์„œ

 

 1. ํ›ˆ๋ จ๊ณผ ํ…Œ์ŠคํŠธ๋ฅผ ๋น„์œจ๋กœ ๋จผ์ € ๋‚˜๋ˆ„๊ธฐ
  - ํ›ˆ๋ จ๊ณผ ํ…Œ์ŠคํŠธ ๋น„์œจ : ์ฃผ๋กœ 7:3์„ ์‚ฌ์šฉ, ๋˜๋Š” 7.5:2.5 ๋˜๋Š” 8:2


 2. ํ›ˆ๋ จ๊ณผ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„๊ธฐ
  - ํ›ˆ๋ จ๊ณผ ๊ฒ€์ฆ ๋น„์œจ : ์ฃผ๋กœ 4 : 2 ๋˜๋Š” 6 : 2๋ฅผ ์‚ฌ์šฉ


 3. ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ํ›ˆ๋ จ : ๊ฒ€์ฆ : ํ…Œ์ŠคํŠธ ๋น„์œจ ๋Œ€๋žต => 6 : 2 : 2

 

 

 

 

์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ ์ •์˜ํ•˜๊ธฐ

 

### ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ ์ •์˜ํ•˜๊ธฐ
fish_length = [25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7, 31.0, 31.0, 
                31.5, 32.0, 32.0, 32.0, 33.0, 33.0, 33.5, 33.5, 34.0, 34.0, 34.5, 35.0, 
                35.0, 35.0, 35.0, 36.0, 36.0, 37.0, 38.5, 38.5, 39.5, 41.0, 41.0, 9.8, 
                10.5, 10.6, 11.0, 11.2, 11.3, 11.8, 11.8, 12.0, 12.2, 12.4, 13.0, 14.3, 15.0]

fish_weight = [242.0, 290.0, 340.0, 363.0, 430.0, 450.0, 500.0, 390.0, 450.0, 500.0, 475.0, 500.0, 
                500.0, 340.0, 600.0, 600.0, 700.0, 700.0, 610.0, 650.0, 575.0, 685.0, 620.0, 680.0, 
                700.0, 725.0, 720.0, 714.0, 850.0, 1000.0, 920.0, 955.0, 925.0, 975.0, 950.0, 6.7, 
                7.5, 7.0, 9.7, 9.8, 8.7, 10.0, 9.9, 9.8, 12.2, 13.4, 12.2, 19.7, 19.9]

 

 

 

ํ›ˆ๋ จ์— ์‚ฌ์šฉํ•  2์ฐจ์› ๋ฐ์ดํ„ฐ ํ˜•ํƒœ๋กœ ๋งŒ๋“ค๊ธฐ

 

### ํ›ˆ๋ จ์— ์‚ฌ์šฉํ•  2์ฐจ์› ๋ฐ์ดํ„ฐ ํ˜•ํƒœ๋กœ ๋งŒ๋“ค๊ธฐ
fish_data = [[l, w]for l, w in zip(fish_length, fish_weight)]
print(fish_data)
len(fish_data)

 

 

 

 

 

 

 

์ข…์†๋ณ€์ˆ˜

 

# ์ข…์†๋ณ€์ˆ˜
fish_target = [1]*35 + [0]*14
print(fish_target)
len(fish_target)

 

 

 

 

 

 

 

ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

 

 

ํ›ˆ๋ จ๋ฐ์ดํ„ฐ(train)

 

### ํ›ˆ๋ จ๋ฐ์ดํ„ฐ(train)
# - ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜
train_input = fish_data[ :35]

# - ํ›ˆ๋ จ ์ข…์†๋ณ€์ˆ˜
train_target = fish_target[ :35]

print(len(train_input), len(train_target))

 

 

 

 

 

 

ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ(test)

 

### ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ(test)
# - ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜
test_input = fish_data[35 : ]

# - ํ›ˆ๋ จ ์ข…์†๋ณ€์ˆ˜
test_target = fish_target[35 :  ]

print(len(test_input), len(test_target))

 

 

 

 

 

 

๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ

 

 

from sklearn.neighbors import KNeighborsClassifier

 

 

 

๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑ

 

### ๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑ
# - ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜๋Š” ๊ธฐ๋ณธ๊ฐ’ ์‚ฌ์šฉ
kn = KNeighborsClassifier()
kn

 

 

 

 

 

 

๋ชจ๋ธ ํ›ˆ๋ จ ์‹œํ‚ค๊ธฐ

 

### ๋ชจ๋ธ ํ›ˆ๋ จ ์‹œํ‚ค๊ธฐ
# - ํ›ˆ๋ จ๋ฐ์ดํ„ฐ ์ ์šฉ
kn.fit(train_input, train_target)

 

 

 

 

 

 

ํ›ˆ๋ จ ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ

 

### ํ›ˆ๋ จ ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ
# - ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ
train_score = kn.score(train_input, train_target)

### ๊ฒ€์ฆํ•˜๊ธฐ : ๊ฒ€์ฆ ์ •ํ™•๋„
# - ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ
test_score = kn.score(test_input, test_target)

train_score, test_score

### (ํ•ด์„)
# - ํ›ˆ๋ จ ์ •ํ™•๋„๊ฐ€ 1์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ณผ๋Œ€์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜์˜€์œผ๋ฉฐ,
# - ๊ฒ€์ฆ ์ •ํ™•๋„๊ฐ€ 0์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Œ
# -> ๋”ฐ๋ผ์„œ, ์ด ํ›ˆ๋ จ ๋ชจ๋ธ์€ ํŠœ๋‹์„ ํ†ตํ•ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์‹œ์ผœ์•ผ ํ•  ํ•„์š”์„ฑ์ด ์žˆ์Œ

 

 

 

 

 

 

์›์ธ ๋ถ„์„

 

 - ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ ์‹œ : 35๊ฐœ์˜ ๋„๋ฏธ๊ฐ’์œผ๋กœ๋งŒ ํ›ˆ๋ จ์„ ์‹œ์ผฐ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•œ ๋ฌธ์ œ
 ์ฆ‰, ๊ฒ€์ฆ๋ฐ์ดํ„ฐ๊ฐ€ 0์ด ๋‚˜์™”๋‹ค๋Š” ๊ฒƒ์€, ๋˜๋Š” ๋งค์šฐ ๋‚ฎ์€ ์ •ํ™•๋„๊ฐ€ ๋‚˜์˜จ ๊ฒฝ์šฐ
   *** ๋ฐ์ดํ„ฐ์— ํŽธํ–ฅ์ด ๋ฐœ์ƒํ•˜์˜€์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค๊ณ  ์˜์‹ฌํ•ด ๋ด…๋‹ˆ๋‹ค.
 - ์ƒ˜ํ”Œ๋งํŽธํ–ฅ : ํŠน์ • ๋ฐ์ดํ„ฐ์— ์ง‘์ค‘๋˜์–ด ๋ฐ์ดํ„ฐ๊ฐ€ ๊ตฌ์„ฑ๋˜์–ด ํ›ˆ๋ จ์ด ์ด๋ฃจ์–ด์ง„ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ํ˜„์ƒ
   -> ํ•ด์†Œ ๋ฐฉ๋ฒ• : ํ›ˆ๋ จ/๊ฒ€์ฆ/ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ ์‹œ์— ์ž˜ ์„ž์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (์…”ํ”Œ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.)

 

 

 

 

์ƒ˜ํ”Œ๋ง ํŽธํ–ฅ ํ•ด์†Œํ•˜๊ธฐ

 

# numpy์˜ ์…”ํ”Œ๋ง ํ•จ์ˆ˜ ์‚ฌ์šฉ

 

### ์ƒ˜ํ”Œ๋ง ํŽธํ–ฅ ํ•ด์†Œํ•˜๊ธฐ
# numpy์˜ ์…”ํ”Œ๋ง ํ•จ์ˆ˜ ์‚ฌ์šฉ
import numpy as np

 

 

 

๋„˜ํŒŒ์ด ๋ฐฐ์—ด ํ˜•ํƒœ๋กœ ๋ณ€ํ˜•ํ•˜๊ธฐ

 

### ๋„˜ํŒŒ์ด ๋ฐฐ์—ด ํ˜•ํƒœ๋กœ ๋ณ€ํ˜•ํ•˜๊ธฐ
input_arr = np.array(fish_data)
target_arr = np.array(fish_target)
input_arr
### ๋ฐ์ดํ„ฐ ๊ฐฏ์ˆ˜ ํ™•์ธํ•˜๊ธฐ
# - shape : ์ฐจ์›์„ ํ™•์ธํ•˜๋Š” ๋„˜ํŒŒ์ด ์†์„ฑ(ํ–‰, ์—ด)
input_arr.shape, target_arr.shape

 

 

 

 

 

 

 

๋žœ๋คํ•˜๊ฒŒ ์„ž๊ธฐ

 

### ๋žœ๋คํ•˜๊ฒŒ ์„ž๊ธฐ
# - ๋žœ๋ค ๊ทœ์น™ ์ง€์ •ํ•˜๊ธฐ
# - random.seed(42) : ๋ฐ์ดํ„ฐ๋ฅผ ๋žœ๋คํ•˜๊ฒŒ ์„ž์„ ๋•Œ ๊ทœ์น™์„ฑ์„ ๋„๋„๋ก ์ •์˜
#                   : ์ˆซ์ž ๊ฐ’์€ ์˜๋ฏธ ์—†๋Š” ๊ฐ’์œผ๋กœ ๊ทœ์น™์„ ์˜๋ฏธํ•จ
np.random.seed(42)

 

 

index = np.arange(49)
index

 

 

 

 

np.random.shuffle(index)
index

 

 

 

 

 

ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

 

### ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ํ•˜๊ธฐ
train_input = input_arr[index[ :35]]
train_target = target_arr[index[ :35]]

test_input = input_arr[index[35 : ]]
test_target = target_arr[index[35 : ]]

print(train_input.shape, train_target.shape)
print(test_input.shape, test_target.shape)

 

 

 

 

 

 

์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ

 

### ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
import matplotlib.pyplot as plt

### ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ

plt.scatter(train_input[:, 0], train_input[:, 1], c="red", label="train")
plt.scatter(test_input[:, 0], test_input[:, 1], c="blue", label="test")
plt.xlabel("length")
plt.ylabel("weight")
plt.legend()
plt.show()

 

 

 

 

 

red - ํ›ˆ๋ จ๋ฐ์ดํ„ฐ

blue - ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ

 

 

 

 

๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ

 

### ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ
kn = KNeighborsClassifier()
kn

 

 

 

 

 

 

ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ

 

### ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ
kn.fit(train_input, train_target)

 

 

 

 

 

 

 

ํ›ˆ๋ จ ์ •ํ™•๋„, ๊ฒ€์ฆ ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ

 

### ํ›ˆ๋ จ ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ
kn.score(train_input, train_target)

### ๊ฒ€์ฆ ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ
test_score = kn.score(test_input, test_target)

train_score, test_score

 

 

 

 

 

 

 

์˜ˆ์ธกํ•˜๊ธฐ

 

### ์˜ˆ์ธกํ•˜๊ธฐ
test_pred = kn.predict(test_input)
print(f"predict :{test_pred}")
print(f"์‹ค์ œ๊ฐ’   : {test_target}")

 

 

 

 

 

 

n_neighbors = 19 ์„ค์ • ํ›„ ์‹คํ–‰

 

 

 

 

 

 

์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ ํŠœ๋‹ํ•˜๊ธฐ

 

 

### ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ ํŠœ๋‹ํ•˜๊ธฐ


### 1๋ณด๋‹ค ์ž‘์€ ๊ฐ€์žฅ ์ข‹์€ ์ •ํ™•๋„์ผ ๋•Œ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ ์ฐพ๊ธฐ
## ๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑ
kn = KNeighborsClassifier()
### ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ
kn.fit(train_input, train_target)

### 1๋ณด๋‹ค ์ž‘์€ ๊ฐ€์žฅ ์ข‹์€ ์ •ํ™•๋„์ผ ๋•Œ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ ์ฐพ๊ธฐ
# - ๋ฐ˜๋ณต๋ฌธ ์‚ฌ์šฉ : ๋ฒ”์œ„๋Š” 3 ~ ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐฏ์ˆ˜

### ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์„ ๋•Œ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜๋ฅผ ๋‹ด์„ ๋ณ€์ˆ˜
nCnt = 0
### ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์„ ๋•Œ์˜ ๊ฐ’์„ ๋‹ด์„ ๋ณ€์ˆ˜
nScore = 0

for n in range(3, len(train_input), 2) :
    kn.n_neighbors = n
    score = kn.score(train_input, train_target)
    print(f"{n} / {score}")

    ### 1๋ณด๋‹ค ์ž‘์€ ์ •ํ™•๋„์ธ ๊ฒฝ์šฐ
    if score < 1 :
        ### nScore์˜ ๊ฐ’์ด score๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ ๋‹ด๊ธฐ
        if nScore < score : 
            nScore = score
            nCnt = n

print(f"nCnt = {nCnt} / nScore = {nScore}")

### ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์€ ์‹œ์ ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ
# ํ•˜์ดํผํŒŒ๋ผ๋ฉ”ํ„ฐ ํŠœ๋‹๊ฒฐ๊ณผ, ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ 19๊ฐœ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๋•Œ
# ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ํ™•์ธ๋จ

 

 

 

 

 

 

๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ํ•˜๊ธฐ - 2

 

 

print(fish_length, fish_weight)

 

 

 

 

 

2์ฐจ์› ๋ฐ์ดํ„ฐ ์ƒ์„ฑํ•˜๊ธฐ

 

### 2์ฐจ์› ๋ฐ์ดํ„ฐ ์ƒ์„ฑํ•˜๊ธฐ
fish_data = np.column_stack((fish_length, fish_weight))

### 1์ฐจ์› ๋ฐ์ดํ„ฐ ์ƒ์„ฑํ•˜๊ธฐ
fish_target = np.concatenate((np.ones(35), np.zeros(14)))
fish_target

 

 

 

 

 

 

๋ฐ์ดํ„ฐ ์„ž์œผ๋ฉด์„œ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

 

### ๋จธ์‹ ๋Ÿฌ๋‹, ๋”ฅ๋Ÿฌ๋‹์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜๊ธฐ ํ•จ์ˆ˜
# - ๋žœ๋คํ•˜๊ฒŒ ์„ž์œผ๋ฉด์„œ ๋‘๊ฐœ(ํ›ˆ๋ จ : ํ…Œ์ŠคํŠธ)์˜ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฅ˜ํ•จ
from sklearn.model_selection import train_test_split

 

### ์ฒซ๋ฒˆ์งธ ๊ฐ’ : ๋…๋ฆฝ๋ณ€์ˆ˜
### ๋‘๋ฒˆ์งธ ๊ฐ’ : ์ข…์†๋ณ€์ˆ˜
### test_size = 0.3 : ๋ถ„๋ฅ˜ ๊ธฐ์ค€ (ํ›ˆ๋ จ : ํ…Œ์ŠคํŠธ = 7 : 3)
### random_stat : ๋žœ๋ค ๊ทœ์น™
### stratify=fish_target : ์ข…์†๋ณ€์ˆ˜์˜ ๋ฒ”์ฃผ ๋น„์œจ์„ ํ›ˆ๋ จ๊ณผ ํ…Œ์ŠคํŠธ์˜ ๋น„์œจ ๋Œ€๋น„ ํŽธํ–ฅ ์—†์ด ์กฐ์ •์‹œํ‚ด

### ์ฒซ๋ฒˆ์งธ ๊ฒฐ๊ณผ๊ฐ’ : ํ›ˆ๋ จ ๋…๋ฆฝ๋ณ€์ˆ˜
### ๋‘๋ฒˆ์งธ ๊ฒฐ๊ณผ๊ฐ’ : ํ…Œ์ŠคํŠธ ๋…๋ฆฝ๋ณ€์ˆ˜
### ์„ธ๋ฒˆ์งธ ๊ฒฐ๊ณผ๊ฐ’ : ํ›ˆ๋ จ ์ข…์†๋ณ€์ˆ˜
### ๋„ค๋ฒˆ์งธ ๊ฒฐ๊ณผ๊ฐ’ : ํ…Œ์ŠคํŠธ ์ข…์†๋ณ€์ˆ˜
train_input, test_input, train_target, test_target = train_test_split(fish_data, fish_target,
                                                           test_size=0.3, random_state=42,
                                                           stratify=fish_target)

print(f"{train_input.shape}, {train_target.shape} / {test_input.shape}, {test_target.shape}")

 

 

 

 

 

 

 

๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑํ•˜๊ธฐ

 

 

### ๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑํ•˜๊ธฐ
kn = KNeighborsClassifier(n_neighbors=5)
kn

### ๋ชจ๋ธ ํ›ˆ๋ จ ์‹œํ‚ค๊ธฐ
kn.fit(train_input, train_target)

 

 

 

 

 

 

 

n_neighbors=21 ์„ค์ •

 

 

### ๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑํ•˜๊ธฐ
kn = KNeighborsClassifier(n_neighbors=21)
kn

### ๋ชจ๋ธ ํ›ˆ๋ จ ์‹œํ‚ค๊ธฐ
kn.fit(train_input, train_target)

### ํ›ˆ๋ จ ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ
train_score = kn.score(train_input, train_target)

### ๊ฒ€์ฆ ์ •ํ™•๋„ ํ™•์ธํ•˜๊ธฐ
test_score = kn.score(test_input, test_target)

train_score, test_score

### (ํ•ด์„)
# ๊ณผ๋Œ€์ ํ•ฉ : ํ›ˆ๋ จ > ๊ฒ€์ฆ, ๋˜๋Š” ํ›ˆ๋ จ์ด 1์ธ ๊ฒฝ์šฐ
# ๊ณผ์†Œ์ ํ•ฉ : ํ›ˆ๋Ÿฐ < ๊ฒ€์ฆ, ๋˜๋Š” ๊ฒ€์ฆ์ด 1์ธ ๊ฒฝ์šฐ
# - ๊ณผ์†Œ์ ํ•ฉ์ด ์ผ์–ด๋‚˜๋Š” ๋ชจ๋ธ์€ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Œ
# -๊ณผ๋Œ€์ ํ•ฉ ์ค‘์— ํ›ˆ๋ จ ์ •ํ™•๋„๊ฐ€ 1์ธ ๊ฒฝ์šฐ์˜ ๋ชจ๋ธ์€ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Œ
# - ๊ณผ๋Œ€์ ํ•ฉ์ด ๋ณดํ†ต 0.1 ์ด์ƒ์˜ ์ฐจ์ด๋ฅผ ๋ณด์ด๋ฉด ์ •ํ™•๋„์˜ ์ฐจ์ด๊ฐ€ ๋งŽ์ด ๋‚œ๋‹ค๊ณ  ์˜์‹ฌํ•ด ๋ณผ ์ˆ˜ ์žˆ์Œ
# ๋ชจ๋ธ ์„ ์ • ๊ธฐ์ค€ : ๊ณผ์†Œ์ ํ•ฉ์ด ์ผ์–ด๋‚˜์ง€ ์•Š์œผ๋ฉด์„œ,
#               : ํ›ˆ๋ จ ์ •ํ™•๋„๊ฐ€ 1์ด ์•„๋‹ˆ๊ณ ,
#               : ํ›ˆ๋ จ๊ณผ ๊ฒ€์ฆ์˜ ์ฐจ์ด๊ฐ€ 0.1 ์ด๋‚ด์ธ ๊ฒฝ์šฐ
# *** ์„ ์ •๋œ ๋ชจ๋ธ์„ "์ผ๋ฐ˜ํ™” ๋ชจ๋ธ"์ด๋ผ๊ณ  ์นญํ•ฉ๋‹ˆ๋‹ค.
# ๋‹ค๋งŒ, ์ถ”๊ฐ€๋กœ ์„ ์ • ๊ธฐ์ค€ ์ค‘์— ํ‰๊ฐ€๊ธฐ์ค€์ด ์žˆ์Œ

### *** ๊ฐ€์žฅ ๋ฐ”๋žŒ์งํ•œ ๊ฒฐ๊ณผ๋Š” ํ›ˆ๋ จ > ๊ฒ€์ฆ > ํ…Œ์ŠคํŠธ
#        (ํ›ˆ๋ จ > ๊ฒ€์ฆ < ํ…Œ์ŠคํŠธ์ธ ๊ฒฝ์šฐ๋„ ์žˆ์Œ)


### 1๋ณด๋‹ค ์ž‘์€ ๊ฐ€์žฅ ์ข‹์€ ์ •ํ™•๋„์ผ ๋•Œ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ ์ฐพ๊ธฐ
## ๋ชจ๋ธ(ํด๋ž˜์Šค) ์ƒ์„ฑ
kn = KNeighborsClassifier()
### ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ
kn.fit(train_input, train_target)

### 1๋ณด๋‹ค ์ž‘์€ ๊ฐ€์žฅ ์ข‹์€ ์ •ํ™•๋„์ผ ๋•Œ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ ์ฐพ๊ธฐ
# - ๋ฐ˜๋ณต๋ฌธ ์‚ฌ์šฉ : ๋ฒ”์œ„๋Š” 3 ~ ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ฐฏ์ˆ˜

### ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์„ ๋•Œ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜๋ฅผ ๋‹ด์„ ๋ณ€์ˆ˜
nCnt = 0
### ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์„ ๋•Œ์˜ ๊ฐ’์„ ๋‹ด์„ ๋ณ€์ˆ˜
nScore = 0

for n in range(3, len(train_input), 2) :
    kn.n_neighbors = n
    score = kn.score(train_input, train_target)
    print(f"{n} / {score}")

    ### 1๋ณด๋‹ค ์ž‘์€ ์ •ํ™•๋„์ธ ๊ฒฝ์šฐ
    if score < 1 :
        ### nScore์˜ ๊ฐ’์ด score๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ ๋‹ด๊ธฐ
        if nScore < score : 
            nScore = score
            nCnt = n

print(f"nCnt = {nCnt} / nScore = {nScore}")

### ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์€ ์‹œ์ ์˜ ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ
# ํ•˜์ดํผํŒŒ๋ผ๋ฉ”ํ„ฐ ํŠœ๋‹๊ฒฐ๊ณผ, ์ด์›ƒ์˜ ๊ฐฏ์ˆ˜ 19๊ฐœ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์„ ๋•Œ
# ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ํ™•์ธ๋จ

 

 

 

 

 

 

ํ•ด์„

# ๊ณผ๋Œ€์ ํ•ฉ : ํ›ˆ๋ จ > ๊ฒ€์ฆ, ๋˜๋Š” ํ›ˆ๋ จ์ด 1์ธ ๊ฒฝ์šฐ
# ๊ณผ์†Œ์ ํ•ฉ : ํ›ˆ๋Ÿฐ < ๊ฒ€์ฆ, ๋˜๋Š” ๊ฒ€์ฆ์ด 1์ธ ๊ฒฝ์šฐ
# - ๊ณผ์†Œ์ ํ•ฉ์ด ์ผ์–ด๋‚˜๋Š” ๋ชจ๋ธ์€ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Œ
# -๊ณผ๋Œ€์ ํ•ฉ ์ค‘์— ํ›ˆ๋ จ ์ •ํ™•๋„๊ฐ€ 1์ธ ๊ฒฝ์šฐ์˜ ๋ชจ๋ธ์€ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Œ
# - ๊ณผ๋Œ€์ ํ•ฉ์ด ๋ณดํ†ต 0.1 ์ด์ƒ์˜ ์ฐจ์ด๋ฅผ ๋ณด์ด๋ฉด ์ •ํ™•๋„์˜ ์ฐจ์ด๊ฐ€ ๋งŽ์ด ๋‚œ๋‹ค๊ณ  ์˜์‹ฌํ•ด ๋ณผ ์ˆ˜ ์žˆ์Œ
# ๋ชจ๋ธ ์„ ์ • ๊ธฐ์ค€ : ๊ณผ์†Œ์ ํ•ฉ์ด ์ผ์–ด๋‚˜์ง€ ์•Š์œผ๋ฉด์„œ,
#               : ํ›ˆ๋ จ ์ •ํ™•๋„๊ฐ€ 1์ด ์•„๋‹ˆ๊ณ ,
#               : ํ›ˆ๋ จ๊ณผ ๊ฒ€์ฆ์˜ ์ฐจ์ด๊ฐ€ 0.1 ์ด๋‚ด์ธ ๊ฒฝ์šฐ
# *** ์„ ์ •๋œ ๋ชจ๋ธ์„ "์ผ๋ฐ˜ํ™” ๋ชจ๋ธ"์ด๋ผ๊ณ  ์นญํ•ฉ๋‹ˆ๋‹ค.

# ๋‹ค๋งŒ, ์ถ”๊ฐ€๋กœ ์„ ์ • ๊ธฐ์ค€ ์ค‘์— ํ‰๊ฐ€๊ธฐ์ค€์ด ์žˆ์Œ

 

### *** ๊ฐ€์žฅ ๋ฐ”๋žŒ์งํ•œ ๊ฒฐ๊ณผ๋Š” ํ›ˆ๋ จ > ๊ฒ€์ฆ > ํ…Œ์ŠคํŠธ
#        (ํ›ˆ๋ จ > ๊ฒ€์ฆ < ํ…Œ์ŠคํŠธ์ธ ๊ฒฝ์šฐ๋„ ์žˆ์Œ)

 

 

 

์ž„์˜ ๋ฐ์ดํ„ฐ๋กœ ํ…Œ์ŠคํŠธํ•˜๊ธฐ

 

### ์ž„์˜ ๋ฐ์ดํ„ฐ๋กœ ํ…Œ์ŠคํŠธํ•˜๊ธฐ
kn.predict([[25,250]])

 

 

 

 

 

 

์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ

 

 

### ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ
plt.scatter(train_input[:, 0], train_input[:, 1], c="red", label="bream")
plt.scatter(30, 600, marker="^", c="green", label="pred")
plt.xlabel("length")
plt.ylabel("weight")
plt.legend()
plt.show()

 

 

 

 

 

 

์‚ฌ์šฉ๋œ ์ด์›ƒ ํ™•์ธํ•˜๊ธฐ

 

### ์‚ฌ์šฉ๋œ ์ด์›ƒ ํ™•์ธํ•˜๊ธฐ
kn.kneighbors([[25,150]])

 

 

 

 

 

 

์ด ๋•Œ๋Š” ์œ„์˜ for๋ฌธ ์žˆ๋Š” ๋ถ€๋ถ„ ํ•จ์ˆ˜ ์‹คํ–‰ํ•˜์ง€ ์•Š๊ธฐ

n_neighbors=5 ์„ค์ •

 

 

 

์ด์›ƒ์„ ํฌํ•จํ•˜์—ฌ ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ

 

### ์ด์›ƒ์„ ํฌํ•จํ•˜์—ฌ ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ
plt.scatter(train_input[:, 0], train_input[:, 1], c="red", label="bream")
plt.scatter(25, 150, marker="^", c="green", label="pred")
plt.scatter(train_input[indexes, 0], train_input[indexes, 1], c="blue", label="nei")
plt.xlabel("length")
plt.ylabel("weight")
plt.legend()
plt.show()

### ์˜ˆ์ธก ๊ฒฐ๊ณผ๋Š” ๋น™์–ด๋กœ ํ™•์ธ๋˜์—ˆ์œผ๋‚˜,
# - ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•˜์˜€์„ ๋•Œ์—๋Š” ๋„๋ฏธ์— ๋” ๊ฐ€๊นŒ์šด ๊ฒƒ์œผ๋กœ ํ™•์ธ๋จ
# - ์‹ค์ œ ์ด์›ƒ์„ ํ™•์ธํ•œ ๊ฒฐ๊ณผ ๋ฐฉ์–ด์ชฝ ์ด์›ƒ์„ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Œ
# => ์ด๋Ÿฐ ํ˜„์ƒ์ด ๋ฐœ์ƒํ•œ ์›์ธ : ์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ๋‚˜ํƒ€๋‚˜๋Š” ํ˜„์ƒ
#    -> "์Šค์ผ€์ผ์ด ๋‹ค๋ฅด๋‹ค"๋ผ๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.

### ํ•ด์†Œ๋ฐฉ๋ฒ• : ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•ด์•ผํ•จ

 

 

 

 

 

 

 

์ •๊ทœํ™”ํ•˜๊ธฐ

 

 

 

<ํ˜„์žฌ๊นŒ์ง€ ์ˆ˜ํ–‰ ์ˆœ์„œ>
 1. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
 2. ๋…๋ฆฝ๋ณ€์ˆ˜ 2์ฐจ์›๊ณผ ์ข…์†๋ณ€์ˆ˜ 1์ฐจ์› ๋ฐ์ดํ„ฐ๋กœ ์ทจํ•ฉ
 3. ํ›ˆ๋ จ, ๊ฒ€์ฆ, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ์„ž์œผ๋ฉด์„œ ๋ถ„๋ฆฌ
 4. ํ›ˆ๋ จ, ๊ฒ€์ฆ, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์ค‘์— ๋…๋ฆฝ๋ณ€์ˆ˜์— ๋Œ€ํ•ด์„œ๋งŒ ์ •๊ทœํ™” ์ „์ฒ˜๋ฆฌ ์ˆ˜ํ–‰
 5. ํ›ˆ๋ จ๋ชจ๋ธ ์ƒ์„ฑ
 6. ๋ชจ๋ธ ํ›ˆ๋ จ ์‹œํ‚ค๊ธฐ
 7. ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ ์ •ํ™•๋„ ํ™•์ธ
 8. ํ•˜์ดํผํŒŒ๋ผ๋ฉ”ํ„ฐ ํŠœ๋‹
 9. ์˜ˆ์ธก 

 

 

 

 

์ •๊ทœํ™”

 

 

### ์ •๊ทœํ™” -> ํ‘œ์ค€์ ์ˆ˜ํ™” ํ•˜๊ธฐ
# ํ‘œ์ค€์ ์ˆ˜ = (๊ฐ ๋ฐ์ดํ„ฐ - ๋ฐ์ดํ„ฐ ์ „์ฒด ํ‰๊ท ) / ๋ฐ์ดํ„ฐ ์ „์ฒด ํ‘œ์ค€ํŽธ์ฐจ
# ํ‘œ์ค€์ ์ˆ˜ : ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์›์ (0)์—์„œ ํ‘œ์ค€ํŽธ์ฐจ๋งŒํผ ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฐ’

 

 

 

๋ฐ์ดํ„ฐ ์ „์ฒด ํ‰๊ท  ๊ตฌํ•˜๊ธฐ

 

### ๋ฐ์ดํ„ฐ ์ „์ฒด ํ‰๊ท  ๊ตฌํ•˜๊ธฐ
mean = np.mean(train_input, axis=0)
mean

 

 

 

 

 

 

๋ฐ์ดํ„ฐ ์ „์ฒด ํ‘œ์ค€ํŽธ์ฐจ ๊ตฌํ•˜๊ธฐ

 

### ๋ฐ์ดํ„ฐ ์ „์ฒด ํ‘œ์ค€ํŽธ์ฐจ ๊ตฌํ•˜๊ธฐ
std = np.std(train_input, axis=0)
std

 

 

 

 

์ •๊ทœํ™”(ํ‘œ์ค€์ ์ˆ˜) ์ฒ˜๋ฆฌํ•˜๊ธฐ

 

### ์ •๊ทœํ™”(ํ‘œ์ค€์ ์ˆ˜) ์ฒ˜๋ฆฌํ•˜๊ธฐ
train_scaled = (train_input - mean) / std
train_scaled

 

 

 

 

 

 

 

 

 

์ด์›ƒ์„ ํฌํ•จํ•˜์—ฌ ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ

 

 

### ์ด์›ƒ์„ ํฌํ•จํ•˜์—ฌ ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ
plt.scatter(train_scaled[:, 0], train_scaled[:, 1], c="red", label="bream")
plt.scatter(25, 150, marker="^", c="green", label="pred")
plt.scatter(train_scaled[indexes, 0], train_scaled[indexes, 1], c="blue", label="nei")
plt.xlabel("length")
plt.ylabel("weight")
plt.legend()
plt.show()

 

 

 

 

 

 

 

 

### ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ’๋„ ๋ชจ๋‘ ์ •๊ทœํ™” ์ฒ˜๋ฆฌํ•ด์•ผ ํ•จ
new = ([25, 150] - mean) / std
new

 

 

 

 

 

 

 

 

 

 

 

๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ

 

### ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ
kn = KNeighborsClassifier()

### ๋ชจ๋ธ ํ›ˆ๋ จํ•˜๊ธฐ
kn.fit(train_scaled, train_target)

 

 

 

 

 

 

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ฒ€์ฆํ•˜๊ธฐ

 

### ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ฒ€์ฆํ•˜๊ธฐ
train_score = kn.score(train_scaled, train_target)
train_score

 

 

 

 

 

 

ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ๊ฒ€์ฆํ•˜๊ธฐ

 

### ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ๊ฒ€์ฆํ•˜๊ธฐ
# - ๊ฒ€์ฆ ๋˜๋Š” ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์ผ€์ผ๋ง ์ •๊ทœํ™” ์ฒ˜๋ฆฌ
# - ์ด๋•Œ๋Š” ํ›ˆ๋ จ์—์„œ ์‚ฌ์šฉํ•œ mean๊ณผ std๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ด์•ผ ํ•จ
test_scaled = (test_input - mean) / std
test_score = kn.score(test_scaled, test_target)
test_score

 

 

 

 

 

 

์˜ˆ์ธกํ•˜๊ธฐ

 

### ์˜ˆ์ธกํ•˜๊ธฐ
kn.predict([new])

 

 

 

 

 

 

 

์˜ˆ์ธก์— ์‚ฌ์šฉ๋œ ์ด์›ƒ ํ™•์ธํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•˜๊ธฐ

 

### ์˜ˆ์ธก์— ์‚ฌ์šฉ๋œ ์ด์›ƒ ํ™•์ธํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•˜๊ธฐ
dist, indexes = kn.kneighbors([new])
indexes

 

 

 

 

 

 

### ์ด์›ƒ์„ ํฌํ•จํ•˜์—ฌ ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ
plt.scatter(train_scaled[:, 0], train_scaled[:, 1], c="red", label="bream")
plt.scatter(new[0], new[1], marker="^", c="green", label="pred")
plt.scatter(train_scaled[indexes, 0], train_scaled[indexes, 1], c="blue", label="nei")
plt.xlabel("length")
plt.ylabel("weight")
plt.legend()
plt.show()

 

 

 

 

 

 

 

 

 

 

 

 

728x90
๋ฐ˜์‘ํ˜•

loading