AI/K-Digital Training

010. 네이버 증권에서 달러 환율과 내 주식 데이터 크롤링

찌리남 2021. 8. 6. 13:00

728x90

금 시세와 달러 환율 데이터 크롤링¶

둘 사이의 종가 및 등락율을 분석해서 둘 사이의 상관관계를 알아보는 EDA를 가정합니다.

In [1]:

from urllib.request import urlopen
import requests
import bs4
import pandas as pd

[5]:

source = bs4.BeautifulSoup(src, 'lxml')
source

Out[5]:

In [11]:

import datetime as dt

In [12]:

date = source.find_all('td', class_="date")[0].text.replace('\t','').replace('\n','')

yyyy, mm, dd = [int(x) for x in date.split('.')]
yyyy, mm, dd

Out[12]:

(2021, 7, 30)

In [13]:

this_date = dt.date(yyyy, mm, dd)
this_date

Out[13]:

datetime.date(2021, 7, 30)

In [14]:

def date_format(date):
    yyyy, mm, dd = [int(x) for x in date.split('.')]
    return dt.date(yyyy, mm, dd)

달러 인덱스 (주요국 달러 환율)¶

달러 인덱스 = 50.14348112 EURUSD^-0.576 USDJPY^-0.136 GBPUSD^-0.119 USDCAD^-0.091 USDSEK^-0.042 USDCHF^-0.036
원자재의 가격과 가격 변동률을 크롤링 하는 방법 차용

유로달러 환율 구하기

https://finance.naver.com/marketindex/worldDailyQuote.nhn?fdtc=4&marketindexCd=FX_EURUSD&page=1

In [93]:

index_cd = "FX_EURUSD"
page_n = 1
naver_index = f"https://finance.naver.com/marketindex/worldDailyQuote.nhn?fdtc=4&marketindexCd={index_cd}&page={page_n}"

In [94]:

src = urlopen(naver_index).read()

In [96]:

source = bs4.BeautifulSoup(src, 'lxml')
td= source.find_all('td')

In [97]:

source.find_all('td', class_="date")

Out[97]:

[<td class="date">
 		
 		2021.07.31				
 		</td>,
 <td class="date">
 		
 		2021.07.30				
 		</td>,
 <td class="date">
 		
 		2021.07.29				
 		</td>,
 <td class="date">
 		
 		2021.07.28				
 		</td>,
 <td class="date">
 		
 		2021.07.27				
 		</td>,
 <td class="date">
 		
 		2021.07.26				
 		</td>,
 <td class="date">
 		
 		2021.07.24				
 		</td>]

날짜 크롤링¶

In [98]:

source.find_all('td', class_="date")[0].text

Out[98]:

'\n\t\t\n\t\t2021.07.31\t\t\t\t\n\t\t'

In [99]:

p = source.find_all('td', class_='date')[0].text.replace('\n','').replace('\t','').strip()
p

Out[99]:

'2021.07.31'

In [100]:

type(p)

Out[100]:

str

종가 크롤링¶

In [101]:

source.find_all('td', class_="num")[0].text

Out[101]:

'\n\t\t\t\n\t\t\t\t1.1869\n\t\t\t\t\n\t\t\t\n\t\t'

In [102]:

source.find_all('td', class_="num")[1].text

Out[102]:

'\n 0.0012\n\t\t\t\t\n\t\t\t\n\t\t'

In [103]:

source.find_all('td', class_="num")[2].text

Out[103]:

'\n\t\t +0.10%\n\t\t'

In [104]:

source.find_all('td', class_="num")[3].text

Out[104]:

'\n\t\t\t\n\t\t\t\t1.1857\n\t\t\t\t\n\t\t\t\n\t\t'

0, 3, 6, 9, ... 순서로 종가가 크롤링 됩니다.

등락률 크롤링¶

In [105]:

source.find_all('td', class_="num")[2]

Out[105]:

<td class="num">
		 +0.10%
		</td>

In [109]:

source.find_all('td', class_="num")[2].text

Out[109]:

'\n\t\t +0.10%\n\t\t'

In [110]:

dates = source.find_all('td', class_="date")
prices = source.find_all('td', class_="num")

해당 페이지 데이터 크롤링 방법¶

In [116]:

for i in range(len(dates)):
    this_date = dates[i].text.replace('\n', '').replace('\t', '').strip()
    this_date = date_format(this_date)  
    this_close = prices[i*3].text.replace(',','')
    this_close = float(this_close)
    this_ratio = prices[i*3+2].text.replace('\n','').replace('\t','').replace('%','')
    this_ratio = float(this_ratio)
    print(this_date, this_close, this_ratio)

2021-07-31 1.1869 0.1
2021-07-30 1.1857 -0.21
2021-07-29 1.1883 0.72
2021-07-28 1.1798 -0.23
2021-07-27 1.1826 0.15
2021-07-26 1.1808 0.31
2021-07-24 1.1771 0.07

100페이지에 있는 데이터 크롤링 하기¶

In [119]:

index_cd = "FX_EURUSD"
page_n = 100
naver_index = f"https://finance.naver.com/marketindex/worldDailyQuote.nhn?fdtc=4&marketindexCd={index_cd}&page={page_n}"
src = urlopen(naver_index).read()
source = bs4.BeautifulSoup(src, 'lxml')
dates = source.find_all('td', class_="date")
prices = source.find_all('td', class_="num")
for i in range(len(dates)):
    this_date = dates[i].text
    this_date = date_format(this_date)   
    this_close = prices[i*3].text.replace(',','')
    this_close = float(this_close)
    print(this_date, this_close)

2019-05-15 1.1209
2019-05-14 1.1211
2019-05-13 1.1242
2019-05-11 1.1235
2019-05-10 1.1245
2019-05-09 1.123
2019-05-08 1.1197

In [124]:

def crawl_dollar_index(index_cd, end_page):
    date_list = []
    price_list = []
    ratio_list = []
    for page_n in range(1, end_page+1):
        naver_index = f"https://finance.naver.com/marketindex/worldDailyQuote.nhn?fdtc=4&marketindexCd={index_cd}&page={page_n}"
        src = urlopen(naver_index).read()
        source = bs4.BeautifulSoup(src, 'lxml')
        td = source.find_all('td')
        dates = source.find_all('td', class_='date')
        prices = source.find_all('td', class_='num')
        for i in range(len(dates)):
            this_date = dates[i].text.replace('\n', '').replace('\t', '').strip()
            this_date =date_format(this_date)
            this_close = prices[i*3].text.replace('\n','').replace('\t','').replace(',','')
            this_close = float(this_close)
            this_ratio = prices[i*3+2].text.replace('\n','').replace('\t','').replace('%','')
            this_ratio = float(this_ratio)
            date_list.append(this_date)
            price_list.append(this_close)
            ratio_list.append(this_ratio)
    df = pd.DataFrame({'날짜' : date_list, "체결가" : price_list, "등락률" : ratio_list})
    return df
crawl_dollar_index('FX_EURUSD', 100)

Out[124]:

	날짜	체결가	등락률
0	2021-07-31	1.1869	0.10
1	2021-07-30	1.1857	-0.21
2	2021-07-29	1.1883	0.72
3	2021-07-28	1.1798	-0.23
4	2021-07-27	1.1826	0.15
...	...	...	...
695	2019-05-13	1.1242	0.06
696	2019-05-11	1.1235	-0.08
697	2019-05-10	1.1245	0.13
698	2019-05-09	1.1230	0.29
699	2019-05-08	1.1197	0.08

700 rows × 3 columns

EURUSD 달러 환율 데이터¶

In [125]:

crawl_dollar_index('FX_EURUSD', 100)

Out[125]:

	날짜	체결가	등락률
0	2021-07-31	1.1869	0.10
1	2021-07-30	1.1857	-0.21
2	2021-07-29	1.1883	0.72
3	2021-07-28	1.1798	-0.23
4	2021-07-27	1.1826	0.15
...	...	...	...
695	2019-05-13	1.1242	0.06
696	2019-05-11	1.1235	-0.08
697	2019-05-10	1.1245	0.13
698	2019-05-09	1.1230	0.29
699	2019-05-08	1.1197	0.08

700 rows × 3 columns

JPYUSD 환율 데이터¶

In [127]:

crawl_dollar_index('FX_USDJPY', 100)

Out[127]:

	날짜	체결가	등락률
0	2021-07-31	109.68	-0.06
1	2021-07-30	109.75	0.12
2	2021-07-29	109.61	-0.43
3	2021-07-28	110.09	0.24
4	2021-07-27	109.82	-0.43
...	...	...	...
695	2019-05-13	109.12	-0.74
696	2019-05-11	109.94	0.31
697	2019-05-10	109.59	0.03
698	2019-05-09	109.55	-0.49
699	2019-05-08	110.09	-0.28

700 rows × 3 columns

GBPUSD 환율 데이터¶

In [128]:

crawl_dollar_index('FX_GBPUSD', 100)

Out[128]:

	날짜	체결가	등락률
0	2021-07-31	1.3903	-0.01
1	2021-07-30	1.3905	-0.49
2	2021-07-29	1.3974	0.74
3	2021-07-28	1.3871	-0.08
4	2021-07-27	1.3883	0.44
...	...	...	...
695	2019-05-13	1.2987	-0.06
696	2019-05-11	1.2996	-0.26
697	2019-05-10	1.3031	0.09
698	2019-05-09	1.3018	0.16
699	2019-05-08	1.2997	-0.39

700 rows × 3 columns

USDCAD 환율 데이터¶

In [129]:

crawl_dollar_index('FX_USDCAD', 100)

Out[129]:

	날짜	체결가	등락률
0	2021-07-31	1.2477	-0.04
1	2021-07-30	1.2482	0.29
2	2021-07-29	1.2445	-1.06
3	2021-07-28	1.2579	0.11
4	2021-07-27	1.2565	0.23
...	...	...	...
695	2019-05-13	1.3443	0.16
696	2019-05-11	1.3421	-0.08
697	2019-05-10	1.3433	-0.39
698	2019-05-09	1.3486	0.11
699	2019-05-08	1.3470	-0.02

700 rows × 3 columns

USDSEK 환율 데이터¶

In [130]:

crawl_dollar_index('FX_USDSEK', 100)

Out[130]:

	날짜	체결가	등락률
0	2021-07-31	8.5949	-0.11
1	2021-07-30	8.6050	0.64
2	2021-07-29	8.5502	-0.98
3	2021-07-28	8.6355	0.34
4	2021-07-27	8.6057	-0.25
...	...	...	...
695	2019-05-11	9.6274	0.11
696	2019-05-10	9.6163	-0.29
697	2019-05-09	9.6447	0.50
698	2019-05-08	9.5965	0.16
699	2019-05-07	9.5808	0.16

700 rows × 3 columns

USDCHF 환율 데이터¶

In [131]:

crawl_dollar_index('FX_USDCHF', 100)

Out[131]:

	날짜	체결가	등락률
0	2021-07-31	0.9053	-0.08
1	2021-07-30	0.9061	-0.01
2	2021-07-29	0.9062	-0.87
3	2021-07-28	0.9142	0.00
4	2021-07-27	0.9142	-0.21
...	...	...	...
695	2019-05-13	1.0062	-0.54
696	2019-05-11	1.0117	0.08
697	2019-05-10	1.0108	-0.26
698	2019-05-09	1.0135	-0.45
699	2019-05-08	1.0181	-0.40

700 rows × 3 columns

국제 금이 내가 가진 주식의 헷징 수단이 될 수 있는가¶

내가 가진 주식들 크롤링하기( 네이버, 펄어비스)

In [135]:

url_stock = "https://finance.naver.com/item/sise_day.nhn?code=035420&page=1"
stock_src = urlopen(url_stock).read()
bs4.BeautifulSoup(stock_src, 'lxml')

Out[135]:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>네이버 :: 세상의 모든 지식, 네이버</title>
<style type="text/css">
.error_content * {margin:0;padding:0;}
.error_content img{border:none;}
.error_content em {font-style:normal;}
.error_content {width:410px; margin:80px auto 0; padding:57px 0 0 0; font-size:12px; font-family:"나눔고딕", "NanumGothic", "돋움", Dotum, AppleGothic, Sans-serif; text-align:left; line-height:14px; background:url(https://ssl.pstatic.net/static/common/error/090610/bg_thumb.gif) no-repeat center top; white-space:nowrap;}
.error_content p{margin:0;}
.error_content .error_desc {margin-bottom:21px; overflow:hidden; text-align:center;}
.error_content .error_desc2 {margin-bottom:11px; padding-bottom:7px; color:#888; line-height:18px; border-bottom:1px solid #eee;}
.error_content .error_desc3 {clear:both; color:#888;}
.error_content .error_desc3 a {color:#004790; text-decoration:underline;}
.error_content .error_list_type {clear:both; float:left; width:410px; _width:428px; margin:0 0 18px 0; *margin:0 0 7px 0; padding-bottom:13px; font-size:13px; color:#000; line-height:18px; border-bottom:1px solid #eee;}
.error_content .error_list_type dt {float:left; width:60px; _width /**/:70px; padding-left:10px; background:url(https://ssl.pstatic.net/static/common/error/090610/bg_dot.gif) no-repeat 2px 8px;}
.error_content .error_list_type dd {float:left; width:336px; _width /**/:340px; padding:0 0 0 4px;}
.error_content .error_list_type dd span {color:#339900; letter-spacing:0;}
.error_content .error_list_type dd a{color:#339900;}
.error_content p.btn{margin:29px 0 100px; text-align:center;}
</style>
</head>
<!-- ERROR -->
<body>
<div class="error_content">
<p class="error_desc"><img alt="페이지를 찾을 수 없습니다." height="30" src="https://ssl.pstatic.net/static/common/error/090610/txt_desc5.gif" width="319"/></p>
<p class="error_desc2">방문하시려는 페이지의 주소가 잘못 입력되었거나,<br/>
		페이지의 주소가 변경 혹은 삭제되어 요청하신 페이지를 찾을 수 없습니다.<br/>
		입력하신 주소가 정확한지 다시 한번 확인해 주시기 바랍니다.
	</p>
<p class="error_desc3">관련 문의사항은 <a href="https://help.naver.com/" target="_blank">고객센터</a>에 알려주시면 친절히 안내해드리겠습니다. 감사합니다.</p>
<p class="btn">
<a href="javascript:history.back()"><img alt="이전 페이지로" height="35" src="https://ssl.pstatic.net/static/common/error/090610/btn_prevpage.gif" width="115"/></a>
<a href="https://finance.naver.com"><img alt="금융홈으로" height="35" src="https://ssl.pstatic.net/static/nfinance/btn_home.gif" width="115"/></a>
</p>
</div>
</body>
</html>

In [147]:

stock_cd = '035420'
page_n = 1

url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
headers = {
    'authority': 'finance.naver.com',
    'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
res = requests.get(url_stock, headers=headers)
source = res.text
src = bs4.BeautifulSoup(source, 'lxml')

In [148]:

src = bs4.BeautifulSoup(source, 'lxml')
src

Out[148]:

날짜 크롤링¶

In [200]:

src.find_all('table')[0].find_all('tr')[2].find_all('td')[0]

## /html/body/table[1]/tbody/tr[3]/td[1]/span

Out[200]:

<td align="center"><span class="tah p10 gray03">2021.08.02</span></td>

In [202]:

src.find_all('td', align='center')[0]

Out[202]:

<td align="center"><span class="tah p10 gray03">2021.08.02</span></td>

In [155]:

a = src.find_all('span', class_='tah p10 gray03')[0].text

In [156]:

date_format(a)

Out[156]:

datetime.date(2021, 8, 2)

종가 크롤링¶

In [190]:

src.find_all('td', class_='num')[0]

## /html/body/table[1]/tbody/tr[3]/td[2]/span

Out[190]:

<td class="num"><span class="tah p11">433,500</span></td>

In [192]:

src.find_all('td', class_='num')[6]

Out[192]:

<td class="num"><span class="tah p11">433,500</span></td>

In [193]:

src.find_all('td', class_='num')[12].text

Out[193]:

'439,500'

0, 6, 12, ... 번째 원소들이 종가

해당 페이지 크롤링¶

In [203]:

dates = src.find_all('td', align='center')
prices = src.find_all('td', class_='num')
len(prices)
prices[52]

Out[203]:

<td class="num"><span class="tah p11">428,000</span></td>

In [204]:

for i in range(len(dates)):
    this_time = dates[i].text
    this_time = date_format(this_time)
    this_close = prices[i*6].text.replace(',','')
    this_close = float(this_close)
    print(this_time, this_close)

2021-08-02 433500.0
2021-07-30 433500.0
2021-07-29 439500.0
2021-07-28 442000.0
2021-07-27 452000.0
2021-07-26 452000.0
2021-07-23 452000.0
2021-07-22 440000.0
2021-07-21 428000.0
2021-07-20 439000.0

100 페이지에 있는 데이터 크롤링¶

In [206]:

stock_cd = "035420"
page_n = 100
url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
headers = {
    'authority': 'finance.naver.com',
    'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
res = requests.get(url_stock, headers=headers)
source = res.text
src = bs4.BeautifulSoup(source, 'lxml')
dates = src.find_all('td', align='center')
prices = src.find_all('td', class_='num')
for i in range(len(dates)):
    this_time = dates[i].text
    this_time = date_format(this_time)
    this_price = prices[i*6].text.replace(',','')
    this_price = float(this_price)
    print(this_time, this_price)

2017-07-21 839000.0
2017-07-20 835000.0
2017-07-19 835000.0
2017-07-18 830000.0
2017-07-17 839000.0
2017-07-14 839000.0
2017-07-13 830000.0
2017-07-12 821000.0
2017-07-11 830000.0
2017-07-10 813000.0

In [235]:

def crawl_stock_price(stock_cd, end_page):
    time_list = []
    price_list = []
    for page_n in range(1, end_page+1):
        
        url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
        headers = {
        'authority': 'finance.naver.com',
        'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
        }
        res = requests.get(url_stock, headers=headers)
        source = res.text
        src = bs4.BeautifulSoup(source, 'lxml')
        dates = src.find_all('td', align='center')
        prices = src.find_all('td', class_='num')
        for i in range(len(dates)):
            this_time = dates[i].text
            this_time = date_format(this_time)
            this_price = prices[i*6].text.replace(',','')
            this_price = float(this_price)
            time_list.append(this_time)
            price_list.append(this_price)
    df = pd.DataFrame({"날짜" : time_list, "종가" : price_list})
    return df

In [246]:

stock_cd = "263750"
page_n = 50
url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
headers = {
'authority': 'finance.naver.com',
'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
res = requests.get(url_stock, headers=headers)
source = res.text
src = bs4.BeautifulSoup(source, 'lxml')
dates = src.find_all('td', align='center')
prices = src.find_all('td', class_='num')
a = src.find_all('td', align = 'center')[1].text
type(a)
this_time = src.find_all('td', align = 'center')[0].text
date_format(this_time)
for i in range(len(dates)):
    this_time = dates[i].text
    this_time = date_format(this_time)
    this_close = prices[i*6].text.replace(',','')
    this_clsoe = float(this_close)
    print(this_time, this_close)

2019-08-08 172000
2019-08-07 160700
2019-08-06 162300
2019-08-05 166000
2019-08-02 173900
2019-08-01 172200
2019-07-31 171600
2019-07-30 168300
2019-07-29 165400
2019-07-26 171600

펄어비스 주가¶

In [249]:

crawl_stock_price("263750",95)

Out[249]:

	날짜	종가
0	2021-08-02	76100.0
1	2021-07-30	74000.0
2	2021-07-29	78400.0
3	2021-07-28	76900.0
4	2021-07-27	79000.0
...	...	...
945	2017-09-25	118600.0
946	2017-09-22	119000.0
947	2017-09-21	116200.0
948	2017-09-20	118000.0
949	2017-09-19	114300.0

950 rows × 2 columns

네이버 주가¶

In [240]:

crawl_stock_price('035420', 100)

Out[240]:

	날짜	종가
0	2021-08-02	433500.0
1	2021-07-30	433500.0
2	2021-07-29	439500.0
3	2021-07-28	442000.0
4	2021-07-27	452000.0
...	...	...
995	2017-07-14	839000.0
996	2017-07-13	830000.0
997	2017-07-12	821000.0
998	2017-07-11	830000.0
999	2017-07-10	813000.0

1000 rows × 2 columns

728x90

'AI > K-Digital Training' 카테고리의 다른 글

012. 데이터베이스 개념 요약1 (0)	2021.08.09
011. 네이버 증권에서 내 주식 뉴스 데이터크롤링 (0)	2021.08.07
009. 네이버 금융에서 원자재 시세 데이터 크롤링하기 (0)	2021.08.05
008. json 활용한 뉴스 본문 및 댓글 크롤링 (0)	2021.08.04
007. 로그인 후 웹크롤링 및 셀레니움 활용 웹크롤링 (0)	2021.08.03

현재글010. 네이버 증권에서 달러 환율과 내 주식 데이터 크롤링

찌질하게 리뷰하는 남자

010. 네이버 증권에서 달러 환율과 내 주식 데이터 크롤링

금 시세와 달러 환율 데이터 크롤링¶

달러 인덱스 (주요국 달러 환율)¶

날짜 크롤링¶

종가 크롤링¶

등락률 크롤링¶

해당 페이지 데이터 크롤링 방법¶

100페이지에 있는 데이터 크롤링 하기¶

EURUSD 달러 환율 데이터¶

JPYUSD 환율 데이터¶

GBPUSD 환율 데이터¶

USDCAD 환율 데이터¶

USDSEK 환율 데이터¶

USDCHF 환율 데이터¶

국제 금이 내가 가진 주식의 헷징 수단이 될 수 있는가¶

날짜 크롤링¶

종가 크롤링¶

해당 페이지 크롤링¶

100 페이지에 있는 데이터 크롤링¶

펄어비스 주가¶

네이버 주가¶

'AI > K-Digital Training' 카테고리의 다른 글

'AI/K-Digital Training'의 다른글

티스토리툴바

010. 네이버 증권에서 달러 환율과 내 주식 데이터 크롤링

금 시세와 달러 환율 데이터 크롤링¶

달러 인덱스 (주요국 달러 환율)¶

날짜 크롤링¶

종가 크롤링¶

등락률 크롤링¶

해당 페이지 데이터 크롤링 방법¶

100페이지에 있는 데이터 크롤링 하기¶

EURUSD 달러 환율 데이터¶

JPYUSD 환율 데이터¶

GBPUSD 환율 데이터¶

USDCAD 환율 데이터¶

USDSEK 환율 데이터¶

USDCHF 환율 데이터¶

국제 금이 내가 가진 주식의 헷징 수단이 될 수 있는가¶

날짜 크롤링¶

종가 크롤링¶

해당 페이지 크롤링¶

100 페이지에 있는 데이터 크롤링¶

펄어비스 주가¶

네이버 주가¶

'AI > K-Digital Training' 카테고리의 다른 글

'AI/K-Digital Training'의 다른글

관련글

티스토리툴바