728x90
금 시세와 달러 환율 데이터 크롤링¶
- 둘 사이의 종가 및 등락율을 분석해서 둘 사이의 상관관계를 알아보는 EDA를 가정합니다.
In [1]:
from urllib.request import urlopen
import requests
import bs4
import pandas as pd
[5]:
source = bs4.BeautifulSoup(src, 'lxml')
source
Out[5]:
In [11]:
import datetime as dt
In [12]:
date = source.find_all('td', class_="date")[0].text.replace('\t','').replace('\n','')
yyyy, mm, dd = [int(x) for x in date.split('.')]
yyyy, mm, dd
Out[12]:
(2021, 7, 30)
In [13]:
this_date = dt.date(yyyy, mm, dd)
this_date
Out[13]:
datetime.date(2021, 7, 30)
In [14]:
def date_format(date):
yyyy, mm, dd = [int(x) for x in date.split('.')]
return dt.date(yyyy, mm, dd)
달러 인덱스 (주요국 달러 환율)¶
- 달러 인덱스 = 50.14348112 EURUSD^-0.576 USDJPY^-0.136 GBPUSD^-0.119 USDCAD^-0.091 USDSEK^-0.042 USDCHF^-0.036
- 원자재의 가격과 가격 변동률을 크롤링 하는 방법 차용
- 유로달러 환율 구하기
In [93]:
index_cd = "FX_EURUSD"
page_n = 1
naver_index = f"https://finance.naver.com/marketindex/worldDailyQuote.nhn?fdtc=4&marketindexCd={index_cd}&page={page_n}"
In [94]:
src = urlopen(naver_index).read()
In [96]:
source = bs4.BeautifulSoup(src, 'lxml')
td= source.find_all('td')
In [97]:
source.find_all('td', class_="date")
Out[97]:
[<td class="date">
2021.07.31
</td>,
<td class="date">
2021.07.30
</td>,
<td class="date">
2021.07.29
</td>,
<td class="date">
2021.07.28
</td>,
<td class="date">
2021.07.27
</td>,
<td class="date">
2021.07.26
</td>,
<td class="date">
2021.07.24
</td>]
날짜 크롤링¶
In [98]:
source.find_all('td', class_="date")[0].text
Out[98]:
'\n\t\t\n\t\t2021.07.31\t\t\t\t\n\t\t'
In [99]:
p = source.find_all('td', class_='date')[0].text.replace('\n','').replace('\t','').strip()
p
Out[99]:
'2021.07.31'
In [100]:
type(p)
Out[100]:
str
종가 크롤링¶
In [101]:
source.find_all('td', class_="num")[0].text
Out[101]:
'\n\t\t\t\n\t\t\t\t1.1869\n\t\t\t\t\n\t\t\t\n\t\t'
In [102]:
source.find_all('td', class_="num")[1].text
Out[102]:
'\n 0.0012\n\t\t\t\t\n\t\t\t\n\t\t'
In [103]:
source.find_all('td', class_="num")[2].text
Out[103]:
'\n\t\t +0.10%\n\t\t'
In [104]:
source.find_all('td', class_="num")[3].text
Out[104]:
'\n\t\t\t\n\t\t\t\t1.1857\n\t\t\t\t\n\t\t\t\n\t\t'
0, 3, 6, 9, ... 순서로 종가가 크롤링 됩니다.
등락률 크롤링¶
In [105]:
source.find_all('td', class_="num")[2]
Out[105]:
<td class="num">
+0.10%
</td>
In [109]:
source.find_all('td', class_="num")[2].text
Out[109]:
'\n\t\t +0.10%\n\t\t'
In [110]:
dates = source.find_all('td', class_="date")
prices = source.find_all('td', class_="num")
해당 페이지 데이터 크롤링 방법¶
In [116]:
for i in range(len(dates)):
this_date = dates[i].text.replace('\n', '').replace('\t', '').strip()
this_date = date_format(this_date)
this_close = prices[i*3].text.replace(',','')
this_close = float(this_close)
this_ratio = prices[i*3+2].text.replace('\n','').replace('\t','').replace('%','')
this_ratio = float(this_ratio)
print(this_date, this_close, this_ratio)
2021-07-31 1.1869 0.1
2021-07-30 1.1857 -0.21
2021-07-29 1.1883 0.72
2021-07-28 1.1798 -0.23
2021-07-27 1.1826 0.15
2021-07-26 1.1808 0.31
2021-07-24 1.1771 0.07
100페이지에 있는 데이터 크롤링 하기¶
In [119]:
index_cd = "FX_EURUSD"
page_n = 100
naver_index = f"https://finance.naver.com/marketindex/worldDailyQuote.nhn?fdtc=4&marketindexCd={index_cd}&page={page_n}"
src = urlopen(naver_index).read()
source = bs4.BeautifulSoup(src, 'lxml')
dates = source.find_all('td', class_="date")
prices = source.find_all('td', class_="num")
for i in range(len(dates)):
this_date = dates[i].text
this_date = date_format(this_date)
this_close = prices[i*3].text.replace(',','')
this_close = float(this_close)
print(this_date, this_close)
2019-05-15 1.1209
2019-05-14 1.1211
2019-05-13 1.1242
2019-05-11 1.1235
2019-05-10 1.1245
2019-05-09 1.123
2019-05-08 1.1197
In [124]:
def crawl_dollar_index(index_cd, end_page):
date_list = []
price_list = []
ratio_list = []
for page_n in range(1, end_page+1):
naver_index = f"https://finance.naver.com/marketindex/worldDailyQuote.nhn?fdtc=4&marketindexCd={index_cd}&page={page_n}"
src = urlopen(naver_index).read()
source = bs4.BeautifulSoup(src, 'lxml')
td = source.find_all('td')
dates = source.find_all('td', class_='date')
prices = source.find_all('td', class_='num')
for i in range(len(dates)):
this_date = dates[i].text.replace('\n', '').replace('\t', '').strip()
this_date =date_format(this_date)
this_close = prices[i*3].text.replace('\n','').replace('\t','').replace(',','')
this_close = float(this_close)
this_ratio = prices[i*3+2].text.replace('\n','').replace('\t','').replace('%','')
this_ratio = float(this_ratio)
date_list.append(this_date)
price_list.append(this_close)
ratio_list.append(this_ratio)
df = pd.DataFrame({'날짜' : date_list, "체결가" : price_list, "등락률" : ratio_list})
return df
crawl_dollar_index('FX_EURUSD', 100)
Out[124]:
날짜 | 체결가 | 등락률 | |
---|---|---|---|
0 | 2021-07-31 | 1.1869 | 0.10 |
1 | 2021-07-30 | 1.1857 | -0.21 |
2 | 2021-07-29 | 1.1883 | 0.72 |
3 | 2021-07-28 | 1.1798 | -0.23 |
4 | 2021-07-27 | 1.1826 | 0.15 |
... | ... | ... | ... |
695 | 2019-05-13 | 1.1242 | 0.06 |
696 | 2019-05-11 | 1.1235 | -0.08 |
697 | 2019-05-10 | 1.1245 | 0.13 |
698 | 2019-05-09 | 1.1230 | 0.29 |
699 | 2019-05-08 | 1.1197 | 0.08 |
700 rows × 3 columns
EURUSD 달러 환율 데이터¶
In [125]:
crawl_dollar_index('FX_EURUSD', 100)
Out[125]:
날짜 | 체결가 | 등락률 | |
---|---|---|---|
0 | 2021-07-31 | 1.1869 | 0.10 |
1 | 2021-07-30 | 1.1857 | -0.21 |
2 | 2021-07-29 | 1.1883 | 0.72 |
3 | 2021-07-28 | 1.1798 | -0.23 |
4 | 2021-07-27 | 1.1826 | 0.15 |
... | ... | ... | ... |
695 | 2019-05-13 | 1.1242 | 0.06 |
696 | 2019-05-11 | 1.1235 | -0.08 |
697 | 2019-05-10 | 1.1245 | 0.13 |
698 | 2019-05-09 | 1.1230 | 0.29 |
699 | 2019-05-08 | 1.1197 | 0.08 |
700 rows × 3 columns
JPYUSD 환율 데이터¶
In [127]:
crawl_dollar_index('FX_USDJPY', 100)
Out[127]:
날짜 | 체결가 | 등락률 | |
---|---|---|---|
0 | 2021-07-31 | 109.68 | -0.06 |
1 | 2021-07-30 | 109.75 | 0.12 |
2 | 2021-07-29 | 109.61 | -0.43 |
3 | 2021-07-28 | 110.09 | 0.24 |
4 | 2021-07-27 | 109.82 | -0.43 |
... | ... | ... | ... |
695 | 2019-05-13 | 109.12 | -0.74 |
696 | 2019-05-11 | 109.94 | 0.31 |
697 | 2019-05-10 | 109.59 | 0.03 |
698 | 2019-05-09 | 109.55 | -0.49 |
699 | 2019-05-08 | 110.09 | -0.28 |
700 rows × 3 columns
GBPUSD 환율 데이터¶
In [128]:
crawl_dollar_index('FX_GBPUSD', 100)
Out[128]:
날짜 | 체결가 | 등락률 | |
---|---|---|---|
0 | 2021-07-31 | 1.3903 | -0.01 |
1 | 2021-07-30 | 1.3905 | -0.49 |
2 | 2021-07-29 | 1.3974 | 0.74 |
3 | 2021-07-28 | 1.3871 | -0.08 |
4 | 2021-07-27 | 1.3883 | 0.44 |
... | ... | ... | ... |
695 | 2019-05-13 | 1.2987 | -0.06 |
696 | 2019-05-11 | 1.2996 | -0.26 |
697 | 2019-05-10 | 1.3031 | 0.09 |
698 | 2019-05-09 | 1.3018 | 0.16 |
699 | 2019-05-08 | 1.2997 | -0.39 |
700 rows × 3 columns
USDCAD 환율 데이터¶
In [129]:
crawl_dollar_index('FX_USDCAD', 100)
Out[129]:
날짜 | 체결가 | 등락률 | |
---|---|---|---|
0 | 2021-07-31 | 1.2477 | -0.04 |
1 | 2021-07-30 | 1.2482 | 0.29 |
2 | 2021-07-29 | 1.2445 | -1.06 |
3 | 2021-07-28 | 1.2579 | 0.11 |
4 | 2021-07-27 | 1.2565 | 0.23 |
... | ... | ... | ... |
695 | 2019-05-13 | 1.3443 | 0.16 |
696 | 2019-05-11 | 1.3421 | -0.08 |
697 | 2019-05-10 | 1.3433 | -0.39 |
698 | 2019-05-09 | 1.3486 | 0.11 |
699 | 2019-05-08 | 1.3470 | -0.02 |
700 rows × 3 columns
USDSEK 환율 데이터¶
In [130]:
crawl_dollar_index('FX_USDSEK', 100)
Out[130]:
날짜 | 체결가 | 등락률 | |
---|---|---|---|
0 | 2021-07-31 | 8.5949 | -0.11 |
1 | 2021-07-30 | 8.6050 | 0.64 |
2 | 2021-07-29 | 8.5502 | -0.98 |
3 | 2021-07-28 | 8.6355 | 0.34 |
4 | 2021-07-27 | 8.6057 | -0.25 |
... | ... | ... | ... |
695 | 2019-05-11 | 9.6274 | 0.11 |
696 | 2019-05-10 | 9.6163 | -0.29 |
697 | 2019-05-09 | 9.6447 | 0.50 |
698 | 2019-05-08 | 9.5965 | 0.16 |
699 | 2019-05-07 | 9.5808 | 0.16 |
700 rows × 3 columns
USDCHF 환율 데이터¶
In [131]:
crawl_dollar_index('FX_USDCHF', 100)
Out[131]:
날짜 | 체결가 | 등락률 | |
---|---|---|---|
0 | 2021-07-31 | 0.9053 | -0.08 |
1 | 2021-07-30 | 0.9061 | -0.01 |
2 | 2021-07-29 | 0.9062 | -0.87 |
3 | 2021-07-28 | 0.9142 | 0.00 |
4 | 2021-07-27 | 0.9142 | -0.21 |
... | ... | ... | ... |
695 | 2019-05-13 | 1.0062 | -0.54 |
696 | 2019-05-11 | 1.0117 | 0.08 |
697 | 2019-05-10 | 1.0108 | -0.26 |
698 | 2019-05-09 | 1.0135 | -0.45 |
699 | 2019-05-08 | 1.0181 | -0.40 |
700 rows × 3 columns
국제 금이 내가 가진 주식의 헷징 수단이 될 수 있는가¶
- 내가 가진 주식들 크롤링하기( 네이버, 펄어비스)
In [135]:
url_stock = "https://finance.naver.com/item/sise_day.nhn?code=035420&page=1"
stock_src = urlopen(url_stock).read()
bs4.BeautifulSoup(stock_src, 'lxml')
Out[135]:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title>네이버 :: 세상의 모든 지식, 네이버</title>
<style type="text/css">
.error_content * {margin:0;padding:0;}
.error_content img{border:none;}
.error_content em {font-style:normal;}
.error_content {width:410px; margin:80px auto 0; padding:57px 0 0 0; font-size:12px; font-family:"나눔고딕", "NanumGothic", "돋움", Dotum, AppleGothic, Sans-serif; text-align:left; line-height:14px; background:url(https://ssl.pstatic.net/static/common/error/090610/bg_thumb.gif) no-repeat center top; white-space:nowrap;}
.error_content p{margin:0;}
.error_content .error_desc {margin-bottom:21px; overflow:hidden; text-align:center;}
.error_content .error_desc2 {margin-bottom:11px; padding-bottom:7px; color:#888; line-height:18px; border-bottom:1px solid #eee;}
.error_content .error_desc3 {clear:both; color:#888;}
.error_content .error_desc3 a {color:#004790; text-decoration:underline;}
.error_content .error_list_type {clear:both; float:left; width:410px; _width:428px; margin:0 0 18px 0; *margin:0 0 7px 0; padding-bottom:13px; font-size:13px; color:#000; line-height:18px; border-bottom:1px solid #eee;}
.error_content .error_list_type dt {float:left; width:60px; _width /**/:70px; padding-left:10px; background:url(https://ssl.pstatic.net/static/common/error/090610/bg_dot.gif) no-repeat 2px 8px;}
.error_content .error_list_type dd {float:left; width:336px; _width /**/:340px; padding:0 0 0 4px;}
.error_content .error_list_type dd span {color:#339900; letter-spacing:0;}
.error_content .error_list_type dd a{color:#339900;}
.error_content p.btn{margin:29px 0 100px; text-align:center;}
</style>
</head>
<!-- ERROR -->
<body>
<div class="error_content">
<p class="error_desc"><img alt="페이지를 찾을 수 없습니다." height="30" src="https://ssl.pstatic.net/static/common/error/090610/txt_desc5.gif" width="319"/></p>
<p class="error_desc2">방문하시려는 페이지의 주소가 잘못 입력되었거나,<br/>
페이지의 주소가 변경 혹은 삭제되어 요청하신 페이지를 찾을 수 없습니다.<br/>
입력하신 주소가 정확한지 다시 한번 확인해 주시기 바랍니다.
</p>
<p class="error_desc3">관련 문의사항은 <a href="https://help.naver.com/" target="_blank">고객센터</a>에 알려주시면 친절히 안내해드리겠습니다. 감사합니다.</p>
<p class="btn">
<a href="javascript:history.back()"><img alt="이전 페이지로" height="35" src="https://ssl.pstatic.net/static/common/error/090610/btn_prevpage.gif" width="115"/></a>
<a href="https://finance.naver.com"><img alt="금융홈으로" height="35" src="https://ssl.pstatic.net/static/nfinance/btn_home.gif" width="115"/></a>
</p>
</div>
</body>
</html>
In [147]:
stock_cd = '035420'
page_n = 1
url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
headers = {
'authority': 'finance.naver.com',
'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
res = requests.get(url_stock, headers=headers)
source = res.text
src = bs4.BeautifulSoup(source, 'lxml')
In [148]:
src = bs4.BeautifulSoup(source, 'lxml')
src
Out[148]:
날짜 크롤링¶
In [200]:
src.find_all('table')[0].find_all('tr')[2].find_all('td')[0]
## /html/body/table[1]/tbody/tr[3]/td[1]/span
Out[200]:
<td align="center"><span class="tah p10 gray03">2021.08.02</span></td>
In [202]:
src.find_all('td', align='center')[0]
Out[202]:
<td align="center"><span class="tah p10 gray03">2021.08.02</span></td>
In [155]:
a = src.find_all('span', class_='tah p10 gray03')[0].text
In [156]:
date_format(a)
Out[156]:
datetime.date(2021, 8, 2)
종가 크롤링¶
In [190]:
src.find_all('td', class_='num')[0]
## /html/body/table[1]/tbody/tr[3]/td[2]/span
Out[190]:
<td class="num"><span class="tah p11">433,500</span></td>
In [192]:
src.find_all('td', class_='num')[6]
Out[192]:
<td class="num"><span class="tah p11">433,500</span></td>
In [193]:
src.find_all('td', class_='num')[12].text
Out[193]:
'439,500'
0, 6, 12, ... 번째 원소들이 종가
해당 페이지 크롤링¶
In [203]:
dates = src.find_all('td', align='center')
prices = src.find_all('td', class_='num')
len(prices)
prices[52]
Out[203]:
<td class="num"><span class="tah p11">428,000</span></td>
In [204]:
for i in range(len(dates)):
this_time = dates[i].text
this_time = date_format(this_time)
this_close = prices[i*6].text.replace(',','')
this_close = float(this_close)
print(this_time, this_close)
2021-08-02 433500.0
2021-07-30 433500.0
2021-07-29 439500.0
2021-07-28 442000.0
2021-07-27 452000.0
2021-07-26 452000.0
2021-07-23 452000.0
2021-07-22 440000.0
2021-07-21 428000.0
2021-07-20 439000.0
100 페이지에 있는 데이터 크롤링¶
In [206]:
stock_cd = "035420"
page_n = 100
url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
headers = {
'authority': 'finance.naver.com',
'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
res = requests.get(url_stock, headers=headers)
source = res.text
src = bs4.BeautifulSoup(source, 'lxml')
dates = src.find_all('td', align='center')
prices = src.find_all('td', class_='num')
for i in range(len(dates)):
this_time = dates[i].text
this_time = date_format(this_time)
this_price = prices[i*6].text.replace(',','')
this_price = float(this_price)
print(this_time, this_price)
2017-07-21 839000.0
2017-07-20 835000.0
2017-07-19 835000.0
2017-07-18 830000.0
2017-07-17 839000.0
2017-07-14 839000.0
2017-07-13 830000.0
2017-07-12 821000.0
2017-07-11 830000.0
2017-07-10 813000.0
In [235]:
def crawl_stock_price(stock_cd, end_page):
time_list = []
price_list = []
for page_n in range(1, end_page+1):
url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
headers = {
'authority': 'finance.naver.com',
'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
res = requests.get(url_stock, headers=headers)
source = res.text
src = bs4.BeautifulSoup(source, 'lxml')
dates = src.find_all('td', align='center')
prices = src.find_all('td', class_='num')
for i in range(len(dates)):
this_time = dates[i].text
this_time = date_format(this_time)
this_price = prices[i*6].text.replace(',','')
this_price = float(this_price)
time_list.append(this_time)
price_list.append(this_price)
df = pd.DataFrame({"날짜" : time_list, "종가" : price_list})
return df
In [246]:
stock_cd = "263750"
page_n = 50
url_stock = f"https://finance.naver.com/item/sise_day.nhn?code={stock_cd}&page={page_n}"
headers = {
'authority': 'finance.naver.com',
'user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
}
res = requests.get(url_stock, headers=headers)
source = res.text
src = bs4.BeautifulSoup(source, 'lxml')
dates = src.find_all('td', align='center')
prices = src.find_all('td', class_='num')
a = src.find_all('td', align = 'center')[1].text
type(a)
this_time = src.find_all('td', align = 'center')[0].text
date_format(this_time)
for i in range(len(dates)):
this_time = dates[i].text
this_time = date_format(this_time)
this_close = prices[i*6].text.replace(',','')
this_clsoe = float(this_close)
print(this_time, this_close)
2019-08-08 172000
2019-08-07 160700
2019-08-06 162300
2019-08-05 166000
2019-08-02 173900
2019-08-01 172200
2019-07-31 171600
2019-07-30 168300
2019-07-29 165400
2019-07-26 171600
펄어비스 주가¶
In [249]:
crawl_stock_price("263750",95)
Out[249]:
날짜 | 종가 | |
---|---|---|
0 | 2021-08-02 | 76100.0 |
1 | 2021-07-30 | 74000.0 |
2 | 2021-07-29 | 78400.0 |
3 | 2021-07-28 | 76900.0 |
4 | 2021-07-27 | 79000.0 |
... | ... | ... |
945 | 2017-09-25 | 118600.0 |
946 | 2017-09-22 | 119000.0 |
947 | 2017-09-21 | 116200.0 |
948 | 2017-09-20 | 118000.0 |
949 | 2017-09-19 | 114300.0 |
950 rows × 2 columns
네이버 주가¶
In [240]:
crawl_stock_price('035420', 100)
Out[240]:
날짜 | 종가 | |
---|---|---|
0 | 2021-08-02 | 433500.0 |
1 | 2021-07-30 | 433500.0 |
2 | 2021-07-29 | 439500.0 |
3 | 2021-07-28 | 442000.0 |
4 | 2021-07-27 | 452000.0 |
... | ... | ... |
995 | 2017-07-14 | 839000.0 |
996 | 2017-07-13 | 830000.0 |
997 | 2017-07-12 | 821000.0 |
998 | 2017-07-11 | 830000.0 |
999 | 2017-07-10 | 813000.0 |
1000 rows × 2 columns
728x90
반응형
'AI > K-Digital Training' 카테고리의 다른 글
012. 데이터베이스 개념 요약1 (0) | 2021.08.09 |
---|---|
011. 네이버 증권에서 내 주식 뉴스 데이터크롤링 (0) | 2021.08.07 |
009. 네이버 금융에서 원자재 시세 데이터 크롤링하기 (0) | 2021.08.05 |
008. json 활용한 뉴스 본문 및 댓글 크롤링 (0) | 2021.08.04 |
007. 로그인 후 웹크롤링 및 셀레니움 활용 웹크롤링 (0) | 2021.08.03 |