웹크롤링 기초3. 네이버 KPI200 지수 일별 체결가 추출 (html)

 '''
네이버 국내주식 KPI200
일자별로 시세를 나타내기
html 크롤링 분석 (국내주식은 JSON로 제공x)
'''
def get_Max_pagenumber(code): #마지막 페이지가 어딘지 추출
    return int( get_InCounty_DateNCost(code, 1)[0].find('td', class_='pgRR').find('a')['href'].split('&')[1].split('=')[1] )
    
def get_InCounty_DateNCost(code, page_number): #code에 따른 source, date, cost추출
    url = "https://finance.naver.com/sise/sise_index_day.nhn?code="+code+"&page="+str(page_number)
    from urllib.request import urlopen
    url_response = urlopen(url)
    import bs4
    source = bs4.BeautifulSoup(url_response,'lxml')
    
    dates = source.find_all('td', class_="date")
    dates = [value.text for value in dates]  
    closed_cost = source.find_all('td', class_="number_1")
    #cost들은 number_1의 class속성을 갖고있는 것들중 4번째마다 존재함.
    closed_cost = [value.text for idx, value in enumerate(closed_cost) if idx%4 is 0]
    #print(closed_cost)
    
    return (source,dates, closed_cost)
def extractAll_Cost_Date(code): # 코드주면 모든 일자, 체결가 출력
    data_all = dict()    
    total_page_number =  get_Max_pagenumber(code)
    for page in range(total_page_number):
        (source, dates, closed_cost) = get_InCounty_DateNCost(code, page)
        for i in range(len(dates)):
            data_all[dates[i]] = float(closed_cost[i])
        
    return data_all
#def end
#메인
data_all = extractAll_Cost_Date("KPI200")
print(data_all['2019.04.26'])

KPI200 지수를 얻어와서 딕셔너리 형으로 저장해서 추출할 수 있도록했다.
이전 장과 다른점은 모든 page의 정보를 추출했다는 점이 다르다.
그리고 네이버에서 국내주식들의 정보는 JSON으로 주고받지 않기 때문에 이런식으로 했다.
JSOn으로 정보를 얻어낸다면 url맨 뒤에 page관련 정보가 없을것이다.

저작자표시

'개발 > Financial Programming' 카테고리의 다른 글

웹 크롤링 기초 2, 네이버에서 KPI200지수 얻어내기 (0)	2019.04.21
웹 크롤링 기초 1 (0)	2019.04.21

'개발 > Financial Programming' 카테고리의 다른 글

검색 태그

티스토리툴바