웹 크롤링 기초 1

사이트에서 데이터를 긁어오는 소스이다.

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""

from bs4 import BeautifulSoup

html_doc = 
"""
<html><head><title>The Office</title></head>
<body>
<p class="title"><b>The Office</b></p>
<p class="story">In my office, there are four officers,
<a href="http://example.com/YW" class="member">YW</a>,
<a href="http://example.com/JK" class="member">JK</a>,
<a href="http://example.com/YJ" class="member">YJ</a> and
<a href="http://example.com/KS" class="member">KS</a>
.</p>
<p class="story">...</p>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify()) #예쁘게 출력

print(soup.title) # title tag만 출력

print(soup.find_all("a")) # a태그 가지는 것들을 list형태로 출력

# a태그 가지는 것들의 요소를 list형으로 출력

for i in soup.find_all("a"):
    print(i.text)

BeautifulSoup라는 라이브러리를 활욯해서 html소스코드의 특정한 태그, 태그의 text까지 찾아서 긁어올 수 있다.

본 글은 파이썬으로 배우는 금융공학레시피 에서 발췌하였습니다.

저작자표시

'개발 > Financial Programming' 카테고리의 다른 글

웹크롤링 기초3. 네이버 KPI200 지수 일별 체결가 추출 (html) (0)	2019.04.28
웹 크롤링 기초 2, 네이버에서 KPI200지수 얻어내기 (0)	2019.04.21

'개발 > Financial Programming' 카테고리의 다른 글

검색 태그

티스토리툴바