'Crawl' 태그의 글 목록

BLOG ARTICLE Crawl | 2 ARTICLE FOUND

2009.07.07 yes24 판매량 체크 프로그램
2009.07.01 마소 연재 시작

yes24 판매량 체크 프로그램

개발/python 2009. 7. 7. 00:35

yes24 판매지수의 로그가 남지 않아서,
판매량의 변화를 알고 싶어서 간단하게 짰습니다.

예약작업이나 시작프로그램에 넣으시거나, crontab 등에 등록해서 돌리면 됩니다.
하루에 한번만 실행되며, 레코드가 이미 존재하는 경우에는 다시 쓰지 않습니다.
sqlite3를 사용하므로, 데이터확인은 직접 콘솔로 확인하셔도 되고, db 클래스의 확인용 함수를 사용하셔도 됩니다. 저는 아래 그림과 같이 firefox용 sqlite manager를 써서 확인합니다.

소스코드는 다음과 같습니다.
수집대상을 바꾸려면 초반의 books 사전정보를 변경하시면 됩니다.

import urllib2, time, traceback
from BeautifulSoup import BeautifulSoup
import sqlite3

books = {
'python':'http://www.yes24.com/24/goods/3432490',
'lua':'http://www.yes24.com/24/goods/3081202'
}

def getContent( url ):
req = urllib2.Request( url )
response = urllib2.urlopen(req)
return response.read()

class DB:
"SQLITE3 wrapper class"
def __init__(self):
self.conn = sqlite3.connect('bookDB')
self.cursor = self.conn.cursor()
for title in books.keys():
self.cursor.execute('CREATE TABLE IF NOT EXISTS %s(date text, sale int)'%title)
self.cursor.execute('CREATE UNIQUE INDEX IF NOT EXISTS IDX001 ON %s(date)'%title)

def __del__(self):
self.conn.commit()
self.cursor.close()

def insertPython(self, title, date, sale):
try:
self.cursor.execute("INSERT INTO %s VALUES ('%s',%d)"%(title,date,sale))
except:
print '%s : maybe already inserted'%title
return 0
else:
print '%s: success'%title
return 1

def printPythonResult(self, title):
self.cursor.execute('SELECT * FROM %s ORDER BY date ASC'%title)
for row in self.cursor.fetchall():
print row[0],'\t', row[1]

def printPythonResult(self, title, num):
self.cursor.execute('SELECT * FROM %s ORDER BY date DESC LIMIT %d'%(title,num))
for row in self.cursor.fetchall():
print row[0],'\t', row[1]

db = DB()

if __name__ == "__main__":
curtime = time.localtime()
curday = "%d/%02d/%02d"%(curtime[0],curtime[1],curtime[2])

for title,url in books.items():
content = getContent( url )
soup = BeautifulSoup( content )

a = soup('dt', {'class':'saleNum'})
salenum = -1
if len(a)>0:
try:
text = str( a[0].contents[0] ).split('|')[1]
#print text
splited = text.split(' ')
for s in splited:
if s.isdigit():
salenum = int(s)
break
except:
traceback.print_exc()

print title, ': try to insert :',curday, salenum
db.insertPython( title, curday, salenum )

print title, ': === recent 10 sale points ==='
db.printPythonResult( title, 10 )

time.sleep(5) # for reading results....

파일 다운로드 : [ salepoint_checker.py ]

ps. python 2.5 기반입니다.

AND

마소 연재 시작

Release note 2009. 7. 1. 01:19

이전에 잠시 언급한 적이 있었던,
마소에 기고한 글이 드디어 7월호에 나왔습니다.

http://www.imaso.co.kr/?doc=bbs/gnuboard.php&bo_table=article&page=1&wr_id=33214

제목은 "파이썬 이용해 검색엔진 만들기 - 블로그 크롤러 구현" 입니다.

4월 중순에 다움 사이트 템플릿이 바뀌면서 수정을 좀 했었고,
6월 말과 7월1일 현재, 잘 동작하는 것을 확인했습니다. :)

AND

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

BLOG ARTICLE Crawl | 2 ARTICLE FOUND

yes24 판매량 체크 프로그램

마소 연재 시작

Dsp Profile [linkedin]

ARTICLE CATEGORY

ARCHIVE & SEARCH

CALENDAR

RECENT ARTICLE

TAG CLOUD

RECENT COMMENT

RECENT TRACKBACK

MY LINK

COUNTER

티스토리툴바