提交 47ba0c84 编写于 作者: B binux

update readme

上级 4b331432
pyspider [![Build Status](https://travis-ci.org/binux/pyspider.png?branch=master)](https://travis-ci.org/binux/pyspider) [![Coverage Status](https://coveralls.io/repos/binux/pyspider/badge.png)](https://coveralls.io/r/binux/pyspider) pyspider [![Build Status](https://travis-ci.org/binux/pyspider.png?branch=master)](https://travis-ci.org/binux/pyspider) [![Coverage Status](https://coveralls.io/repos/binux/pyspider/badge.png)](https://coveralls.io/r/binux/pyspider)
======== ========
A spider system in python. [Try It Now!](http://demo.pyspider.org/) A Powerful Spider System in Python. [Try It Now!](http://demo.pyspider.org/)
- Write script with python - Write script in python with powerful API
- Web script editor, debugger, task monitor, project manager and result viewer - Powerful WebUI with script editor, task monitor, project manager and result viewer
- MySQL, MongoDB, SQLite as database backend
- Javascript pages supported!
- Task priority, retry, periodical and recrawl by age or marks in index page (like update time)
- Distributed architecture - Distributed architecture
- MySQL, MongoDB and SQLite as database backend
- Full control of crawl process with powerful API
- Javascript pages Support! (with phantomjs fetcher)
![debug demo](http://f.binux.me/debug_demo.png) Sample Code:
demo code: [gist:9424801](https://gist.github.com/binux/9424801)
```python
from libs.base_handler import *
class Handler(BaseHandler):
'''
this is a sample handler
'''
@every(minutes=24*60, seconds=0)
def on_start(self):
self.crawl('http://scrapy.org/', callback=self.index_page)
@config(age=10*24*60*60)
def index_page(self, response):
for each in response.doc('a[href^="http://"]').items():
self.crawl(each.attr.href, callback=self.detail_page)
def detail_page(self, response):
return {
"url": response.url,
"title": response.doc('title').text(),
}
```
[![demo](http://ww1.sinaimg.cn/large/7d46d69fjw1emavy6e9gij21kw0uldvy.jpg)](http://demo.pyspider.org/)
Installation Installation
============ ============
* python2.6/2.7 * python2.6/2.7
* `pip install -r requirements.txt` * `pip install --allow-all-external -r requirements.txt`
* `./run.py` , visit [http://localhost:5000/](http://localhost:5000/) * `./run.py` , visit [http://localhost:5000/](http://localhost:5000/)
Docker if ubuntu: `apt-get install python python-dev python-distribute python-pip libcurl4-openssl-dev libxml2-dev libxslt1-dev python-lxml`
======
``` or [Run with Docker](https://github.com/binux/pyspider/wiki/Run-pyspider-with-Docker)
# mysql
docker run -it -d --name mysql dockerfile/mysql
# rabbitmq
docker run -it -d --name rabbitmq dockerfile/rabbitmq
# phantomjs link to fetcher and webui
docker run --name phantomjs -it -d -v `pwd`:/mnt/test --expose 25555 cmfatih/phantomjs /usr/bin/phantomjs /mnt/test/fetcher/phantomjs_fetcher.js 25555
# scheduler
docker run -it -d --name scheduler --link mysql:mysql --link rabbitmq:rabbitmq binux/pyspider scheduler
# fetcher, run multiple instance if needed.
docker run -it -d -m 64m --link rabbitmq:rabbitmq binux/pyspider fetcher
# processor, run multiple instance if needed.
docker run -it -d -m 128m --link mysql:mysql --link rabbitmq:rabbitmq binux/pyspider processor
# webui
docker run -it -d -p 5000:5000 --link mysql:mysql --link rabbitmq:rabbitmq --link scheduler:scheduler binux/pyspider webui
```
Documents Documents
========= =========
...@@ -53,8 +60,8 @@ Documents ...@@ -53,8 +60,8 @@ Documents
Contribute Contribute
========== ==========
* 部署使用,提交 bug、特性 [Issue](https://github.com/binux/pyspider/issues) * Use It, Open [Issue](https://github.com/binux/pyspider/issues), PR is welcome.
* 参与 [特性讨论](https://github.com/binux/pyspider/issues?labels=discussion&state=open)[完善文档](https://github.com/binux/pyspider/wiki) * [Discuss](https://github.com/binux/pyspider/issues?labels=discussion&state=open) [Document](https://github.com/binux/pyspider/wiki)
License License
......
...@@ -3,7 +3,6 @@ ...@@ -3,7 +3,6 @@
# vim: set et sw=4 ts=4 sts=4 ff=unix fenc=utf8: # vim: set et sw=4 ts=4 sts=4 ff=unix fenc=utf8:
# Created on __DATE__ # Created on __DATE__
from libs.pprint import pprint
from libs.base_handler import * from libs.base_handler import *
class Handler(BaseHandler): class Handler(BaseHandler):
...@@ -12,7 +11,7 @@ class Handler(BaseHandler): ...@@ -12,7 +11,7 @@ class Handler(BaseHandler):
''' '''
@every(minutes=24*60, seconds=0) @every(minutes=24*60, seconds=0)
def on_start(self): def on_start(self):
self.crawl('http://www.baidu.com/', callback=self.index_page) self.crawl('http://scrapy.org/', callback=self.index_page)
@config(age=10*24*60*60) @config(age=10*24*60*60)
def index_page(self, response): def index_page(self, response):
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册