Home

Tornado源码分析笔记(1)

Tornado 是一个用 Python 实现的 Web 服务器框架,其最大的特点是使用异步非阻塞IO的处理方式来获取高负载和高性能。到底 Tornado 的底层是如何实现的呢?我们来一起看看。

注明一下,这里展示的 Tornado 版本为4.2.1,不同版本可能会有一些出入。

让我们从官方的 helloworld 开始我们的旅程:

helloworld.py:

import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web

from tornado.options import define, options

define("port", default=8888, help="run on the given port", type=int)


class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")


def main():
    tornado.options.parse_command_line()
    application = tornado.web.Application([
        (r"/", MainHandler),
    ])
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(options.port)
    tornado.ioloop.IOLoop.current().start()


if __name__ == "__main__":
    main()                         

Tornado 的基本框架是先实现处理各种模式的RequestHandler类,用这些模式和对应的 Handler 初始化一个Application类,再用这个Application类初始化一个HTTPServer,设置监听端口,再启动IOLoop 。 我们先来看Application:

web.py:

class Application(httputil.HTTPServerConnectionDelegate):

    def __init__(self, handlers=None, default_host="", transforms=None,
                 **settings):

        ...  # 省略transform和ui模块的设置

        self.handlers = []
        self.named_handlers = {}
        self.default_host = default_host
        self.settings = settings
        if self.settings.get("static_path"):
            path = self.settings["static_path"]
            handlers = list(handlers or [])
            static_url_prefix = settings.get("static_url_prefix",
                                             "/static/")
            static_handler_class = settings.get("static_handler_class",
                                                StaticFileHandler)
            static_handler_args = settings.get("static_handler_args", {})
            static_handler_args['path'] = path
            for pattern in [re.escape(static_url_prefix) + r"(.*)",
                            r"/(favicon\.ico)", r"/(robots\.txt)"]:
                handlers.insert(0, (pattern, static_handler_class,
                                    static_handler_args))
        if handlers:
            self.add_handlers(".*$", handlers)

        ...

Application的初始化的工作主要是设置 transform , 加载 ui 模块,为静态文件设置StaticFileHandler,后面还有为 debug 模式设置 autoreload,最主要的还是调用self.add_handlers(".*$", handlers)来添加 handlers,

def add_handlers(self, host_pattern, host_handlers):
    """Appends the given handlers to our handler list."""
    if not host_pattern.endswith("$"):
        host_pattern += "$"
    handlers = []

    # 确保全部匹配的 handler 处于 self.handlers 的最后 
    if self.handlers and self.handlers[-1][0].pattern == '.*$':
        self.handlers.insert(-1, (re.compile(host_pattern), handlers))
    else:
        self.handlers.append((re.compile(host_pattern), handlers))

    for spec in host_handlers:
        if isinstance(spec, (tuple, list)):
            assert len(spec) in (2, 3, 4)
            spec = URLSpec(*spec)
        handlers.append(spec)
        if spec.name:
            if spec.name in self.named_handlers:
                app_log.warning(
                    "Multiple handlers named %s; replacing previous value",
                    spec.name)
            self.named_handlers[spec.name] = spec

接下来我们看HttpServer类,它的初始化没什么好讲的,值得注意到一点是它以self.request_callback属性来保存Application类。直接来看listen方法HttpServer的 listen 方法直接继承自TCPServerlisten方法。

tcpserver.py:

def listen(self, port, address=""):

        sockets = bind_sockets(port, address=address)
        self.add_sockets(sockets)

Socket 连接要经过 create -> bind -> listen 的三部曲。bind_sockets 函数走完了前面两步:就是创建 socket 并进行设置,然后 bind。因为输入的 address 可能对应多个IP地址,所以bind_sockets返回值为一个 socket 的列表。 我们来看add_sockets

def add_sockets(self, sockets):

        if self.io_loop is None:
            self.io_loop = IOLoop.current()

        for sock in sockets:
            self._sockets[sock.fileno()] = sock
            add_accept_handler(sock, self._handle_connection,
                               io_loop=self.io_loop)

一开始先获得IOLoop实例,对于IOLoop是全局的单例,关于IOLoop的将在其他地方详述。然后调用add_accept_handler用于设置 socket 接收到连接时的回调函数为TCPServer._handle_connection。先来看 add_accept_handler

netutil.py:

def add_accept_handler(sock, callback, io_loop=None):

    if io_loop is None:
        io_loop = IOLoop.current()

    def accept_handler(fd, events):

        for i in xrange(_DEFAULT_BACKLOG):
            try:
                connection, address = sock.accept()
            except socket.error as e:
                # _ERRNO_WOULDBLOCK indicate we have accepted every
                # connection that is available.
                if errno_from_exception(e) in _ERRNO_WOULDBLOCK:
                    return
                # ECONNABORTED indicates that there was a connection
                # but it was closed while still in the accept queue.
                # (observed on FreeBSD).
                if errno_from_exception(e) == errno.ECONNABORTED:
                    continue
                raise
            callback(connection, address)
    io_loop.add_handler(sock, accept_handler, IOLoop.READ)

add_accept_handlerio_loop注册了 socket,在IOLoop.READ事件到来时的 调用accept_handler来处理 socket 事件。accept_handler就定义在add_accept_handler内部,可以看到其实就是sock.accept()获取连接的 socket 和 address,然后再调用回调函数,也就是TCPServer. _handle_connection:

tcpserver.py:

def _handle_connection(self, connection, address):
        if self.ssl_options is not None:

            ...  # 处理ssl

        try:
            if self.ssl_options is not None:
                stream = SSLIOStream(connection, io_loop=self.io_loop,
                                     max_buffer_size=self.max_buffer_size,
                                     read_chunk_size=self.read_chunk_size)
            else:
                stream = IOStream(connection, io_loop=self.io_loop,
                                  max_buffer_size=self.max_buffer_size,
                                  read_chunk_size=self.read_chunk_size)
            future = self.handle_stream(stream, address)
            if future is not None:
                self.io_loop.add_future(future, lambda f: f.result())
        except Exception:
            app_log.error("Error in connection callback", exc_info=True)

我们略去处理 ssl 的部分,在这里主要的内容就是将 socket 封装成IOStream类给上层处理。handle_stream方法在HttpServer中实现,返回None或一个future对象。关于future对象后面再细讲。

到此Applicationhttpserver就准备就绪,下面我们来看看 tornado 的核心——IOLoop