Warning, /maui/mauikit-documents/src/code/epub/qhttpserver/http-parser/README.md is written in an unsupported language. File is not indexed.

0001 HTTP Parser
0002 ===========
0003 
0004 This is a parser for HTTP messages written in C. It parses both requests and
0005 responses. The parser is designed to be used in performance HTTP
0006 applications. It does not make any syscalls nor allocations, it does not
0007 buffer data, it can be interrupted at anytime. Depending on your
0008 architecture, it only requires about 40 bytes of data per message
0009 stream (in a web server that is per connection).
0010 
0011 Features:
0012 
0013   * No dependencies
0014   * Handles persistent streams (keep-alive).
0015   * Decodes chunked encoding.
0016   * Upgrade support
0017   * Defends against buffer overflow attacks.
0018 
0019 The parser extracts the following information from HTTP messages:
0020 
0021   * Header fields and values
0022   * Content-Length
0023   * Request method
0024   * Response status code
0025   * Transfer-Encoding
0026   * HTTP version
0027   * Request URL
0028   * Message body
0029 
0030 
0031 Usage
0032 -----
0033 
0034 One `http_parser` object is used per TCP connection. Initialize the struct
0035 using `http_parser_init()` and set the callbacks. That might look something
0036 like this for a request parser:
0037 
0038     http_parser_settings settings;
0039     settings.on_url = my_url_callback;
0040     settings.on_header_field = my_header_field_callback;
0041     /* ... */
0042 
0043     http_parser *parser = malloc(sizeof(http_parser));
0044     http_parser_init(parser, HTTP_REQUEST);
0045     parser->data = my_socket;
0046 
0047 When data is received on the socket execute the parser and check for errors.
0048 
0049     size_t len = 80*1024, nparsed;
0050     char buf[len];
0051     ssize_t recved;
0052 
0053     recved = recv(fd, buf, len, 0);
0054 
0055     if (recved < 0) {
0056       /* Handle error. */
0057     }
0058 
0059     /* Start up / continue the parser.
0060      * Note we pass recved==0 to signal that EOF has been recieved.
0061      */
0062     nparsed = http_parser_execute(parser, &settings, buf, recved);
0063 
0064     if (parser->upgrade) {
0065       /* handle new protocol */
0066     } else if (nparsed != recved) {
0067       /* Handle error. Usually just close the connection. */
0068     }
0069 
0070 HTTP needs to know where the end of the stream is. For example, sometimes
0071 servers send responses without Content-Length and expect the client to
0072 consume input (for the body) until EOF. To tell http_parser about EOF, give
0073 `0` as the forth parameter to `http_parser_execute()`. Callbacks and errors
0074 can still be encountered during an EOF, so one must still be prepared
0075 to receive them.
0076 
0077 Scalar valued message information such as `status_code`, `method`, and the
0078 HTTP version are stored in the parser structure. This data is only
0079 temporally stored in `http_parser` and gets reset on each new message. If
0080 this information is needed later, copy it out of the structure during the
0081 `headers_complete` callback.
0082 
0083 The parser decodes the transfer-encoding for both requests and responses
0084 transparently. That is, a chunked encoding is decoded before being sent to
0085 the on_body callback.
0086 
0087 
0088 The Special Problem of Upgrade
0089 ------------------------------
0090 
0091 HTTP supports upgrading the connection to a different protocol. An
0092 increasingly common example of this is the Web Socket protocol which sends
0093 a request like
0094 
0095         GET /demo HTTP/1.1
0096         Upgrade: WebSocket
0097         Connection: Upgrade
0098         Host: example.com
0099         Origin: http://example.com
0100         WebSocket-Protocol: sample
0101 
0102 followed by non-HTTP data.
0103 
0104 (See http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-75 for more
0105 information the Web Socket protocol.)
0106 
0107 To support this, the parser will treat this as a normal HTTP message without a
0108 body. Issuing both on_headers_complete and on_message_complete callbacks. However
0109 http_parser_execute() will stop parsing at the end of the headers and return.
0110 
0111 The user is expected to check if `parser->upgrade` has been set to 1 after
0112 `http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied
0113 offset by the return value of `http_parser_execute()`.
0114 
0115 
0116 Callbacks
0117 ---------
0118 
0119 During the `http_parser_execute()` call, the callbacks set in
0120 `http_parser_settings` will be executed. The parser maintains state and
0121 never looks behind, so buffering the data is not necessary. If you need to
0122 save certain data for later usage, you can do that from the callbacks.
0123 
0124 There are two types of callbacks:
0125 
0126 * notification `typedef int (*http_cb) (http_parser*);`
0127     Callbacks: on_message_begin, on_headers_complete, on_message_complete.
0128 * data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);`
0129     Callbacks: (requests only) on_uri,
0130                (common) on_header_field, on_header_value, on_body;
0131 
0132 Callbacks must return 0 on success. Returning a non-zero value indicates
0133 error to the parser, making it exit immediately.
0134 
0135 In case you parse HTTP message in chunks (i.e. `read()` request line
0136 from socket, parse, read half headers, parse, etc) your data callbacks
0137 may be called more than once. Http-parser guarantees that data pointer is only
0138 valid for the lifetime of callback. You can also `read()` into a heap allocated
0139 buffer to avoid copying memory around if this fits your application.
0140 
0141 Reading headers may be a tricky task if you read/parse headers partially.
0142 Basically, you need to remember whether last header callback was field or value
0143 and apply following logic:
0144 
0145     (on_header_field and on_header_value shortened to on_h_*)
0146      ------------------------ ------------ --------------------------------------------
0147     | State (prev. callback) | Callback   | Description/action                         |
0148      ------------------------ ------------ --------------------------------------------
0149     | nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
0150     |                        |            | into it                                    |
0151      ------------------------ ------------ --------------------------------------------
0152     | value                  | on_h_field | New header started.                        |
0153     |                        |            | Copy current name,value buffers to headers |
0154     |                        |            | list and allocate new buffer for new name  |
0155      ------------------------ ------------ --------------------------------------------
0156     | field                  | on_h_field | Previous name continues. Reallocate name   |
0157     |                        |            | buffer and append callback data to it      |
0158      ------------------------ ------------ --------------------------------------------
0159     | field                  | on_h_value | Value for current header started. Allocate |
0160     |                        |            | new buffer and copy callback data to it    |
0161      ------------------------ ------------ --------------------------------------------
0162     | value                  | on_h_value | Value continues. Reallocate value buffer   |
0163     |                        |            | and append callback data to it             |
0164      ------------------------ ------------ --------------------------------------------
0165 
0166 
0167 Parsing URLs
0168 ------------
0169 
0170 A simplistic zero-copy URL parser is provided as `http_parser_parse_url()`.
0171 Users of this library may wish to use it to parse URLs constructed from
0172 consecutive `on_url` callbacks.
0173 
0174 See examples of reading in headers:
0175 
0176 * [partial example](http://gist.github.com/155877) in C
0177 * [from http-parser tests](http://github.com/joyent/http-parser/blob/37a0ff8/test.c#L403) in C
0178 * [from Node library](http://github.com/joyent/node/blob/842eaf4/src/http.js#L284) in Javascript