divák-tornado¶
Helper classes for observing your Tornado-based HTTP code.
Source Code | https://github.com/dave-shawley/divak-tornado |
Download | https://pypi.org/project/divak-tornado |
Documentation | https://divak-tornado.readthedocs.io |
Issue Tracker | https://github.com/dave-shawley/divak-tornado |
Divák implements patterns for observing inter-service communication in distributed systems from both a client and a server perspective. The primary goal is to provide a nice framework for all aspects of observability:
- measuring time spent during execution
- counting interesting events
- propagating tokens across inter-service communication that allow for tracing of messages
- injecting tracing tokens into log messages
In the past I have approached the problem of observing data flows from the perspective of the individual aspects of logging, metrics recording, and error reporting. Divák takes a different approach. It provides the tools to include measurements in your code and the ability to propagate tokens that correlate all parts of a data flow across multiple services.
This particular library implements observation as an application level observer, a mix-in class that reads tokens from incoming HTTP requests, a wrapper that propagates tokens to other services when you make HTTP requests, and other similar helpers. The goal is to make it possible to easily instrument your application code in such a way that operational metrics are recorded, correlation identifiers are created and propagated between services, and the same identifiers are included in log messages. All of this using a single context-manager based instrumentation mechanism.
Here’s an example of the API that I am aiming for, The request handler uses
two new mix-in classes: divak.api.RequestTracker
and divak.api.Logger
.
The request tracker adds methods that implement operation tracing and the
logger is essentially a logging.LoggingAdapter
that injects the
correlation identifiers into LogRecord
instances. I also used the
divak.api.HttpClientMixin
which adds a method to send HTTP requests to
other services that propagates the active span information.
The application also uses two mix-in classes: divak.api.Recorder
and
divak.api.ProbabilisticSampler
. The record is responsible for submitting
trace data out-of-band to whatever aggregators you add and the sampler is what
determines how frequently to sample requests.
from tornado import gen, web
import divak.api
class MyHandler(divak.api.RequestTracker, divak.api.Logger,
divak.api.HttpClientMixin, web.RequestHandler):
def initialize(self):
super(self, MyHandler).initialize()
self.set_divak_name('foo')
@gen.coroutine
def get(self):
with self.record_operation('query-db'):
rows = yield self.query('...')
self.add_divak_tag('query-db.rowcount', len(rows))
output = []
for row in rows:
self.logger.debug('requesting %r', row['id'])
response = yield self.send_request(
'http://.../?id={}'.format(row['id']),
method='GET')
output.append(response.body)
self.set_status(200)
self.send_response(output)
class Application(divak.api.Recorder, divak.api.ProbabilisticSampler,
web.Application):
def __init__(self, *args, **kwargs):
handlers = [web.url(r'/', MyHandler)]
super(Application, self).__init__(handlers, *args, **kwargs)
self.set_divak_service('my-service')
self.set_divak_sample_probability(1.0 / 1000.0)
self.add_divak_reporter(
divak.api.ZipkinReporter('http://127.0.0.1:9411/api/v2'))
self.add_divak_propagator(divak.api.ZipkinPropagation())
This example will result in zipkin tracing for GET /
requests which
record spans for the database query and each HTTP API call.
Related Work¶
- sprockets.http - implements helpers that run Tornado HTTP applications and configure logging to facilitate tracing
- sprockets.mixins.correlation - creates a correlation ID that can be passed from service to service using HTTP headers
- sprockets.mixins.metrics - implements recording of runtime metrics using either statsd or InfluxDB
- sprockets.mixins.sentry - implements error reporting to sentry for unhandled exceptions
- sprockets.mixins.statsd - implements recording of runtime metrics using statsd
- sprockets-influxdb - implements recording of runtime measurements using InfluxDB as a back end
User’s Guide¶
Application Requirements¶
Your application needs to inherit from Recorder
to use divák.
The Recorder
class provides methods to inject observers into the
request processing pipeline. The only notable functionality that it provides
is to ensure that every request has a divak_request_id
attribute that is
set to None
by default and that the attribute is added to the logging
context when writing to tornado.log.access_log
. Beyond that, it is a
strictly opt-in interface in that you must explicitly add observers to
enable functionality.
Request Tracing¶
Correlation Headers¶
The simplest form of tracing a request is to pass a header with a correlation
ID through the system. The RequestIdPropagator
class simply
propagates a named header from the request into the response. The default
behavior is to relay a request header named Request-ID
from request to
response. This functionality is enabled by adding
a RequestIdPropagator
instance to your application by calling
add_divak_propagator()
as shown below.
class MyApplication(divak.api.Recorder, web.Application):
def __init__(self, *args, **kwargs):
super(MyApplication, self).__init__(
[web.url('/status', StatusHandler)],
*args, **kwargs)
self.set_divak_service('my-service')
self.add_divak_propagator(divak.api.RequestIdPropagator())
This simple change will copy the Request-ID
header from the request
headers to the response headers. Having a HTTP header that is relayed
from request to response makes it possible to trace the processing of
a request through multiple services provided that you record it somewhere.
GET /status HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: 127.0.0.1:8000
Request-ID: a6bd045cdcf64451a076a59bdb367ba6
User-Agent: HTTPie/0.9.9
HTTP/1.1 200 OK
Content-Length: 61
Content-Type: application/json
Date: Sat, 10 Feb 2018 13:48:41 GMT
Etag: "e3c1480268930c91ab9eb759a1fa5fcad3ab5e28"
Request-Id: a6bd045cdcf64451a076a59bdb367ba6
Server: TornadoServer/4.5.3
{
"service": "my-service",
"status": "ok",
"version": "0.0.0"
}
If the header is not included in the request, then it will generate a UUIDv4
value and insert it into the response. Of course, the value generation
function can be overridden to anything you want by passing the
value_factory
keyword to RequestIdPropagator
.
GET /status HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: 127.0.0.1:8000
User-Agent: HTTPie/0.9.9
HTTP/1.1 200 OK
Content-Length: 61
Content-Type: application/json
Date: Sat, 10 Feb 2018 13:52:42 GMT
Etag: "e3c1480268930c91ab9eb759a1fa5fcad3ab5e28"
Request-Id: 31510218-abb2-49b6-bfd3-a036fe4f6113
Server: TornadoServer/4.5.3
{
"service": "my-service",
"status": "ok",
"version": "0.0.0"
}
Request Logging¶
The divak.api.Recorder
class modifies the Python logging
module to ensure that you can use %(divak_request_id)s
in log formats.
It also overrides the log_request()
method and
outputs something closer to a standard access log line using the following log
format:
$REMOTE-IP "$METHOD $URI" $STATUS "$USER-AGENT" $PROCESSING-TIME
For example:
127.0.0.1 "GET /status" 200 "curl/7.54.0" 0.001341104507446289
The $PROCESSING-TIME
is in decimal seconds and the user-agent & request id
portions both default to "-"
if they are missing for some reason. You can
modify this behavior by overriding the method in your application class or by
adding a custom formatting pattern to tornado.log.access_log
. The
following dictionary is passed as the extra
parameter to the log method:
extra = {
'remoteip': '127.0.0.1',
'status': handler.get_status(),
'elapsed': request.request_time(),
'method': request.method,
'uri': request.uri,
'useragent': request.headers.get('User-Agent', '-'),
'divak_request_id': getattr(request, 'divak_request_id', '-'),
}
If you want to insert the request ID into the access log, then you should modify the default log format to include it when you initialize the logging module:
def main():
logging.basicConfig(
level=logging.INFO,
format=('%(levelname)1.1s - %(name)s: %(message)s '
'{%(divak_request_id)s}'))
app = MyApplication(debug=True)
app.listen(8000)
ioloop.IOLoop.current().start()
This will include the request ID on every log line that it is available on.
The divak.api.Logger
mix-in ensures that it is available on it’s
self.logger
instance. The divak.api.Recorder
class inserts the
request id into access log entries as well. Access logs entries from the
example look like:
127.0.0.1 "GET /status" 200 "curl/7.54.0" 0.001341104507446289 {84DC5B74-752A-468F-A786-806696A5DE01}
API Reference¶
Application Configuration¶
-
class
divak.api.
Recorder
(*args, **kwargs)¶ Imbues an application with recording abilities.
-
add_divak_propagator
(propagator)¶ Add a propagation instance that inspects each request.
Parameters: propagator – a propagator instance to inspect requests and potentially modify responses
-
add_divak_reporter
(reporter)¶ Add a reporter instance.
Parameters: reporter – a reporter instance to receive observations
-
log_request
(handler)¶ Override
log_request
to improve logging format.Parameters: handler (tornado.web.RequestHandler) – the handler that processed the request
-
Header Modifiers¶
-
class
divak.api.
RequestIdPropagator
(header_name='Request-Id', *args, **kwargs)¶ Propagates Request-IDs between services.
Parameters: - header_name (str) – the name of the request header to propagate.
If this value is unspecified, then the header name defaults to
Request-ID
. - value_factory – if this keyword is specified, then it’s value is called to generate a response header if a new header value is required. If this value is unspecified, then a UUID4 will be generated.
This class implements propagation of a request header into the response. If the incoming request does not include a matching header, then a new value will be generated by calling value_factory. You can disable the generation of new values by setting value_factory to
None
.An instance of
HeaderRelayTransformer
is wired in to transform the request output by inserting the header value from thedivak_request_id
property of the active request.-
handle_request
(request)¶ Initial transform function.
Parameters: request (tornado.web.httpserver.HTTPRequest) – the request that is being processed Returns: a new instance of HeaderRelayTransformer
that is configured to insert the request ID headerReturn type: HeaderRelayTransformer This function is called to process each request. It pulls the header value out of the request and assigns it to
request.divak_request_id
. If the incoming request does not have a request ID header, then a new value may be generated before assigning it. Finally, a new instance ofHeaderRelayTransformer
is created to process the output generated by processing request.
-
install
(application)¶ Install the propagator into the application.
Parameters: application (tornado.web.Application) – the application to install this propagator into Returns: True
if the propagator wants to be called in the future orFalse
otherwiseReturn type: bool
- header_name (str) – the name of the request header to propagate.
If this value is unspecified, then the header name defaults to
Observation Points¶
-
class
divak.api.
Logger
(*args, **kwargs)¶ Imbues a
tornado.web.RequestHandler
with a contextual logger.This class adds a
logger
attribute that inserts divak tags into the logging record. Tags added by callingadd_divak_tag()
are automatically made available in log messages. Thedivak_request_id
value is guaranteed to be available in all log messages provided that you are usingApplication
in your application’s class list.The
logger
attribute is set inprepare()
and will wrap an existinglogger
attribute or create a new one using the self’s class module and class name as the logger name.-
logger
¶ A
logging.LoggerAdapter
that inserts divak tags into log records using theextra
dict.
-
Test Helpers¶
-
class
divak.testing.
RecordingLogHandler
(*args, **kwargs)¶ Log handler that keeps track of log records.
Install an instance of this class as a handler on a
logging.Logger
to keep track of messages that are logged.-
records
¶ list
oflogging.LogRecord
instances for messages that were logged.
-
Implementation Details¶
Request Id Management¶
Propagating incoming request details through the system is at the very heart of any observability attempt. The simplest case is copying a header from the request to the response and making the same identifier available in emitted log messages. It is a simple concept and has a profound impact on your ability to trace a message through the internals of a service and through the entire system. The implementation is spread throughout a few classes:
RequestIdPropagator
ensures that a named request header is stored as a first-class property on the requestHeaderRelayTransformer
then ensures that the property value is added as a response header of the same nameRecorder
is atornado.web.Application
mix-in that installs instances of theHeaderRelayTransformer
when your application callsadd_divak_propagator()
Logger
is atornado.web.RequestHandler
mix-in that adds alogging.LoggerAdapter
instance and inserts the request ID property from the request into the logging context – this makes%(divak_request_id)s
work in log messages
Simply making the request ID available throughout the application is a bit more work than one would expect. The process relies on an under-documented Tornado feature called transformers. A transformer is a function that is called immediately after creating the request handler instance. It is an injection point that can extract information from an incoming request or even modify the request object before it is processed. The transformer function returns an object instance that is called immediately before sending the HTTP response line and again on each subsequent outgoing chunk. The simplest implementation is a class since the class itself is callable and returns an instance. The following class is a simple no-op implementation of a Tornado transformer.
class NoOpTransformer(object):
def __init__(self, request):
pass
def transform_first_chunk(self, status_code, headers, chunk,
include_footers):
return status_code, headers, chunk
def transform_chunk(self, chunk, include_footers):
return chunk
The application installs the transformer by calling
add_transform()
. Then it is called on each
request. The detailed sequence of events during request processing is:
- The
tornado.web.HTTPServerRequest
instance is created. - The HTTP start line and headers are read and stored in the request object
- the
tornado.web.RequestHandler
sub-class is determined by the application routing rules. - A new instance of the request handler is instantiated with the request
instance. The
initialize()
method is called at this time. - The transformer functions are invoked with the request object. In the simple case, new transformer instances are created.
- The
prepare()
method is called on the request handler instance. - The request handler’s “action method” (e.g.,
get
,post
) is invoked. - The request is “finished” if necessary.
A few very important details about this sequence of events:
When the request handler instance is created in step 4, the request has not passed through the transform functions so:
- The request handler cannot rely on any information that is injected by a
transformer inside of
initialize
. - The request handler has a chance to modify the request instance before it is transformed.
- The request handler cannot rely on any information that is injected by a
transformer inside of
Request handler SHOULD NOT call
finish()
from withininitialize
. Doing so wreaks havoc with any installed transforms because they will be created and initialized after the request is already finished and are never called again.Since
prepare()
is permitted to be asynchronous, transformers need to be per-request instances or completely stateless. Ifprepare
yields, then there could be parallel active requests so the transformed SHOULD NOT contain state.
There is a rather glaring omission in the sequence of events. When are the
transform methods invoked? transform_first_chunk
is called BEFORE
the HTTP response line and headers are written so they can influence the
response headers and status code after the request handler is done with
them. transform_chunk
is called for each body chunk from within
write()
.
The extra wrinkle for the HeaderRelayTransformer
is that the name of
the header to relay is configurable so we cannot simply let Tornado create
a class instance for us – we need to get the header name into the transformer.
Instead, RequestIdPropagator
implements a factory method that
returns a new HeaderRelayTransformer
instance and installs the
factory method as a transformer function. The HeaderRelayTransformer
instance inserts the divak_request_id property from the request into the
response headers if it is set.
Logging¶
The Recorder
mix-in (over tornado.web.Application
) ensures
that it is safe to refer to the %(divak_request_id)s
in any log formatter.
During initialization, the Recorder
instance installs the
DivakLogger
class as the “logger class” by calling
logging.setLoggerClass()
. Future calls to logging.getLogger()
will return instances of DivakLogger
which is a
sub-class of logging.Logger
. This makes the request ID available for
logger instances created after the application is initialized but it doesn’t
address existing logger instances.
A DivakRequestIdFilter
instance is inserted into
existing logging.Handler
instances so that any existing loggers can
safely refer to the request ID in their messages as well. This is done during
application initialization after setting the logging class. Existing loggers
are visited via the logging.Manager
global instance. If the assigned
handlers do not have a DivakRequestIdFilter
, then
one is added to the filter chain.
That makes it possible to refer to %(divak_request_id)s
in log formats but
it doesn’t actually get the value into the log messages. This is where the
divak.api.Logger
mix-in comes into play. It’s raison d’êtra is to
add self.logger
to a RequestHandler
instance but it
does one other thing for you – it inserts the
self.request.divak_request_id
into the logging context for you. This is
what makes the request id available in log messages from your request
handlers.
The next place that we want the request ID to show up is in the Tornado access
log lines. Recorder
re-implements
tornado.web.Application.log_request()
so that it passes the current
request ID into the Tornado logger. What it does not do is add the request ID
to the log format – you are required to do that if you wish. See
Request Logging for the details.
Implementation Details¶
DivakLogger¶
-
class
divak.internals.
DivakLogger
(name, level=0)¶ Extends class:logging.Logger to ensure that divak_request_id is set.
This class is installed via
logging.setLoggerClass()
to ensure that thedivak_request_id
attribute is set on all records.
DivakRequestIdFilter¶
-
class
divak.internals.
DivakRequestIdFilter
(name='')¶ Logging filter that sets the divak_request_id attribute on records.
This makes it possible to use
divak_request_id
in log formats without having to do additional work. You shouldn’t need to tinker with this yourself sincedivak.api.Recorder
does it for you when a instance is created.
HeaderRelayTransformer¶
-
class
divak.api.
HeaderRelayTransformer
(header_name, request)¶ Tornado transformer that relays a header from request to response.
Parameters: - header_name (str) – the name of the header to relay from request to response
- value_factory – callable that generates a new value
Setting value_factory to
None
disables the generation of response header values when the header is missing from the request.This class implements the under-documented Tornado transform interface. Transforms are called when the application starts processing a request. Per-request instances of this class are returned from
RequestIdPropagator.handle_request()
with the appropriate header name and value to insert when the first chunk is processed.-
transform_chunk
(chunk, include_footers)¶ Called to transform subsequent chunks.
Parameters: - chunk – the data chunk to transform
- include_footers (bool) – should footers be included?
Returns: the possibly transformed chunk
This implementation returns chunk as-is.
-
transform_first_chunk
(status_code, headers, chunk, include_footers)¶ Called to process the first chunk.
Parameters: - status_code (int) – status code that is going to be returned in the response
- headers (tornado.httputil.HTTPHeaders) – response headers
- chunk – the data chunk to transform
- include_footers (bool) – should footers be included?
Returns: the status code, headers, and chunk to use as a tuple
This method will inject
request.divak_request_id
into headers if the value is notNone
. The remaining parameters are passed through as-is.
initialize_logging¶
-
divak.internals.
initialize_logging
()¶ Ensure that LogRecords have a divak_request_id defined.
This is called during initialization to ensure that
logging.LogRecord
instances will always have adivak_request_id
property. This makes it possible to refer to the request id in your log formats.Firstly, it installs a new
logging.Logger
class by callinglogging.setLoggerClass()
that sets the attribute in it’slogging.Logger.makeRecord()
method if it not already present. Then it spins through the existing loggers and addsDivakRequestIdFilter
to any handler that doesn’t already have one. This process ensures that new and existing loggers can refer to the request id in log formats.
Distribution License¶
Copyright (c) 2018, Dave Shawley All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the {organization} nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Contributing to Divák¶
So you want to contribute back to Divák.
Thank you
This project is released under the permissive BSD license so you aren’t required to contribute modifications. But you are obviously a person of excellent taste since you found your way here. The first thing that you will want to do is to create a fork of
and clone it to your workstation/laptop/phone/whatever:
git clone git@github.com:your-org/divak-tornado.git
Then you can move forward with your development.
Development Environment¶
This library is primarily developed in Python 3 so start by setting up a new
virtual environment named env
in the root directory. I’m not usually so
opionated as to tell you what to name things but env
is wired through the
.gitignore file and that’s what I will be using in examples. If you put the
environment somewhere else, then that works too but you will have to do a
minor amount of mental gymnastics to follow along:
python3.6 -mvenv --copies env
Next you want to install the development tools using the pip-formatted requirements file requires/development.txt. This will ensure that you are using the same versions of development tools that tests will be run with:
./env/bin/pip -r requires/development.txt
Before you start writing code, do yourself a big favor and make sure that the tests pass on your local machine. They should pass but you are better off safe than sorry:
./env/bin/python setup.py nosetests
If they pass, then it’s time to start writing code; otherwise, reach out and
file a github issue with the output of pip freeze
, the test output, and
any logs that you have handy.
Development Tasks¶
Most of the tasks are wired up to be run using setup.py so activate the environment and run ./setup.py … to run a utility. Of course you are more than welcome to use nosetests, sphinx-build, and other utilities directly.
- setup.py nosetests will run the test suite. Add –with-coverage to generate coverage reports to build/coverage as well.
- setup.py flake8 will run the PEP8 checker
- setup.py build_sphinx will build the documentation suite into build/sphinx/html
- setup.py bdist_wheel will build a wheel distribution into the dist directory
Submitting a PR¶
Once you have made your changes, its time to submit a pull request against the upstream repository. Just go through the following checklist before you create the PR since the integrated CI will fail unless you do:
- ensure that all of the tests are passing
- ensure that any new code or branches are covered by tests – I aim for 100% test coverage on lines and branches
- describe your changes in the Next Release section of docs/changelog.rst
- push your changes to your fork and issue the PR
Once again, thank you very much for taking the time to contribute back.