Tuesday, March 25, 2014

building a django app that uses ZeroMQ: an annotated webliography


Introduction:
I wanted to build a website that allows people to search their data against a database (in the not too distant future, when the website is live I'll link it and the source code and give more of an explanation. Edit: and here it is, the code, the ZeroMQ stuff is mostly at "/nmr/management/commands", and the website ). Each search takes a few seconds, so in order to be able to serve multiple clients at a time, and allow scaling, I wanted to build a system where the main wsgi process does not block, but passes the search request off to another process that puts it in a queue and executes requests in the queue one by one. I ended up following a simple approach using ZeroMQ. There is a scheduler that runs as a thread in the main wsgi process. When the search input view receives a search request, it writes the search parameters into the database and opens a connection to the scheduler thread and passes it the unique ID of the database record storing the search parameters. There are one or more worker processes each running as a subprocess. The workers are permanently attached via a socket to the scheduler. When a worker completes a job, the scheduler sends it the ID of the next job in the queue, the worker executes the job, writes the results in a database table, and tells the scheduler it is ready for another job. There can be many workers attached to the scheduler, so that multiple searches can be run concurrently.

Here then is a list of (some of) the websites I used for reference while writing this program.



 Asycnronous job execution for Django apps:

Celery: this is the job queue that everyone recommends and is probably the way to go for asynchronous job execution, but I'm kind of stubborn and I don't like all of the dependencies so I'm making my own with zeromq instead.
http://www.celeryproject.org/ 

How to spawn a child process that executes asynchronously from the calling script and closes when finished (doesn't become a stuck orphan) if the calling script has terminated (subprocess.Popen does not block).
http://stackoverflow.com/questions/11203167/how-do-i-start-a-subprocess-in-python-and-not-wait-for-it-to-return

A more detailed discussion of spawning non-blocking child processes from Django applications
http://stackoverflow.com/questions/8068945/django-long-running-asynchronous-tasks-with-threads-processing

RQ (Redis Queue) a simple job queue for python, depends on the redis database
http://python-rq.org/

brokest: another simple job queue, depends on pyzmq and cloud
https://www.jeffknupp.com/blog/2014/02/11/a-celerylike-python-task-queue-in-55-lines-of-code/
https://github.com/jeffknupp/brokest


Django configuration:
To execute a script when Django initializes, call it from wsgi.py
http://eldarion.com/blog/2013/02/14/entry-point-hook-django-projects/
http://stackoverflow.com/questions/6791911/execute-code-when-django-starts-once-only

To create a custom management command:
https://docs.djangoproject.com/en/1.6/howto/custom-management-commands/

To call a management script programmatically, use django.core.management
https://docs.djangoproject.com/en/1.6/ref/django-admin/#running-management-commands-from-your-code

Other Django:
To get a dict from a model instance, you can use .values() on a queryset object (gives a list of dicts, or more properly, an iterable that returns dicts when iterated over, if you want to modify those dicts, you have to convert it into a real, genuine, list of dicts first), or django.forms.models.model_to_dict
http://timsaylor.com/index.php/2012/05/21/convert-django-model-instances-to-dictionaries/

To get a set of objects with the same Many-to-One foreign key use _set as in One.many_set.all()
https://docs.djangoproject.com/en/1.6/topics/db/queries/#following-relationships-backward

To make complex database queries (QuerySet filters) put individual criteria in Q objects (django.db.models.Q), and then combine those with parentheses(()), ampersands(&), and pipes(|)
 https://docs.djangoproject.com/en/1.6/topics/db/queries/#complex-lookups-with-q-objects
http://www.michelepasin.org/blog/2010/07/20/the-power-of-djangos-q-objects/

JOIN-like queries, querying across relations, use __ to distinguish model and field
https://docs.djangoproject.com/en/1.6/topics/db/queries/#lookups-that-span-relationships

ZeroMQ:

Python version of the zeromq guide. An extensive document describing zeromq usage. Includes lots of good, free to use examples:
http://zguide.zeromq.org/py:all

PyZMQ documentation, not as extensive or as useful as the zeromq guide, but PyZMQ does provide some extra methods that may be helpful in some cases:
http://zeromq.github.io/pyzmq/index.html

ØMQ and pyzmq Basics. A nice short tutorial on zmq using python examples.Quite a bit of overlap with the zeromq guide, but maybe the best place to start (I didn't run into it until after I'd already spent a couple days going over the guide though)
http://learning-0mq-with-pyzmq.readthedocs.org/en/latest/pyzmq/basics.html


Use 127.0.0.1, not localhost (tested this with zmq 4, and it still seems to be true):
http://stackoverflow.com/questions/6024003/why-doesnt-zeromq-work-on-localhost/8958414#8958414

Use port * or 0 to find an arbitrary open port, then use getsockopt to query which port was found:
http://stackoverflow.com/questions/16699890/connect-to-first-free-port-with-tcp-using-0mq

Example code for handling keyboard interrupt without locking:
http://zguide.zeromq.org/py:interrupt

To use keyboard interrupt to kill a script that is running zeromq or pyzmq use ctrl+break (yes, the key above Page Up that you've probably never touched before in your life) instead of ctrl+c (also, yes, that is my answer on that stack-overflow question):
https://github.com/zeromq/pyzmq/issues/100
http://stackoverflow.com/questions/17174001/stop-pyzmq-receiver-by-keyboardinterrupt

Generic Python:
Don't use a list as a queue instead use collections.dequeue : pop=popleft, push=append:
http://docs.python.org/3.3/tutorial/datastructures.html#using-lists-as-queues

Register functions to run on exit (for example to clean up data and processes before closing) using the atexit module:
http://docs.python.org/3.3/library/atexit.html

The decimal library is a nice way to represent numbers with fixed precision that you want to represent in a pretty way, and do predictable greater than or less than comparisons with. Use the quantize method to round.
http://docs.python.org/3.3/library/decimal.html

<metablog>I enjoyed this. I think I'll make this a thing, despite my disappointment to find that someone else invented the word "webliography" long before I did.</metablog>

No comments:

Post a Comment