2 Framework for Implementing Additional Tools
2.1 Managing Python Processes from Racket
(require pydrnlp/support) | package: pydrnlp |
procedure
(python-worker? v) → boolean?
v : any/c
syntax
(define-python-worker id mod-bytes-literal arg-bytes-literal ...)
id? : (-> any/c boolean?)
id-revision : jsexpr?
launch-id : (->* [] [#:quiet? any/c] id?)
id-send/raw :
(->* [id? jsexpr?] [#:who symbol?] (stream/c jsexpr?))
procedure
(python-worker-running? worker) → boolean?
worker : python-worker?
procedure
(python-worker-kill worker) → any
worker : python-worker?
procedure
(python-worker-dead-evt worker) → (evt/c (or/c #f exn:fail?))
worker : python-worker?
Every Python worker is an instance of some concrete Python worker type, which specifies a particular Python module with which to communicate. A Python worker type is defined by the define-python-worker form, which binds id?, id-revision, launch-id, and id-send/raw (synthesized with the lexical context of id) to values implementing the type-specific portion of the interface. The mod-bytes-literal must name one of pydrnlp’s Python modules using the same syntax as python -m: for example, #"pydrnlp.trends". Any additional arg-bytes-literals are passed as command-line arguments to the Python module. The Python module named by mod-bytes-literal must define a Python revision function (see pydrnlp/support/python-lang and python-revision-value/c): if it does not, the define-python-worker form will raise a syntax error.
The launch-id function creates Python worker
values of the concrete Python worker type,
which are recognized by the predicate id?.
If the #:quiet argument is given and is not #false,
the Python process’s standard error output is written to
the current-error-port; otherwise, it is discarded.
Creating a Python worker allocates system-level
resources—
Calling id-send/raw sends a jsexpr as a request to the process encapsulated by the Python worker and returns its response as a lazy stream of jsexpr values. To implement the Python side of this interaction, use pydrnlp.jsonio. Messages are sent to the Python process asynchronously, in sequential order, and id-send/raw returns immediately, though forcing the returned stream (e.g. with stream-first or stream-rest) will block until the request has been sent and the response has begun to be received. Python worker values are thread-safe: id-send/raw can be called with the same worker concurrently from multiple Racket threads, and one client thread being terminated, blocking, etc. will not interfere with use of the worker from other client threads. However, workers intentionally are not fully “kill-safe” in the sense of [Flatt04]: shutting down the managing custodian must release the system-level resources, and clients can cause the worker to become dead (intentionally or not) in various other ways.
Process-level parallelism can be obtained by using launch-id to create multiple workers of the same Python worker type. However, note that the digitalricoeur.org server currently doesn’t have very many cores, anyway.
The functions implemented by Python worker types are often expensive. The id-revision value is defined to support caching and avoid redundant calls to id-send/raw. When id-revision is #false, any cached value should be ignored. Otherwise, if id-revision is equal? to a cached value of id-revision from a previous run, it means that the Python module encapsulated by the Python worker type promises that calling id-send/raw with “the same” request would produce “the same” response, and therefore cached responses can be used. Of course, the Python module must take care to live up to this promise when implementing its Python revision function. Note that the applicable notion of “the same” is specific to the Python module and Python worker type: “the same” may mean something either stronger or weaker that equal?. In addition to the Python revision function, the value of id-revision also reflects the versions of the spaCy library and the language models being used.
Note that access to id-revision does not require running Python (or even having it installed), even though id-revision incorporates values defined in Python code. Instead, by enforcing constraints on the syntax of Python revision functions, pydrnlp/support/python-lang is able to analyse their definitions statically and compile them to Racket code.
The name of id-send/raw reflects the fact that it enforces only the raw, jsexpr-based communication protocol common to all Python workers. In practice, the Racket and Python parties to a particular interaction will both have invariants about the messages they expect to send and receive. When designing a new Python worker type, id-send/raw should be used to implement higher-level communication functions, which can enforce specific contracts and convert values to and from the jsexpr representation used for communication. To facilitate such wrapper functions, id-send/raw will report errors that cannot be detected by first-order tests using its #:who argument, if given, rather than its own symbolic name. On the other hand, id-send/raw does enforce its documented contract and will blame its callers for violating their obligations.
(when (python-worker-running? a-worker) (id-send/raw a-worker "Hi, world!"))
If the Python worker given to id-send/raw becomes dead before id-send/raw can enqueue the jsexpr value to be sent asynchronously, id-send/raw raises an exception, which will refer to its #:who argument, if given. If the Python worker becomes dead before it has finished producing its response, an exception is raised when the corresponding part of the stream returned by id-send/raw is forced.
A Python worker is dead when it has freed all of its system-level resources, and thus is no longer in communication with a Python process. Programmers must ensure that the worker is dead when they no longer need it, and generally they will need to do so explicitly. The function python-worker-kill causes its argument to become dead immediately; calling it on a Python worker that is already dead has no effect. Using python-worker-kill is equivalent to shutting down the worker’s managing custodian, except that python-worker-kill only effects resources encapsulated by the given Python worker value.
It is almost always best to make a Python worker dead with python-worker-kill or custodian-shutdown-all as soon as you can determine that the worker value is no longer needed. However, calls to launch-id incur significant overhead, so it is much better to reuse Python workers than to create and free them repeatedly. Even better, by consulting id-revision, you may be able to avoid calling launch-id in the first place.
Even if it is never subjected to python-worker-kill or custodian-shutdown-all, a Python worker may still become dead for other reasons. In particular, a Python worker will become dead if the Python process it manages exits of its own accord, either successfully or, for example, due to an unhandled exception. Nonetheless, this possibility does not relieve programmers of the burden of ensuring that the all Python workers do, in fact, actually become dead. Even if a Python worker type implements a comunication protocol in which the Python module is expected to exit, the Racket side of the communication should still check that the worker actually is dead: if it isn’t, the Racket side should clean up and signal that the invariants of the communication protocol have been violated.
The function python-worker-dead-evt takes any Python worker value and produces a synchronizable event that becomes ready for synchronization when the worker is dead. The event’s synchronization result is either #false or an exn:fail that caused the worker to become dead. Currently, a #false result does not necessarily mean that the worker became dead “normally”: this may be improved in the future.
Conversely, the predicate python-worker-running? recognizes any Python worker that is not currently dead. Taking a point-in-time snapshot has some limitations, but python-worker-running? has the benefit of being an inexpensive first-order test.
value
=
(or/c #f exact-integer? (listof python-revision-value/c))
2.2 Python–Racket Bridge Language
#lang pydrnlp/support/python-lang | package: pydrnlp |
Documentation forthcoming.
#!/usr/bin/env python3 #lang pydrnlp/support/python-lang # -*- coding: utf-8 -*- """Module docstring Lots of great documentation here ... """
In Python 3, the default encoding is UTF-8.
2.3 Scribbling Python Documentation
Documentation forthcoming.
2.4 Python Utility Modules
2.4.1 pydrnlp.language
import pydrnlp.language | package: pydrnlp |
procedure
= (let* () 0)
Python method
def get(lang_str)
2.4.2 pydrnlp.jsonio
import pydrnlp.jsonio | package: pydrnlp |
Python method
def start_loop(on_input, *, [description])