Architecture¶
pymerlin is to merlin as pyspark is to spark. This means that pymerlin uses py4j as a bridge between a python process and a java process. This allows pymerlin to use the Aerie simulation engine directly, without having to re-implement it in python.
This means that running simulate
starts a subprocess using java -jar /path/to/pymerlin.jar
.
Approachability over performance¶
The main tenet of pymerlin is approachability, and its aim is to enable rapid prototyping of models and activities. While where possible, performance will be considered, it is expected that someone who wants to seriously engineer the performance of their simulation will port their code to Java - which has the double benefit of removing socket communication overhead, as well as giving the engineer a single Java process to instrument and analyze, rather than a hybrid system, which may be more difficult to characterize.
Round trips¶
Some objects that the mission model provides to the simulation driver are pass-through objects - i.e. the driver
merely returns these objects to the mission model when it is appropriate to do so. These objects need not be converted
to Java types - it is sufficient to pass a handle to the simulation driver, as long as the handle can be used to look
up the original object when needed. This is why certain global variables exist in _globals.py
- they are used to cache
python objects and pass only their id to the Java process.
Warning
As implemented, this can be a source of a memory leak. Some careful cleanup is required to make this approach viable for larger use cases
For resources and activity arguments (i.e. things that are represented as “SerializedValue” on the Java side), it is important not to make use of the python cache. This will help with integrating a python mission model with the rest of the Aerie system by allowing the python code to handle inputs generated elsewhere in the system, and provide outputs that can be understood by the rest of the system.
Async/await vs threads¶
pymerlin originally required tasks to be defined as async functions (a.k.a coroutines), but that was in tension with the “approachability over performance” principle. Version 0.0.8 replaced async functions with regular functions, and used threads instead. This significantly simplified the implementation, as well as the mental model. If thread switching turns out to be a performance bottleneck, async tasks should be reintroduced as an optional alternative.
Use pythonic idioms¶
To the extent possible, pymerlin should expose pythonic APIs. This means:
Use
snake_case
for functions, methods, and variablesUse
CAPITAL_SNAKE_CASE
for constantsUse
lowercasenospaces
for modulesUse
TitleCase
for classesLeverage context managers for cleanup
Prefer duck typing to explicit inheritance
Public API should include docstrings and type annotations
Prefix private attributes with one underscore
Prefer simple attributes to properties
Override operators where appropriate
Enumerate entry points into public API in the init.py’s all attribute.
Consult PEP8 for additional ideas.
But what about Aerie idioms?¶
For now, stick to pythonic style - cross that bridge when uploading pymerlin can build Aerie-compatible jars
Emphasize debuggability¶
On one hand, we need to do our best not to show users Java stack traces - or at least show them when they’re useful and not overwhelming. On the other hand, we must not obscure useful debugging information. This is a delicate balance to strike.