Developer Guide#

This document describes the internal architecture and the main concepts behind PyScaffold. It assumes the reader has some experience in using PyScaffold (specially its command line interface, putup) and some familiarity with Python’s package ecosystem.

Please notice this document does not target PyScaffold’s users, instead it provides internal documentation for those who are involved in PyScaffold’s development.

Architecture#

As indicated in the figure below, PyScaffold can be divided in two main execution blocks: a pure Python API and the command line interface wrapping it as an executable program that runs on the shell.

PyScaffold's architecture

The CLI is responsible for defining all arguments putup accepts and parsing the user input accordingly. The result is a dict that contains options expressing the user preference and can be fed into PyScaffold’s main API, create_project.

This function is responsible for combining the provided options dict with pre-existing project configurations that might be available in the project directory (the setup.cfg file, if present) and globally defined default values (via PyScaffold’s own configuration file). It will then create an (initially empty) in-memory representation of the project structure and run PyScaffold’s action pipeline, which in turn will (between other tasks) write customized versions of PyScaffold’s templates to the disk as project files, according to the combined scaffold options.

The project representation and the action pipeline are two key concepts in PyScaffold’s architecture and are described in detail in the following sections.

Project Structure Representation#

Each Python package project is internally represented by PyScaffold as a tree data structure, that directly relates to a directory entry in the file system. This tree is implemented as a simple (and possibly nested) dict in which keys indicate the path where files will be generated, while values indicate their content. For instance, the following dict:

{
    "folder": {
        "file.txt": "Hello World!",
        "another-folder": {
            "empty-file.txt": ""
        }
    }
}

represents a project directory in the file system that contains a single directory named folder. In turn, folder contains two entries. The first entry is a file named file.txt with content Hello World! while the second entry is a sub-directory named another-folder. Finally, another-folder contains an empty file named empty-file.txt.

Note

Changed in version 4.0: Prior to version 4.0, the project structure included the top level directory of the project. Now it considers everything under the project folder.

Additionally, tuple values are also allowed in order to specify a file operation (or simply file op) that will be used to produce the file. In this case, the first element of the tuple is the file content, while the second element will be a function (or more generally a callable object) responsible for writing that content to the disk. For example, the dict:

from pyscaffold.operations import create

{
    "src": {
        "namespace": {
            "module.py": ('print("Hello World!")', create)
        }
    }
}

represents a src/namespace/module.py file, under the project directory, with content print("Hello World!"), that will written to the disk. When no operation is specified (i.e. when using a simple string instead of a tuple), PyScaffold will assume create by default.

Note

The create function simply creates a text file to the disk using UTF-8 encoding and the default file permissions. This behaviour can be modified by wrapping create within other functions/callables, for example:

from pyscaffold.operations import create, no_overwrite

{"file": ("content", no_overwrite(create))}

will prevent the file to be written if it already exists. See pyscaffold.operations for more information on how to write your own file operation and other options.

Finally, while it is simple to represent file contents as a string directly, most of the times we want to customize them according to the project parameters being created (e.g. package or author’s name). So PyScaffold also accepts string.Template objects and functions (with a single dict argument and a str return value) to be used as contents. These templates and functions will be called with PyScaffold's options when its time to create the file to the disk.

Note

string.Template objects will have safe_substitute called (not simply substitute).

This tree representation is often referred in this document as project structure or simply structure.

Action Pipeline#

PyScaffold organizes the generation of a project into a series of steps with well defined purposes. As shown in the figure below, each step is called action and is implemented as a simple function that receives two arguments: a project structure and a dict with options (some of them parsed from command line arguments, other from default values).

PyScaffold's action pipeline

An action MUST return a tuple also composed by a project structure and a dict with options. The return values, thus, are usually modified versions of the input arguments. Additionally an action can also have side effects, like creating directories or adding files to version control. The following pseudo-code illustrates a basic action:

def action(project_structure, options):
    new_struct, new_opts = modify(project_structure, options)
    some_side_effect()
    return new_struct, new_opts

The output of each action is used as the input of the subsequent action, forming a pipeline. Initially the structure argument is just an empty dict. Each action is uniquely identified by a string in the format <module name>:<function name>, similarly to the convention used for a setuptools entry point. For example, if an action is defined in the action function of the extras.py file that is part of the pyscaffoldext.contrib project, the action identifier is pyscaffoldext.contrib.extras:action.

By default, the sequence of actions taken by PyScaffold is:

  1. pyscaffold.actions:get_default_options

  2. pyscaffold.actions:verify_options_consistency

  3. pyscaffold.structure:define_structure

  4. pyscaffold.actions:verify_project_dir

  5. pyscaffold.update:version_migration

  6. pyscaffold.structure:create_structure

  7. pyscaffold.actions:init_git

  8. pyscaffold.actions:report_done

(as given by pyscaffold.actions.DEFAULT)

The project structure is usually empty until define_structure This action just loads the in-memory dict representation, that is only written to disk by the create_structure action.

Note that, this sequence varies according to the command line options. To retrieve an updated list, please use putup --list-actions or putup --dry-run.

Extensions#

Extensions are a mechanism provided by PyScaffold to modify its action pipeline at runtime and the preferred way of adding new functionality. There are built-in extensions (e.g. pyscaffold.extensions.cirrus) and external extensions (e.g. pyscaffoldext-dsproject), but both types of extensions work exactly in the same way. This division is purely based on the fact that some of PyScaffold features are implemented as extensions that ship by default with the pyscaffold package, while other require the user to install additional Python packages.

Extensions are required to add at least one CLI argument that allow the users to opt-in for their behaviour. When putup runs, PyScaffold’s will dynamically discover installed extensions via setuptools entry points and add their defined arguments to the main CLI parser. Once activated, a extension can use the helper functions defined in pyscaffold.actions to manipulate PyScaffold’s action pipeline and therefore the project structure.

For more details on extensions, please consult our Extending PyScaffold guide.

Code base Organization#

PyScaffold is organized in a series of internal Python modules, the main ones being:

  • api: top level functions for accessing PyScaffold functionality, by combining together the other modules

  • cli: wrapper around the API to create a command line executable program

  • actions: default action pipeline and helper functions for manipulating it

  • structure: functions specialized in defining the in-memory project structure representation and in taking this representation and creating it as part of the file system.

  • update: steps required for updating projects generated with old versions of PyScaffold

  • extensions: main extension mechanism and subpackages corresponding to the built-in extensions

Additionally, a series of internal auxiliary libraries is defined in:

  • dependencies: processing and manipulating of package dependencies and requirements

  • exceptions: custom PyScaffold exceptions and exception handlers

  • file_system: wrappers around file system functions that make them easy to be used from PyScaffold.

  • identification: creating and processing of project/package/function names and other general identifiers

  • info: general information about the system, user and package being generated

  • log: custom logging infrastructure for PyScaffold, specialized in its verbose execution

  • operations: file operations that can be embedded in the in-memory project structure representation

  • repo: wrapper around the git command

  • shell: helper functions for working with external programs

  • termui: basic support for ANSI code formatting

  • toml: thin adapter layer around third-party TOML parsing libraries, focused in API stability

For more details about each module and its functions and classes, please consult our module reference.

When contributing to PyScaffold, please try to maintain this overall project organization by respecting each module’s own purpose. Moreover, when introducing new files or renaming existing ones, please try to use meaningful naming and avoid terms that are too generic, e.g. utils.py (when in doubt, Peter Hilton has a great article about naming smells and a nice presentation aboug how to name things).