API: Endpoints¶

This document discusses the interface to the database for clients. We discuss the procedure names and data transfer objects (DTO). We also discuss the endpoints, i.e., the locations where the procedures are called.

repository: https://git.sr.ht/~qeef/dd-hot-tm
documentation: https://dd-hot-tm.mapathon.cz/

The main problem HOT TM solves is that it helps mappers to manage the mapping of a large area. From the mapper’s point of view, the solution of this problem is straightforward. Split the large area into smaller parts and let the mappers lock and unlock them to communicate what part of the area is being worked on.

HOT TM uses the Project and Task terminology to denote a large area with additional information like the description of the purpose of mapping, and its smaller parts. Each task has its state, e.g. locked or unlocked. (In the HOT TM code base the “state” is often called “status”.) However, a task can also be understand as a thing to do instead of a part of the area. To avoid the misunderstanding, we introduce the Action to denote a transition from one task state to another.

Using the terminology of Project, Task and Action introduced above, we can describe the common mapping workflow. First, Celestine creates new project with all tasks unlocked. Then, Monica, Michal, Marcel and Miriam each locks random task and map that part of the project. Miriam and Marcel are finished with their tasks, unlock them and lock another random tasks. We can see the figure representing the task states (rounded boxes) with corresponding actions (arrows):

digraph {
rankdir="LR"
node [shape="box" style="rounded"]

unlocked
locked

unlocked -> locked [label="map"]
locked -> unlocked [label="finish"]
}

In addition, Radek and Ramona comes into the mapping workflow. They find tasks that have been recently mapped and check that these tasks have been mapped properly. To improve the representation of the workflow, we give the meaning to the locked and unlocked states and we name the actions, as shown in the following figure:

digraph {
rankdir="LR"
node [shape="box" style="rounded"]

lm [label="locked\nfor mapping"]
lc [label="locked\nfor checking"]

tm [label="unlocked\nto map"]
tc [label="unlocked\nto check"]
d [label="unlocked\ndone"]

tm -> lm [label="map"]
lm -> tc [label="finish"]
tc -> lc [label="check"]
lc -> d [label="good"]
lc -> tm [label="bad"]

{rank="source" ; tm}
{rank="sink" ; d}
} — Mapping workflow¶

The names of the actions is the first half of the database interface. The other half, the DTOs, identify the project, the task, the transition and the mapper. The DTOs should not be confused with the states.

Note that the terminology and the naming of task states only reassembles HOT TM naming – there are more states with slightly different names in HOT TM code.

The purpose of the endpoints ¶

The endpoints provide to clients the places where the communication with clients happens. How the communication happens is given by the interface, i.e., the procedure names and DTOs related to the endpoints.

We understand the clients as whatever is used by mappers to request the server – a web page, JOSM editor, or mapper’s script.

We aim on the communication between clients and the server via HTTP, using JSON to encode the values of DTOs. This is de facto standard for client-server communication in the web environment and leveraging other (or more) technologies unnecessarily increase technical debt.

Well-defined constraints on where and how the communication happens simplify the implementation of the clients.

The interface to endpoints (how)¶

The interface consists of procedure names and DTOs. The procedure names are given by the HTTP: HTTP uses GET to retrieve information, POST to send new information, PUT to update the existing information, and DELETE to destroy the existing information available at the endpoint.

The DTOs concretize the request, providing additional information as identifier or a reason of change.

The endpoints (where)¶

The endpoints specify the places or resources or objects that can be retrieved (GET), created (POST), changed (PUT), and/or deleted (DELETE).

The endpoints do not necessarily need to reflect the database – they serve the different purpose of looking friendly to the clients.

So, at the end, in the API design, we are interested in the endpoints and their interface with aim on the simplicity of the communication about the mapping workflow between the server and clients.

Non-goals ¶

This document does not cover all endpoints HOT TM uses. This document covers only endpoints related to tasks and the transitions between the task states. It is expected there are similar documents covering other endpoints related to the other parts of HOT TM as groups or campaigns, and that there is another document that puts all these parts together and introduces overall endpoints schema.

This document does not cover task issues or annotations. The document aims solely on the main function of tasks within the HOT TM and keeping the tasks history.

This document does not introduce production-ready endpoints and corresponding interface.

Authentication and authorization is out of scope of this document.

Balance between endpoints and DTOs ¶

Because we limit ourselves to HTTP, one half of the interface is given (GET, POST, PUT, and DELETE.) What left are endpoints (where the communication happens) and DTOs (how the communication happens). We try to find a balance between the two.

Endpoints extreme ¶

When there is endpoint for everything, we call it endpoints extreme. HOT TM API is close. Paraphrasing HOT TM API, there are tasks-related endpoints accepting POST:

/project/{pid}/task/{tid}/map: Expect the task tid of the project pid to be in unlocked to map state, changing the task into the locked for mapping state.
/project/{pid}/task/{tid}/finish: Expect the task tid of the project pid to be in locked for mapping state, changing the task into the unlocked to check state.
/project/{pid}/task/{tid}/check: Expect the task tid of the project pid to be in unlocked to check state, changing the task into the locked for checking state.
/project/{pid}/task/{tid}/good: Expect the task tid of the project pid to be in locked for checking state, changing the task into the unlocked done state.
/project/{pid}/task/{tid}/bad: Expect the task tid of the project pid to be in locked for checking state, changing the task into the unlocked to map state.

and GET:

/project/{pid}/tasks/states: Retrieve the state of all tasks of the project pid.
/project/{pid}/task/{tid}/state: Retrieve the state of the task tid of the project pid.

Where pid is project identifier and tid is task identifier. In such a case, there is little to no information transferred in DTOs.

DTOs extreme ¶

When there is a single endpoint for everything, we call it DTOs extreme, because all the information is encoded in DTOs:

/whatever

Accepts POST and GET.

For POST, the DTOs must always contain pid – the project identifier, tid – the task identifier, and action, where action can be map, finish, check, good, or bad.

For GET, the DTO must always contain pid – the project identifier. However, the GET does not have a body as POST has, so there is no place where to put the values of the DTO. To keep the “DTOs extreme” approach, we need to encode the DTO’s values in the URL of the endpoint, i.e., /whatever?pid={pid}.

(Please, note that /whatever?pid={pid} indeed is different from the /whatever/{pid}, because the former is understood as /whatever path with pid={pid} query, but the latter only as /whatever/{pid} path by the URL syntax.)

When there is no tid – the task identifier – in the DTO (i.e., in the query part of the URL,) it is expected that the client requests the state of all the tasks of the project pid. If tid is specified within the DTO, the state of the task tid of the project pid is sent back.

Finding the balance ¶

We can see that both extreme approaches suffer from the scalability issues:

An example for Endpoints extreme is extending the workflow with the reset action that changes the state of the task tid of the project pid from unlocked done to unlocked to map – new endpoint needs to be introduced.
An example for DTOs extreme is whatever extension of GET request, which is already enough cumbersome in the DTOs extreme example. (Encoding DTO values in the query part of the URL make sense for small number of parameters like when using pagination. It is not scalable.)

When finding the balance between endpoints and DTOs, we aim on the simplicity of the implementation in the clients. Having the right balance between endpoints and DTOs improves the scalability and overall maintainability of the code base.

Our endpoints draft is based on the terminology used at the beginning of this design document – Project, Task, and Action.

The endpoint path, where path reflects the path of the URL syntax, consists of the endpoint parts. The convention for an endpoint part is to use plural, like .../projects, for endpoints representing a list of objects, and singular followed by an identifier, like .../project/{pid}, for endpoints representing particular object, where ... may be zero or more endpoint parts, like .../project/{pid}/tasks or .../project/{pid}/task/{tid}.

The values between { and } are the identifiers of the objects.

Considering mapping workflow and targeting load testing, our API proposal consists of the following endpoints and DTOs:

/project/{pid}/tasks

Accepts GET, returns the list of tasks and their corresponding states.

An example of returned DTO is:

[
    {
        "pid": 1,
        "tid": 1,
        "state": "unlocked to map",
    },
    {
        "pid": 1,
        "tid": 2,
        "state": "locked for mapping",
    },
    ...
    {
        "pid": 1,
        "tid": 1000,
        "state": "unlocked to map",
    }
]

/project/{pid}/task/{tid}

Accepts GET, returns the task and its corresponding state.

An example of returned DTO is:

{
    "pid": 1,
    "tid": 1,
    "state": "unlocked to map",
}

/project/{pid}/actions

Accepts POST, returns the task and its boundary.

An example of the DTO from a client to the backend that requests mapping of a random task:

{
    "what": "map"
}

An example of the DTO with the reply from the backend to the client:

{
    "pid": 1,
    "tid": 22,
    "geometry": {"some": "geom"}
}

An example of the DTO from a client to the backend that requests finishing the task, where tid must be specified:

{
    "what": "finish",
    "tid": 22
}

Addressing the scalability issues of the Endpoints extreme, extending the workflow means adding the support for:

{
    "what": "reset",
    "tid": 22
}

DTO in the /project/{pid}/actions endpoint.

Addressing the scalability issues of the DTOs extreme, the hierarchy of the exposed objects is leveraged, e.g., having project with tasks and actions leads to /projects/{pid}/tasks and /project/{pid}/actions endpoints.

Load testing ¶

The load_test.py module is in the repository root. Its documentation follows.

HOT TM proposal load testing.

This file is meant to be run using locust -f load_test.py.

To prepare the databases for load testing, build and run the database containers. In the first terminal:

cd hot_tm_proposal/database
docker-compose build --no-cache almost_tm_admin
docker-compose run --rm --name almost_tm_admin almost_tm_admin

In the second terminal:

cd hot_tm_proposal/database
docker-compose build --no-cache actions_history
docker-compose run --rm --name actions_history actions_history

Then, new projects need to be created in the databases. If there is no “testing virtual environment” in the database directory, start with creating one:

cd hot_tm_proposal/database
python3 -m venv tve
. tve/bin/activate
pip install -r requirements.txt

and then, in the database directory (cd hot_tm_proposal/database,) run the script to prepare the databases:

python3 drop_all_and_create_10_projects.py

Last step before load testing is to start the FastAPI application. If there is no “testing virtual environment” in the repository root, it’s time to create one:

python3 -m venv tve
. tve/bin/activate
pip install -r hot_tm_proposal/database/requirements.txt
pip install -r hot_tm_proposal/api/requirements.txt

Then run the application either for the almost_tm_admin database schema:

HOT_TM_DB_SCHEMA=almost_tm_admin PYTHONPATH=hot_tm_proposal/database/ uvicorn hot_tm_proposal.api.balanced:app --workers 4

or for the actions_history database schema:

HOT_TM_DB_SCHEMA=actions_history PYTHONPATH=hot_tm_proposal/database/ uvicorn hot_tm_proposal.api.balanced:app --workers 4

The last step is to run locust.io (in another terminal, but in the virtual environment tve in the repository root):

. tve/bin/activate
locust -f load_test.py

Then, visit the web browser at http://localhost:8089/, set the parameters of new load test (we use 100 number of mappers, 10 ramp up, and http://localhost:8000 host address,) and start load testing.

We compare how the balanced API works for almost_tm_admin and actions_history database schemas. We run the “average” load testing for one hour for both of them. We also stressed the server with “extreme” load testing for ten minutes for both of them.

almost_tm_admin

for average load test (wait 30 to 60 seconds between requests):
- average response time is 21.47 ms
- 100%ile response time is 170 ms
- #requests 8664
- see full report
for extreme load test (wait 1 to 2.5 seconds between requests):
- average response time is 27.89 ms (v2 23.27 ms)
- 100%ile response time is 3100 ms (v2 2000 ms)
- #requests 35711 (v2 34657)
- see full report
- see full report v2

actions_history

for average load test (wait 30 to 60 seconds between requests):
- average response time is 26.2 ms
- 100%ile response time is 180 ms
- #requests 8370
- see full report
for extreme load test (wait 1 to 2.5 seconds between requests):
- average response time is 22.45 ms (22.24 ms)
- 100%ile response time is 170 ms (320 ms)
- #requests 35578 (v2 36405)
- see full report
- see full report v2

Conclusion ¶

We conducted load testing experiments in order to stress different database designs in an environment of many (simulated) mappers. Recalling the Backend diagram, our code base consists solely of API endpoints and Database. In real application, three dots ... from Backend diagram would be a strata connecting API with database. We can afford simplification, because a lot of functionality is currently out of scope these documents.

When we study the results of load testing, there is not much difference between the almost_tm_admin and actions_history. This is interesting in relevance to the database experiments described in Experiments.

Only difference observed is that actions history looks more stable. “Extreme” load test had to be conducted to find this.

However, we need to be aware of that we tested limited functionality. For example, retrieving the history of the actions per project was not tested. The functionality like this should be described in the load test, implemented, and the implementation performance measured.