Basics
For Loom I knew what kind of building blocks were required since it is basically just a continuation of my work on Mesh.
In the past I choose the approach to generate the spec and database schema from code. At least for the database that seemed a good choice at the time since the embedded DB of Mesh could be directly controlled via Java.
This time I’ll however decided to handle things differently. First I want to define the API specification and from that information I would derive the needed domain model.
Let’s take a look at the building blocks first.
Assets
The assets are the key component of Loom around various other elements are arranged.
Properties:
-
uuid
-
filename
-
hashes.sha512
,hashes.sha256
,hashes.md5
-
mimeType
-
dominantColor
-
duration
-
timeline.from
,timeline.to
-
timeline.thumbnail
-
timeline.meta
-
timeline.tags
-
timeline.description
-
location.lon
,location.lat
-
meta
-
creator.username
,creator.uuid
-
editor.username
,editor.uuid
-
cdate
,edate
ACL
Additionally an ACL
system is needed which consists of User
, Group
, Role
and permission management.
SSO
Loom does not directly participate a typical OAuth flow since it is only a third party consumer of OAuth tokens.
Instead a frontend app must handle the OAuth flow and process the SSO authentication. After that has been done the generated API token can be used for Loom requests as well.
Loom can now validate the token using the public keys from the SSO provider.
Next the JWT
public claims can be utilized to map the information to a Loom user which is needed to perform the request.
For all options it is a good idea to prepared the domain model in order to support Single-Sign-On. For Loom this means that a user model also has a sso
flag which indicates that the user was created via SSO process.
Option 1 - In-memory user mapping
In order to avoid cluttering the Loom database with SSO users it would be one option to use an in-memory user instead of creating and persisting the user in the database. Downside of this approach is that newly created or edited resources can’t store the user reference.
Option 2 - Surrogate user mapping
Another approach is to map the SSO user to a surrogate user which acts as a stand-in for SSO users. This approach however only works if the users don’t need dedicated permissions.
Option 3 - Dedicate mapping
Finally the SSO token information can be mapped directly to a dedicated Loom users. This way individual permissions and creator/editor references can be stored in Loom.
Permissions
Tagging
Taggings can be applied to assets, contents, namespaces.
Often taggings are user based and thus it is a good choice to store tags globally and on a per-user basis.
Namespaces
Namespaces
are used to scope content and asset elements.
Content Models
Content Models
are used to define the structure of your content elements.
A model defines a list of fields which can hold the content information.
Field
types can be numbers, dates, JSON, text, booleans.
Contents
The Content
are elements which can be created in order to supplement the information from the assets.
API Functions / OpenAPI
This time I wanted to follow the design first approach as it would give me time to rethink and plan the API upfront. The OpenAPI spec can be utilized to create a full REST API specification.
The Swagger Editor is a great tool when verifying and visualizing the written spec.
There are many resources out there on the net that cover good practices when creating an API. I thus won’t go into detail about those topics here.
A good summary can for example be found here.
My bulletpoint list:
-
Don’t use verbs in your path - use http methods instead
-
Use plural for resources
-
Be consistent (e.g. same patterns, same identifier format)
-
Plan on how the API can be seen by humans and software
-
Keep caching in mind
-
Try to find a good middleground when returning information (
overfetching
,underfetching
) -
Use JSON where possible
-
Enable server-side compression
-
Enable prettyprinting of JSON
-
Add options to hide server tokens
-
Use meaningful http error codes for server errors
-
Return
204
on delete -
Include error messages
openapi: '3.0.2'
info:
description: "Central Loom REST API"
version: "1.0.0"
title: "MetaLoom Loom API"
termsOfService: "https://metaloom.io/terms/"
contact:
email: "api@metaloom.io"
license:
name: "Apache 2.0"
url: "http://www.apache.org/licenses/LICENSE-2.0.html"
servers:
- url: https://demo.metaloom.io/v1
tags:
- name: "namespace"
description: "The namespace within a loom instance."
externalDocs:
description: "Find out more"
url: "https://metaloom/docs/namespaces"
security:
- ApiKeyAuth: []
paths:
# User
/users:
summary: "Endpoint to access users."
head:
tags: ["User"]
operationId: "checkUsers"
responses:
'204':
description: "Users checked"
get:
parameters:
- $ref: "#/components/parameters/perPage"
- $ref: "#/components/parameters/page"
tags: ["User"]
operationId: "loadUsers"
responses:
'200':
description: "Users loaded"
content:
'application/json':
schema:
$ref: '#/components/schemas/UserListResponse'
examples:
UserListResponse:
$ref: '#/components/schemas/UserListResponse/example'
…
Resource Identifier
For the resource identifier in API paths (/users/:uuid
) I choose UUIDv4.
In some situations using UUIDs can be complicated for the client. Tags are a good example for this. A frontend may interact with the Loom backend using tag names instead of UUIDs.
A resource path that matches /tagsByName/:tagName
would thus be more convenient.
HTTP Methods
The REST API CRUD methods should not be be directly mapped to HTTP methods. The usage of the method must match the RFC2616.
A good resource on this topic is restfulapi.net.
Method | Description |
---|---|
|
Load the resource/s (idempotent) |
|
Create a new resource |
|
Partial update of a resource |
|
Check the resource/s without returning the content |
|
Update / create resource with id |
|
Delete the resource |
Note that POST requests are not idempotent and thus may create multiple elements when invoked multiple times.
For Loom I’m however considering to omit PATCH
and use POST
for partial updates. This does not validate the RFC and simplifies the API.
REST Models
The OpenAPI spec also contains the response and request format.
…
UserResponse:
allOf:
- $ref: "#/components/schemas/EditableElement"
required:
- username
properties:
uuid:
type: string
description: "UUID of the user."
username:
type: string
description: "Username of the user."
firstname:
type: string
description: "Firstname of the user."
lastname:
type: string
description: "Lastname of the user."
email:
type: string
description: "E-Mail address of the user."
enabled:
type: boolean
description: "Flag which indicates that the user is enabled."
admin:
type: boolean
description: "Flag which indicates that the user has admin permissions."
meta:
type: object
description: "Additional user properties."
example:
uuid: "217f8c981ada4642bf8c981adaa642c3"
username: "joedoe"
email: "joedoe@metaloom.io"
firstName: "Joe"
lastName: "Doe"
enabled: true
admin: false
meta:
birthday: "02-04-1987"
favoriteColor: "blue"
securityQuestionAnswer: "42"
creator:
username: "joedoe"
uuid: "eaf23fea-f6b2-46a7-84af-a7335471df5e"
editor:
username: "joedoe"
uuid: "2e20fca1-dbe7-4f28-99d7-15bf366b4a82"
cdate: "2021-03-20T20:12:01.084Z"
edate: "2021-03-20T21:12:01.084Z"
…
Pagination
Implementing an efficient paging solution can be a challenge in itself. Often many parameters need to be taken into account when retrieving a page of elements for a specific resource.
For loom the result of a page request can be affected by:
-
applied filters
-
user permissions
-
page selection
-
sort order
It is thus desired to generate database queries which all include these factors. The database can in turn cache the query and return the resulting page of elements.
There are two different paging options that are common:
Limit & offset based - /users?offset=10&limit=2
In this option you need to manually specify how many items you want to skip and afterwards select. It is easy to map these parameters to the needed SQL query. This option is however not very convenient when using the REST API for a web frontend which usually exposes pages to the users.
Page size based - /users?perPage=10&page=2
In this option page size and selected page need to be specified by the user. The offset and limit can derived from these parameters.
-
perPage * page = offset
-
perPage = limit
Response Caching / eTag Handling
Caching is an important aspect that is rarely mentioned when designing a REST API. In this section I’ll mention a few aspects that influence caching.
Foreign values
Whenever the response of a resource contains fields from foreign elements it can negatively affect the cacheability.
The element needs to be flushed from the cache whenever an included value of a foreign element is altered in the response. This not only affects client side caches but also the eTag generation for responses.
{
"uuid" : "fbfd0632-7d10-464f-ada4-ebe5511d226d",
"creator" : {
"firstName" : "Joe",
"lastName" : "Doe",
"uuid" : "89389d87-6af3-43ed-8374-23cda00c3750"
},
…
"name" : "Admin Group",
"roles" : [ {
"name" : "admin",
"uuid" : "8cb8c1c0-7a32-4b27-a15b-d28819ae5952"
} ]
}
Including foreign values can’t or should not always be avoided. This is mainly a design and usability decision. In this example the role names and uuid are included. Whenever the role name changes the cached group response needs to be invalidated / the eTag in the response needs to change.
Inclusion of immutable foreign values
Another option is to omit the mutable value from the referenced resource. This reduces the probability that the cache needs to be updated. It however may also increase the burden on the API user since additional resources may need to be manually fetched afterwards. This aspect is called underfetching.
Needless to say that GraphQL does solve the fetching aspect in a great way.
…
"roles" : [ {
"uuid" : "8cb8c1c0-7a32-4b27-a15b-d28819ae5952"
} ]
…
ETag handling
Returning an ETag is a good practice for REST APIs as it with helps client and proxy caching.
The ETag itself is a hash which needs to be computed for the requested resource.
Adding a version
column on each resource table can help a lot with the generation of the ETag.
The value of the column for the element needs to be incremented whenever it gets updated.
Using this value it is easy to compute a hash by selecting all version
values that influence the response.
A group response which also returns the role names must thus also include the role version’s in the ETag.
Server side caching can be used to store computed hashes in order to avoid database queries.
For Mesh I created an event aware caching implementation which could invalidate server-side caches across a clustered setup. For Loom this might also be a good option.
Caching of page responses
Including the totalCount
and pageCount
values in the response can also negatively affect caching since the response needs to be invalidated when ever new elements get added/removed which may alter the counts.
{
"data" : [ … ],
"_metainfo" : {
"currentPage" : 1,
"perPage" : 2,
"pageCount" : 10,
"totalCount" : 20
}
}
Alternatively it is also possible to just include information whether there are further pages. This may limit the effect of newly added elements to the resource on caching. The needed SQL select may in theory also be faster since a full table scan can be avoided in this case. The scan would otherwise needed to return the count values.
{
"data" : [ … ],
"_metainfo" : {
"currentPage" : 1,
"perPage" : 2,
"hasNext" : true,
"hasPrev" : true
}
}
Streaming / Time to first byte
Reducing the time to first byte can be a desired feature.
Progressive Media
A very typical usecase is the handling of progressive images. Images which are for example encoded in WebP can already be displayed by the client when only a part of the full image data has been loaded.
JSON Streaming
Streaming JSON data which was formatted in NDJSON can also desired when handling large bulk responses. The server side response handling gets a bit more complicated since individual entries need to be encoded and flushed to the connected client.
Returning paged response data in this way could improve the responsiveness of client applications. A reactive server design in combination with result streaming from a source database can allow this kind of streaming in an efficient way. Streaming the data also reduces the memory footprint of request processing since the full result in not kept in memory at one time. Instead only the currently retrieved database entry needs to be processed at one time.
Domain Model / Database Design
Final words
I hope you liked this post. It certainly gave me an option to reflect on the whole design and to summarize it.
Things might change in the future. I welcome any feedback/ideas via feedback@metaloom.io