Query Languages For The Web

Querying data over the web is a common problem with many often adhoc solutions. In this article, I survey some common solution patterns in use. 31 January 2020

# Background

Data can often be conceptually represented as tables. Each column represents a particular dimension or attribute, and each row represents a different entity. For example, an address book might have columns to represent name, phone, and email, while each row represents a different contact.

With this tabular representation, the types of operations of interest include (often more than one at once):

horizontal filtering
selects a subset of rows; eg. contacts with a particular last name
vertical filtering
selects a subset of columns; eg. first name and last name only
sorts results; eg. sort by first name
consume subsets of results; eg. first 10 results, next 10 results, etc.
resource embedding
request related resources in the one query, avoiding the N+1 problem  and reducing latency

# Structured-QL

Naturally, the first query language that comes to mind is Structured-QL or SQL. It has been around since 1974, is standardised , and is considered a stable language.

The basic query in SQL is FROM/JOINWHERESELECTORDER BYLIMIT; here written in its conceptual execution order . Horizontal filtering is achieved with the WHERE clause, vertical filtering with SELECT, ordering with ORDER BY, and pagination with a combination of SELECT and LIMIT. Since SQL query results are themselves tabular, resource embedding is not natively supported.

Postgres roles  offers Row-Level-Security as well as Column-Level-Security to restrict access.

Finally, LISTEN and NOTIFY provide a subscription mechanism for change notification.

# PostgREST

In many web applications, the application layer (Ruby / Java / etc.) exposes some query syntax over HTTP, then depending on application logic, translates that into another query syntax over SQL. PostgREST  aims to eliminate this layer, by directly providing a RESTful  HTTP API to the database. The database structure enforces which tables and columns are exposed, and accessible to which roles  (using the Authorization HTTP header). It does so without any additional application programming, as all the data constraints live within the database itself.

Horizontal filtering of tables is achieved with conditions on columns. More complicated queries can be achieved using views .

GET /people?age=gte.18&student=is.true HTTP/1.1

Vertical filtering is similarly achieved. Notice that there is no distinction between the operator and the column, ie, we can't create a column named select — not that we necessarily want to.

GET /people?select=first_name,age HTTP/1.1

Ordering has a similar syntax.

GET /people?order=age.desc,height.asc HTTP/1.1

Pagination is achieved in one of two ways. Either with HTTP headers or with query parameters. The former is preferred because the response uses the same mechanism to indicate if/when the server is unable to meet the request.

GET /people HTTP/1.1
Range-Unit: items
Range: 30-45
GET /people?limit=15&offset=30 HTTP/1.1

Resource embedding allows foreign keys and join tables to be traversed within a single API query.

GET /people?select=first_name,purchases(id,description) HTTP/1.1

With a strong focus on the persistence level, any additional non-persistence related logic can be deferred using Aspect Oriented Programming  techniques, perhaps with hooks in an API gateway layer.

# Feathers-JS

FeathersJS  is a framework for creating realtime applications. It is datastore agnostic, but uses a JSON-based query language over WebSockets or serialised over HTTP query parameters. This language is a dialect of MongoDB query operators , which finds common use in several popular Javascript packages too.

Keywords like $select, $sort, or $limit have a prefix to distinguish them from column names. The same operations of horizontal filtering, vertical filtering, ordering, and pagination can be similarly performed.

GET /people?age[$gte]=21&student[$eq]=true HTTP/1.1
GET /people?$select[]=first_name&$select[]=age HTTP/1.1
GET /people?$sort[age]=-1&$sort[height]=1 HTTP/1.1
GET /people?$limit=15&$skip=30 HTTP/1.1

In order to support real time notifications in a uniform way, FeathersJS does not natively support resource embedding, but this can be supported by extending the query language similar to PostgREST. Whenever a resource is created, updated, or deleted, the event is published to authorized subscribers.

Aspect Oriented Programming  for additional non-persistence related logic is natively supported through Feathers Hooks .

# Graph-QL

GraphQL  models data storage and retrieval as a graph, allowing deeply nested resources to be retrieved in one query. It is not prescriptive about the protocol between client and server, and is primarily concerned about the request and response formats. Typically though, GraphQL is hosted on a HTTP server with a single endpoint:

{`GET /graphql?query=GRAPHQL_QUERY HTTP/1.1 `}

It's query language is inspired by JSON, and can be deeply nested. Filtering is achieved using fields . Where a field is plural, horizontal filtering is applied, where it is singular, vertical filtering is applied. Fields may have additional arguments  to allow additional data transformations. In the example below, the people table is horizontally filtered for a minAge and student status, and its desired columns vertically filtered for firstName and age.

people(minAge: 21, student: true) {
"data": {
"people": [{
"firstName": "Alice",
"age": 25
}, {
"firstName": "Bob",
"age": 26

Ordering and pagination are left up to the implementation. For example, it could be implemented using query arguments , using keywords like order, limit, and skip or similar. Subscriptions are also defined in the language, but is also left up to the implementation.

GraphQL is also often used even if the underlying data is not graph-like. Practioners will cite resource embedding as the primary reason for its use.

# Geo tile servers

Slippy Map  applications serve geospatial data in small bundles or tiles. In a sense, it is a query language optimised for geospatial pagination. The basic query has the current zoom level and x-y coordinates in Web Mercator projection  embedded in the URL.

GET /:zoom/:x/:y.pbf HTTP/1.1

Dynamic  vector tiles  where the payload can contain data that is horizontally and vertically filtered, can be supported through additional query strings. There is no standard in how that can be specified – look to PostgREST or FeathersJS for inspiration on the query syntax.

# Summary

Plain SQL is a powerful language for querying data, but it is not directly accessible from a browser. PostgREST offers a HTTP-to-SQL bridge without having to write an application server, and instead using the database structure directly.

FeatherJS is database agnostic and is geared towards realtime applications using WebSockets, but can support HTTP also with a JSON based query language. GraphQL extends this even further by using a JSON-like language to allow querying and filtering of deeply nested embedded resources.

Finally, though not really a query language, I've included Slippy Map applications here as an example optimised for pagination.