Building a Stripe-like Search Language from Scratch

At Zuplo, we're enthusiastic advocates of the Stripe API. Its consistent and straightforward design significantly enriches the developer experience. At Zuplo, we focus on API design since we create tools that aid developers in deploying exceptional APIs. While developing our REST APIs, we often begin by considering how Stripe would approach a feature. Although we do not always adopt their methods, there are instances where emulating a well-implemented strategy is more efficient than creating a new one from scratch.

Recently, we have been refining our internal APIs to streamline our codebase and minimize duplication. In the past, we enabled filtering for various list requests, but our implementation lacked consistency and repeatability. Initially, we simply added query parameters to API requests, like this:

GET /accounts/:accountId/users?role=admin

This really isn’t a terrible practice, but over time we started to add more of these filters and we ended up with some pretty ugly APIs that started to have lots of query parameters.

GET /accounts/:accoundId/users?role=admin&role=editor&created-after=2023-01-01&created-before=2024-01-01

From the perspective of API consumers, this isn't particularly problematic if the parameters remain consistent. However, for API maintainers, it imposes a significant burden. Each new query variant demands custom coding to manage the query strings and update the database queries accordingly. This becomes increasingly challenging to maintain and test across a large codebase with numerous endpoints.

To address this, we decided to implement a stripe like search query language and parser to get a simple and consistent query language across our APIs. We drew inspiration from the query languages used by Github and Stripe, both of which are straightforward yet robust enough to significantly reduce our reliance on multiple filter query parameters.

Info

This is probably the point where some people are thinking, why not just use GraphQL. The short answer is that we prefer REST for most of our APIs. The long answer can be read on this post.

Building a Parser#

We chose to develop a subset of the Stripe Search Language. The first step was to explore existing open-source libraries that could meet our needs in Typescript and Go. However, finding none that were suitable, we decided to create our own.

There are two main approaches to building a parser: manually or using a generator that produces a parser from a specified language grammar. We opted for the latter due to its ease of development and the potential for reuse in other programming languages, like Go.

The language fundamentals are simple and outlined below. A query clause consists of a field, an operator, and a value:

clauseemail:"amy@rocketrides.io"
fieldemail
operator:
valueamy@rocketrides.io

Negations are handled by prefixing a field-value pair with a -, and logical operators like AND and OR can be used to combine conditions. Basic comparison operators such as <, >, >=, and <= are also supported.

The complete specification is available here: Stripe Search Query Language

Writing the Grammar#

After deciding on the language specification, we proceeded to define and generate the grammar. We settled on using Peggy.js, which simplifies the generation of a parser from a grammar. To initiate, I utilized GitHub Copilot to draft the initial code.

Copilot Query

After a few iterations with Copilot I had a simple grammar.

{
  function makeTerm(field, value) {
    return {field: field.join(''), value: value.join('')};
  }
}

Search
  = ws* terms:TermList ws* { return terms; }

TermList
  = head:Term tail:(_ "AND" _ Term)* {
    return [head].concat(tail.map(function(t) { return t[3]; }));
  }

Term
  = ws* field:Field ws* ":" ws* value:Value ws* { return makeTerm(field, value); }

Field
  = chars:[a-zA-Z]+ { return chars; }

Value
  = chars:[a-zA-Z0-9]+ { return chars; }

_ "whitespace"
  = [ \t\n\r]*

ws "whitespace"
  = [ \t\n\r]+

Next, I generated the parser using Peggy.js.

peggy -o parser.mjs --format es grammar.peggy

This output the parser.mjs file containing the parser. I can then use this to parse some queries.

import * as parser from "./parser.mjs";
 
const result = parser.parse(`account:foo AND project:baz`);
console.dir(result, { depth: null });

The parsed output of that query is shown below.

[
  { "field": "account", "value": "foo" },
  { "field": "project", "value": "baz" }
]

Building the Database Query#

I won’t go into all the details of how we build the database query from this parsed output (maybe in another post), but you can imagine how this would be done using something like Prisma for example.

import { PrismaClient } from "@prisma/client";
import * as parser from "./parser.mjs";
 
const prisma = new PrismaClient();
 
const result = parser.parse(`name:Alice AND email:alice@prisma.io`);
const user = await prisma.account.findFirst({
  where: {
    name: result.name?.value,
    email: result.email?.value,
  },
});
console.log(user);

Safely Expose your Query to Users#

When handling untrusted queries, it's crucial to ensure that only permitted inputs are allowed, both in terms of key/value pairs and query types. To safeguard our API, we implemented an API Gateway using Zuplo to filter queries based on predefined rules.

In our Zuplo API Gateway setup, we created a route /users that accepts a search query parameter. We wrote a custom inbound policy that uses our parser, and verifies the fields against a list of permitted ones. If a query includes disallowed fields, the API returns an error.

This method addresses some security concerns related to dynamic queries but does not cover all potential issues. When implementing this yourself, be sure to test a variety of scenarios.

import { ZuploContext, ZuploRequest } from "@zuplo/runtime";
import { parse } from "./parser";
 
const allowedFields = ["name", "email", "createdOn"];
 
export default async function policy(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const search = request.query.search;
  const query = parse(search);
 
  context.log.info(`Initial query`, query);
 
  const notAllowedFields = query.filter(
    (q) => !allowedFields.includes(q.field),
  );
 
  if (notAllowedFields.length > 0) {
    context.log.warn(`Fields not allowed`, notAllowedFields);
    return new Response("Query not allowed", { status: 400 });
  }
 
  return request;
}

When I try to call the API with a query that contains a field that is not on the list of allowed fields, the API returns an error.

Bad query

Conclusion#

In this article, I showed you how to build a custom query syntax and parser using modern tools like Peggy.js, GitHub Copilot, and Zuplo, enhancing both the developer and user experience by simplifying API queries. Hopefully you find this helpful in your own projects.

Designed for Developers, Made for the Edge