Our customer had a need to create a robust and scalable application based on high volumes of search requests with no downtime for upgrades / maintenance. Their main focus was to spin off a new product based on a portion of an existing product to respond to market needs, including making it scalable and eliminating downtime. They were aiming for as close to a real-time search as possible.
The Solution Design
Early in the design process we identified Elasticsearch as the search technology of choice. We chose to use Elastic Cloud to avoid having to provision hardware or manage the cluster. Because of high volume and automatic scalability requirements, we utilized AWS Serverless architecture to allow for increased volume without having to provision worst-case-scenario hardware.
Due to the nature of the data being served (95% of a stored entity needed to be returned to the user) we chose to store the entire document in Elasticsearch even though only a small portion of the document was involved with the lookup. This made Elasticsearch the repository of record for this data.
In this case study, we will outline the architecture, design decisions, and the process Axispoint undertook to meld these two symbiotic tool sets.
Setting up Elastic Cloud was a painless process. We took advantage of the free trial tier to begin development and setup a proof of concept. Once all a parties vetted the PoC, we engaged with Elastic and converted our trial tier to a production ready cluster. Elastic Cloud offers to setup the cluster in both Amazon (AWS) and Google (GCP), and we chose AWS because we wanted to keep the architectures in the same ecosystem. A tailored index mapping where only the searchable fields are indexed provided best performance possible.
It's worth noting that AWS offers an Elasticsearch service, but after comparison we made the architectural decision to go directly with Elastic Cloud because we wanted access to all of the configuration available to a private instance, which the AWS product does not support.
Serverless architecture is a broad umbrella of technologies, all of which function together to make the final application work. The following describes the various components used for this solution.
In the AWS world, Lambda is their “functions as a service” offering. While there are many languages you can write these functions in, we chose NodeJS because of the deep set of library functions offered.
In order to turn Lambda functions into an API, you pair them with AWS API Gateway, which marries an http endpoint to a Lambda function. The user experience looks something like:
- User sends a PUT request to https://www.mydomain.com/searchForStuff
- The domain executes a Lambda function to get results
- Domain returns them to the user.
Cloud Formation and the Serverless Package
One NodeJS Package that takes a lot of the pain out of developing Serverless applications is “serverless” (see https://serverless.com/ for detailed information). This package allows a developer to setup a configuration file (serverless.yml) and define the relationships between Lambda functions and various ways that the functions can be called. In our case we married the Lambda functions to API gateway for user interaction, AWS SNS service for message queuing and Cloud Watch for job scheduling.
Serverless reads the configuration file and provisions the necessary services through AWS Cloud Formation which is Amazon's provisioning service.
We utilized several add-ons to the base Serverless package.
- serverless-aws-documentation: This allows the specification of inputs, outputs, and descriptions to document the API endpoints. It also allows export to tools such as Swagger for API documentation.
- serverless-offline: This was a very handy plug in that would allow developers to run functions locally and test them before spending the time to do the deployment.
- serverless-reqvalidator-plugin: Used along with Serverless-aws-documentation so that API gateway will validate requests before firing off lambda functions.
- serverless-plugin-warmup: This one is interesting. Lambda functions take time to spin up if they go idle, so this plug in allows you to configure your functions to wake up periodically so that they are always ready for your user.
This is where all of the log messages go. The Serverless library sets up a Cloud Watch log group for every Lambda function created. These logs can then be analyzed for errors or system health.
The Application Solution
The application (from the consumer point of view) is an API endpoint that fronts searching against our clients' data repository. The application is two-fold: one that receives data from the client, and one that allows our client’s customers to search through that data.
Indexing the Data with No Down Time
The client had an existing process to gather all of the data relevant to a given use case and package it for delivery. The contents are then zipped together and encrypted, and then subsequently dropped onto an SFTP server. The package contains several XML files: a list of parent objects, and a series of child objects that are separated like one would find in a relational database.
We created a Lambda function running on a schedule that would figure out what the next file is in sequence and see if it arrived. If the correct file arrived, the function would pull the file down, decrypt it, unzip it and process its contents. To be efficient, our process would load all child objects into memory and for each parent it would put the whole searchable document together into a single object. That object was then added to the index through Elastic’s API.
Because of the nature of the application, the object had to be further manipulated to enhance it's searchability. Once this processing was added to the equation, the process no longer ran in the allotted 5 minutes - the maximum processing time for a Lambda function. We refactored the process so that once the parent object was joined with its children, it was put on a queue using SNS. A second Lambda function would then pick it up, manipulate it as required, and index it. We throttled the process so we didn’t overload our cluster simply by configuring the queue correctly.
For the moment we are not utilizing Elastic’s bulk API. Instead, each entity is processed by a separate child Lambda process. Future iterations may package them together and insert as a single call, but unlike database bulk inserts the only savings with Elastic is the http overhead, so due to satisfactory performance metrics, the process will remain as currently implemented. There is a balancing act between how large SNS messages can be, and any gains made by eliminating http chatter.
The bulk of the data comes in once a month; incremental updates follow but are processed in seconds rather than minutes. When the full refresh file comes in, a new monthly index is started and is loaded in parallel with the old month. Once the process is complete the Elasticsearch alias is repointed to the new month, resulting in no down time during re-indexing.
Executing the Search
The client required a certain template for providing the key information to be searched on. That template was fed into our Serverless documentation so that API gateway would enforce it.
We noticed early on that API gateway’s error messages were not helpful – telling the user only that the request was bad but not providing any reason why. We made the decision to enforce only the most general structure within API gateway and if the user got at least that correct we would check the object using https://github.com/hapijs/joi. Here we traded the expense of firing off Lambda potentially for invalid input for the user experience of getting descriptive error messages. AWS reported that the enhancement to return more robust messages is on their roadmap but there is no delivery date defined.
Assuming the information passes the validation, it must be manipulated using a similar algorithm as the index creation in order to compare apples to apples. It can then be translated to an Elasticsearch query.
The client had a requirement to provide a “match score”, a consistent measurement of how close the search term matched the document. Elastic provides a relative score based on how the results returned compared to each other and the rest of the documents in the store. This means that the score would change as documents are added to/removed from the index. This lead to a decision to calculate a score once the results were returned. We likely could have done this directly in Elastic but rather than spending time figuring out how, we moved the rescore functionality into our application. Performance doesn't seem to be degraded; if anything, we offloaded that process from Elastic to Lambda, which was in our opinion the more scalable solution.
The other thing we did that may be unusual was take advantage of an Elastic feature called highlighting. This allowed us to know what search fields the search hit on, so we could focus our rescoring algorithm on only the terms within the document that were close enough to cause the document to be returned in the first place.
Once the documents were returned they were additionally filtered by score and some other information, translated to the format the client wanted returned, and returned to the user.
By utilizing this architecture we were able to meet the goal of a 300ms response time average at 1,000,000 transactions per day. We used Gatling (https://gatling.io/) to run the performance tests, setting up a random list of sort terms to call 2,000 requests in 100 seconds (which equated to 1,728,000 transactions per day). This accounted for bursts during the day or uneven usage. Our mean was 194ms and 95% of the requests finished within the 300ms goal. Some of the outliers had to do with exceptionally large objects or search results that returned a lot of documents.