Building a Notification system that delivers 1,50,000 messages in a month | Documenting my Startup learning journey - #1

Building a Notification system that delivers 1,50,000 messages in a month | Documenting my Startup learning journey - #1

High level Architecture of Notification system

Overview of the Backend

We have divided our entire backend into multiple microservices.
- The Hotel service manages all the core business logic.
- The notification service contains code for different providers for various communication channels (Email, WhatsApp, SMS), wrapper functions to convert our data and transform them into specific templates that can be used by the providers to send notifications, etc.

Work Flow:

Stage 1

  • As seen in the diagram, stage 1 involves triggering specific events when a particular action is done. Eg: When a guest books a room in a hotel (Action), we trigger an event and dispatch this to the SQS.

  • So each action will be mapped to a specific event. We have this data directly in code as there is no real need to store this info in a database.

  • All these events are dispatched to AWS SQS asynchronously.

Stage 1.5

  • We use AWS SQS (simple queueing service) here to process all the events dispatched by various services (In this diagram hotel service) one by one asynchronously on a first come first serve basis.

  • Notifications generally take a few seconds to go through.

  • The main advantage of using a queuing system:

    • The ability to process data asynchronously which enables the particular action to be completed without waiting for the notifications to be sent to the end user (enables immediate response for that particular action/API call).
  • NOTE: We could have used open-source alternatives like RabbitMQ etc instead of AWS SQS but since we are in the early stages right now, we are more focused on improving the business vertical by building and experimenting with innovative features to bring in new cashflows and more revenue.
    Once the business is at a good stage, more focus will be given to the engineering vertical to save up on costs and making things highly efficient and scalable.

Stages 2, 3, 4

  • The notification service is subscribed to the AWS SQS and is constantly listening for new events from the SQS.

  • Once we receive and process an event, it is deleted from the queue. We then proceed to process the next event, and this cycle continues.

  • Each event message has the following meta information:

    • The event type/name.

    • The data needed to send the notification along with the user's contact details like phone number, email address, etc.

Processing of event messages:

  • All the notifications for a particular event are mapped in our SQL database in a separate configuration table.

  • Eg: bookingCompleted event will have:

    • An email notification with its associated template ID.

    • A WhatsApp notification with its associated template ID.

  • All these templates are created using third-party companies (Mailjet, Wati, etc). We create templates in their user interface with dynamic variables. Use the template ID and map it for each event.

  • We store the event and notification configurations in the database as its subject to change at any given time. We might decide to change the static contents in the template, or we might decide to deactivate WhatsApp and only send emails for a particular event. The idea is that configuring all these in the database is faster and we need not wait for the code to be deployed once again to reflect all these changes.

Stage 5

  • Once we have the notifications template ID for each event, we use the wrapper function to transform the notifications and send them to the specific third-party providers based on the medium of the notifications.

Scope for improvements

  • AWS SQS is a single point of failure in the current architecture.

    • We could have a backup queue or just simply store the message in a new table and make sure we manage the state in the same table. SQL Databases can also be used as queues themselves.

Summary

  • The architecture is highly scalable and de-coupled.

  • Introducing an async workflow enables the action to be completed immediately and increases the API performance.

  • Every message sent to the queue is persistent and is consumed one by one in a FIFO order.

  • Templates are managed in the third-party providers' UI.

Socials