The Customer Success SaaS Platform Architecture
Businesses are shifting from selling products to providing software and services with a subscription-based model. The model generates hundreds – sometimes thousands – of monthly revenue streams instead of a few large product sales. This complicates the process of measuring and managing “success” with a customer. While revenue is a good metric of success, it is a lagging indicator. A customer may downgrade or outright cancel a subscription within months or even weeks if they don’t use or reduce the use of some of the features of a subscription-based product. That is where its important to understand customer success platform architecture so that it addresses the needs of a SaaS platform or subscription-based product.
Just tracking revenue or profit hides the problem of customers going away or downgrading in a few months only to be replaced by new customers who also last only a few months – a phenomenon the industry calls “customer churn” or “logo churn”. High customer churn is a ticking time bomb but is difficult to spot, especially if revenues continue to grow.
Recurring revenues are far more critical in a subscription-based model than one-off revenue. Traditional cost accounting systems and performance measurement systems like the balanced scorecard are not sufficient in a subscription-based model.
The SmartKarrot Customer Success platform aims to be the go-to point for managing customer success for subscription-based platforms. The platform provides early warnings and indicators like feature usage, user engagement, customer churn and revenue churn that are invaluable drivers for customer success and business success.
A Success Oriented Architecture for the SaaS Platform
As we started architecting the platform, we realized that it needed to track and measure every meaningful user activity and engagement in real-time, mash it together with operational data from CRM systems, data from sales planning and tracking systems like Salesforce, and financial data from enterprise systems.
The platform needed to mine and analyze this large mound of data to generate succinct scorecards and clean dashboards with well-thought widgets. It still needed to be a robust, high volume, and secure SaaS on Cloud platform.
So what did we need from this customer success SaaS platform?
We needed a robust integration platform that would connect with enterprise finance and planning systems and third-party sources like Salesforce, Freshdesk, Asana, Hubspot, JIRA, SugarCRM, and others. We built this integration using microservices, webhooks2, an ETL engine and a NoSQL datastore. With a serverless platform, we could do this without running or managing a single server ourselves.
The platform’s Web user interface is built with React – with carefully designed state management, routing, AJAX networking, and graph and chart libraries. Widgets are architected as micro front-ends.
Transactions on the server are managed with a serverless microservices architecture running on top of a NoSQL database.
The whole system is built as a multi-tenant app with a sharded multi-tenant database model. This approach is a combination of two approaches: tenant name table partitioning and tenant index partitioning.
The system needs strong analytics. We use data lakes and a query engine to manage this. A data lake allows us to store structured and semi-structured information. On top of this, we run big data processing, real-time analytics and – in the future – machine learning.
The platform itself runs on a SaaS on Cloud model. We meter the usage of our own services by our customers. Our metering module uses log analysis to measure resource usage for compute (to the 100-millisecond level), data usage (at rest and in motion for the NoSQL data, data in data lakes and file storage), and notifications (SMS, emails, and in-app notifications).
We soon concluded that our bleeding-edge customer success platform architecture needed a brand new form of architecture – what we called the “ Success-Oriented” Architecture.
Our customers love having a sandbox environment where they can quickly try out our service. Most of our features are available in the sandbox. Our architecture neatly ringfences the sandbox and allows us to throttle some sandbox services to ensure that the sandbox does not affect runtime, production system.
The system has sensitive customer data and is designed ground-up to be secure. The platform is used by customer success managers and senior managers. Even when processing large volumes of data, it still needs to be quick and performant. Good use of caching, pre-processing and optimized widgets help with this.
Finally, a good platform is only as good as its documentation. For API documentation, we liked the approach that Stripe and PayPal have taken. Built using the Slate API document generator, our documentation is beautifully designed, our documentation has everything on a single page, with a table of content on the left pane, details at the center and code examples on the right.
Want to know more about how we architected a cutting edge customer success SaaS platform? Read on.
SaaS on Cloud Platform
SmartKarrot is delivered as a SaaS on Cloud platform – not as a product. While most customers prefer a SaaS model, a small number want to deploy it on a private cloud. The platform is multi-tenanted and is designed to support hundreds or customers, hundreds of thousands or their users.
Business components and services use the microservices variant of SOA (service-oriented architecture).
The SaaS platform is structured as a collection of loosely coupled services as opposed to being a monolithic architecture. Code written with the microservices pattern is forced to be modular and easier to understand, develop, test, and more amenable to parallel development and continuous refactoring.
We use the AWS Lambda serverless compute engine to run our microservices. We like the scaling and the 100ms metering. This keeps costs low even with fluctuating computing demands.
Reporting, Analytics and Data Warehouse
SmartKarrot has significant logging, metering, reporting, and analytics. At the data layer, this will be separated into a reporting database. The data could use the S3 file storage. At a later stage, we could shift to using Amazon Redshift. Redshift works well with denormalized fact tables and data warehouses organized in the star and snowflake schemas.
Amazon Glue will be used as the ETL tool to shunt transaction data in DynamoDB into Redshift. Amazon Kinesis Firehose will do the job for real-time data.
Business intelligence (BI) and data visualization will come from the cloud-based Amazon QuickSight. It can be used for ad-hoc analysis, and quickly get business insights from data.
The SaaS platform stores its data in a NoSQL database. We chose Amazon DynamoDB.
Low Latency Operations
Unencumbered by data joins and relational mappings, and with data hosted on fast solid-state drives, we consistently get sub 10 millisecond response times even at scale.
For the moment, this is more than enough for our systems. In the future, we will enable the use of the DynamoDB Accelerator (DAX) in-memory cache to improve the latency from milliseconds to microseconds.
Data at rest is encrypted using the AES-256 algorithm. Data in motion, to and from the database, is also similarly encrypted.
Though the database supports ACID properties completely, we like the speed that comes from eventually consistent read operations. We almost always use eventually consistent database reads. This satisfies most of our use cases – other than a handful – where we switch to strongly consistent reads.
Web User Interface with React
We use React to build our user interface. The clean, encapsulated component-based design pattern that React articulates goes well with our widget-based UX design.
Each widget, or a set of related widgets, is decoupled from the others. They connect with the analytics backend, intelligently cache, refresh, and invalidate data, and reload themselves on the user interface.
With this widget-component based pattern, we make full use of React’s mapping of its virtual DOM to the browser DOM and ensure that the minimum possible update of the screen DOM objects.
All this results in a highly responsive user interface.
The Redux state container is the first thing that comes to mind when architects think of a state management solution to React. Though cumbersome to use, Redux solves the state management riddle well.
Another very attractive option is Redux’s own Context API. Released a little over a year back in August 2018, the API provides a single global context to manage shared data and actions. Context API is a part of React library.
The Context API has a problem though. It is not up to snuff with high-frequency updates. We plan to switch to the Context API when we are sure that this is not an issue anymore. In the meantime, we have rolled up a custom state and action management library of our own.
AJAX Networking and API Access
The SmartKarrot Customer Success SaaS platform exposes its services through a RESTful API. Some good choices for pre-built solutions are Axios, the React Fetch API, and jQuery AJAX.
The React Fetch API is built into React and is often the natural choice. We use Axios on our platform. When comparing Axios and the Fetch API, we liked two aspects of Axios:
- The code is more concise and clean. We don’t need an intermediate function call to convert the data returned from the server into a JSON object. Axios automates the JSON transformation.
- The Axios promise handles errors in a more intuitive way. For example, on a 400 error response from the server, the Axios promise runs the “catch” block rather than the “then” block.
Single Page Applications (SPAs) – like those built using React – do not load new content each time a user navigates to a new page.
We use the React Router library to route content to manage links and route the user to the new page. As a fresh page is not loaded from the server, the link loads very quickly.
The SmartKarrot Customer Success SaaS platform uses graphs and charts extensively to concisely display information to senior managers. The platform has pie charts, bar graphs, heat maps and gauges. This calls for an effective and flexible chart library.
A year back we started building the SmartKarrot Customer Success Platform using AngularJS as the front-end technology.
The platform has multiple dashboards, each with carefully thought through widgets. We offer a choice to our customers to integrate their finance, HR, and operations enterprise systems and external systems like Salesforce, Asana, and Freshdesk. Widgets switch on and off based on what systems are integrated.
Each component in the micro front-end connects with its own micro-service on the backend. This deeply specialized ecosystem lets us build large and complex dashboards without the inefficiencies from a monolithic single-page app architecture on the front-end with a similar monolith on the back-end.
Mobile SDK Architecture for SaaS Platform
The mobile SDK exposes two layers of services:
- A native iOS and Android SDK wrapper over the functional REST API.
- A UI view that lets developers quickly build functionality.
We think of widget-based micro front-ends providing similar benefits on the front-end as microservices do on the backend.
Mobile Hierarchy of Layers
Much of the UI layer on the mobile is structured using the MVVM design pattern. We think of widget-based micro front-ends providing similar benefits on the front-end as microservices do on the backend.
UI Theme Management and Customization
This module implements the default themes and exposes a view customization interface to developers.
MVVM (Model-View-Viewmodel) Pattern
Our system screens have to display and input a large number of fields (example surveys), perform validations, access business services on the backend, and marshal data for storage. Using the traditional MVC pattern will result in bloated and unmanageable controller classes. We will structure the mobile-side code using MVVM.
API Event Dispatch
App usage events are generated at a high frequency – sometimes tens of them in a minute. Such API calls need to be buffered and dispatched in a separate thread. This API Event Dispatch module will buffer and dispatch all high-frequency APIs.
Persistent Storage, Cache
The single-version-of-truth data will reside on the server. The mobile will cache a relevant part of it for quicker user responses and offline use. An SQLite database will be used for persistent storage and cache.
Object Relational Mapping (ORM)
An ORM utility will be used to convert the table-style relational storage in SQLite into an object-oriented structure. WaveORM could be used for this on Android systems.
Cached data and offline facilities will be critical to good response times on the mobile. A GraphQL utility will be used for automatic sync and conflict resolution of the cache with the server storage.
Identity, Access, Security
The security layer will come from AWS Cognito and IAM. TLS will be used to enable secure communication between the mobile and the server. Cached data on the mobile need not be encrypted and will rely on the mobile OS providing a secure sandbox.
Multi-tenanted Customer Success Platform Architecture
The architecture that we are following on our platform is Multi-Tenant Architecture. Multi-Tenant architecture simply means that the same app, running on the same OS, with the same hardware and same data storing mechanism, servers multiple tenants (users). This architecture is very cost-effective (lesser number of licenses = lesser cost), data aggregation/ data mining effort is minimal and it simplifies release management for the tenants. But this architecture is a little complex and security testing is more stringent owing to the fact that multiple customers’ data is being commingled.
There are three approaches DynamoDB provides us to partition our tenants data::
Linked Account Partitioning (Separate Database)
This is the most extreme option available. It provides a separate database namespace and footprint to every tenant. This is achieved by introducing separate linked AWS accounts for each tenant(enabling the AWS Consolidated Billing feature) and one common Payer’s account. Once the mechanism is established, we can provide a separate linked account for each new tenant. These tenants would then have distinct AWS account IDs and, in turn, have a scoped view of DynamoDB tables that are owned by that account.
- A bit simpler to manage the scope and schema of each tenant’s data
- Provides a natural model for evaluating and metering a tenant’s usage of AWS resources.
- Cumbersome to manage
- Impractical if there are a large number of tenants
Tenant Name Table Partitioning (Same Database, Separate Schema)
This model embraces all the freedoms that come with an isolated tenant scheme, allowing each tenant to have its own unique data representation. We may use a distinct naming schema that prepends a table name with some tenant id, helping us to identify ownership of the table.
- We can apply for AWS IAM roles at table level to constrain access based on tenant role
- AWS Cloudwatch metrics can be captured at table level
- IOPS can be applied, allowing to create distinct scaling policies for each tenant
- The downside is more on the operational and management side. For e.g.: The operational team will require some awareness of the tenant table naming scheme in order to filter and present information in a tenant-centric context.
- It adds a layer of indirection to any code you might have that is metering tenant consumption of DynamoDB resources.
Tenant Index Partitioning (Shared Everything)
This approach places all the tenant data in the same table(s) and partitions it with a DynamoDB index. This is achieved by populating the hash key of an index with a tenant’s unique ID. This means that the keys that would typically be your hash key (Customer ID, Account ID, etc.) are now represented as range keys.
- It promotes a unified approach to managing and migrating the data for all tenants without requiring a table-by-table processing of the information.
- Enables a simpler model for performing tenant-wide analytics of the data helping in profiling trends.
- Inability to have more granular, tenant-centric control over access, performance, and scaling.
- Data has to be isolated very carefully, as queries can, in error, access another customer’s data.
- This approach could be viewed as creating a single point of failure. Any problem with the shared table could affect the entire population of tenants.
Preferred Approach for Customer Success Platform Architecture
Multi-tenancy can be present at any layer or all the layers. As mentioned above there are various approaches to achieve multi-tenancy. We are going to go ahead with a combination of model 220.127.116.11 (Tenant Name Table Partitioning) and model 18.104.22.168 (Tenant Index Partitioning). This approach is known as a Multi-tenant app with a sharded multi-tenant database model.
Most SaaS platforms access the data of only one tenant at a time, which allows tenant data to be distributed across multiple databases or shards, where all the data for anyone tenant is contained in one shard. Combined with a multi-tenant database pattern, a sharded model allows an almost limitless scale.
Sharding adds complexity. A catalog is required to maintain the mapping between tenants and databases. In addition, management procedures are required to manage the shards and the tenant population. For example, procedures must be designed to add and remove shards, and to move tenant data between shards.
Smaller database – Easily manageable
By distributing tenants across multiple databases, the sharded multi-tenant solution results in smaller databases that are more easily managed. For example, restoring a specific tenant to a prior point in time now involves restoring a single smaller database from a backup, rather than a larger database that contains all tenants.
Tenant identifier in the schema
Depending on the sharding approach used, additional constraints may be imposed on the database schema. If we use this model we will need to use a Tenant identifier which will be used as the primary key for any user/tenant.
Elastic pool of shards
Sharded multi-tenant databases can be placed in elastic pools. In general, having many single-tenant databases in a pool is as cost-efficient as having many tenants in a few multi-tenant databases. Multi-tenant databases are advantageous when there are a large number of relatively inactive tenants.
Metering data means accurate tracking of client usage and also providing the capacity for analyzing client usage patterns. The main thing to meter here is the API usage by every user. API usage comprises of both API gateway and Lambda calls. All this will be metered in a centralized manner. There are different approaches that we can take to log every user’s API usage. The two main ways are
- Store logs in dynamoDb
- Store log text files
For more details, access the customer success platform architecture book here.
Published March 22, 2020, Updated August 26, 2022