Choosing a Datastore For The Flipkart User Engagement Platform

A User Engagement Platform

Tersely defined, a “User Engagement Platform” is a platform that solicits, accepts and displays user content and actions. These collections of content or actions are usually associated with both user and non user entities.

Such a platform enables engagement paradigms like reviews, user lists, ratings, votes, comments, karma… etc. It’s easy to imagine such a system being used by millions of users, who are constantly submitting and accessing varied and rich content.
This access pattern implies near real time updation of stats, ratings, votes, likes, plays, views… etc at scale. This usage creates loads of data, an abundance of which is volatile and ephemeral in nature.

In short, imagine a customer engagement platform for Flipkart, with about a million unique visitors per day, most of whom are actively engaging with the site and the community. We envision our user engagement platform handling such a load pattern with ease and ready to scale. One of the most critical, operational components in such a system is the data persistence layer. Ideally, this persistence layer should be highly performant, horizontally scalable with minimal overhead, consistent, always available and partition tolerant. But…..

CAP: Or why you can’t have your cake and eat it too

The CAP theorem, or Brewers theorem of distributed computing systems, states that it is impossible for a distributed system to simultaneously provide all three guarantees of

  1. Consistency: All nodes see the same data at the same time
  2. Availability: a guarantee that every request receives a response about whether it was successful or failed
  3. Partition tolerance: the system continues to operate despite arbitrary message loss or failure of part of the system.

The implications of the CAP theorem are crucial for selecting the persistence layer of an application or service. One has to carefully analyze their respective app/service requirements and pick the appropriate guarantees from CAP that their prospective persistence layer will support.

Data access patterns for User engagement services

Historically, the data access pattern for user generated content was low on writes and heavy on reads. However, modern social engagement patterns around user content have modified this pattern to being write heavy as well, to accommodate more likes, more ratings, more upvotes, more comments, more shares. So, today’s and tomorrow’s engagement services should accommodate, heavy write loads, heavy read loads, heavy aggregate(counter), modify and read loads. What becomes apparent if we look at user engagement services in this way is that aggregation needs to be a first class function of engagement services that is near real time, scalable and highly available.

Eventual consistency and User experience

At the same time we can also note that most of today’s engagement heavy applications tradeoff consistency for eventual consistency to achieve better scalability through horizontal partitioning and availability. The extent to which a congruent user experience can be pulled off with eventual consistency differs greatly. Reddit’s upvote user experience is a good example of using eventual consistency without it adversely affecting how the user perceives consistency on the platform.

Youtube’s “301 Views” for a video with “20000 likes” falls at the other end of the spectrum of good user experience using eventual consistency. So, with careful application and service design, effective tradeoffs on data consistency can be made without affecting the user experience. By doing this we immediately free ourselves from the “C” constraint of CAP, which leaves us free to explore the “Availability and Partition tolerance” guarantees which are very much desired in this context. The following section gives a brief example of the kinds of use cases that our engagement platform should support and what they imply for the persistence layer.

A playlist of the people

Imagine a community created playlist on Flipkart’s Flyte. This is a playlist where people add songs to a global playlist and then upvote or downvote songs added by other users. The users should have an always on experience and neither their submissions nor votes should be missed. The implication here is that there shouldn’t be a single point of failure in the write path. Hundreds/thousands of users could be simultaneously upvoting/downvoting the same song so locking of counters should be avoided and Distributed Counters should be preferred. Not every user can see the same view of the data as the nature of the data is very transient to begin with, so eventual consistency should do fine. Given the massive amount of user engagement, the sort order of the playlist is going to change very often so one should avoid query time sorting and prefer data that is natively sorted. Adequate clarity around such usage scenarios enabled us to confidently transition from requirements assessment to technology selection.

Technology Selection: What we want from a persistence layer

Let us consider the attributes of an appropriate persistence layer for engagement services.

  • Availability and Partition tolerance, with tunable consistency: as discussed above we should be able to trade off on consistency to accommodate highly available and partition tolerant engagement services.
  • Linear Scalability: In the face of massive amounts of content being created by  users the system should be able to scale without degrading performance.
  • Operational Efficiency: The operational requirements of the highly available, distributed, partition tolerant persistence layer should be minimal.
  • Community Support: There should be a thriving community of users and developers that is both effective and helpful.
  • Parsable Code Base: The code base should be gorkable both in size and complexity, with either good documentation or help from the community.
  • Professional Support: It is preferable to have companies that are providing professional support for the platform.

Though this isn’t an exhaustive list, it’s a good starting point to explore different alternatives. However, there are also functional requirements of the persistence layer that must be considered.

  • Aggregators are a first class concern: aggregator modification is potentially the most heavy write load element of an engagement service. So the persistence layer should support highly performant aggregator modification over a distributed infrastructure.
  • Sorting is a first class concern: Most user generated content is going to be in a sorted form, ranging from most recent comments(like news feed) to sort by helpfulness. Sorting large amounts of data should be handled in a highly efficient manner.
  • Multiple pivot points for data elements: each complex data entity should be accessible through its attributes through a reverse index or filtering.
  • Offline stats with map reduce: the persistence layer should support map reduce on data natively or should be able to easily be plugged into a map reduce framework like hadoop.
  • Search integration: text search should either be native or easily pluggable into the persistence layer.
  • Selectable/Updatable individual attributes: Attributes of a data entity should be individually selectable and updatable.
  • Schema Less: The data model should be flexible and should not impose any constraints on what and how much data is stored. A schema less data store like columnar or key-value data stores provide great data model flexibility.
  • Native support for ephemeral data: engagement services are going to generate lots of data of an ephemeral nature, i.e data that is important/valid only for a short period of time. Ephemeral data should not clog up the system.
  • Replication should be first class concern: replication, replication awareness should be deeply integrated into the design and implementation of the database.

Cassandra

Considering all the above factors we ruled out traditional RDBMSs and ventured into the wild west of databases aka “NoSQL”. After evaluating and eliminating a bunch databases (Document stores, key value stores, Riak – attributes not selectable/updatable, HBASE – catered to consistency and partitioning….) we arrived at Cassandra. Cassandra purportedly supports many of the above mentioned functional and non-functional requirements.

The Good

  • Online load balancing and cluster growth: Cassandra is designed from the ground up to be deployed on a cluster. Growing and shrinking clusters is transparent and can deal gracefully with node loss in a cluster.
  • Flexible Schema: Cassandra is a column oriented database and there is inherent flexibility in the schema for the data.
  • Key Oriented Queries: All queries are key,value oriented making the database highly performant if appropriately configured.
  • CAP aware: Cassandra is CAP aware, and allows one to make tradeoffs between consistency and latency (Consistency and Partition tolerance). Consistency can be configured for different levels. One can also tune consistency at a per query level.
  • No SPF: No single point of failure. Cassandra can be configured to be incredibly resilient in the face of massive node loss in the cluster. This is because replication is a first class function of Cassandra.
  • DC and rackaware: Cassandra was built to be datacenter and rack aware and can be configured appropriately to use different strategies to achieve either better DR(disaster recovery) resiliency or minimal latency.
  • Row and Key Caching: row level and key level caching is baked into Cassandra. This makes Cassandra a viable persistent cache.

The Bad

  • Limited Adhoc Querying Capability: due to the absence of a query language as comprehensive as SQL, adhoc querying are severely limited on a Cassandra cluster.
  • Tight coupling of data model: the way in which Cassandra data models are created are heavily dependent on their access patterns from the application layer. In contrast RDBMS systems model data based on entities and their relationships. Hence, the Cassandra data model is tightly coupled with application access patterns. This means any app level features, changes will impact the data model to a large degree.
  • No Codebase stability: Cassandra is still rapidly evolving and the codebase and feature set change constantly.
  • Bad documentation: the documentation is very sparse and outdated/defunct due to the rapid pace of change.
  • Lack of transactions: Cassandra does not have a mechanism for rolling back writes. Since it values AP out of CAP there is no built in transaction support.

The Ugly

  • Steep learning curve: the steep learning curve combined with having to sift through the internet to find appropriate documentation makes Cassandra very hard to get a handle on.
  • No Operational experience: Zero or low operational experience across the organization with Cassandra when compared with RDBMS systems.
  • Here be dragons: One of the scary prospects of going the Cassandra way is fear of the unknown. If something goes wrong, how will we be able to debug, do an RCA in a timely manner without any tooling/experience and fix the issue.

Appendix I

Cassandra Riak HBase
CAP AP AP CP
Language Java Erlang, C, Javascript Java
Type Column oriented Key-Value Column Oriented
Protocol Thrift / custom REST/Custom REST/Thrift
Tuneable tradeoffs
for distribution and
replication
Yes (N,R,W) Yes (N,R,W) No
Map/Reduce Through Hadoop Built In Through Hadoop
Distributed Counters Yes No Yes
Built in Sorting Yes No (map reduce) Yes