Progressive Web App: A New Way to Experience Mobile

There have been a few turning points in the history of the web platform that radically changed how web apps were built, deployed and experienced. Ajax was one such pivot that led to a profound shift in web engineering. It allowed web applications to be responsive enough to challenge the conventional desktop apps. However on mobile, the experience was defined by native apps and web apps hardly came close to them, at least until now.  Mobile Engineering team at Flipkart discovered that with right set of capabilities in a browser, a mobile web app can be as performant as a native app.

Thanks to  Extensible Web Manifesto’s efforts to tighten the feedback loop between the editors of web standards and web developers,  browser vendors started introducing new low-level APIs based on the feedback from developers. The advent of these APIs brings unprecedented capabilities to the web. We, at Flipkart, decided to live on this bleeding edge and build a truly powerful and technically advanced web app while working to further evolve these APIs.  

Here’s a sneak peek into how we’ve created an extremely immersive, engaging and high-performance app.

Immersive : While native apps are rich in experience, they do come with a price of an install. While web apps solved the instant access problem, the network connectivity still plays a significant role in defining the web experience. There have been multiple attempts at enabling offline web apps in the past, such as AppCache and using LocalStorage/ IndexedDB. However, these solutions failed to model complex offline use cases described below, making it painful to develop and debug issues. Service Workers replace these approaches by providing a scriptable network proxy in the browser that allows you to handle the requests programmatically. With Service Workers, we can intercept every network request and serve a response from cache even when the user is offline.  


We chose to use SW-Toolbox, a Service Workers wrapper library that enables using simple patterns such as NetworkFirst, CacheFirst or NetworkOnly. SW-Toolbox provides an LRU cache used in our app for storing previous search results on the browse page and last few visited product pages. The toolbox also has TTL-based cache invalidation mechanism that we use to purge out of date content. Service Workers provides low-level scriptable primitives that make this possible.

Screen Shot 2015-11-09 at 11.28.18 PM

Making the right solution work was as hard as devising it. We faced a wide spectrum of challenges from implementation issues to dev tooling bugs. We are actively collaborating with browser vendors to resolve these challenges.

One such significant challenge that emerged from our use of Service Workers was to build a “kill switch”. It is easy to end up with bugs in Service Workers and stale responses. Having a reliable mechanism to purge all caches has helped us to be proactively ready for any contingencies or surprises.

One more cornerstone of a truly immersive experience is a fullscreen, standalone experience launched right from the home screen. This is what the Add to Home Screen (ATHS) prompt allows us to do. When the user chooses to add to home screen, the browser creates a high-quality icon on the home screen based on the metadata in the Web Manifest. The ATHS prompt is shown automatically to the user based on a heuristic that is specific to each browser. On Chrome, if the user has visited the site twice within a defined period, the prompt will trigger. In the newer Chrome versions, we receive an event once we have matched the heuristic and can show the prompt at a later point in time.

While the heuristic is indispensable to prevent spam on the web platform, we felt it was too conservative and convinced the Chrome team to tweak the heuristic for more commonly occurring scenarios. Based on our feedback, experiments are underway by the Chrome team to shorten the required delay between interactions.

Native apps use splash screen to hide the slow loading of home screen. Web never had this luxury and there was a blank page staring at the user before home screen could load up. Good news is the latest version of Chrome supports generation of a splash screen that radically improves the launch experience and perceived performance of the web app.


Another capability we’re championing is opening external links in the standalone app version rather than in a browser tab. Currently, there is a limitation with Android, but we are working with the Chrome team to enable this use case as soon as possible.  

Engaging: Being able to Re-engage with our users on the web has always been a challenge. With the introduction of the Web Push API, we now have the capability to send Push Notifications to our users, even when the browser is closed. This is possible because of Service Workers, that live beyond the lifetime of the browser.


High Performance: A highly performant mobile app is the one that requests less data over the network and takes less time to render.  With a powerful scriptable proxy and persistent cache living in the browser, the data consumption from the network can be reduced significantly. This also helped in reducing the dependency on the network strength and eliminating all latencies on a repeat visit.

Rendering Performance has always has been a challenge for the web. We identified significant improvements in performance when GPU handled rasterization compared to CPU doing it. Hence we decided to leverage GPU rasterization on Chrome (Project Ganesh,  by including the required meta tag in our HTML). At the same time, we have carefully balanced the right number of GPU accelerated composited layers by measuring composition vs. paint costs. Thirdly, we’re using GPU friendly animations namely Opacity and Transform transitions.

Profiling on various mobile devices using Chrome Dev Tools Timeline panel and Chrome Tracing, helped us identify multiple bottlenecks and optimization paths. This helped us make the best of each frame during an animation. We are continuously striving to achieve 60fps animations and interactions. We use the RAIL model for our performance budgets and strive to match and exceed expectations on each of the metrics.

All of this put together, manifested into a stellar experience for our users. It’s been a remarkable journey building this web app, working with browser vendors and pushing the limits of web platform on mobile. Over the coming weeks, we plan to roll out more detailed posts that deep-dive into the technical architectures, patterns and most importantly the lessons learned.

We believe more browser companies and developers will start thinking in these lines and make web apps even better. The web is truly what you make of it, and we have only just begun.



Last but not the least,  meet the Flipkart Lite Team  that did the magic— Abhinav Rastogi, Aditya Punjani, Boopathi Raja, Jai Santhosh, Abinash Mohapatra, Nagaraju Epuri, Karan Peri, Bharat KS, Baisampayan Saha, Mohammed Bilal, Ayesha Rana, Suvonil Chatterjee, Akshay Rajwade.


(wish everyone was in the pic)

Gearing up for #TheBigBillionDays

The Big Billion Days are back and we, at Flipkart are super excited to bring you a bigger, better shopping experience on the App.
We’ve got a host of cool new features that are sure to make retail therapy even more fun this time around. For starters, we wanted to make shopping a collaborative experience and bring back some of the real world feel into the App. Now just like a day out at the mall, you can shop with your friends and family, no matter where they are using Ping.
Last year, we noticed shoppers pre-build carts ahead of the Big Billion Day and then go so far as to share login details and passwords with their friends and family. Now with Ping, you can share your wishlist or cart directly and get opinions from the people who really matter to you.

Ping’s easy drag and drop feature lets you share products and offers with the people who really matter to you. We’ve gone one step ahead and also let you take and share photos within Ping.

For example: Share a picture of a shirt you’re wearing and ask for help in finding a pair of shoes that match. You can even share share your screen using Ping, and show your friends all the products you are considering buying.

In sync with our promise of a great shopping experience on the App, we’re thrilled to have on board the “Image Search” feature. Primarily designed for the fashion and furnishings category, this feature replicates the offline shopping experience of “show me more like this” as it helps shoppers find visually similar products to things they like.

For big sale days like the BBD, the feature helps you save time browsing for products if you already have something specific in mind. All one has to do is click a picture or use a photo on your phone to find fashion products of the same colour, pattern or style on Flipkart.

For example: Describing a shoe or dress is often fuzzy and not easy when shopping online. Image search eliminates the need for defining a product, as it shows up products that are visual matches to what you have in mind.  We tested it on a bunch of tech bloggers earlier this year and found some very happy shoppers!

But that’s not all

The 2015 edition of #TheBigBillionDays will also see revamped cart and checkout features to ensure a smoother, faster checkout experience at the time of purchase.

We’ve recently migrated to a new data centre in a bid to enhance the Flipkart shopping experience. Last year saw reports of huge fluctuations in orders and inability to deal with traffic but with the biggest server farm in the country that has been built in house in just a few months, the Flipkart App now promises a more stable and reliable shopping experience on the App. 

We’re thrilled to bring you all these awesome features in time for #TheBigBillionDays. Download the Flipkart App (if you don’t have it already!) and get set to shop till you drop from the 13th to the 17th of October.

Mobile Number Login


At Flipkart, we always endeavor to create a personalised shopping experience for our customers and are constantly looking at ways to make all your interactions with us easy, safe and secure. We understand the importance of creating a verified communication preference in order to keep your shopping history, orders and other account related information accessible by only you. Our research indicates  a paradigm shift to m-commerce, a customer’s identity is strongly related to their mobile number.

With the above tenets in mind, we redesigned the signup and login experience to be more simpler and safer so that customers can do their shopping as quickly as possible.  You may  have noticed this recent change on the Flipkart mobile application which now allows you to log-in to Flipkart with your mobile number. We hope you love the change.

mobile login

It’s easier

Now you just need your 10 digit mobile number to sign up. We hope you remember your phone number though :) Even if you don’t, our Android app might detect it automatically. It would also auto-fill the One Time Password (OTP) to verify your number. Based on the detected identifiers, we navigate our users to signup or login screen with a pre-filled identifier to ease the login.

Not only sign up, it is also easier to recover or reset your password with mobile number. You no longer need to get into  your email and click on reset link. Just enter your number and tap on forgot password. Our android app will auto-detect and verify the OTP sent to your phone number and you are ready to reset your password. In order to enable our existing customers to use mobile number, we solicit their mobile numbers through prompts. Once you verify the number, you can use it in subsequent logins across platforms.

It’s secure

When using your personal mobile number to sign up, you would notice the  2 step verification, which is an extra layer of security.  We will automatically verify your phone number and password to completely secure your account.

It’s for everyone

Millions of Indians are joining the internet revolution. A lot of them don’t use email. Now, all of them can savour the Flipkart experience.

We are very happy to see so many consumers have already used this new signup and login experience, and have provided positive feedback.  Please continue to provide your feedback  so that we can keep working hard to make your experience even better.



Rahul Koul and Raman Arora



Is Data Locality Always Out Of The Box in Hadoop ? Not Really !


Hadoop optimizes on Data Locality: Moving compute to data is cheaper than moving data to compute. It can schedule jobs to nodes that are local for input stream, which results in high performance. But, is this really out of the box ? Not always !  This blog explains couple of data locality issues that we identified and fixed.

User Insights Team

Flipkart collects terabytes of user data like browse, search, orders etc. User Insights Team uses these signals and derives useful insights about the user. These can be broadly classified as one of the following

  • Direct Insights Directly derived from user data. e.g. Location,
  • Descriptive Insights Describes the user, his preferences, likes/dislikes, habits, etc. e.g. Brand Affinities, Product Category Affinities, Life-stage
  • Predictive Insights Predict the next behavior or event of a user. e.g. Probability of purchase in session, cancellation/return.

These insights are used to personalize the user experience, give better recommendations and target relevant advertisements.

User Insights
User Insights

HBase as the data store for User Insights

The data collected is stored distributively and processed on Hadoop Cluster. HBase is used for fast lookup of user signals during generation of insights.

Multiple mappers hitting the same HBase region server

At times, full HBase table scans are required. During these scans some mappers finish in less than 10 minutes where as many take hours to finish for approximately same size of HBase regions. We picked the regions which were taking hours to finish and started running them one by one, to our surprise they also finished in under 10 minutes. But when ran with some other regions on the same region server they were slowing down. We started profiling the nodes where the jobs were running very slow and saw that CPU, Memory were fine, but network was choking. We used network profiling tools like nload and iftop to look at the network usage. This showed us that the outbound network traffic was maxing on the bandwidth of the network card and slowing down outbound network transfers. iftop showed us that most of the outbound traffic was from the HBase region server process.

We suspected that the way the input data was split and maps were launched caused too many mappers to hit the same region server. Digging into HBase code which creates the TableInputFormat having TableSplits for HBase scan jobs, found that if the scan range has consecutive regions belonging to the same region server, it will return them in a sequential order and mappers will be launched and allocated in the same order. This will cause all the mappers to hit the same region server and choke the network bandwidth. We wrote the below RoundRobinTableInputFormat which tries to order the splits in a round robin way for each of the used region servers, thereby reducing the probability of the having lot of mappers hitting the same region server.

Gist Link:

YARN Fair Scheduler and Data Locality

Even when the cluster was free, we observed that we had 5-10% of data locality for a job. Something seemed suspicious and we started looking at how data locality happens in YARN. Client submits job to YARN Resource Manager(RM), RM allocates a container for running the Application Master(AM) on one of the cluster node. Data to be processed is split into InputSplit for each mapper task. InputSplit has information on where the data is located for the split task. AM asks RM to provide containers for running the mapper tasks preferably on the nodes where the data is located. RM uses the scheduler config to figure out which level of data locality has to be offered to the container request. RM allocates Containers to run on Node Manager(NM) and AM executes the mapper tasks in the allocated containers.

YARN Resource Manager
YARN Resource Manager

We are using FairScheduler in our YARN setup. FairScheduler has configs yarn.scheduler.fair.locality.threshold.node and yarn.scheduler.fair.locality.threshold.rack to tweak data locality. On setting these values to max data locality(=1), we were not able to observe any change in data locality tasks. Digging deeper in to the scheduler code, we found that these values will have no effect without setting the undocumented configs: yarn.scheduler.fair.locality-delay-node-ms and yarn.scheduler.fair.locality-delay-rack-ms. Even on setting these values, not even a single mapper was going to the data local node. Looking further into the scheduler code and container assignment code, figured out that the data local requests will be allocated if the available YARN Node Manager node name and Application Master requesting container for data on a given node name matches. Looking at the YARN node manager console saw that hosts were added with non-FQDN(Fully Qualified Domain Name, eg: hadoop01) and HBase region servers were returning data locality region server names with FQDN( These hosts have non FQDN names in /etc/hostname and kernel(can be seen by `hostname` command) . After updating the hostname to FQDN and running the job again, we were able to get around 97.6% data locality and job’s runtime got reduced by 37% ! Looking for filing a bug on YARN, found that there is one bug already filed and fixed in version YARN 2.1.0. If anyone is using version of YARN less than 2.1.0, should set the hostname to FQDN for getting better data locality.

The above were two instances that cautions us about data locality assumptions in Hadoop and is worth addressing by teams that run large Hadoop based MR workloads.

Foxtrot – Event Analytics at Scale

Health and monitoring of actors participating in a SOA based distributed system like the one we have at Flipkart is critical for production stability. When things go south (and they do), not only do we need to know about the occurrence instantly, but get down to exactly when  and where the problem has happened. This points to the need for us to get access to and act on aggregated metrics as well as raw events that can be sliced and diced at will. With server side apps spanning hundreds of virtual machines and client side apps running on millions of diverse phones this gets amplified. Individual requests need to be tracked across systems, and issues found and reported much before the customers get impacted.

Introducing Foxtrot

Foxtrot is a system built to ingest, store and analyze rich events from multiple systems, providing aggregated metrics as well as the ability to slice, dice and drill down into raw events.

Foxtrot in action

Foxtrot provides a configurable console to get a view of systems; enables people to dig into as well as aggregate raw events across hosts and/or client devices; and provides a familiar SQL based language to interact with the system both from the console as well as from the command line.

A case for event analytics at scale

Let’s consider a few real-life situations:
  • Each day, hundreds of thousands of people place orders at Flipkart through the Checkout system. The system interacts with a bunch of downstream services to make this possible. Any of these services failing in such situations degrade the buying experience considerably. We strive to be on top of such situations and proactively monitor service call rates and response times and quickly stop, debug and re-enable access to the degraded services.
  • Mobile apps need to be faster and efficient both in terms of memory and power. Interestingly, we have very little insight into what is actually happening on the customer’s device as they are interacting with the apps. Getting proper data about how our app affects these devices helps us make our app, faster lighter and more power efficient.
  • Push notifications bring important snippets of information and great offers to millions of users of our apps. It’s of critical importance for us to know what happens to these messages, and control the quality and frequency of these notifications to users.
  • Book publishers push data like book names, authors to us. We process and publish this data to be shown on the apps and the website. We need to monitor these processing systems to ensure things are working properly and quickly catch and fix any problems.

There and many more use-cases currently being satisfied by Foxtrot at Flipkart and Olacabs.

Basic Abstractions

Foxtrot works on the following very simple abstractions:

  • Table – The basic multi-tenancy container used to logically separate data coming in from different systems. So your Foxtrot system might have tables for website, checkout, oms, apps etc. A table in Foxtrot has a TTL and the data does not remain queriable from FQL, Console and JSON queries (see below) once the TTL expires. Data for the table is saved in Hbase and can be used independently by using M/R jobs on hadoop depending on the expiry time specified for the column-family in the table.
  • Document – Most granular data/event representation in Foxtrot. A document is composed of:
    • id – A unique id for the document. This must be provided.
    • timestamp – A timestamp representing the time of generation of this event.
    • data – A json node with fields representing the metadata for the event.
//A sample document
    "id": "569b3782-255b-48d7-a53f-7897d605db0b",
   "timestamp": 1401650819000,
    "data": {
        "event": "APP_LOAD",
        "os": "android",
        "device": "XperiaZ",
        "appVersion": {
            "major": "1",
            "minor": "2"


Foxtrot has been designed grounds up for simplicity of use. The following sections detail how events can be pushed to Foxtrot and how they can be summarized, viewed and downloaded.

Pushing data to Foxtrot

Pushing data involves POSTing documents to the single or bulk ingestion API. For more details on ingestion APIs please consult the wiki.

Accessing data from Foxtrot

Once events are ingested into Foxtrot, they can be accessed through any of the many interfaces available.

Foxtrot Console (Builder)

Foxtrot in action

Foxtrot provides a console system with configurable widgets that can be backed by JSON queries. Once configured and saved, these consoles can be shared by url and accessed from the consoles dropdown.

Currently, the following widgets are supported:

  • Pie charts – Show comparative counts of different types for a given time period
  • Bar charts – Show comparative counts of different types for a given time period
  • Histogram – Show count of events bucketed by mins/hours/days over time period
  • Classified histogram – Show count of multiple type of events bucketed by mins/hours/days over time period
  • Numeric Stats histogram – Show stats like max, min, avg and percentiles over numeric fields  bucketed by mins/hours/days over time period
  • Tables – Auto-refreshing tabular data display based on FQL query (see below)

Each of these widgets come with customizable parameters like:

  • Time window of query
  • Filters
  • Whether to show a legend or not etc.

Foxtrot Query Language

A big user of these kinds  of systems  is our analysts, and they really love SQL. We, therefore, support a subset of SQL, that we call as FQL (obviously!!).

Example: The find the count of all app loads and app crashes in the last one hour grouped by operating system:

select * from test where eventType in ('APP_LOAD', 'APP_CRASH') and last('1h') group by eventType, os

| eventType | os      |   count |
| APP_CRASH | android |   38330 |
| APP_CRASH | ios     |    2888 |
| APP_LOAD  | android | 2749803 |
| APP_LOAD  | ios     |   35380 |

FQL queries can be run from the console as well by accessing the “FQL” tab.

Details about FQL can be found in the wiki.

CSV Downloads

Data can be downloaded from the system by POSTing a FQL query to the /fql/download endpoint.

Json Analytics Interface

This is the simplest and fundamental mode of access of data stored in Foxtrot. Foxtrot provides the /analytics endpoint that can be hit with simple json queries to access and analyze the data that has been ingested into Foxtrot.

Example: The find the count of all app loads crashes in the last one hour grouped by operating system and version:

    "opcode": "group",
    "table": "test",
    "filters": [
            "field": "event",
            "operator": "in",
            "value": "APP_LOAD"
            "operator" : "last",
            "duration" : "1h"
    "nesting": [
    "opcode": "group",
    "result": {
        "android": {
            "3.2": 2019,
            "4.3": 299430,
            "4.4": 100000032023
        "ios": {
            "6": 12281,
            "7": 23383773637

A list of all analytics and their usage can be found in the wiki.

Raw access

The raw data is available on HBase for access from map-reduce jobs.

Technical Background

Foxtrot was designed and built for scale, with optimizations in both code and query paths to give the fastest possible experience to the user.


We set out to build Foxtrot with the following basic requirements:
  • Rapid drill down into events with segmentation on arbitrary fields spanning over a reasonable number of days – This is a big one. Basically that, analysts and users will be able to slice and dice the data based on arbitrary fields in the data. This facility will be provided over data for a reasonable amount of time.
  • Fast ingestion into the system. We absolutely cannot slow down the client system due to it’s wanting to push data to us.
  • Metrics on systems and how they are performing in near real-time. Derived from above, basically aggregations over the events based on arbitrary criteria.
  • Multi-tenancy of clients pushing data to it. Clean abstractions regarding multi-tenancy right down to the storage layers both for query and raw stores, so that multiple-teams can push and view data without them having to worry about stability and/or their data getting corrupted by event stream from other systems
  • Consoles that are easy to build and  share. We needed to have a re-configurable console that teams could customize and share with each other. The console should not buckle under the pressure of millions of events getting ingested and many people having such consoles open to view real-time data on the same.
  • SQL interface to run queries on the data. Our analysts requested us to provide a SQL-ish interface to the system so that they can easily access and analyze the data without getting into the JSON DSL voodoo.
  • CSV downloads of recent data. This, again, was a request from analysts, as they wanted access to the raw events and aggregated results as CSV.
  • Extensibility of the system. We understood that teams will keep on requesting newer analytics and we would not be able to ship everything together at start. We had to plan to support quick development of newer analytics functions without having to touch a lot of the code.

Technology Used

Foxtrot unifies and abstracts a bunch of complex and proven technologies to bring a simple user experience in terms of event aggregation and metrics. Foxtrot uses the following tech stack:

  • HBase – Used as a simple key value store and saves data to be served out for queries and for usage as raw store in long-term map-reduce batch jobs.
  • Elasticsearch – Used as the primary query index. All fields of an event are indexed for the stipulated time and TTLs out. It does not store any document, but row keys only. We provide an optimized mapping in the source tree to the max performance out of elasticsearch.
  • Hazelcast – Used as a distributed caching layer till now, and caches query results for 30 secs. Cache keys are time-dependent and changes with the passage of time. Older data gets TTLd out.
  • SQL parser – For parsing FQL and converting them into Foxtrot JSON queries
  • Bootstrap, Jquery, HTML5, CSS3 –  Used to build the console


The Foxtrot system uses a bunch of battle tested technologies to provide a robust and scalable system that meets the various requirements for  teams and allows for development for more types of analytics function on them.

Foxtrot Architecture

  • During ingestion, data is written to Elasticsearch quorum and HBase tables. Writes succeed only when events are written to both stores. HBase is written to before Elasticsearch. Data written to HBase will not become discoverable till it is indexed in Elasticsearch.
  • During query, the following steps are performed:
    • Translate FQL to Foxtrot JSON/native query format
    • Create cache key from the query.
    • Return results if found for this cache key.
    • Otherwise, figure out the time window for the analytics
    • Translate Foxtrot filters to elasticsearch filters
    • Execute the query on Elasticsearch and HBase and cache the results on Hazelcast if the action is cacheable and of reasonable size.


  • We try to avoid any heavy lifting during document ingestion. We recommend and internally use the /bulk API for event ingestion.
  • We have built in support for discovery of Foxtrot nodes using the /cluster/members API. We use this to distribute the indexing requests onto these different hosts, rather than sending data through a single load-balancer endpoint. We have a Java Client Library that can be embedded and used in client applications for this purpose. The Java client supports both on disk-queue based and async/direct senders for events.
  • We optimize the queries for every analytics by controlling the type of elasticsearch queries and filters being generated. Newer analytics can be easily added to the system by implementing the necessary abstract classes and with proper annotation. The server scans the classpath to pick up newer annotations, so it’s not mandatory for the analytics to be a part of the Foxtrot source tree.
  • In the runtime, the time window is detected and queries routed to indexes that are relevant to the query period. For efficiency reasons always use a time based filter in your queries. Most analytics will automatically add a filter to limit the query. Details for this can be found in individual analytics wiki pages.
  • We have added a simple console (Cluster tab of the console) with relevant parameters that can be used to monitor the health of the elasticsearch cluster.


Foxtrot has been in production for the better part of the year now at Flipkart and has ingested and provided analytics over hundreds of millions of events a day ranging over terabytes of data. Quite a few of our teams depend on this system for their application level metrics, as well as debugging and tracing  requirements.
Foxtrot is being released with Apache 2 License on github.
The source, wiki and issue tracker for Foxtrot is available at: Foxtrot github repository.
We look forward to you taking a look at the project and actively use and contribute to it. Please drop us a note if you find Foxtrot useful. Use github issues to file issues, feature requests. Use pull requests to send us code.
Special thanks to the Mobile API, checkout and notifications team for the support. Many thanks to Regu (@RegunathB) for providing some very important and much needed feedback on the usability of the system and the documentation as a whole and for guiding us meticulously through the process of open-sourcing this.

Aesop Change Propagation System

We have open sourced Aesop, the Flipkart Change Propagation system following the announcement of intent at slash n – the Flipkart tech conference. This post describes use of Aesop in one of our production systems.


PayZippy is an online payment solution launched by Flipkart. It is a Safe & Easy payment solution by Flipkart for e-commerce users and online merchants.

Need for change event propagation

It is fairly common for products and services to use Relational Databases to store business critical data. These data stores therefore become source of truth for such data.
However Relational Databases may not scale well for all kinds of use cases. Use cases include analytics, reporting, search indexing, etc that need this data in secondary data stores. Some of these secondary data stores are non-relational and are efficient at handling such use cases.There is then, a need, for a system to transfer data from the primary data store to these secondary data stores.
A number of tools and products in the ETL (Extract, Transform, and Load) space may be used to transfer data. However these are batched and do not transfer data in real time.
At PayZippy, data from these secondary stores is used to feed more than just business decisions. The data from these secondary data stores feeds into Real-Time use cases like Console, Fraud Detection and Monitoring Systems.
We therefore needed a system that can transfer data across data stores in real time.

Introducing Aesop

Aesop is a keen observer of changes that can also relay change events reliably to interested parties. It provides useful infrastructure for building Eventually Consistent data sources and systems.

Overall Architecture

The main components are the following :
Relay Server

  • Reads changes from Source Data Sources. Converts the changes to a Serializable form by using the schema registry. Stores the events in an internal buffer.
  • Listens to requests from Clients and transport events to the clients. Provides clients with only those events pertaining to sources and partitions for which the client has registered.

Relay Client

  • Single Client : It calls Relay Server to check for new events. On receipt of events from server it executes business logic – for e.g. writing to a destination data store. It checkpoints the SCN. If it falls off the relay, it connect to bootstrap for events. It reconnects back to Relay once it catches up.
  • Participant in Cluster Client : It calls Relay Server for events only for partitions assigned to the client. It process Cluster Status change events from Helix and acts accordingly. It checkpoints SCN to shared storage.


  • Bootstrap Producer : Similar to a Client. It checks for new data change events on relays and stores those events in a MySQL database. The MySQL database is used for bootstrap and catchup for clients.
  • Bootstrap Server : Similar to Relay Server. It listens for requests from Databus Clients and returns long look-back data change events for bootstrapping and catch up.

High Availability, Load Balancing and Scaling

High Availability, Load Balancing and Scaling is built into Aesop. Since PayZippy is relying on the data moved via Aesop for Real time use cases these were essential to the strategy.

Client Clustering

  • High Availability and Load Balancing on the clients is achieved by having partitioned clients participate in a cluster with each client handling a partition (part of the data set). The partitions are dynamically assigned via Helix. The default partition function partition function used to filter the events is MOD.
  • Partition reassignment when instances join or leave the cluster is automatic and the load (number of partitions assigned) is uniform across instances.
  • The filtering itself is performed at the Relay and Bootstrap. Checkpoint for partitions is stored in Zookeeper. Uses Partitioning functionality available in Linkedin databus.

Relay – HA

High Availability of Relay Server is achieved using one of the following approaches :
Multiple Relay Servers read from the Source Data Sources.

  • The Clients connect to Relay Server via a load balancer.
  • Since the requests from clients are over HTTP one of the Relay Servers or both can be serving the request based on the configuration in the load balancer.
  • When one Relay goes down, the other can still handle requests.

Relay Chaining with HA using Leader Follower model (Not yet implemented)

  • Relay Producer reads from another Relay. This Relay can act as normal Relay for Clients.
  • Relays have both producer, Source Producer and Relay Producers, but only one is active at a time. On Leader Relay the source producer is active. On follower relay the relay producer is active.
  • Leader/Follower election is done using Helix.

Refer Aesop Github Wiki for more information on Aesop architecture and design.

Aesop at PayZippy

Blocking Startup Bootstrap

We needed to transfer existing data from MySql to destination data stores. We assumed we could use the same Relay Server and Client mechanism by pointing the Relay to the first bin log.
However we faced the following issues:

  • The throughput of the client was dependant on that of the destination data store. This turned out to be far less than the Relay. The Relay was twice as fast as the client.
  • The clients would fall off the Relay. The buffer in the Relay does not help as the Relay throughput  is far higher than that of Client and hence the buffer fills up and starts overwriting even before the client has been able to pull the set of events.

Solution : Blocking Startup Bootstrap

  • Thin Producer : A producer similar to producer in Relay that can pull events from the Source or the events are pushed to it from the Source. Processing related to serialization is skipped.
  • Partitioned Consumers : Consumers consume only the particular partition in order and execute business logic. In this case it inserts into destination data store.


The consumer currently supported in PayZippy writes to a destination data source – for e.g. denormalized MySQL or HBase. The consumer calls Event Mapper and Transformer to get an Event pertaining to the schema of the destination data source. The transformed event is then written to Destination Data Source.

Monitoring and Alerting

Monitoring and alerting was important as the change propagation system is being used for Real time use cases. It is supported using:

  • JMX Monitor (Soon to be Open Sourced) connects to JVM’s running Aesop. Fetches metrics and publishes to StatsD/Graphite. It also raises alerts based on configured thresholds.
  • Skyline (Changes by Flipkart are not yet Open Sourced) is used to raise alerts based on algorithm configured.
  • Aesop Dashboard – The dashboard provides a real-time view of the change propagation pipeline


We ran a few benchmarks for Aesop on standard 4 core, 16GB virtual machines running 64-bit Debian Linux. The Aesop components were run on separate VM instances. Performance numbers from the benchmarks:

Relay Server : Handles 19 million events an hour per data source. 18.2 GB per hour.
Relay Client : Can process data at same speed as Server. However it is limited by the capability or speed of end database. For MySql Client we are able to insert/update events at the rate of 10 million events an hour.
Latency : The latency between source data source and destination data source is within 80-90 milliseconds.

Appendix :  Design Decisions

Dual Writes v/s Log Mining

Of the various ways of having data in multiple stores, two were evident.
Dual Writes : Application writes to destination data stores, synchronously or asynchronously. Application can also write to a Publisher-Subscriber system in which the Subscribers are consumers that eventually write to Destination Data stores

  • Pros : Appears Easy : Application can publish the same event that is being inserted/updated in the Primary Data Source.
  • Cons : Difficult to maintain consistency
    • Updates with non-primary-key where clause. For cases where there are bulk updates (Updates with non-primary-key where clause), the application would have to then have to fetch affected records and publish the changes.
    • Difficult to ensure all applications use the common data layer which publishes changes as well.
    • Manual changes in Primary Data Store will be missed.

Log Mining : Separate application/service can extract changes from Database commit logs and publish them. This would use the same approach used by database for replication.

  • Pros : Consistency can be guaranteed as changes are being read from commit logs (bin log in case of MySql).
  • Cons
    • Appears tough – But definitely possible.
    • Tied to mechanism used by database for replication. Tied to commit log format, etc … Tightly coupled approach.

Since Consistency across Datastores is of paramount importance to a financial system like PayZippy we chose the Log Mining approach.

Approaches to Log Mining – Bin Log Parser vs Storage Engine

MySql Bin Log Parsing

  • Pros : Familiar approach
    • Open source softwares were available that parsed MySql bin logs. Open Replicator and Tungsten Replicator
  • Cons
    • If format of bin logs changes the parser would have to change.
    • Open Replicator was supporting MySql version 5.5. We would have to modify Open Replicator to support MySql v5.6 and checksum feature introduced in v5.6.

Custom Storage Engine

  • Pros : Independent of binlog format. Layers above Storage Engine take care of parsing.
  • Cons : Unfamiliar approach. Unknown pitfalls.

We decided to go with known pitfalls and picked Bin Log Parsing approach.

Palette Image Generation – Technology to the Rescue

In this blog post, I’ll describe a new approach to palette image generation for products sold on Flipkart.

Palette images on

Palette images are the small colored boxes that represent the color of a product. You can see these in small groups below the product image on a browse page or right of the product image on a product page that look like these

The shades of the color in these images are the colors from the actual product.

For verticals (product groups) like apparels, shoes, bags, etc. we need to inform the customer that alternate colors are available for the particular product. Palette images are an efficient way of delivering this information visually. Instead of reading an alternate product color as Fushcia and wonder what color it could be, the palette shows it as a shade of purple.

Problem Statement

Given a image for a product on Flipkart, we should be able to generate the palette image for it automatically. This leads to a question – how was it being done and is automating the process even required?

So, how did it work so far?

It was being handled manually. We have a console where the product image is displayed and one has to select to a tiny section of the image that would act as a palette. We capture the coordinates of the selection, product ID and image url and and put this information in a queue. The queue consumer would then crop the required section from the image and process it.

The need for automation?

The effort of generating palette images manually increases linearly with the number of products, hence it’s not scalable. On a given day we have about 2500 different products being shot (which converts to around 4k images). Generating the palettes manually from the UI takes significant time.

To top that, we now get images from marketplace sellers. None of them have palette images and we need to generate the palettes for them as well.

Given that lifestyle products are seasonal these appear in short unexpected bursts, making planning for this activity difficult.

How can this be automated?

We brainstormed through several existing techniques and why they didn’t fit the bill.

  • There are a quite a few algorithms which can be used to generate a color palette out of an image, most of them involve some form of color quantization. This paper describes one such way. What these would give you us is something like this .


All the quantization-based algorithms generated palettes of the entire image. As it turned out, the background color dominated the palettes in most cases. But, we are not interested in the palette of the entire image, only the product inside the image!

  • That seemed like a solvable problem. Remove the background from the image and then use the technique from step 1. This too did not work out.                                Even if the background is removed, we are still left with an image that contains the model. Also a product image will usually have the model wearing some other kind of clothing to give a better sense of the product. Now, we not only have to recognise and remove the hair, eyes and skin tone from the image but also need to recognise the product from other products in the image!                                                           For example, the product here is a scarf. The background, hair, skin tone, earrings, t-shirt, jean, wrist-band all need to be recognised and discarded for us to even get to the relevant portion of the image!

  •  Even if we manage to select the scarf from some combination of AI and image processing, we had other problems.

New Approach

The importance and increased scale of the problem made us take a step back and have a re-look. We eventually decided that this was a difficult problem to be solved just from image processing POV and we really did not need to do it that way.

Apart from the product image itself, is there any other information that can be leveraged? As it turns out, every product in the catalog has a color property that can be used. Given a product ID, the product catalog system’s API returns the colors of the product.

For example, a shirt can have the color property as red and blue. These properties by themselves alone are not very useful, however combined with the product images it can be very useful. The colors themselves cannot be directly used in the palette because the color red in various products will be  of different different shades and the palettes need to show a shade which is in the product image.

The approach we settled on:

To start with, all the colors from the product image have equal probabilities of being the palette color of the product. Then, we get the color properties from the backend.

The probability of any color from the image being the palette color will slide up or down depending on the closeness of the color from the color property. The closer the colors are, higher the probability of it being the palette color.

What I’ll do here is list the steps along with the commands so that you can run the same commands and try the output. I’ve used imagemagick and im4java library for image processing and Google’s Catalano library for calculations.

  • Color quantise an image to reduce the number of unique colors. Before and after color quantisation an image looks like this

There are various methods of color quantization and you can use any of them. For this example I’ve used the following command:

convert nonquantized.jpeg -colors 16 -depth 8 quantized.jpeg

  • Generate a histogram of the image and find the RGB values of the top 10 colors from the image. We make two assumptions now
  • the color(s) in the palette is amongst these top 10 colors.
  • the top colors now have an equal probability of being the palette color.

The Command for this is:

convert quantized.jpeg -colors 16 -depth 8 -format “%c” histogram:info: |sort -nr  |head -n8

and the output is :


  • For the product, get the color properties of the image and convert those colors to their respective RGB values. For the above product, the color properties are grey and pink. The respective RGB values are #808080 and #ffc0cb.
  • Now we need to find the closeness of each of the top 8 colors to the each of color property. Unfortunately, calculating the euclidean distance between two RGB colors does not correlate with human-perceived distance between two colors. We need to switch to a color space this is perceptually uniform. For our case, we’ve decided to use CIELAB color space and DeltaE formula (Refer to the first answer on this SO thread for more info ). 

The below figures illustrate the difference between RGB color space vs CIE Lab color space (images)




  • For each color in the color property, select the color from top 10 colors which has the least distance. That color has the highest probability of being the palette color.
  • Combine all the least distance color for each of the color properties and generate a palette image out of it. Output palette for the above example


Disadvantages of this approach:

  • Reliance on correctness of this approach : If the color properties stored in the backend are not accurate, then this solution goes for a toss.
  • Each color property is mapped to a specific RGB value, and in some cases it might need to be manually updated (peacock, navy, skin, etc). Some of them cannot be mapped to RGB color (multicolor etc).  But, this mapping is a one time process that seems good enough.

Advantages of this approach

  • It is several magnitudes faster than the manual approach.
  • The size of the palette image generated using this method is smaller than the one generated by cropping the image. This is because it does not have any color gradients and can be efficiently compressed. For this product, the left palette image is live on site and is manually cropped and the second is auto generated. The manually generated one is 529 bytes and the auto generated one is 301 bytes. This gives an average saving of almost 50% across all images.

239794316green-cherokee-youth-m-400x400-imaddgtgkgg5cjyt.jpeg 239794316green-cherokee-youth-m-40x40-imade6awthyxg3hv.jpeg TSHDBZGE4VXZSK8D_1_palette_converted.jpeg

  • The auto generated palette is guaranteed to have 1:1 aspect ratio whereas for the manually generated one it depends on the cropping, ensuring pixel-level consistency across the site.
  • It generates better palette image for products that have fine design all over the product like the below image.

Example1 :


Example 2:


  • It generates better palettes for multicolored products, which otherwise would have needed to crop the image at the intersection of colors which can prove tricky like below

            (observe carefully to see the top ⅓rd is white)

  • Seasonal distribution of products does not impact it all. Even if there are 500K new products generated in a day, we do not have to wait for weeks for someone to sit and generate these images.

The feature is now live in production in a pilot phase  and seems to work with significant efficiency. Questions welcome!

slash n: intelligence @ scale

Let me start this blog with a note of thanks – to you, the engineer! Whether it’s an engineer in San Francisco who open sourced a framework, or an engineer in Beijing who came up with a new technique, or an engineer in Bangalore whose commits increased Hadoop’s performance by 10% – its your work that allows us to focus on solving our problems than being distracted by solving everything under the sun. Flipkart wouldn’t have been here without the support of open source tech community worldwide.

For us, slash n, our flagship technology event, is a celebration of that liberating engineering philosophy of sharing. Building India’s largest e-commerce platform, we’ve learnt a thing or two, created a few pieces of technology, and figured out what techniques/processes work and what don’t. We are committed to share our learnings and to open up our technologies and platforms to the community. slash n is a forum for us to do so and also the forum to learn from the experience of others in the industry, who share the same philosophy about technology.

On Mar 7th, we had the 2nd edition of slash n, and what a day it was! Over 500 engineers participated in the event, out of which half were external invitees. That’s more than double the number of participants we had last year. Considering it was still an invite only event, the interest, enthusiasm and participation shown by the tech community inside and outside of Flipkart was beyond expectations. Unlike last year, when we had most of the talks by Flipkart engineers, this year more than half of the talks were by external speakers from diverse set of organizations. Here are some highlights of the day:

  • The day started with Amod, head of technology at Flipkart, reinforcing the importance of sharing in the technology community, and slash n as our means of doing that. He also committed to open sourcing a few key technology elements at Flipkart over the next 2 quarters, in particular RestBus (messaging system for transactionally consistent service interactions), Aesop (change propagation system) and some of our mobile app technologies.
  • In his keynote, Sachin, founder and CEO of Flipkart, outlined the journey of eCommerce in India, the significant problems that got solved, what challenges lie ahead and how technology can address those challenges in future.
  • This was followed by two talks from very different areas – molecular biology and mobile ad network. Ramesh from Strand Life Sciences talked about how they are using advances in computing to make genetic disease discovery available at sustainable cost to everyone. Rajiv from InMobi talked about their efforts around mining large amount of user data to provide better mobile ads. It was interesting to note that uniquely identifying users across devices, apps & browsers remains the holy grail of personalization on the internet.
  • Some of the most popular talks (based on how many people added them to their schedule using slash n mobile app) included
    • Art of Promising by Yogi & Vikas from Flipkart, which pealed the layers off how Flipkart makes and keeps the availability and SLA promise to customers.
    • Soothsayer @ Flipkart by Ananda and Mohit, which talked about internals of Flipkart’s demand forecasting and inventory planning system.
    • Cataloging talk by Utkarsh and Abhishek from Flipkart, which talked about evolution of our catalog from books to 30+ M items today and how the team addressed the issues around scale, availability and agility along the way.
    • Job Scheduling in Hadoop by Joydeep from Qubole, which provided details on issues around Hadoop job scheduling as well as his experience of building Fair Scheduler and Corona Job Scheduler.
    • Participants loved the newly introduced fire talks – 15-minute quick discussion on a very focused tech topic.
    • Another highlight of the day was a panel discussion on hope, hype and reality of big data analytics, which saw healthy debate among data scientists from diverse organizations like Flipkart, IBM, Xurmo, UIDAI and Mayin, which are trying to use big data analytics to solve problems in different domains.
    • Twitter was abuzz throughout the day with participants tweeting their learnings, questions, discoveries at #slashn.

The atmosphere in the event was quite electric with interesting talks and engaging debates, which often continued between speakers and participants beyond the talk.

IISc Bangalore was the venue of slash n Keynote address by Sachin Bansal

Guest talk by Ramesh Hariharan from Strand Life Sciences Panel discussion on big data analytics

The focus for this year’s event was on ‘Intelligence @ Scale’. The theme encapsulates what we are trying to do from technology perspective at Flipkart and the effort was to share our learnings in this direction and to learn from others. We believe that large scale can become a strategic differentiator if we can use it to make the life of users better continuously. And this can happen when large amount of data generated by the user activities can be used, in real time, to make user experience better via systems that are learning continuously from each user interaction. slash n saw engineers from diverse fields like eCommerce, molecular biology, education, social sciences, mobile ad network, cloud infrastructure, etc. talk about their approaches to build ‘intelligence @ scale’ in their respective domains.

Tech bonding at scale! Full room means good engagement

The day has re-established our core belief that knowledge and technology is meant to be shared and doing so can create virtuous cycle of innovation and progress not only for us but also for the entire ecosystem. I hope everyone who participated had some key takeaways in terms of learning (and a few more connections on your favorite social network) and those who could not, can still watch the recording of all the talks on the event website

We would like slash n to evolve into a more democratic and open platform to share knowledge and possibly to collaborate on building technologies for future. See you all at next year’s event – let’s celebrate the freedom at a grander scale and collaborate more deeply to solve problems that matter.


This article is in two parts. The first is the announcement of HostDB, a new tool to help manage data center inventory and write applications around it. The second part of the article is a bit of a back story about why we needed this tool and all the grimy details you need to know. If you’re the impatient type, you can read the announcement and description of HostDB and skip the rest.

Announcing HostDB

Today, we’re releasing HostDB as an open source project on github. HostDB is our attempt to solve the problem of finding hosts and their purposes in a large environment. HostDB acts as a Single source of truth about all Physical and Virtual servers and is used to define their purpose. It helps us group our servers through tags and all the software written by the operations team revolves around HostDB. HostDB acts as the centralized configuration store for all sorts of information.


Any Host that exists is created inside HostDB upon birth and has information about itself in YAML. This info can be Hardware info, amount of CPU/RAM or Network FQDN, IP address, Rack, Switch, Physical location or Function, e.g what application software the host needs? The YAML can contain just about anything, it can be varied across hosts and can be ever evolving.


Hosts are grouped together with tags, similar to a host – a tag also has information about itself in YAML This information is applied to all hosts which are members of a tag. e.g a tag called VM can be applied to all virtual machines and can be used to define properties that are shared by all VMs. To be useful a tag must have member hosts.


HostDB provides versioning and the ability to roll back to a previous version for each and every host or tag.


The above concepts may look simple, but, can be used in extremely powerful ways with an API. and that’s exactly what HostDB provides. HostDB provides a REST api which can be used to create hosts, get members of a particular tag etc. We use this feature to automate just about everything at flipkart. Creating Virtual hosts, creating DNS records, automatic monitoring and escalations and building automated infrastructures. Services can do a periodic lookups to Hostdb and keep updating themselves of changes.


User Interfaces

HostDB provides a variety of user interfaces which can be used to interact with HostDB by applications and users.

  • Web Application
  • Rest API
  • Command Line Interface
  • HostDB::Client Perl Module

Some example use of HostDB

DC HostMap

We use HostDB to create a dynamic map of hosts in our datacenters, which is used by the siteops team to easily map the physical location of a machine inside the data center to it’s hostname or to find which physical machine a virtual machine resides in. The map is a view of the details in hostDB using the API. It changes as hosts are created/destroyed or Virtual machines move from one physical to another. This also gives visibility to everyone and devops, developers and Siteops can all talk the same language. when talking about a host.


DC Map
DC Map


Kloud is both the name of Flipkart’s private cloud as well as the Virtualization software that drives it.


When a machine is created, Kloud creates a HostDB entry for a host. Many services, like DNS, Monitoring, Puppet, FAI have clients of HostDB that actively listen for new member addition to their specific tags, once they find a new host, they spring into action and perform tasks that are required to create the machine. e.g The puppet hostDB client adds a nodes entry for the host. The dns client creates an entry in the zone file. The monitoring client adds a nagios entry. so on and so forth.

HostDB is now available on github, so go fork it, make some changes and let us know what you think! We think it’s written in a way that will allow it to be used outside of our specific use case, and it might enable some fun projects.


HostDB: The Details

Sometime in 2011, While flipkart was growing at an exponential scale and the entire operations team consisted of four people. We constantly struggled to allocate hardware and to make it production ready. Cloning machines using FAI, adding monitoring, adding dns entries etc were all routine pre-defined tasks which we felt could be automated very easily. We were tracking all of this with a shared google spreadsheet which was never kept up to date. Many a times existing machines were allocated twice, or more disastrously,  re-cloned by mistake. Surely, there was a better way.

At the same time we were also thinking about a virtualization strategy, the open source options that were available at the time did not make any waves for us. So we decided to write our own, something we call Kloud. It was in these discussions that we thought about the life cycle of a machine and how instead of a centralised datastore keeping machine info, we really needed an application which could talk to other applications about the purpose of a host and it’s properties.

We looked at all the available options and were disappointed. We decided to write something which was not just a source of truth about a host, but interacted with the production environment and lived with it. Because we wrote it to automate infrastructure problems, host state and host properties comes naturally to HostDB. We kept availability, scalability and reliability as the most important features of HostDB. As It turns out HostDB scales wonderfully for thousands of clients.

Since we were such a small team, we were constantly involved in firefights and didn’t have any time to manage new services. We didn’t want to write something that depended on external services like zookeeper or mysql etc and decided to keep all the data in text files, “if it didn’t scale, we’ll change it later” was the policy.  We also wanted to future proof it and so stayed away from any complex file formats. The first prototype was written in Python by Abhishek. Both Krishnan and I refuse to read Python and Krishnan rewrote the whole thing in Perl one night. A year later another rewrite done by Jain Jonny incorporated the concept of multiple namespaces and made the code much more modular. We’ve been running this version in production for over a year now.


HostDB: Key/Value Store with a difference

HostDB is a key-value store, keys are grouped into namespaces based on type. ‘hosts’ is a special namespace which is central to all other namespaces. Keys inside ‘hosts’ are server names and their values contains mostly information and configuration details needed by server itself or applications it runs.

Applications can create unique keys(tags) which are application specific. You can add servers as ‘members’ of these keys which in turn helps you to consider your key(tag) as a group of hosts.

HostDB provides namespaces e.g you can create  keys(tags) that exist in a specific namespace and are access controlled, only member applications can read/write to keys of this namespace.  One can create access controls for each namespace or even for each key.

HostDB uses plain text files as its storage. The namespaces are represented as directories and keys are files inside the namespace directory. These files contain a key’s config in YAML.

The ‘members’ are also stored in files in a subdirectory in the namespace’s directory. The access permissions on the namespaces are also stored in a subdirectory.


The complete file structure of HostDB is in a git repository and git handles the versioning and transactions for HostDB. leveraging git means that we have a simple transactional store, which offers history as well as versioning. We can go to a previous version of a host or tag config at any point in time.

Web based Interface

HostDB provides a Web based interface for uses to interact. Here are some screenshots:

Command Line tool

HostDB provides a command line tool that is extremely helpful in writing those small bash one liners to get information out fast. Want to find out all machines with 24 GB of ram which are part of the search cluster. no problem!

Add an object:
$ hostdb add hosts/
Add host to tag:
$ hostdb add -m "adding new member" tags/nm-prod/members/
Get host IP: 
$ hostdb get hosts/
Get tag members: 
$ hostdb get tags/nm-prod/members



There is a comprehensive list of all API functions HostDB provides. Look at the github page for details.

HostDB::Client PerI Module

We use Perl extensively and have a Perl Module that can be used by applications to interact with HostDB.  This module provides an object oriented interface over HostDB REST API.

use HostDB::Client;
my $hdb = HostDB::Client->new(\%options);
my $output = $hdb->get($id[, $revision, $raw]);
my $output = $hdb->revisions($id[, $limit]);

HostDB has been central to almost all software written by the devops at flipkart and has allowed us to scale exponentially without a fuss. We hope you find it useful too. HostDB is now available on github, so go fork it.

Hackday 4 – A Retrospective

Flipkart conducted the 4th iteration of its annual Hackday event on Sep12/13 2013 – and it was a grand success, to say the least! The event was unprecedented in terms of participation : we had 246 hackers who created a total of 96 hacks between them – and the actual number is probably higher – I personally know of several folks who were hacking away, and didn’t register or even showcase their hacks.

But more than just the numbers, it was the general buzz and excitement on the floor throughout both days, and the sheer quality of every single hack that astounded everyone present – and we’d like to share this with all of you.

The Event


The Flipkart Hackday is a 24-hour event where our Engineering Team is encouraged to “Think Outside the Box” and come up with innovative ideas, and then build a proof-of-concept of their idea. Post this, there is a presentation session where the best ideas are selected by a judging panel and rewarded with small prizes.

The event kicks off with a talk by Kiran Jonnalagadda, Founder of

The talk’s really entertaining and engaging, and everyone listens intently! Well, almost everyone … 

T-Shirts are distributed to all participants … 

… And there’s a mad scramble to get them!

The Organizing Team had come up with the 0x10 Commandments of Hacking …. cool stuff!

There’s plenty of awesome food … 

And Red Bull as well!

Which is consumed in copious amounts … 

And here’s the net result …. converting Red Bull into code!

There’s some serious hacking going on at this point … 

Some folks aren’t quite as serious yet =)

But the best part of the event was the camaraderie on display.

Some of the hacks are pretty hardcore … 

And some of them were completely off the beaten track, like this effort to resurrect the Flipkart Library!

The general atmosphere was so conducive to hacking, that even our guest speaker and judge Kiran got infected and started hacking away!

The sun sets, but the office is still packed with people tinkering away at their hacks.

We had made plenty of arrangements for the overnighters …

And people make best use of said arrangements, and start to get cozy!

Some folks try and catch a quick nap …

… while others have found alternative means to stay awake!

Morning glory, and people start winding up their hacks. The buzz begins to pick up again, as folks walk around and see what everyone else has been up to.

The first demo session begins, it’s complete madness as everyone tries to catch the judges’ attention!

The competition is so tight that some teams decide that some marketing is in order … 

The demo session has concluded, and it’s voting time! It’s especially hard for the judges, as their votes count for more, and there are so many awesome hacks to choose from!

Voting’s done *phew*

Everyone’s eagerly awaiting the results of the vote … 

For some people, it’s all too much!

Results are out! The top 15 hacks are shortlisted for the final presentation session.

Each of the Top 15 teams gets 5 minutes on stage to present their hacks to the assembled audience.

The audience listens intently … 

But the last 24 hours have completely drained some of our hackers!

Presentation session’s done! While the points are being tallied up, MC Sourav gets some alone time with the trophies … 

And the winners are announced! Congratulations!

The Judges

Our judging panel consisted of  :


(From L-R)

Saran Chatterjee (Vice President-Products @

Sourav Sachin (Director-Engineering @

Kiran Jonnalagadda (Founder @

These gentlemen had the unenviable task of selecting the best hacks amongst a veritable sea of great hacks – all three judges mentioned just how difficult this was, because there were so many awesome hacks on display! Kiran Jonalagadda (who has organized several public Hacknights under the umbrella) told us multiple times that he has never seen a Hackathon quite like this, both in terms of size and quality – so a big Thumbs Up to everyone who participated!

While every single hack on display was top-notch, we did want to reward the ones that we felt went the extra step and had that little extra _something_ which allowed it to stand out from the crowd. The judging process was divided into 2 parts: The first part was a crowd-sourced online vote by all the participants, from which we shortlisted the top 15 hacks. These 15 hacks were then presented on-stage, where the judges scored them on criteria such as originality, impact, potential to be productionized and audience appeal.

The Winners

We had several awards to give out : Best Hack (for the hack that scored maximum on the judges scorecard), Popular Choice (for the hack that got the maximum number of votes in the online vote) and several smaller category awards such as “Most Innovative Hack”, “Geekiest Hack”, “Coolest Hack”, “Laziest Hack” and “Most Useful Hack”. List of winners is as follows :

Best Hack

Sirf Flipkart Karo | Chrome Plugin which suggest Flipkart products along with search results
Jagadish Kasi, Navni Bhojwani, Samir Bellare, Sudeep Kumar Moharana, Mayank Mittal


Popular Choice Award

ComicCon | Convert a video into a full blown comic
Jay Hasmukh Chawda, Vijayaraghavan A


Most Innovative Hack

Minority Report | Minority Report Style Analytics
Amod Malviya, Dipanjan Mukherjee



In true Hackathon style, their hack refused to work post event; and as such we had to take some creative liberties in displaying what their hack (supposedly) does =)

Geekiest Hack

Unix Flipkart | Bringing the goodness of the UNIX terminal to
Nikhil Bafna, Yogesh Dahiya, Pratyay Pandey


Coolest Hack

FaceIt | Login to flipkart using face recognition
Abhilash Goje, Aditya Punjani, Pavan Srinivas Bysani


Laziest Hack

Chota Minority Report | Smart file transfer
Vishnu H Rao, Aniruddha Gangopadhyay, Chetna Chaudhari S




Keeping in line with the award they won, the Chota Minority Report folks were too lazy to send in photos of their team and hack

Most Useful Hack

Hackday Website | Show the Voting results on the website in realtime
Ramesh Perumalsamy, Aakash Bapna



The Feedback

Amod Malviya (Senior Vice President-Engineering; and winner of the Most Innovative Hack award)

Congratulations everyone! This was massively awesome! I was blown away by the creativity of folks. Personally, I was so torn between having to vote only 3 times that I had to upvote, downvote, upvote multiple times.

Saran Chatterjee (Vice President-Products; and one of the judges of the event)

Kudos to the organizing team plus congratulations to all hack teams . In my books you all are winners! This was my first Hackday here in Flipkart and I was really impressed with the quality of hacks. My goal now is to work with you all and to make sure I provide the support (where needed) to get some of these prioritized into the roadmap quickly . Cant wait for it to happen!

Kiran Jonnalagadda (Founder of; and one of the judges of the event)

I was pleasantly surprised by how deeply integrated the hacks were with Flipkart’s technology. At public hackathons participants almost always build something unrelated to their day job. Most of those hacks aren’t meant to be anything more than an expression of creative energy, abandoned shortly after the event.

The Flipkart hacks were different. Nearly every one was built on top of existing Flipkart tech and was meant to address a very real problem that the participants had been mulling over for a while. They had a clear sense of the solution and a deep understanding of the platform and where to plug in each piece.

I saw five broad categories of hacks:

1. At the DevOps level, (a) better logging, processing of data streams and reporting, making it possible to understand how users are using the site and how the infrastructure is keeping up, and (b) better tooling for developers to try new ideas.

2. Workflow improvements helping fix gaps in Flipkart’s existing procurement and fulfilment.

3. Front-end tweaks, giving users an immersive content-rich pathway through the site to help them make better purchase decisions.

4. Access to Flipkart data beyond the Flipkart website, allowing users to perform comparison shopping, to access Flipkart from alternate interfaces, and to take Flipkart to social networks.

5. Fun hacks, presented as workplace quality improvements, but really just developers blowing off steam.