HostDB

This article is in two parts. The first is the announcement of HostDB, a new tool to help manage data center inventory and write applications around it. The second part of the article is a bit of a back story about why we needed this tool and all the grimy details you need to know. If you’re the impatient type, you can read the announcement and description of HostDB and skip the rest.

Announcing HostDB

Today, we’re releasing HostDB as an open source project on github. HostDB is our attempt to solve the problem of finding hosts and their purposes in a large environment. HostDB acts as a Single source of truth about all Physical and Virtual servers and is used to define their purpose. It helps us group our servers through tags and all the software written by the operations team revolves around HostDB. HostDB acts as the centralized configuration store for all sorts of information.

Hosts

Any Host that exists is created inside HostDB upon birth and has information about itself in YAML. This info can be Hardware info, amount of CPU/RAM or Network FQDN, IP address, Rack, Switch, Physical location or Function, e.g what application software the host needs? The YAML can contain just about anything, it can be varied across hosts and can be ever evolving.

Tags

Hosts are grouped together with tags, similar to a host – a tag also has information about itself in YAML This information is applied to all hosts which are members of a tag. e.g a tag called VM can be applied to all virtual machines and can be used to define properties that are shared by all VMs. To be useful a tag must have member hosts.

Versioning

HostDB provides versioning and the ability to roll back to a previous version for each and every host or tag.

API

The above concepts may look simple, but, can be used in extremely powerful ways with an API. and that’s exactly what HostDB provides. HostDB provides a REST api which can be used to create hosts, get members of a particular tag etc. We use this feature to automate just about everything at flipkart. Creating Virtual hosts, creating DNS records, automatic monitoring and escalations and building automated infrastructures. Services can do a periodic lookups to Hostdb and keep updating themselves of changes.

 

User Interfaces

HostDB provides a variety of user interfaces which can be used to interact with HostDB by applications and users.

  • Web Application
  • Rest API
  • Command Line Interface
  • HostDB::Client Perl Module

Some example use of HostDB

DC HostMap

We use HostDB to create a dynamic map of hosts in our datacenters, which is used by the siteops team to easily map the physical location of a machine inside the data center to it’s hostname or to find which physical machine a virtual machine resides in. The map is a view of the details in hostDB using the API. It changes as hosts are created/destroyed or Virtual machines move from one physical to another. This also gives visibility to everyone and devops, developers and Siteops can all talk the same language. when talking about a host.

 

DC Map
DC Map

Kloud

Kloud is both the name of Flipkart’s private cloud as well as the Virtualization software that drives it.

Kloud
Kloud

When a machine is created, Kloud creates a HostDB entry for a host. Many services, like DNS, Monitoring, Puppet, FAI have clients of HostDB that actively listen for new member addition to their specific tags, once they find a new host, they spring into action and perform tasks that are required to create the machine. e.g The puppet hostDB client adds a nodes entry for the host. The dns client creates an entry in the zone file. The monitoring client adds a nagios entry. so on and so forth.

HostDB is now available on github, so go fork it, make some changes and let us know what you think! We think it’s written in a way that will allow it to be used outside of our specific use case, and it might enable some fun projects.

 

HostDB: The Details

Sometime in 2011, While flipkart was growing at an exponential scale and the entire operations team consisted of four people. We constantly struggled to allocate hardware and to make it production ready. Cloning machines using FAI, adding monitoring, adding dns entries etc were all routine pre-defined tasks which we felt could be automated very easily. We were tracking all of this with a shared google spreadsheet which was never kept up to date. Many a times existing machines were allocated twice, or more disastrously,  re-cloned by mistake. Surely, there was a better way.

At the same time we were also thinking about a virtualization strategy, the open source options that were available at the time did not make any waves for us. So we decided to write our own, something we call Kloud. It was in these discussions that we thought about the life cycle of a machine and how instead of a centralised datastore keeping machine info, we really needed an application which could talk to other applications about the purpose of a host and it’s properties.

We looked at all the available options and were disappointed. We decided to write something which was not just a source of truth about a host, but interacted with the production environment and lived with it. Because we wrote it to automate infrastructure problems, host state and host properties comes naturally to HostDB. We kept availability, scalability and reliability as the most important features of HostDB. As It turns out HostDB scales wonderfully for thousands of clients.

Since we were such a small team, we were constantly involved in firefights and didn’t have any time to manage new services. We didn’t want to write something that depended on external services like zookeeper or mysql etc and decided to keep all the data in text files, “if it didn’t scale, we’ll change it later” was the policy.  We also wanted to future proof it and so stayed away from any complex file formats. The first prototype was written in Python by Abhishek. Both Krishnan and I refuse to read Python and Krishnan rewrote the whole thing in Perl one night. A year later another rewrite done by Jain Jonny incorporated the concept of multiple namespaces and made the code much more modular. We’ve been running this version in production for over a year now.

 

HostDB: Key/Value Store with a difference

HostDB is a key-value store, keys are grouped into namespaces based on type. ‘hosts’ is a special namespace which is central to all other namespaces. Keys inside ‘hosts’ are server names and their values contains mostly information and configuration details needed by server itself or applications it runs.

Applications can create unique keys(tags) which are application specific. You can add servers as ‘members’ of these keys which in turn helps you to consider your key(tag) as a group of hosts.

HostDB provides namespaces e.g you can create  keys(tags) that exist in a specific namespace and are access controlled, only member applications can read/write to keys of this namespace.  One can create access controls for each namespace or even for each key.

HostDB uses plain text files as its storage. The namespaces are represented as directories and keys are files inside the namespace directory. These files contain a key’s config in YAML.

The ‘members’ are also stored in files in a subdirectory in the namespace’s directory. The access permissions on the namespaces are also stored in a subdirectory.

GIT

The complete file structure of HostDB is in a git repository and git handles the versioning and transactions for HostDB. leveraging git means that we have a simple transactional store, which offers history as well as versioning. We can go to a previous version of a host or tag config at any point in time.

Web based Interface

HostDB provides a Web based interface for uses to interact. Here are some screenshots:

Command Line tool

HostDB provides a command line tool that is extremely helpful in writing those small bash one liners to get information out fast. Want to find out all machines with 24 GB of ram which are part of the search cluster. no problem!

Add an object:
$ hostdb add hosts/hostdb.ops.ch.flipkart.com
Add host to tag:
$ hostdb add -m "adding new member" tags/nm-prod/members/new.nm.flipkart.com
Get host IP: 
$ hostdb get hosts/hostdb.ops.ch.flipkart.com/Network/IP
Get tag members: 
$ hostdb get tags/nm-prod/members

 

API

There is a comprehensive list of all API functions HostDB provides. Look at the github page for details.

HostDB::Client PerI Module

We use Perl extensively and have a Perl Module that can be used by applications to interact with HostDB.  This module provides an object oriented interface over HostDB REST API.

use HostDB::Client;
my $hdb = HostDB::Client->new(\%options);
my $output = $hdb->get($id[, $revision, $raw]);
my $output = $hdb->revisions($id[, $limit]);

HostDB has been central to almost all software written by the devops at flipkart and has allowed us to scale exponentially without a fuss. We hope you find it useful too. HostDB is now available on github, so go fork it.

Hackday 4 – A Retrospective

Flipkart conducted the 4th iteration of its annual Hackday event on Sep12/13 2013 – and it was a grand success, to say the least! The event was unprecedented in terms of participation : we had 246 hackers who created a total of 96 hacks between them – and the actual number is probably higher – I personally know of several folks who were hacking away, and didn’t register or even showcase their hacks.

But more than just the numbers, it was the general buzz and excitement on the floor throughout both days, and the sheer quality of every single hack that astounded everyone present – and we’d like to share this with all of you.


The Event

mailer-360

The Flipkart Hackday is a 24-hour event where our Engineering Team is encouraged to “Think Outside the Box” and come up with innovative ideas, and then build a proof-of-concept of their idea. Post this, there is a presentation session where the best ideas are selected by a judging panel and rewarded with small prizes.

The event kicks off with a talk by Kiran Jonnalagadda, Founder of Hasgeek.com

The talk’s really entertaining and engaging, and everyone listens intently! Well, almost everyone … 

T-Shirts are distributed to all participants … 

… And there’s a mad scramble to get them!

The Organizing Team had come up with the 0x10 Commandments of Hacking …. cool stuff!

There’s plenty of awesome food … 

And Red Bull as well!

Which is consumed in copious amounts … 

And here’s the net result …. converting Red Bull into code!

There’s some serious hacking going on at this point … 

Some folks aren’t quite as serious yet =)

But the best part of the event was the camaraderie on display.

Some of the hacks are pretty hardcore … 

And some of them were completely off the beaten track, like this effort to resurrect the Flipkart Library!

The general atmosphere was so conducive to hacking, that even our guest speaker and judge Kiran got infected and started hacking away!

The sun sets, but the office is still packed with people tinkering away at their hacks.

We had made plenty of arrangements for the overnighters …

And people make best use of said arrangements, and start to get cozy!

Some folks try and catch a quick nap …

… while others have found alternative means to stay awake!

Morning glory, and people start winding up their hacks. The buzz begins to pick up again, as folks walk around and see what everyone else has been up to.

The first demo session begins, it’s complete madness as everyone tries to catch the judges’ attention!

The competition is so tight that some teams decide that some marketing is in order … 

The demo session has concluded, and it’s voting time! It’s especially hard for the judges, as their votes count for more, and there are so many awesome hacks to choose from!

Voting’s done *phew*

Everyone’s eagerly awaiting the results of the vote … 

For some people, it’s all too much!

Results are out! The top 15 hacks are shortlisted for the final presentation session.

Each of the Top 15 teams gets 5 minutes on stage to present their hacks to the assembled audience.

The audience listens intently … 

But the last 24 hours have completely drained some of our hackers!

Presentation session’s done! While the points are being tallied up, MC Sourav gets some alone time with the trophies … 

And the winners are announced! Congratulations!


The Judges

Our judging panel consisted of  :

saran-1sourav-1kiran-1

(From L-R)

Saran Chatterjee (Vice President-Products @ Flipkart.com)

Sourav Sachin (Director-Engineering @ Flipkart.com)

Kiran Jonnalagadda (Founder @ Hasgeek.com)

These gentlemen had the unenviable task of selecting the best hacks amongst a veritable sea of great hacks – all three judges mentioned just how difficult this was, because there were so many awesome hacks on display! Kiran Jonalagadda (who has organized several public Hacknights under the Hasgeek.com umbrella) told us multiple times that he has never seen a Hackathon quite like this, both in terms of size and quality – so a big Thumbs Up to everyone who participated!

While every single hack on display was top-notch, we did want to reward the ones that we felt went the extra step and had that little extra _something_ which allowed it to stand out from the crowd. The judging process was divided into 2 parts: The first part was a crowd-sourced online vote by all the participants, from which we shortlisted the top 15 hacks. These 15 hacks were then presented on-stage, where the judges scored them on criteria such as originality, impact, potential to be productionized and audience appeal.


The Winners

We had several awards to give out : Best Hack (for the hack that scored maximum on the judges scorecard), Popular Choice (for the hack that got the maximum number of votes in the online vote) and several smaller category awards such as “Most Innovative Hack”, “Geekiest Hack”, “Coolest Hack”, “Laziest Hack” and “Most Useful Hack”. List of winners is as follows :

Best Hack

Sirf Flipkart Karo | Chrome Plugin which suggest Flipkart products along with search results
Jagadish Kasi, Navni Bhojwani, Samir Bellare, Sudeep Kumar Moharana, Mayank Mittal

 

 
Popular Choice Award

ComicCon | Convert a video into a full blown comic
Jay Hasmukh Chawda, Vijayaraghavan A

 

 
Most Innovative Hack

Minority Report | Minority Report Style Analytics
Amod Malviya, Dipanjan Mukherjee

 

 

In true Hackathon style, their hack refused to work post event; and as such we had to take some creative liberties in displaying what their hack (supposedly) does =)

 
Geekiest Hack

Unix Flipkart | Bringing the goodness of the UNIX terminal to flipkart.com
Nikhil Bafna, Yogesh Dahiya, Pratyay Pandey

 

 
Coolest Hack

FaceIt | Login to flipkart using face recognition
Abhilash Goje, Aditya Punjani, Pavan Srinivas Bysani

 

 
Laziest Hack

Chota Minority Report | Smart file transfer
Vishnu H Rao, Aniruddha Gangopadhyay, Chetna Chaudhari S

 

plaeholder-teamplaceholder-hack

 

Keeping in line with the award they won, the Chota Minority Report folks were too lazy to send in photos of their team and hack

Most Useful Hack

Hackday Website | Show the Voting results on the website in realtime
Ramesh Perumalsamy, Aakash Bapna

 

 


The Feedback

Amod Malviya (Senior Vice President-Engineering; and winner of the Most Innovative Hack award)

Congratulations everyone! This was massively awesome! I was blown away by the creativity of folks. Personally, I was so torn between having to vote only 3 times that I had to upvote, downvote, upvote multiple times.

Saran Chatterjee (Vice President-Products; and one of the judges of the event)

Kudos to the organizing team plus congratulations to all hack teams . In my books you all are winners! This was my first Hackday here in Flipkart and I was really impressed with the quality of hacks. My goal now is to work with you all and to make sure I provide the support (where needed) to get some of these prioritized into the roadmap quickly . Cant wait for it to happen!

Kiran Jonnalagadda (Founder of Hasgeek.com; and one of the judges of the event)

I was pleasantly surprised by how deeply integrated the hacks were with Flipkart’s technology. At public hackathons participants almost always build something unrelated to their day job. Most of those hacks aren’t meant to be anything more than an expression of creative energy, abandoned shortly after the event.

The Flipkart hacks were different. Nearly every one was built on top of existing Flipkart tech and was meant to address a very real problem that the participants had been mulling over for a while. They had a clear sense of the solution and a deep understanding of the platform and where to plug in each piece.

I saw five broad categories of hacks:

1. At the DevOps level, (a) better logging, processing of data streams and reporting, making it possible to understand how users are using the site and how the infrastructure is keeping up, and (b) better tooling for developers to try new ideas.

2. Workflow improvements helping fix gaps in Flipkart’s existing procurement and fulfilment.

3. Front-end tweaks, giving users an immersive content-rich pathway through the site to help them make better purchase decisions.

4. Access to Flipkart data beyond the Flipkart website, allowing users to perform comparison shopping, to access Flipkart from alternate interfaces, and to take Flipkart to social networks.

5. Fun hacks, presented as workplace quality improvements, but really just developers blowing off steam.