Looking back while moving forward

A little over the last two years of my live has been devoted to building, funding, and growing SimpleGeo. The experience was, without a doubt, life changing in many ways. It was a crash course in a bunch of disciplines that I knew little to nothing about. Managing clients, iterating on your product, sales, business development, raising money from investors, etc. were all wide open holes in need to my attention and input. It goes without saying that this was not an easy endeavor.

Along with being constantly challenged and learning new things, I was fortunate enough to work with some of the brightest engineers in Silicon Valley. While at SimpleGeo I saw them build a patent-pending distributed graph database built on top of theory that had not been implemented yet in the real world. At scale no less. The operations team built an infrastructure that was scarily automated and resistant to failure. So much so that the SimpleGeo engineers get regular calls to talk about how we did it all on top of AWS.

On October 31st, we announced that we’d been acquired by Urban Airship. I’m extremely excited to see what the combined teams cook up in the coming months. Having locationally aware push notifications is going to allow businesses to engage with their customers in ways they’ve never dreamed. Additionally, I know SimpleGeo’s world class engineering team will be able to help the new company build features at scale that the competition won’t be able to match. The best is truly yet to come from this company and I’m sure Scott Kveton will be a great shepherd moving forward.

As for me, I’ve decided to move on post-acquisition. I need to step away from the echo chamber and spend time focusing on what is important to me in general; not just professionally. To that end, my lovely lady and I have bought an RV and plan on touring around the Southwest this winter. I’d like to visit as many incubators and coworking places as possible. So if you’re in the Southwest and want me to swing by and say hello, please drop me a line on Twitter or via email.

Onward and upwards.

Dear Yahoo!, hire me as your next CEO

News is going around that Yahoo! is looking for a new CEO. I have no idea if this is true or not, but if it is, I would like to announce that I’m ready, willing, and insane enough to go long and go big with Yahoo! as your new CEO. Yahoo! showed glimmers of hope when it bought Flickr and Delicious. It’s been a bastion of some of the most impressive technology of the last 15 years. I believe it can be great again.

Sounds great, how the hell am I going to do it? I’m going to take the $20bn in market cap and build an empire of product and design talent that will be beyond reproach. Then I will give them the support and freedom to do what they do best: innovate.

  1. I’d buy Instagram and put them in charge of both Instagram and Flickr. They would have 100% autonomy over the entire “Yahoo! Photo” division.
  2. I’d buy Soft Facade and run them as an internal design and branding agency for all of our products.
  3. I’d figure out a way to wrestle The Barbarian Group into the fold and put them in charge of all PR and marketing initiatives.
  4. I would buy Twitter and Square in order to bring Jack Dorsey on full-time to run a new division called “Yahoo! Mobile”. He would have 100% autonomy over the entire mobile strategy.
  5. I’d buy Path and With for the sole reason of bringing Dave and his team on to lead the new “Yahoo! Social” division.
  6. I’d buy the NYT (for a mere $1.5bn!) and recruit John Gruber to be Editor in Chief of the “Yahoo! News” division.

Just think of what we could accomplish if we just let amazing people do what they do best.

HOWTO: Spend your investors’ money

I’ve invested in two startups and advise, officially and unofficially, a dozen or so other startups. Recently, a company that I’m involved with, attachments.me, raised $500,000 from Foundry Group. Since their raise, the two cofounders, Ben and Jesse, have been on a tear adding features, solidifying the infrastructure, and ramping things up to a public beta.

Attachments.me is a unique consumer service in that a single user could have gigabytes of data to crawl across multiple accounts. As a result of this unique challenge, Ben has been spending a great deal of time working out how the underlying infrastructure is going to scale. This, of course, involves spinning up a decent amount of servers on AWS. In doing so, Ben was extremely worried about keeping costs down. I had to laugh as the numbers he was worrying about was less than 1% of the total amount raised or, as Chris Lea said, “Your investors didn’t give you the money so you could look at the large balance in your bank account.”

But, it’s a good question, and I get asked it often. How should you spend all that money your investors just gave you? How should you spend that 15% employee option pool? So I set up a Google Form and asked a dozen or so of my favorite investors what they thought. I got six responses from four seed stage investors and two Series A investors. Here’s what they had to say…

  • If you took total monthly burn and divided it by total number of employees, how much would you expect the per-employee burn be? The average response was $12,000 per employee with the majority saying $15,000 was expected. This means if you have 10 employees you should be comfortable with a total monthly burn of between $120,000 and $150,000 per month.
  • What percent of a given round of funding do you expect will be spent on personnel? The average response was 73% of the total round with the majority saying 80% This would indicate that my friends at attachments.me should feel comfortable spending $400,000 on building out the team.
  • What percent of a given round of funding do you expect will be spent on servers and infrastructure? The average response was 18% with the vast majority saying they expect a company to spend 15% of their total raise on servers and infrastructure. If you’re burning $150,000 a month, you should try to keep your AWS bills below $22,500 per month.
  • How much should rent be, roughly, per employee per month? The average response was $666 with the vast majority of investors saying $500. So a team of 10 shouldn’t be spending more than $5000/mo. on rent.
  • How much equity, on average, should early engineers get? Two investors recommended less than 0.5%, which seems extremely low for your first couple of engineers. One said 1.5% to 3%, which I think is fair for your first engineer, but on the high end for your third and fourth engineers. The other three said 0.5% to 1.5%, which seems to be the universal standard when I talk about this topic with other founders in Silicon Valley.
  • How much equity, on average, should an early executive hire get? The consensus, with four investors agreeing, was between 2% and 5% The other two investors thought 1% to 2% was appropriate. My personal recommendation, pre-Series A, would be 1% for a Director, 2-3% for VPs, and 6-9% for CEOs. This, of course, depends greatly on salary and other benefits offered. What I tell people is that I have two dials: salary and equity. Dial one up and the other gets dialed down.
  • How much, if any, of a premium would you expect there to be on burn for
    SaaS and PaaS companies?
    One investor said there should be no premium, one said 10%, one said 30%, and three said 20% The 20% number resonates with me as that’s about the premium SimpleGeo has spent on our per-employee burn. In other words, if an investor expects you to spend $15,000 per employee per month, they most likely will be okay with a platform company spending $18,000 per employee per month in total burn. The thinking here is that SaaS/PaaS companies require more infrastructure, better/higher quality infrastructure, more bandwidth, and more senior/seasoned engineers.

These are, of course, rules of thumb, but it should give you a good feeling of where you and your company stands. Your investors put money into your company under the expectation that it’s going to be spent so you shouldn’t feel bad about spending that money.

UPDATE: A lot of people have questioned the $12,000 per month, per employee number. Keep in mind that’s 100% of all burn for the entire company and not just their salary. This includes server costs, travel, rent, office supplies, etc. On average, an engineer in Silicon Valley will have a base salary of $100,000 a year. Add roughly 20% for benefits (healthcare, vacation, payroll taxes, etc.), $3k every two years for hardware, rent ($500/mo.), etc. and you’re at $10,800 per month just to pay for them to walk in the door. I doubt it’s too hard to imagine spending $1,200 per employee on travel, servers, office supplies, etc.

The Cloud is not a Silver Bullet

There has been much brouhaha over the recent EBS outage. I say EBS to be specific as this, despite the sensationalistic headlines, was not an AWS or EC2 outage. RDS was also affected as it is built on top of the EBS product. As has been reported just about everywhere, the outage affected many large websites that are built on top of AWS. Much of the banter was mainly around how the “AWS outage” had “brought down” these sites, which couldn’t be further from the truth. What brought down these services was poor architectural choices combined with a lack of understanding of the unique characteristics of cloud infrastructure.

Our friends at Twilio and SmugMug, along with SimpleGeo, were able to happily weather the storm without significant issues. Why were we able to survive the storm while others were not? It’s simple; anyone who’s spent a lot of time talking with the fine folks at AWS or researching the EBS product would know that it has a number of drawbacks:

  • EBS volumes are not anywhere near as fault tolerant as S3. S3 has 11 9’s of durability. EBS volumes are no more or less durable than a simple RAID1.
  • EBS volumes increase network IO to and from your instance to some degree and can degrade or interfere with other network services on a given instance.
  • EBS volumes lack the IO characteristics that many IO-bound services require. I’ve heard insane stories of people doing RAID0 across dozens of EBS volumes to get the IO performance they’re looking for. At SimpleGeo, we RAID0 the ephemeral drives and split data across multiple ephemeral RAID0 volumes to get the IO we’re looking for.

This doesn’t mean that you should avoid EBS or that EBS is a bad product. Quite the contrary. I think EBS is a pretty fucking cool product when you consider the features and simplicity of it. What it does mean is that you need to know it’s weaknesses in order to properly build a solution on top of it.

I think the largest surprise of the EBS outage, and one that no doubt will be reviewed in depth by the engineers at AWS, was that an EBS problem in one AZ was able to degrade services in other AZs. My general understanding is that the issue arose when EBS in one AZ got confused about possible network outages and started replicating degraded EBS volumes around the downed AZ. In a very degenerate case, when EBS can’t find any peers in their home AZ, they attempted to reach across other AZs to replicate data to other peers. This led to EBS being totally unavailable in the originally problematic AZ and degrading network-based services in other AZs as this degenerate case was triggered.

All this being said, what is so shocking about this banter is that startups around the globe were essentially blaming a hard drive manufacturer for taking down their sites. I don’t believe I’ve ever heard of a startup blaming NetApp or Seagate for an outage in their hosted environments. People building on the cloud shouldn’t get a pass for poor architectural decisions that put too much emphasis on, essentially, network attached RAID1 storage saving their asses in an outage.

Which brings me to my main point: the cloud is not a silver bullet. S3, EC2, EBS, ELB, et al have their strengths and weaknesses, just like any piece of hardware you’d buy from a traditionally enterprise vendor. Additionally, the cloud has wholly unique architectural characteristics. What I mean by this is that the tools AWS provides you need to be assembled and leveraged in unique ways that differ, sometimes greatly, to how you’d build a system if you were building out your own datacenter.

The #1 characteristic that people fail over and over to recognize is that the cloud is entirely ephemeral. Everything from EBS, to EC2, to EBS volumes could vaporize in an instant. If you are not building your system with this in mind, you’ve already lost. When you’re dealing with an environment like this, which by the way most large hosted services deal with, your architecture needs to exhibit a few key characteristics:

  • Everything needs to be automated. Spinning up new instances, expanding your clusters, backups, restoring from backups, metrics, monitoring, configurations, deployments, etc. should all be automated.
  • You must build share-nothing services that span AZs at a minimum. Preferably your services should span regions as well, which is technically more difficult to implement, but will increase your availability by an order of magnitude.
  • An avoidance of relying on ACID services. It’s not that you can’t run MySQL, PostgreSQL, etc. on the cloud, but the ephemeral and distributed nature of the cloud make this a much more difficult feature to sustain.
  • Data must be replicated across multiple types of storage. If you run MySQL on top of RDS, you should be replicating to slaves on EBS, RDS multi-AZ slaves, ephemeral drives, etc. Additionally, snapshots and backups should span regions. This allows entire components to disappear and you to either continue to operate or restore quickly even if a major AWS service is down.
  • Application-level replication strategies. To truly go multi-region, or to span across cloud services, you’ll very likely have to build replication strategies into your application rather than relying those inherent in your storage systems.

Many people point out that Amazon’s SLA for it’s AWS products is “only” 99.9% What they fail to recognize is those are per-AZ numbers. You can compound your 9’s in a positive manner by spanning multiple AZs. This means that the simple act of spanning two AZs takes you from 99.9% uptime on EC2 to 99.9999% If you went to three AZs, with an appropriate share-nothing architecture, you’d be looking at a theoretical 99.9999999% uptime. That’s 0.03 seconds of downtime a year.

This all being said, I think it is fair to give Amazon a hard time for a couple of things. Personally, I think the majority of the backlash this outage caused was driven by a few factors:

  1. Amazon and the AWS team need to more clearly communicate, and reinforce that communication, about the fallibility of each of their services. For too long, Amazon has let the myth run rampant that the cloud is a magic silver bullet and it’s time to pull back on that messaging. I don’t think Amazon has done this with malice, rather they’re a victim of their own success. The reality is that people are building fantastic systems with outrageously impressive redundancy on top of AWS, but it’s not being clearly articulated that these systems are due to a combination of great tools and great architecture – not magic cloud pixy dust.
  2. Amazon’s community outreach and education needs to be much stronger than it is. They should have a small army of AWS specialists preaching proper cloud architecture at every hackathon, developer day, and conference.
  3. A lot more effort needs to go into documenting proper cloud architecture. The cloud has changed the game. There are new tools and, as a result, new ways of building systems in the cloud. Case studies, diagrams, approved tools, etc. should all be highlighted, documented, and preached about accordingly.

So what have we learned? The cloud isn’t a silver bullet. You still have to build proper redundancy into your systems and applications. And, most importantly, you, not Amazon, is ultimately responsible for your system’s uptime.

Netflix is hiring social data/integration nerds

I’ve previously used my blog to talk about a lot of things in technology, rant about politics, and ramble on about meaningless trivia. Today I’m trying something new; talking about one of the many amazing job opportunities I’ve recently heard about in Silicon Valley. I’ve built up a somewhat random reputation for helping to connect companies with technical talent. I love helping people, I love nerds, and I’m enthusiastic about helping companies succeed, so it comes pretty naturally to me.

Which brings me to a rad opportunity I recently found out about …

I’m an avid fan of Netflix and I use the service daily on my Apple TV. I’ve met a few people who work there and recently ran into Mike Hart from Netflix and he mentioned that he’s working on forming a team to focus entirely on building out Netflix’s social strategy. He said that the company views this as a big opportunity. You can’t help but hear Mike out on these kinds of initiatives. After all, he was the guy who created, incubated, and led the Netflix API team. You may recall that this small team now powers about 200 retail devices on the market and is the 5th largest API in the world with 20 billion calls a month.

What’s not to like about Netflix? They have a CEO that Fortune named Business Person of the Year. They spent $1m on the Netflix Prize to improve their algorithms. They’re the fastest growing public company in Silicon Valley. They have a cultural thesis that is widely emulated around the world. Hell, they’re even licensing original TV shows now!

So what, exactly, would you be doing if you’re lucky enough to land this gig? Well, there are two opportunities open: a senior-level Senior Facebook Integration Engineer, and a Lead Test Engineer for Social Systems. The team you’d be joining would be small, maybe four to six people in total, and you’d be, basically, figuring out ways to integrate Netflix into the social services you love and use every day. Long term you’ll be helping to figure out how the social graph can be leveraged to to help you, and your friends, discover great TV and movie content. Your daily hacking session would include writing Java, wrangling servers in AWS’s cloud, and munging data using Apache’s Cassandra project. You test engineers can expect lots of automation and debugging. The best part is that those pesky UI/UX tests are handled by a separate team! You’d be focused solely on continuous integration, testing, and triaging the underlying social infrastructure fabric at Netflix.

If working for an amazing leader, at a company with an outstanding culture, working on complex problems with millions of users and tons of data wasn’t enticing enough, I’ve been assured by Mike that you’ll be highly compensated. How highly? Well, Netflix, being the awesome company they are, give you a total compensation number and then allow you the flexibility to choose how you want that portioned out (e.g. If you want more options, they’ll lower the salary or if you want better healthcare/401(k)/benefits, they can move around options and salary). Total compensation? Let’s just say I’ve confirmed it will definitely be at top of market (In fact, Mike caught me off guard with their compensation; it’s seriously the best I’ve seen in Silicon Valley).

Engineers interested in learning more about this opportunity should contact Mike Hart for more information. If you know of anyone interested, please pass this along. Additionally, if you tweet out the link to this blog post, I’ll be randomly selecting two people to receive 6 month streaming subscriptions from Netflix for free.

Disclaimer: I have not, in any way, been compensated to write this post. If, in the future, people do compensate me for writing up job opportunities, I’ll be sure to let you all know. Mike gave me a few talking points and made sure this post wouldn’t anger Netflix, but otherwise the words and details outlined are my own.