A discussion on languages and frameworks

I start all of my talks at conferences on architecture and scaling with describing the distinct differences between scaling and performance. I define scaling in one word: specialization. I, somewhat jokingly, retort to the question of what performance is with “Who cares?”.

The reason for not caring (much) about actual performance is that whether or not you use single quotes, double quotes, objects, functions, Python, Ruby, PHP, foreach, etc. has nothing to do with whether or not your application and site will scale.

Scaling is entirely about IO. Fundamentally, it’s about whether your data is being stored in a manner that makes retrieving it at the rate of today’s high traffic websites possible. In other words, Ruby isn’t the reason you can’t store 250,000,000 records in MySQL and do range scans. It’s because MySQL (and most RDBMS’s) suck.

I’ve been playing with Django a lot lately for a side project. Despite being a pragmatic coder to a fault, I’ve decided to truly learn another language and I felt Django would ease the shock a bit. So far I love it. Django’s patterns make a lot of sense and I’m loving the goodies a true OO language like Python gives me. Since starting down this path I’ve been getting two questions over and over:

  1. Does this mean Digg is going to be using Django?
  2. Why didn’t you choose Ruby on Rails?

Digg will not be using Django or any other framework anytime soon. We deal with traffic that most Django developers will probably never see. Our stack receives billions of requests a month. That kind of traffic, as I stated earlier, requires specialization. Django is the exact opposite of this. It’s a generic web framework made to answer the majority of web programmer’s basic needs. The majority of web programmers don’t deal with the problems we deal with. I’m sure if you ripped out a lot of what makes Django so great (e.g. the models, admin, etc.) Django would be fine (e.g. If we used it only for mapping requests to views and templates), but then it wouldn’t really be Django would it?

I chose Python and Django over Ruby and Ruby on Rails for a number of reasons. First and foremost is that we use Python here at Digg. Learning Python will only enhance my ability to perform my duties at Digg. I, personally, dislike the Perlisms in Ruby. Additionally, Ruby seems to skew towards implicitness in the language, while Python skews toward explicitness. I like explicitness.

To sum things up, I have nothing technically against Ruby as a language, I love frameworks for regular development work and Perl’s syntax kills small children.

Choose what you love and be happy coders.

How I got started programming

  1. How old were you when you started programming?By the time I really got into computers around age 12 or 13 my parents old Tandy 3000 wasn’t quite up to date compared to the 386’s and 486’s most of my friends had. I really truly started coding in TI-BASIC on my TI-85. I created all sorts of games, programs and such, which I’d then trade and sell to other kids at school. Around age 16 I bought a custom built machine from a local computer shop. A Cyrix 133 with 16MB of RAM, which I soon upgraded to a Cyrix 200 with 40MB of RAM. It wasn’t long after this that I started coding IRC bots for mIRC and HTML on my Geocities website.My first true coding experience didn’t really come until I started college where I was introduced to PHP by my friend Paul Barton. It was love at first site and the rest is, as they say, history.
  2. How did you get started in programming?I really started in college, but I’m sure there was some BASIC and VB stuff for office here and there before that. My first programs were written in TI-BASIC and ASM for the TI-85. 
  3. What was your first language?TI-BASIC for the TI-85 calculator is the first programming language I really sunk my teeth into. What a nightmare. 
  4. What was the first real program you wrote?Depends on how you define this I suppose. The first program that I wrote that had any use to me was a program that would figure out math equations for my algebra and statistics classes that would show each step of work as it solved the equation. That’s also the first program I wrote that I made money from as there were quite a few students interested in it.
  5. What languages have you used since you started programming?I guess that depends on what you mean by “used”. I’ve written substantial lines of code in C/C++, PHP, Python, Perl, ASM, BASIC, TI-BASIC, JavaScript and BASH. I’ve also done quite a bit of work in COBOL and MFC.
  6. What was your first professional programming gig?My first paid gig was working on the website for Affordable Computers in Ann Arbor, MI. I’d say my first run at the “big show” was for Care2.com in 2000. 
  7. If there is one thing you learned along the way that you would tell new developers, what would it be?Break stuff. Break everything. Poke, prod and explore. Don’t listen to people who tell you that you can’t do something or that you’re wasting your time. More practical advice is that you should learn to know and love design patterns and avoid GUI’s. I have a real problem with people who say they know SQL because they’re well versed with an ORM or a DB’s GUI. Go back and read up on relational algebra and SQL92 before you say you know SQL okay?I’ll probably get flamed for this, but I think people should learn a single environment in and out and stick with it. This might mean you learn Microsoft’s technologies in and out or Cocoa or LAMP. You simply can’t be an expert in an area of computers without picking a single environment and sticking with it. Dabble, sure, but pick a horse and learn everything you can about it.If you choose UNIX read one man page a day until you’ve read all of the GNU utilities’ man pages. You’re not a true UNIX geek unless you’ve typed man man at once point in your life.
  8. What’s the most fun you’ve ever had programming?Oh, I don’t even know where to start. Hacking on PHP3 back in my dorm room, working with Jeremy and Seth on Care2 late into the night, building eNotes’ infrastructure from the ground up and building large scale systems with Ron and Matt at Digg to name just a few.

This absurd entry was spurred on by Erik Kastner. I’m going to give him a noogie next time I see him for this. Because I hate chain posts like this I won’t be tagging anyone as it after this, but if you do carry this on please trackback this post so I can read and reminisce with you.

New iPhone version of Digg

Like many other interesting projects the Digg iPhone project started with a conversation over a few beers and a challenge: code it in 48 hours and Kevin would give me a free iPhone. Being the unabashed Apple fanboy I am and, also, being a self-respecting coder I set out to create Digg for the iPhone.  After spending a short time white boarding the application, Daniel mocked up the design and I set off to code it.

Technically speaking, it’s no revolutionary application and I didn’t spend the entire 48 hours working on the application. I did, of course, find time to go see Transformers (awesome) over the weekend. The JavaScript was borrowed from Joe Hewitt and adapted a bit using jQuery. The application iteself is based on our API using the Services_Digg PEAR package I maintain. I’ve been talking with the jQuery team about some limitation in the animations and plan on packaging up a more robust iPhone JavaScript library based on jQuery sometime in the near future (hopefully).

Yesterday Kevin announced the iPhone application and Daniel Burka, our designer at Digg, has covered the details about designing for the iPhone. So far the response to the application has been positive. It was fun and I’m glad everyone is enjoying it. I know I did while riding the bus to work today.

And to answer everyone’s questions. I got an 8GB version last night, the keyboard is interesting, it’s breathtakingly gorgeous and I’ll write YAiPR (Yet Another iPhone Review) soon.

Intelligent image thumbnails

A recent project I’ve been working on required 4:3 thumbnails of all images no matter the original image’s orientation. This required me to do a little math and cropping before actually making the thumbnail. I use the package Image_Transform to figure out which orientation the image is and then crop it appropriately.


$o = Image_Transform::factory('IM');
if (PEAR::isError($o)) {
    return $o;
}


$result = $o->load($ds1);
if (PEAR::isError($result)) {
    return $result;
}


$w = $o->getImageWidth();
$h = $o->getImageHeight();

if ($h > $w) {
    $newWidth = $w;
    $newHeight = ($newWidth * .75);
    $newY = ($h * .15);
    $newX = 0;
} else {
    $newWidth = ($w / 2);
    $newHeight = ($newWidth * .75);
    $newY = ($h / 2) - ($newHeight / 2);
    $newX = ($x / 2) + ($newWidth / 2);
}

$o->crop($newWidth, $newHeight, $newX, $newY);
$o->save($ds2);

The first check checks to see if the image is in landscape or portrait (in portrait the height will be greater than the width). With portrait’s I use the width, multiple that by 0.75 to get my 4:3 ratio and finish by going 15% down from the top of the portrait (assuming that you’re focusing your portrait in the upper portion of the image). With landscape images I simply go outwards from the center of the image (again, assuming you’re focusing your landscape in the center of the image).

A Quick Bitmask HOWTO for Programmers

Warning: Highly Geeky material follows.

I’m currently working on a large database for a customer. I’m actually redoing a crappy version of the database into a normalized screaming machine. I ran into a problem recently in that some of the values are stored as bitmasks. I knew what a bitmask was, but generally regarded them as voodoo magic left to crazy C hackers. Until now. I contacted my voodoo crazy C hacker mentor, Jeremy Brand, and asked him how they worked.

Here’s a quick tutorial:

instead of counting 0, 1, 2, 3, 4, 5… you could with powers of 2
instead 2^0, 2^1, 2^2, 2^3, 2^4, 2^5… (1, 2, 4, 8, 16, 32…).

When you lable something with one of these powers of two you can
add them to other powers of 2 and then later on divide out what
you started with again.

For example:

apple = 4
orange = 2
banana = 1
———–
sum = 7

The sum of your identifiers is 7 (lets call this $sum).

So, later on you can check 7 to see what is in your basket:

does $sum mod 4 equal 0? (then there is no apple in the cart)
does $sum mod 2 equal 0? (then there is no orange in the cart)
does $sum mod 1 email 0? (then there is no banana in the cart)

Using bitmasks isn’t neccesarily easier to use, but it is fast.
It’s fast because computers already thing in bits. This example
is using 3 bit memory. Typically you’ll have 16 options (like I
have 3 here) because of the size of an integer. If you’re lucky
you’ll be using C or even mysql that can access all 32 bits of an
integer then you’ll have 32 options.

The reason why computers use base-2 to to begin with are because
with storage hardware there is really only two states: on and off.
The more single units that you can store ons and offs the more
storage the device can have.

FYI, you can use google for your calculator:
eg. http://www.google.com/search?q=7+mod+4.

Finally, that makes sense. I’m still completely lost on the actual math that goes into making this work (which irks me), but I did manage to get a proof-of-concept program working, which may help other bitmask deficient programmers out there:

<?php

  // Copyright 2004 Joe Stump <joe@joestump.net>
  // Public Domain
  // Usage: php -q bitmask.php 1 4 32
  //        (any numeric argument will be evaluated - change to whatever)
  echo "Valid bitmask values (up to 16): n";
  for ($i = 1 ; $i < 16 ; ++$i) {
      echo "2 ^ $i = ".pow(2,$i)."n";
  }
  echo "nn";

  $bitmask = 0;
  $values = array();
  for ($i = 1 ; $i < count($argv) ; ++$i) {
      if (is_numeric($argv[$i])) {
          $bitmask += $argv[$i];
          $values[] = $argv[$i];
      }
  }

  echo "Bitmask Contains: ".implode(', ',$values)."n";
  echo "Bitmask Total: ".$bitmask."n";

  echo "nResults:n";
  $arr = array(1,2,4,8,16,32);
  for ($i = 0 ; $i < count($arr) ; ++$i) {
      echo $arr[$i].': '.((($bitmask & $arr[$i]) == 0) ? 'FALSE' : 'TRUE')."n";
  }

?>

The trick is the &. If the result is 0 (zero) then the single bitmask value is not present in the sum of the bitmask. I’m sure this has something with the fact that you cannot add any separate list of single bitmask values and get the same sum twice (ie. 1 + 2 + 4 = 7, 2 + 4 + 8 = 14). Again, I just know “it works[tm]”. I hope this helps someone else out there.

I’m mainly caching this here so the next time my retarded mind can’t wrap itself around bitmasks I can check back to my own site. If you have questions concerning this don’t email me because I still think little green men make this mathematic “trick” work. Damn, I wish I had gotten a Computer Science degree instead of a Computer Info. Sys. degree.

Pyschology of Programming

A Slashdot thread about the Psychology of Programming really hit home today. After having my almost zendlike coding state shattered I left work in a huff.

Much like authors and artists, programmers suffer from “writers block” and when they are in the mental state to program it’s a fragile state that should be respected. I know wonder how much productivity has been lost because of such interruptions to my “flow” through the years …

Fuckus!

Psychic: Fuckus!
Brodie: That’s what I’m talking about!
TS: She said “focus”
Brodie: Whatever.

On Saturday I did something I had only heard about in fair tales; I conducted a focus group for our new website. To put it lightly it was mind blowing. It was interesting to see how their clicking habits compared to how you had designed pageflow. Not to mention the things that were totally overlooked.

I had them do a few tasks that I needed users to be able to do with little problem (adding stuff to their cart, signing up for an account, checking out, searching, etc.). It was interesting to see where their eyes focused on the page, what confused them, and what path they followed to find products.

A friend of mine turned me onto what appears to be a wealth of information concerning web user interface. I found a few particularily interesting, such as the Top Ten Web-Design Mistakes of 2002 and Writing for the Web.

WAP, WML

I’ve been doing some research on WAP and WML. While searching on how to create WBMP image files I ran across a story, written in 2000, that predicted more than half of all web content would be viewed from wireless devices. Here we are in the beginning of 2003 and I’ve still yet to use WAP for anything more than looking up movie times.

Since I recently purchased a chatboard for my T68i I have become interested in programming my own WAP applications. Mainly so I can read my IMAP email and post to this blog from the road at anytime. While figuring out how to do this I found this superb primer on building interactive WAP applications. When I get things working I’ll be sure to let everyone know. I’d have to believe the blog community would be very interested in being able to blog from their cell phones.

Look Ma I modified my source!

In true Open Source tradition I found a program that did *almost* what I wanted and modified the source to do what I wanted. Once I had it working I submitted the patch to the maintainer.

The program is called BlackHole and does AV and SPAM checking. I use SpamAssassin for SPAM protection so my main focus was AV. BlackHole was made to work with Qmail, but because of my setup I needed it to output to STDOUT. It currently only wrote to Maildir and mbox. With a little modification I soon had it working. Hopefully, someone actually finds it useful.

Google bucks the trend

Everyone knows the coolest way to interface with web services is SOAP. Evidently no one told Google this. They use SOAP for their Google Search API, which is very cool, but decided to use flat tab deliminated files and FTP as it’s interface for Froogle. If you work for a retail shop I highly recommend posting your products there. I can see this becoming a major source of traffic for web stores around the world.