<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>stu.mp &#187; architecture</title>
	<atom:link href="http://stu.mp/category/architecture/feed" rel="self" type="application/rss+xml" />
	<link>http://stu.mp</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Thu, 01 Apr 2010 20:18:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Distributed vs. Fault Tolerant Systems</title>
		<link>http://stu.mp/2009/12/distributed-vs-fault-tolerant-systems.html</link>
		<comments>http://stu.mp/2009/12/distributed-vs-fault-tolerant-systems.html#comments</comments>
		<pubDate>Sun, 27 Dec 2009 17:27:25 +0000</pubDate>
		<dc:creator>joestump</dc:creator>
				<category><![CDATA[architecture]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[coding]]></category>

		<guid isPermaLink="false">http://stu.mp/?p=4230</guid>
		<description><![CDATA[I&#8217;ve been researching implementations of distributed search technology for some things we want to do at SimpleGeo. It was about this time that I ran across a project called Katta, which is a &#8220;distributed&#8221; implementation of Lucene indexes. While perusing the documentation I ran across a diagram detailing the architecture of Katta. What struck me [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been researching implementations of distributed search technology for some things we want to do at SimpleGeo. It was about this time that I ran across a project called <a href="http://katta.sourceforge.net/">Katta</a>, which is a &#8220;distributed&#8221; implementation of <a href="http://lucene.apache.org/">Lucene</a> indexes. While perusing the documentation I ran across <a href="http://katta.sourceforge.net/wp-content/uploads/kattaoverview.jpg">a diagram detailing the architecture of Katta</a>. What struck me as odd was that Katta was purporting to be a distributed system, yet had a master/slave setup managing it. This led me to tweet out:</p>
<blockquote><p><span style="padding: 0px; margin: 0px;">Dear Engineers, It&#8217;s not a &#8220;distributed&#8221; system if you have a master/slave setup. That&#8217;s simply a &#8220;redundant&#8221; system. <a href="http://twitter.com/joestump/status/7093473122">#</a></span></p></blockquote>
<p>Not long after this I got a few inquiries along with some great input from the ever-wise Jan Lehnardt from the CouchDB project along with Christopher Brown:</p>
<blockquote><p>RT @joestump: “…It&#8217;s not a ‘distributed’ system if you have a master/slave setup. That’s simply ‘redundant’” — Same: distributed != sharding <a href="http://twitter.com/janl/status/7093712636">#</a></p>
<p>And it&#8217;s not &#8220;distributed&#8221; if it&#8217;s all hub-n-spokes.  Might as well have a central server with some extra storage. <a href="http://twitter.com/skeptomai/status/7094755736">#</a></p></blockquote>
<p>Basically, Jan and I are making the argument that redundancy or sharding/partitioning doesn&#8217;t really add up to a truly distributed system. Well, not in the sense of what I think of when I think of distributed. Going back to my old friend, the CAP Theorem, I think we can see why.</p>
<p>Redundancy could be argued to be the A in CAP; always available. I&#8217;d even argue that partitioning, in the sense that most people think of (e.g. partitioning 300m users across 100 MySQL servers), is the A in CAP. The P for partitioning in CAP is that a system is highly tolerant to <em>network</em> partitioning. Because of the master/slave setup of Katta, it really only implements the A in CAP.</p>
<p>Of course, CAP says that you can only have two in any distributed system. I&#8217;d make the argument that no system is truly distributed unless it fully implements two of the three properties in CAP. I&#8217;d further argue that if you had a highly available (A), highly consistent system (C) even it wouldn&#8217;t be distributed (lacking the ability to be highly tolerant to network partitioning).</p>
<p>The problem that I have with Katta&#8217;s definition of distributed is that I can&#8217;t truly distribute Katta in the raw sense of the word. For instance, I can&#8217;t spread Katta across three data centers and not worry about a data center going down. If I lose access to the master then Katta is worthless to me.</p>
<p>To have a truly distributed system I&#8217;d argue you need the following characteristics in your software:</p>
<ul>
<li>There can be no master (hub).</li>
<li>There must be multiple copies of data spread across multiple nodes in multiple data centers.</li>
<li>Software and systems must be tolerant of data center failures.</li>
</ul>
<p>My point is, I think the word &#8220;distributed&#8221; is applied too freely to systems that are, more appropriately, called &#8220;redundant&#8221; or &#8220;fault tolerant&#8221;.</p>
<p><strong>UPDATE:</strong> Seems I confused some people about my master/slave argument. You <em>can</em> build distributed systems that have master/slave pieces in the puzzle. Large distributed systems are rarely <em>one</em> cohesive piece of software, rather they tend to be a <em>series</em> of systems glued together. What I am saying is that a simple master/slave system is not distributed. Through a series of bolts, glue, add-ons, and duct tape you can make a master/slave system sufficiently distributed, but don&#8217;t confuse the totality of that system with the base master/slave system.</p>
]]></content:encoded>
			<wfw:commentRss>http://stu.mp/2009/12/distributed-vs-fault-tolerant-systems.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It&#039;s not the language stupid</title>
		<link>http://stu.mp/2008/04/its-not-the-language-stupid.html</link>
		<comments>http://stu.mp/2008/04/its-not-the-language-stupid.html#comments</comments>
		<pubDate>Tue, 15 Apr 2008 19:49:13 +0000</pubDate>
		<dc:creator>joestump</dc:creator>
				<category><![CDATA[architecture]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[db]]></category>
		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://www.joestump.net/2008/04/its-not-the-language-stupid.html</guid>
		<description><![CDATA[I&#8217;ve said it once, I&#8217;ve said it twice, I&#8217;ve screamed it from the top of mountains and yet nobody listens. I&#8217;m sitting in a session at the MySQL Conference and the person presenting just said, &#8220;You have to have well written code to avoid bottlenecks.&#8221; This is, put bluntly, stupid and patently false. Let me [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve said it once, I&#8217;ve said it twice, I&#8217;ve screamed it from the top of mountains and yet nobody listens. I&#8217;m sitting in a session at the MySQL Conference and the person presenting just said, &#8220;You have to have well written code to avoid bottlenecks.&#8221; This is, put bluntly, stupid and patently false. Let me explain.</p>
<ul>
<li>Your true bottlenecks when scaling are very rarely, if ever, <a href="http://valleywag.com/366630/true-confessions-of-the-worlds-busiest-websites">because of your language</a>. Sure Ruby is slower than PHP or Perl or Python, but only incrementally so and it&#8217;s only going to get faster. Even if your language is your problem it&#8217;s the easiest part of your architecture to scale; add more hardware.</li>
<li>Just because your code is well written doesn&#8217;t mean it will perform well and, conversely, just because you write shitty code doesn&#8217;t mean your code will not perform well. I&#8217;ve seen some seriously shitty PHP code that&#8217;s blazing fast because it&#8217;s so simple.</li>
<li>Depending on your application, as you grow you&#8217;ll find that your scaling issues come down to one fundamental problem: I/O. DB I/O, file system / disk I/O, network traffic, etc, etc. Ask anyone who&#8217;s written a large scale application where their growing pains were and I&#8217;ll bet my last dollar it wasn&#8217;t &#8220;PHP/Python/Ruby/Perl/Java/COBOL is slow&#8221;. I&#8217;m betting they&#8217;ll say something along the lines of &#8220;MySQL took a crap on us after we hit 200,000,000 records and had to do date range scans.&#8221; Or they&#8217;ll say, &#8220;I was storing user generated content and NFS couldn&#8217;t scale to the amount of requests for that content.&#8221;</li>
</ul>
<p>I&#8217;m sick and tired of the language zealots who say PHP is slower than Perl or Ruby is slower than PHP or Java  sucks because which language you&#8217;re using has zero to do with that missing index on your table or the fact that you can&#8217;t store all of that user generated content.</p>
<p>It comes down to your architecture and, despite what the zealots would have you believe, the language you choose is only one component of your overall architecture. Choose <a href="http://www.blogmaverick.com/2008/03/09/my-rules-for-startups/">what you know</a> and run with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://stu.mp/2008/04/its-not-the-language-stupid.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
