A Quick Bitmask HOWTO for Programmers

Warning: Highly Geeky material follows.

I’m currently working on a large database for a customer. I’m actually redoing a crappy version of the database into a normalized screaming machine. I ran into a problem recently in that some of the values are stored as bitmasks. I knew what a bitmask was, but generally regarded them as voodoo magic left to crazy C hackers. Until now. I contacted my voodoo crazy C hacker mentor, Jeremy Brand, and asked him how they worked.

Here’s a quick tutorial:

instead of counting 0, 1, 2, 3, 4, 5… you could with powers of 2
instead 2^0, 2^1, 2^2, 2^3, 2^4, 2^5… (1, 2, 4, 8, 16, 32…).

When you lable something with one of these powers of two you can
add them to other powers of 2 and then later on divide out what
you started with again.

For example:

apple = 4
orange = 2
banana = 1
———–
sum = 7

The sum of your identifiers is 7 (lets call this $sum).

So, later on you can check 7 to see what is in your basket:

does $sum mod 4 equal 0? (then there is no apple in the cart)
does $sum mod 2 equal 0? (then there is no orange in the cart)
does $sum mod 1 email 0? (then there is no banana in the cart)

Using bitmasks isn’t neccesarily easier to use, but it is fast.
It’s fast because computers already thing in bits. This example
is using 3 bit memory. Typically you’ll have 16 options (like I
have 3 here) because of the size of an integer. If you’re lucky
you’ll be using C or even mysql that can access all 32 bits of an
integer then you’ll have 32 options.

The reason why computers use base-2 to to begin with are because
with storage hardware there is really only two states: on and off.
The more single units that you can store ons and offs the more
storage the device can have.

FYI, you can use google for your calculator:
eg. http://www.google.com/search?q=7+mod+4.

Finally, that makes sense. I’m still completely lost on the actual math that goes into making this work (which irks me), but I did manage to get a proof-of-concept program working, which may help other bitmask deficient programmers out there:

<?php

  // Copyright 2004 Joe Stump <joe@joestump.net>
  // Public Domain
  // Usage: php -q bitmask.php 1 4 32
  //        (any numeric argument will be evaluated - change to whatever)
  echo "Valid bitmask values (up to 16): n";
  for ($i = 1 ; $i < 16 ; ++$i) {
      echo "2 ^ $i = ".pow(2,$i)."n";
  }
  echo "nn";

  $bitmask = 0;
  $values = array();
  for ($i = 1 ; $i < count($argv) ; ++$i) {
      if (is_numeric($argv[$i])) {
          $bitmask += $argv[$i];
          $values[] = $argv[$i];
      }
  }

  echo "Bitmask Contains: ".implode(', ',$values)."n";
  echo "Bitmask Total: ".$bitmask."n";

  echo "nResults:n";
  $arr = array(1,2,4,8,16,32);
  for ($i = 0 ; $i < count($arr) ; ++$i) {
      echo $arr[$i].': '.((($bitmask & $arr[$i]) == 0) ? 'FALSE' : 'TRUE')."n";
  }

?>

The trick is the &. If the result is 0 (zero) then the single bitmask value is not present in the sum of the bitmask. I’m sure this has something with the fact that you cannot add any separate list of single bitmask values and get the same sum twice (ie. 1 + 2 + 4 = 7, 2 + 4 + 8 = 14). Again, I just know “it works[tm]“. I hope this helps someone else out there.

I’m mainly caching this here so the next time my retarded mind can’t wrap itself around bitmasks I can check back to my own site. If you have questions concerning this don’t email me because I still think little green men make this mathematic “trick” work. Damn, I wish I had gotten a Computer Science degree instead of a Computer Info. Sys. degree.

16 thoughts on “A Quick Bitmask HOWTO for Programmers

  1. Finally! I saw it a lot of times but couldn’t figure out it’s name, thanks

    You helped at least one bitmask deficient coder :)

  2. I can’t believe the types of people that call themselves programmers these days. And managers don’t know; to them, anyone who can throw some neat buzzwords around must know what they’re doing. And hence the overwhelming amount of crappy, substandard, amateur software systems out there, with poor documentation and even worse design. You sir, are a moron, and I hate your kind.

  3. You have no IDEA what kind of programmer I am. Before you shoot your mouth off you should probably take the time to get to know me. Just because I didn’t know what a bitmask was (or how to effectively use it) doesn’t mean that I’m not good with SQL, OOP, etc. At least I took it upon myself to learn how to use them and can recognize the fact that they are a more elegant solution to many problems.

    How about you come down on someone for trying to learn how to be a better programmer? You, sir, are a moron, and I hate you.

  4. I think the idea of bitmasks is something not everyone has to know. You can develop applications for years and never stumble across a need for them, especially when not using DBs.

    Bitmasks are usefull, but where I start to get nervous are bitshifts, e.g. in Java. They are even stranger (and I think the code tends to get very unreadable).

    ArenĀ“t Integer comparisons allways transformed into bitmasks by a decent compiler?

  5. So rather than dissing someone, how ’bout I just be helpful ;) The math you’re looking for behind all this is blindingly simple. Problem is, for some, they don’t speak binary. Suppose we have four bits. Not bytes, bits. So, line them up, and they look like this:
    0000 = 0
    0001 = 1
    0010 = 2
    0100 = 4
    1000 = 8
    Looking at that pattern, you’ll notice that the 1 is moving one space to the left (hence bit shift) for each new value.
    Not bad so far?

    OK, now lets talk about “and”-ing them. Bitwise “and”. What does that mean? Well, if you compare two *binary* numbers like:

    0010 = 4 decimal
    0011 = 5 decimal

    Look at the right-most number of each set of bits. The top one is 0 and the bottom one is 1. There are only two rules to remember:

    0 & 1 = 0 binary
    1 & 1 = 1 binary

    Remember this binary (base 2) math, not decimal math (base-10), and we don’t use “bitwise and (&)” in decimal math, only operators like (+,-,/, x, % among others).

    So “and” each column of bits and you get the result:

    0011 = 5 decimal
    0010 = 4 decimal
    0010 = 4 decimal = Result

    What good does that do? Well, just like your coffee maker, you just created a filter. Imagine if it looked like this:

    0011 = 5 decimal = A user’s “access rights”.
    0010 = 4 decimal = The permission needed to access this page.

    $60 000 question: If the user arrives at this page, do they see the page?
    But yes!! The second bit is set on their “access rights”.

    Hope that helps.

    Btw. I’ve worked in software a long time including some world reknown projects (NFS: Underground for example). I’ve learned, if someone’s got attitude, it goes along with insecurity. The best and the brightest are easy-going, approachable, and always ready to learn. No one knows everything about everything.

  6. Great tutorial!
    Still, I have two suggestions:

    a) You should bitwise OR (|) the values to compose the bitmask, instead of adding (just for historical reasons ;));
    b) To test, you should bitwise AND the bitmask with a value, and then compare to the same value being tested (instead of 0);

    Suppose you have the following bitmask:

    1010 –> bitmask (10 in decimal)

    And you want to test for:

    0110 –> value (6 in decimal)

    If you bitwise AND (&) the two, you’ll get:

    0010 –> bitmask & value (2 in decimal)

    Which is greater than zero, yelding TRUE. Yet the test value is not within the bitmask..

    So, your script works if the values being tested are powers of 2, but fails otherwise.

    You could easily make it fullproof by testing against the same value, like so:

    function testbitmask ($value, $bitmask){
    if (($value & $bitmask)==$value) return TRUE;
    }

    This function works whether your bitmask is composed of powers of 2, or any other combination of numbers.

    Note that there’s a reason for bitmasks to be made of powers of two. In the given example, the bitmask 1010 is supposed to represent the values 2 (0010) and 8 (1000), OR’ed together.

    If you add 10 (which is 1010 in binary) to the bitmask, you would get… 1010! How then could you know if the bitmask contains 2, 8 or 10? Well, you couldn’t.

    To prevent this problem, you’re only supposed to use powers of two on your bitmask, since these numbers shift one bit to the left sequentially, like so:

    2^0 = 1 = 0001
    2^1 = 2 = 0010
    2^2 = 4 = 0100
    2^3 = 8 = 1000

    But since in some situations it’s useful to use other numbers (you can, if you’re careful), your scripts better be ready.. :)

    Hope this helps!

    As for “Mr.” John’s comment, I guess he must have been born taught or something… :P

  7. so i still have questions in regards to this…

    Scenario:

    91 | 92 = 95 (1011011 | 1011100 = 1011111)

    Test:
    95 & 91 = 91 CORRECT
    95 & 92 = 92 CORRECT
    95 & 93 = 93 ??
    95 & 5 = 5 ??
    95 & 6 = 4 ??

    I recognize that powers of 2 should be used to avoid just this complication. However I have a dev manager that is convinced that it can be done with any set of numbers. Could someone illuminate for me how i can use non powers of 2 in a bitmask and still be able to avoid false positives?

    Thank you!!

    Rusty

  8. so i still have questions in regards to this…

    Scenario:

    91 | 92 = 95 (1011011 | 1011100 = 1011111)

    Test:
    95 & 91 = 91 CORRECT
    95 & 92 = 92 CORRECT
    95 & 93 = 93 ??
    95 & 5 = 5 ??
    95 & 6 = 4 ??

    I recognize that powers of 2 should be used to avoid just this complication. However I have a dev manager that is convinced that it can be done with any set of numbers. Could someone illuminate for me how i can use non powers of 2 in a bitmask and still be able to avoid false positives?

    Thank you!!

    Rusty

  9. You need to use bitmasks.

    If you want more options, use a larger bit mask field (eg 0000000 instead of just 0000).

  10. One thing I will add, the main problem with these (and I dont like it much) is that you can run out of bits/flags you can assign quickly.

    For example, a 32 bit number means you can only have up to 32 different permission flags, or items in your cart etc.

    You wouldnt use a bitmask to handle what items a user has in their shopping cart, as there would be 1000’s of items (and to have 1000s of items supported, you would need a 1000bit+ bitmask to handle each item).

  11. Thank you. Very nice tutorial and great responses! Except for the one guy who should die and clear out the gene pool a bit….

  12. Thanks! I have been tasked with working with a db that uses bit masks and almost drew a blank. I have completely forgotten how it all came together. This article totally recalled the bit thought pattern for me!

    Thank again!

  13. I think John has a point, although he is a moron. A true programmer understands the underlying concepts, this should be learnt when you first start programming. Calling yourself (in the literal sense, not the author) a professional when you don’t know the basic principles is like calling yourself a mechanic and not knowing that cars use combustion engines but happily repairing engines because they have using parts. I wouldn’t want that mechanic to work on my car. Note that in my contrived analogy the mechanic does not need to know thermodynamics, simply how in general it works, likewise a programmer needn’t be expected to use microcode or assembler.

  14. Im a junior comp sci student, and i had never heard of bitmask till today. It came about at my job @ IT when i asked one of our “geniuses” for a tutorial on something. Just like it was mentioned in a previous post, the best in the field are the most helpful and humble. Ive learned that insecure people walking around telling illiterates that they are hackers and programmers have John’s attitude and end up failing because of it. Team work is the best learning environment in my opinion. It is impossible to know everything.

  15. Hello, I stumbled upon this looking for a refresher (I saw this once before in a class and this would be my first time applying it).

    I may be confused but I think there may be a problem with your laymen algorithm:

    Our cart can have the following items in it (with the values provided in your example.)
    Lets assume only an orange and apple is in the cart

    Sum = 6
    sum mod 4 0 (an apple is in the cart)
    sum mod 2 = 0 (an orange is NOT in the cart) -> ERROR

    But what if we only had an apple in the cart

    Is this not working because we are dealing with integers in the example? (I was under the impression that the compiler and the speed of bitshifting within a processor would take care of the conversion).

    If a bitmask can be applied with integers, I suspect I need to apply more than a mod to obtain the result. Can you tell me what I’m missing?

    Thanks.

    PS: Sorry for writing this about 3 years late ; >)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>