Checking large sites for PHP parse errors

I now work for a large site. It’s not a place I haven’t been before, but I’m definitely getting back into the swing of running a large (we’ll do over 10 million requests during our busy months). We’re in the middle of migrating our websites in house onto larger servers that we’ll manage ourselves. It’s a daunting task because our company runs lots of smaller sites we’ve either picked up or somewhat abandoned along the way. These sites were written in older versions of PHP (probably PHP3) by the owner of the company as he was learning PHP. As a result I’m having to go back and fix include paths, etc. as we migrate to the new server, but how exactly do you automate the task of checking a site with hundreds to tens of thousands of pages for parse errors?

A friend of mine, Ian, who I do a lot of “bouncing off of” had a great idea. Basically, you turn on log_errors in your php.ini file, tail -f /path/to/error.log | grep PHP and then run a garden variety link checker against your site. As the link checker crawls your site it should trigger parse errors that will show up in your error file.

There you have it. A simple and easy way to check a large site for parse errors. Turn your error_reporting to simply be E_ALL to find notices and other problems in your code.

4 thoughts on “Checking large sites for PHP parse errors

  1. Wouldn’t it be wiser to right a recursive bash script that simple ran php -l $filename against all the files? Link checker is fine but what about areas of a website are under security? The only thing I think is good about the link checker idea is that it you can find all the possible E_NOTICES and such, but for parse errors, I’d recommend php-l against your php files.

    Your loving one man show,
    Mr. Bill

  2. It would be a good idea and I did think about that, however, many of my scripts (and I can’t possibly be alone here) depend on GET and POST arguments. You can’t adequately replicate /path/to/script.php?var=val.

    For instance, what if $var determines include paths, template paths, etc.? Hence, this workaround and not php -l.

  3. I suppose if you have include paths coming from GET and POST you have more to worry about than parse errors. I might suggest to you, being the php guru I’ve known and loved, creating an open source tool to do both php -l with a makeshift link tester and log analyzer.

    Good luck,
    Mr. Bill

    UnlhbiAiVGhlIE1hYyBEYWR5Ig==

  4. Well, let me clarify. I don’t simply use those get variables for include paths, I don’t use GET vars for that at all, but I *DO* use $_SERVER['SERVER_NAME'] to run multiple sites from a single code base (server name determining site specific templates and configs), which isn’t implemented when running php -l from the shell.

    You are right, opening files and includes based on GET args is a dangerous and bad habit.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.