Formatting User-Submitted Text
March 2, 2008 – 7:51 pmIf you are a PHP and/or a PBBG developer, then I’m sure you know about the problems associated with user-submitted text. If not sanitized properly, your website could be the victim of XSS attacks and SQL injections. In this article I will discuss what methods I use to protect my sites.
PHP already has several useful functions that can be used to sanitize strings. The first function I want to talk about is nl2br(). It doesn’t really help in security, but it is great for readability when you need to display stuff. It inserts a br tag at the end of each new line.
When it should be used: This tag should only be used when data is being displayed in a non-editable form, such as in a forum or a user profile. When it is displayed in an editable form such as a textarea, or when it is being added into the database, you don’t want to use this function because you want to preserve the original text. If the text is being edited by a user, they might be wondering why there are suddenly HTML tags all over their text.
If you want to get rid of HTML in text, you can use this function. strip_tags() will attempt to remove all HTML and PHP tags. You can also set which tags to allow! However, this function is not reliable, and can have unwanted side-effects. Even if you allow only ’safe’ tags, attributes of HTML tags will not be altered, and can still be dangerous by adding attributes such as ‘onmouseover’.
When it should be used: If you really are not picky about security, you could use this function as a very primitive form of removing HTML whenever the formatted text is being inserted into the database. There is no need to ‘only’ use this function when the text is being displayed to the user, since the HTML tags are not meant to be preserved, but removed permanently. However, this function still cannot protect you from SQL injections or XSS attacks.
htmlentities() and htmlspecialchars()
If you don’t want to remove the HTML tags, but instead display them, you can use either of these functions. They will convert characters into their corresponding HTML entities. The difference between the two functions is that htmlspecialchars() will only convert a limited set of characters (see PHP manual), while htmlentities() will attempt to convert everything.
When it should be used: These functions should be used when the text is being displayed, not processed, for the same reasons as for nl2br() - you’ll probably want to preserve the original text. Remember to add ENT_NOQUOTES as a parameter of the functions to convert double and single quotes.
This function will add backslashes to text to escape all quotes and backslashes.
When it should be used: addslashes() should be used on GET/POST/REQUEST/COOKIE data if magic_quotes_gpc is off. If it is on, backslashes will be added automatically. addslashes() should be used when you are inserting data in the database. This function is useful because it escapes quotes, which could potentially break out of any SQL queries you run with the original data.
stripslashes() will remove backslashes from your message. Double backslashes will become a single backslash.
When it should be used: stripslashes() should be used on all GET/POST/REQUEST/COOKIE data if and only if magic_quotes_gpc is on and you want to display that data immediately. Otherwise, stripslashes() should be used when you are displaying data from the database which have already had their quotes escaped (with addslashes() or magic quotes).
This function is supposed to take care of SQL injections. It will escape all special characters in any values/queries that you pass a parameter.
When it should be used: This function should be used when you are inserting user-submitted text into the database. This function should not be used in conjunction with addslashes(). Any quote escaping will be done automatically by this function. I haven’t personally used this function before so I don’t know how effective it is.
HTMLPurifier is a library for cleaning up HTML. You choose which tags to allow, or none at all, and the library will take care of the rest. I like to think of this as an advanced and more useful strip_tags() function.
When it should be used: HTMLPurifier’s functions should be used whenever data is being processed and added into the database, so that when the text is displayed, there won’t be any faulty code or hidden HTML tags. HTMLPurifier is very useful for protection against XSS attacks, and is also very flexible, allowing your users to use HTML tags safely.
Of course, these aren’t the only solutions available! There are plenty of other functions, and you could also make your own functions to sanitize strings.
If you have your own methods of sanitizing user-submitted text, please leave a comment and share your methods with us! ![]()

2 Responses to “Formatting User-Submitted Text”
Great Post! Thanks!
By mobeamer on Mar 3, 2008
i see that most problems that come from sql attacks are because developers leave too much freedom to users. if one can make it so only type
of data accepted are numbers and letters than
filtering is quite easy, right ? by allowing many
features on your site, like uploading avatar
images or accepting anything but numbers or letters you will sooner or later run into problems. don’t do it!
By overklokan on Mar 30, 2008