Escape Output

Another cornerstone of web application security is the practice of escaping outputescaping or encoding special characters so that their original meaning is preserved. For example, O'Reilly is represented as O\'Reilly when being sent to a MySQL database. The backslash before the apostrophe is there to preserve itthe apostrophe is part of the data and not meant to be interpreted by the database.

As with filtering input, when I refer to escaping output , I am really describing three different steps:

  • Identifying output

  • Escaping output

  • Distinguishing between escaped and unescaped data

To escape output, you must first identify output. In general, this is much easier than identifying input because it relies on an action that you take. For example, to identify output being sent to the client, you can search for strings such as the following in your code:

  • echo

  • print

  • printf

  • <?=

As the developer of an application, you should be aware of every case in which you send data to a remote system. These cases all constitute output.

Like filtering, escaping is a process that is unique for each situation. Whereas filtering is unique according to the type of data you're filtering, escaping is unique according to the type of system to which you're sending data.

For most common destinations (including the client, databases, and URLs), there is a native escaping function that you can use. If you must write your own, it is important to be exhaustive. Find a reliable and complete list of every special character in the remote system and the proper way to represent each character so that it is preserved rather than interpreted.

The most common destination is the client, and htmlentities( ) is the best escaping function for escaping data to be sent to the client. Like most string functions, it takes a string and returns the modified version of the string. However, the best way to use htmlentities( ) is to specify the two optional argumentsthe quote style (the second argument) and the character set (the third argument). The quote style should always be ENT_QUOTES in order for the escaping to be most exhaustive, and the character set should match the character set indicated in the Content-Type header that your application includes in each response.

To distinguish between escaped and unescaped data, I advocate the use of a naming convention. For data to be sent to the client, the convention I use is to store all data escaped with htmlentities( ) in $html, an array that is initialized to an empty array and contains only data that has been both filtered and escaped:

<?php


$html = array( );


$html['username'] = htmlentities($clean['username'],

ENT_QUOTES, 'UTF-8');


echo "<p>Welcome back, {$html['username']}.</p>";


?>

0 komentar:

top