Languages: English • Validation 日本語 Русский • 中文(繁體) • (Add your language)
Untrusted data comes from many sources (users, third party sites, your own database!, ...) and all of it needs to be validated both on input and output.
The method of data sanitization depends on the type of data and the context in which it is used. Below are some common tasks in WordPress and how they should be sanitized.
Tip: It's best to do the output validation as late as possible, ideally as it's being outputted, as opposed to further up in your script. This way you can always be sure that your data is properly validated/escaped and you don't need to remember if the variable has been previously validated.
Note that many types of XML documents (as opposed to HTML documents) understand only a few named character references: apos, amp, gt, lt, quot
. When outputting text to such an XML document, be sure to filter any text containing illegal named entities through WordPress's ent2ncr( $text )
function.
wp_kses( (string) $fragment, (array) $allowed_html, (array) $protocols = null )
wp_kses()
.wp_kses_post( (string) $fragment )
for tags that are allowed in posts/pages or wp_kses_data( (string) $fragment )
for the small list of tags allowed in comments.wp_rel_nofollow( (string) $html )
wp_kses_allowed_html( (string) $context )
esc_html( $text )
(since 2.8)esc_attr
, except it applies the esc_html
filter to the output.esc_html__
(since 2.8)esc_html_e
(since 2.8)esc_textarea
(since 3.1)sanitize_text_field
(since 2.9.0)esc_attr( $text )
(since 2.8)esc_html
, except it applies the attribute_escape
filter to the output.esc_attr__()
esc_attr_e()
esc_js( $text )
(since 2.8)esc_url( $url, (array) $protocols = null )
(since 2.8)esc_url
when sanitizing URLs (in text nodes, attribute nodes or anywhere else). Rejects URLs that do not have one of the provided whitelisted protocols (defaulting to http, https, ftp, ftps, mailto, news, irc, gopher, nntp, feed, and telnet), eliminates invalid characters, and removes dangerous characters. Replaces clean_url()
which was deprecated in 3.0.esc_url_raw( $url, (array) $protocols = null )
(since 2.8)clean_url
function by setting $context
to db
.urlencode( $scalar )
urlencode_deep( $array )
$wpdb->insert( $table, (array) $data )
$data
should be unescaped (the function will escape them for you). Keys are columns, Values are values.$wpdb->update( $table, (array) $data, (array) $where )
$data
should be unescaped. Keys are columns, Values are values. $where
should be unescaped. Multiple WHERE
conditions are AND
ed together.$wpdb->update( 'my_table', array( 'status' => $untrusted_status, 'title' => $untrusted_title ), array( 'id' => 123 ) );
$wpdb->prepare( $format, (scalar) $value1, (scalar) $value2, ... )
$format
is a sprintf() like format string. It only understands %s
, %d
and %f
, none of which need to be enclosed in quotation marks.$wpdb->get_var( $wpdb->prepare( "SELECT something FROM table WHERE foo = %s and status = %d", $name, // an unescaped string (function will do the sanitization for you) $status // an untrusted integer (function will do the sanitization for you) ) );
esc_sql( $sql )
addslashes(). $wpdb->prepare
is generally preferred because it corrects a few common formatting errors.$wpdb->escape( $text )
$wpdb->escape_by_ref( &$text )
$wpdb->esc_like( $text )
$text
for use in a LIKE expression of a SQL query. Will still need to be SQL escaped (with one of the above functions).like_escape( $string )
validate_file( (string) $filename, (array) $allowed_files = "" )
$filename
represents a valid relative path. After validating, you must treat $filename
as a relative path (i.e. you must prepend it with an absolute path), since something like /etc/hosts will validate with this function. Returns an integer greater than zero if the given path contains .., ./, or :, or is not in the $allowed_files
whitelist. Be careful making boolean interpretations of the result, since false (0) indicates the filename has passed validation, whereas true (> 0) indicates failure.Header splitting attacks are annoying since they are dependent on the HTTP client. WordPress has little need to include user generated content in HTTP headers, but when it does, WordPress typically uses whitelisting for most of its HTTP headers.
WordPress does use user generated content in HTTP Location headers, and provides sanitization for those.
wp_redirect($location, $status = 302)
wp_safe_redirect($location, $status = 302)
Many of the functions above in #Output_Sanitization are useful for input validation. In addition, WordPress uses the following functions.
sanitize_title( $title )
sanitize_user( $username, $strict = false )
$strict
when creating a new user (though you should use the API for that).balanceTags( $html )
or force_balance_tags( $html )
tag_escape( $html_tag_name )
sanitize_html_class( $class, $fallback )
is_email( $email_address )
array_map( 'absint', $array )
Some other functions that may be useful to sanitize data input:
There are several different philosophies about how validation should be done. Each is appropriate for different scenarios.
Accept data only from a finite list of known and trusted values.
When comparing untrusted data against the whitelist, it's important to make sure that strict type checking is used. Otherwise an attacker could craft input in a way that will pass the whitelist but still have a malicious effect.
$untrusted_input = '1 malicious string'; // will evaluate to integer 1 during loose comparisons
if ( 1 === $untrusted_input ) { // == would have evaluated to true, but === evaluates to false
echo '<p>Valid data';
} else {
wp_die( 'Invalid data' );
}
$untrusted_input = '1 malicious string'; // will evaluate to integer 1 during loose comparisons
$safe_values = array( 1, 5, 7 );
if ( in_array( $untrusted_input, $safe_values, true ) ) { // `true` enables strict type checking
echo '<p>Valid data';
} else {
wp_die( 'Invalid data' );
}
$untrusted_input = '1 malicious string'; // will evaluate to integer 1 during loose comparisons
switch ( true ) {
case 1 === $untrusted_input: // do your own strict comparison instead of relying on switch()'s loose comparison
echo '<p>Valid data';
break;
default:
wp_die( 'Invalid data' );
}
Reject data from finite list of known untrusted values. This is very rarely a good idea.
Test to see if the data is of the correct format. Only accept it if it is.
if ( ! ctype_alnum( $data ) ) { wp_die( "Invalid format" ); } if ( preg_match( "/[^0-9.-]/", $data ) ) { wp_die( "Invalid format" ); }
Accept most any data, but remove or alter the dangerous pieces.
$trusted_integer = (int) $untrusted_integer; $trusted_alpha = preg_replace( '/[^a-z]/i', "", $untrusted_alpha ); $trusted_slug = sanitize_title( $untrusted_slug );
esc_textarea
. (#15454)clean_url()
in favor of esc_url()
and esc_url_raw()
. (#12309)sanitize_url()
-> esc_url_raw()
wp_specialchars()
-> esc_html()
(also: esc_html__()
and esc_html_e()
)attribute_escape()
-> esc_attr()
(also: esc_attr__()
and esc_attr_e()
)