TYPO3
7.6
|
Public Member Functions | |
splitIntoBlock ($tag, $content, $eliminateExtraEndTags=false) | |
splitIntoBlockRecursiveProc ($tag, $content, &$procObj, $callBackContent, $callBackTags, $level=0) | |
splitTags ($tag, $content) | |
getAllParts ($parts, $tag_parts=true, $include_tag=true) | |
removeFirstAndLastTag ($str) | |
getFirstTag ($str) | |
getFirstTagName ($str, $preserveCase=false) | |
get_tag_attributes ($tag, $deHSC=0) | |
split_tag_attributes ($tag) | |
checkTagTypeCounts ($content, $blockTags= 'a, b, blockquote, body, div, em, font, form, h1, h2, h3, h4, h5, h6, i, li, map, ol, option, p, pre, select, span, strong, table, td, textarea, tr, u, ul', $soloTags= 'br, hr, img, input, area') | |
bidir_htmlspecialchars ($value, $dir) | |
prefixResourcePath ($main_prefix, $content, $alternatives=array(), $suffix= '') | |
prefixRelPath ($prefix, $srcVal, $suffix= '') | |
cleanFontTags ($value, $keepFace=0, $keepSize=0, $keepColor=0) | |
mapTags ($value, $tags=array(), $ltChar= '<', $ltChar2= '<') | |
unprotectTags ($content, $tagList= '') | |
caseShift ($str, $flag, $cacheKey= '') | |
compileTagAttribs ($tagAttrib, $meta=array(), $xhtmlClean=0) | |
get_tag_attributes_classic ($tag, $deHSC=0) | |
indentLines ($content, $number=1, $indentChar=TAB) | |
HTMLparserConfig ($TSconfig, $keepTags=array()) | |
XHTML_clean ($content) | |
processTag ($value, $conf, $endTag, $protected=0) | |
processContent ($value, $dir, $conf) | |
stripEmptyTags ($content, $tagList=null, $treatNonBreakingSpaceAsEmpty=false) | |
Static Public Member Functions | |
static | getSubpart ($content, $marker) |
static | substituteSubpart ($content, $marker, $subpartContent, $recursive=true, $keepMarker=false) |
static | substituteSubpartArray ($content, array $subpartsContent) |
static | substituteMarker ($content, $marker, $markContent) |
static | substituteMarkerArray ($content, $markContentArray, $wrap= '', $uppercase=false, $deleteUnused=false) |
static | substituteMarkerAndSubpartArrayRecursive ($content, array $markersAndSubparts, $wrap= '', $uppercase=false, $deleteUnused=false) |
Public Attributes | |
const | VOID_ELEMENTS = 'area|base|br|col|command|embed|hr|img|input|keygen|meta|param|source|track|wbr' |
Protected Member Functions | |
stripEmptyTagsIfConfigured ($value, $configuration) | |
Protected Attributes | |
$caseShift_cache = array() | |
Functions for parsing HTML. You are encouraged to use this class in your own applications
Definition at line 24 of file HtmlParser.php.
bidir_htmlspecialchars | ( | $value, | |
$dir | |||
) |
Converts htmlspecialchars forth ($dir=1) AND back ($dir=-1)
string | $value | Input value |
int | $dir | Direction: forth ($dir=1, dir=2 for preserving entities) AND back ($dir=-1) |
Definition at line 882 of file HtmlParser.php.
References elseif.
caseShift | ( | $str, | |
$flag, | |||
$cacheKey = '' |
|||
) |
Internal function for case shifting of a string or whole array
mixed | $str | Input string/array |
bool | $flag | If $str is a string AND this boolean(caseSensitive) is FALSE, the string is returned in uppercase |
string | $cacheKey | Key string used for internal caching of the results. Could be an MD5 hash of the serialized version of the input $str if that is an array. |
Definition at line 1116 of file HtmlParser.php.
References elseif.
checkTagTypeCounts | ( | $content, | |
$blockTags = 'a , |
|||
b | , | ||
blockquote | , | ||
body | , | ||
div | , | ||
em | , | ||
font | , | ||
form | , | ||
h1 | , | ||
h2 | , | ||
h3 | , | ||
h4 | , | ||
h5 | , | ||
h6 | , | ||
i | , | ||
li | , | ||
map | , | ||
ol | , | ||
option | , | ||
p | , | ||
pre | , | ||
select | , | ||
span | , | ||
strong | , | ||
table | , | ||
td | , | ||
textarea | , | ||
tr | , | ||
u | , | ||
ul' | , | ||
$soloTags = 'br , |
|||
hr | , | ||
img | , | ||
input | , | ||
area' | |||
) |
Checks whether block/solo tags are found in the correct amounts in HTML content Block tags are tags which are required to have an equal amount of start and end tags, eg. "<table>...</table>" Solo tags are tags which are required to have ONLY start tags (possibly with an XHTML ending like ".../>") NOTICE: Correct XHTML might actually fail since "<br></br>" is allowed as well as "<br/>". However only the LATTER is accepted by this function (with "br" in the "solo-tag" list), the first example will result in a warning. NOTICE: Correct XHTML might actually fail since "<p/>" is allowed as well as "<p></p>". However only the LATTER is accepted by this function (with "p" in the "block-tag" list), the first example will result in an ERROR! NOTICE: Correct HTML version "something" allows eg.
and to be NON-ended (implicitly ended by other tags). However this is NOT accepted by this function (with "p" and "li" in the block-tag list) and it will result in an ERROR!
string | $content | HTML content to analyze |
string | $blockTags | Tag names for block tags (eg. table or div or p) in lowercase, commalist (eg. "table,div,p") |
string | $soloTags | Tag names for solo tags (eg. img, br or input) in lowercase, commalist ("img,br,input") |
Definition at line 490 of file HtmlParser.php.
cleanFontTags | ( | $value, | |
$keepFace = 0 , |
|||
$keepSize = 0 , |
|||
$keepColor = 0 |
|||
) |
Cleans up the input $value for fonttags. If keepFace,-Size and -Color is set then font-tags with an allowed property is kept. Else deleted.
string | HTML content with font-tags inside to clean up. |
bool | If set, keep "face" attribute |
bool | If set, keep "size" attribute |
bool | If set, keep "color" attribute |
Definition at line 1023 of file HtmlParser.php.
compileTagAttribs | ( | $tagAttrib, | |
$meta = array() , |
|||
$xhtmlClean = 0 |
|||
) |
Compiling an array with tag attributes into a string
array | $tagAttrib | Tag attributes |
array | $meta | Meta information about these attributes (like if they were quoted) |
bool | $xhtmlClean | If set, then the attribute names will be set in lower case, value quotes in double-quotes and the value will be htmlspecialchar()'ed |
Definition at line 1148 of file HtmlParser.php.
Referenced by RteHtmlParser\divideIntoLines().
get_tag_attributes | ( | $tag, | |
$deHSC = 0 |
|||
) |
Returns an array with all attributes as keys. Attributes are only lowercase a-z If an attribute is empty (shorthand), then the value for the key is empty. You can check if it existed with isset()
string | $tag | Tag: $tag is either a whole tag (eg '<TAG option="" attrib="VALUE">') or the parameterlist (ex ' OPTION ATTRIB=VALUE>') |
bool | $deHSC | If set, the attribute values are de-htmlspecialchar'ed. Should actually always be set! |
Definition at line 408 of file HtmlParser.php.
Referenced by RteHtmlParser\divideIntoLines().
get_tag_attributes_classic | ( | $tag, | |
$deHSC = 0 |
|||
) |
Get tag attributes, the classic version (which had some limitations?)
string | $tag | The tag |
bool | $deHSC | De-htmlspecialchar flag. |
Definition at line 1177 of file HtmlParser.php.
Referenced by RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_images_rte(), RteHtmlParser\TS_links_db(), RteHtmlParser\TS_preserve_db(), RteHtmlParser\TS_reglinks(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().
getAllParts | ( | $parts, | |
$tag_parts = true , |
|||
$include_tag = true |
|||
) |
Returns an array with either tag or non-tag content of the result from ->splitIntoBlock()/->splitTags()
array | $parts | Parts generated by ->splitIntoBlock() or >splitTags() |
bool | $tag_parts | Whether to return the tag-parts (default,TRUE) or what was outside the tags. |
bool | $include_tag | Whether to include the tags in the tag-parts (most useful for input made by ->splitIntoBlock()) |
Definition at line 335 of file HtmlParser.php.
Referenced by RteHtmlParser\removeTables().
getFirstTag | ( | $str | ) |
Returns the first tag in $str Actually everything from the beginning of the $str is returned, so you better make sure the tag is the first thing...
string | $str | HTML string with tags |
Definition at line 373 of file HtmlParser.php.
Referenced by RteHtmlParser\divideIntoLines(), HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_links_db(), RteHtmlParser\TS_links_rte(), RteHtmlParser\TS_preserve_db(), RteHtmlParser\TS_preserve_rte(), RteHtmlParser\TS_reglinks(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().
getFirstTagName | ( | $str, | |
$preserveCase = false |
|||
) |
Returns the NAME of the first tag in $str
string | $str | HTML tag (The element name MUST be separated from the attributes by a space character! Just whitespace will not do) |
bool | $preserveCase | If set, then the tag is NOT converted to uppercase by case is preserved. |
Definition at line 388 of file HtmlParser.php.
Referenced by RteHtmlParser\divideIntoLines(), HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\TS_preserve_db(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().
|
static |
Returns the first subpart encapsulated in the marker, $marker (possibly present in $content as a HTML comment)
string | $content | Content with subpart wrapped in fx. "###CONTENT_PART###" inside. |
string | $marker | Marker string, eg. "###CONTENT_PART### |
Definition at line 44 of file HtmlParser.php.
References GeneralUtility\logDeprecatedFunction(), and GeneralUtility\makeInstance().
HTMLparserConfig | ( | $TSconfig, | |
$keepTags = array() |
|||
) |
Converts TSconfig into an array for the HTMLcleaner function.
array | $TSconfig | TSconfig for HTMLcleaner |
array | $keepTags | Array of tags to keep (?) |
Definition at line 1210 of file HtmlParser.php.
Referenced by RteHtmlParser\getKeepTags(), and RteHtmlParser\RTE_transform().
indentLines | ( | $content, | |
$number = 1 , |
|||
$indentChar = TAB |
|||
) |
Indents input content with $number instances of $indentChar
string | $content | Content string, multiple lines. |
int | $number | Number of indents |
string | $indentChar | Indent character/string |
Definition at line 1191 of file HtmlParser.php.
mapTags | ( | $value, | |
$tags = array() , |
|||
$ltChar = '<' , |
|||
$ltChar2 = '<' |
|||
) |
This is used to map certain tag-names into other names.
string | $value | HTML content |
array | $tags | Array with tag key=>value pairs where key is from-tag and value is to-tag |
string | $ltChar | Alternative less-than char to search for (search regex string) |
string | $ltChar2 | Alternative less-than char to replace with (replace regex string) |
Definition at line 1061 of file HtmlParser.php.
Referenced by RteHtmlParser\defaultTStagMapping().
prefixRelPath | ( | $prefix, | |
$srcVal, | |||
$suffix = '' |
|||
) |
Internal sub-function for ->prefixResourcePath()
string | $prefix | Prefix string |
string | $srcVal | Relative path/URL |
string | $suffix | Suffix string |
Definition at line 999 of file HtmlParser.php.
prefixResourcePath | ( | $main_prefix, | |
$content, | |||
$alternatives = array() , |
|||
$suffix = '' |
|||
) |
Prefixes the relative paths of hrefs/src/action in the tags [td,table,body,img,input,form,link,script,a] in the $content with the $main_prefix or and alternative given by $alternatives
string | $main_prefix | Prefix string |
string | $content | HTML content |
array | $alternatives | Array with alternative prefixes for certain of the tags. key=>value pairs where the keys are the tag element names in uppercase |
string | $suffix | Suffix string (put after the resource). |
Definition at line 904 of file HtmlParser.php.
processContent | ( | $value, | |
$dir, | |||
$conf | |||
) |
Processing content between tags for HTML_cleaner
string | $value | The value |
int | $dir | Direction, either -1 or +1. 0 (zero) means no change to input value. |
mixed | $conf | Not used, ignore. |
Definition at line 1434 of file HtmlParser.php.
processTag | ( | $value, | |
$conf, | |||
$endTag, | |||
$protected = 0 |
|||
) |
Processing all tags themselves (Some additions by Sacha Vorbeck)
string | Tag to process |
array | Configuration array passing instructions for processing. If count()==0, function will return value unprocessed. See source code for details |
bool | Is endtag, then set this. |
bool | If set, just return value straight away |
Definition at line 1377 of file HtmlParser.php.
References elseif.
removeFirstAndLastTag | ( | $str | ) |
Removes the first and last tag in the string Anything before the first and after the last tags respectively is also removed
string | $str | String to process |
Definition at line 356 of file HtmlParser.php.
Referenced by RteHtmlParser\divideIntoLines(), HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_links_db(), RteHtmlParser\TS_links_rte(), RteHtmlParser\TS_preserve_db(), RteHtmlParser\TS_preserve_rte(), RteHtmlParser\TS_reglinks(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().
split_tag_attributes | ( | $tag | ) |
Returns an array with the 'components' from an attribute list. The result is normally analyzed by get_tag_attributes Removes tag-name if found
string | $tag | The tag or attributes |
Definition at line 452 of file HtmlParser.php.
splitIntoBlock | ( | $tag, | |
$content, | |||
$eliminateExtraEndTags = false |
|||
) |
Returns an array with the $content divided by tag-blocks specified with the list of tags, $tag Even numbers in the array are outside the blocks, Odd numbers are block-content. Use ->getAllParts() and ->removeFirstAndLastTag() to process the content if needed.
string | $tag | List of tags, comma separated. |
string | $content | HTML-content |
bool | $eliminateExtraEndTags | If set, excessive end tags are ignored - you should probably set this in most cases. |
Definition at line 191 of file HtmlParser.php.
References GeneralUtility\trimExplode().
Referenced by RteHtmlParser\divideIntoLines(), RteHtmlParser\removeTables(), HtmlParser\splitIntoBlockRecursiveProc(), RteHtmlParser\transformStyledATags(), RteHtmlParser\TS_AtagToAbs(), RteHtmlParser\TS_links_db(), RteHtmlParser\TS_links_rte(), RteHtmlParser\TS_preserve_db(), RteHtmlParser\TS_preserve_rte(), RteHtmlParser\TS_reglinks(), RteHtmlParser\TS_transform_db(), and RteHtmlParser\TS_transform_rte().
splitIntoBlockRecursiveProc | ( | $tag, | |
$content, | |||
& | $procObj, | ||
$callBackContent, | |||
$callBackTags, | |||
$level = 0 |
|||
) |
Splitting content into blocks recursively and processing tags/content with call back functions.
string | $tag | Tag list, see splitIntoBlock() |
string | $content | Content, see splitIntoBlock() |
object | $procObj | Object where call back methods are. |
string | $callBackContent | Name of call back method for content; "function callBackContent($str,$level) @param string $callBackTags Name of call back method for tags; "function callBackTags($tags,$level) |
int | $level | Indent level |
Definition at line 263 of file HtmlParser.php.
References HtmlParser\getFirstTag(), HtmlParser\getFirstTagName(), HtmlParser\removeFirstAndLastTag(), and HtmlParser\splitIntoBlock().
splitTags | ( | $tag, | |
$content | |||
) |
Returns an array with the $content divided by tag-blocks specified with the list of tags, $tag Even numbers in the array are outside the blocks, Odd numbers are block-content. Use ->getAllParts() and ->removeFirstAndLastTag() to process the content if needed.
string | $tag | List of tags |
string | $content | HTML-content |
Definition at line 298 of file HtmlParser.php.
Referenced by RteHtmlParser\TS_images_rte().
stripEmptyTags | ( | $content, | |
$tagList = null , |
|||
$treatNonBreakingSpaceAsEmpty = false |
|||
) |
Strips empty tags from HTML.
string | $content | The content to be stripped of empty tags |
string | $tagList | The comma separated list of tags to be stripped. If empty, all empty tags will be stripped |
bool | $treatNonBreakingSpaceAsEmpty | If TRUE tags containing only entities will be treated as empty. |
Definition at line 1451 of file HtmlParser.php.
|
protected |
Strips the configured empty tags from the HMTL code.
string | $value | |
array | $configuration |
Definition at line 1473 of file HtmlParser.php.
|
static |
Substitutes a marker string in the input content (by a simple str_replace())
string | $content | The content stream, typically HTML template content. |
string | $marker | The marker string, typically on the form "###[the marker string]### |
mixed | $markContent | The content to insert instead of the marker string found. |
Definition at line 97 of file HtmlParser.php.
References GeneralUtility\logDeprecatedFunction(), and GeneralUtility\makeInstance().
|
static |
Replaces all markers and subparts in a template with the content provided in the structured array.
The array is built like the template with its markers and subparts. Keys represent the marker name and the values the content. If the value is not an array the key will be treated as a single marker. If the value is an array the key will be treated as a subpart marker. Repeated subpart contents are of course elements in the array, so every subpart value must contain an array with its markers.
$markersAndSubparts = array ( '###SINGLEMARKER1###' => 'value 1', '###SUBPARTMARKER1###' => array( 0 => array( '###SINGLEMARKER2###' => 'value 2', ), 1 => array( '###SINGLEMARKER2###' => 'value 3', ) ), '###SUBPARTMARKER2###' => array( ), ) Subparts can be nested, so below the 'SINGLEMARKER2' it is possible to have another subpart marker with an array as the value, which in its turn contains the elements of the sub-subparts. Empty arrays for Subparts will cause the subtemplate to be cleared.
string | $content | The content stream, typically HTML template content. |
array | $markersAndSubparts | The array of single markers and subpart contents. |
string | $wrap | A wrap value - [part1] | [part2] - for the markers before substitution. |
bool | $uppercase | If set, all marker string substitution is done with upper-case markers. |
bool | $deleteUnused | If set, all unused single markers are deleted. |
Definition at line 168 of file HtmlParser.php.
References GeneralUtility\logDeprecatedFunction(), and GeneralUtility\makeInstance().
|
static |
Traverses the input $markContentArray array and for each key the marker by the same name (possibly wrapped and in upper case) will be substituted with the keys value in the array. This is very useful if you have a data-record to substitute in some content. In particular when you use the $wrap and $uppercase values to pre-process the markers. Eg. a key name like "myfield" could effectively be represented by the marker "###MYFIELD###" if the wrap value was "###|###" and the $uppercase boolean TRUE.
string | $content | The content stream, typically HTML template content. |
array | $markContentArray | The array of key/value pairs being marker/content values used in the substitution. For each element in this array the function will substitute a marker in the content stream with the content. |
string | $wrap | A wrap value - [part 1] | [part 2] - for the markers before substitution |
bool | $uppercase | If set, all marker string substitution is done with upper-case markers. |
bool | $deleteUnused | If set, all unused marker are deleted. |
Definition at line 123 of file HtmlParser.php.
References GeneralUtility\logDeprecatedFunction(), and GeneralUtility\makeInstance().
|
static |
Substitutes a subpart in $content with the content of $subpartContent.
string | $content | Content with subpart wrapped in fx. "###CONTENT_PART###" inside. |
string | $marker | Marker string, eg. "###CONTENT_PART### |
array | $subpartContent | If $subpartContent happens to be an array, it's [0] and [1] elements are wrapped around the content of the subpart (fetched by getSubpart()) |
bool | $recursive | If $recursive is set, the function calls itself with the content set to the remaining part of the content after the second marker. This means that proceding subparts are ALSO substituted! |
bool | $keepMarker | If set, the marker around the subpart is not removed, but kept in the output |
Definition at line 63 of file HtmlParser.php.
References GeneralUtility\logDeprecatedFunction(), and GeneralUtility\makeInstance().
|
static |
Substitues multiple subparts at once
string | $content | The content stream, typically HTML template content. |
array | $subpartsContent | The array of key/value pairs being subpart/content values used in the substitution. For each element in this array the function will substitute a subpart in the content stream with the content. |
Definition at line 79 of file HtmlParser.php.
References GeneralUtility\logDeprecatedFunction(), and GeneralUtility\makeInstance().
unprotectTags | ( | $content, | |
$tagList = '' |
|||
) |
This converts htmlspecialchar()'ed tags (from $tagList) back to real tags. Eg. '<strong>' would be converted back to '' if found in $tagList
string | $content | HTML content |
string | $tagList | Tag list, separated by comma. Lowercase! |
Definition at line 1076 of file HtmlParser.php.
XHTML_clean | ( | $content | ) |
Tries to convert the content to be XHTML compliant and other stuff like that. STILL EXPERIMENTAL. See comments below.
What it does NOT do (yet) according to XHTML specs.:
cannot contain img, big,small,sub,sup ...
Wrapping scripts and style element contents in CDATA - or alternatively they should have entitites converted.
Setting charsets may put some special requirements on both XML declaration/ meta-http-equiv. (C.9)
UTF-8 encoding is in fact expected by XML!!
stylesheet element and attribute names are NOT converted to lowercase
ampersands (and entities in general I think) MUST be converted to an entity reference! (). This may mean further conversion of non-tag content before output to page. May be related to the charset issue as a whole.
Minimized values not allowed: Must do this: selected="selected"
What it does at this point:
string | $content | Content to clean up |
Definition at line 1360 of file HtmlParser.php.
|
protected |
Definition at line 29 of file HtmlParser.php.
const VOID_ELEMENTS = 'area|base|br|col|command|embed|hr|img|input|keygen|meta|param|source|track|wbr' |
Definition at line 32 of file HtmlParser.php.