Crawler
class Crawler implements Countable, IteratorAggregate
Crawler eases navigation of a list of \DOMNode objects.
Properties
protected | $uri |
Methods
No description
Returns the current URI.
Returns base href.
Removes all the nodes.
Adds HTML/XML content.
Adds an HTML content to the list of nodes.
Adds an XML content to the list of nodes.
Adds an array of \DOMNode instances to the list of nodes.
Returns the previous sibling nodes of the current selection.
Returns the attribute value of the first node of the list.
Returns the node name of the first node of the list.
Returns the node value of the first node of the list.
Returns the first node of the list as HTML.
Extracts information from the list of nodes.
Filters the list of nodes with an XPath expression.
Selects links by name or alt value for clickable images.
Selects images by alt value.
Selects a button by name or alt value for images.
Returns a Form object for the first node in the list.
Overloads a default namespace prefix to be used with XPath and CSS expressions.
No description
Converts string for XPath expressions.
No description
No description
No description
Details
add(DOMNodeList|DOMNode|array|string|null $node)
Adds a node to the current list of nodes.
This method uses the appropriate specialized add*() method based on the type of the argument.
addContent(string $content, string|null $type = null)
Adds HTML/XML content.
If the charset is not set via the content type, it is assumed to be UTF-8, or ISO-8859-1 as a fallback, which is the default charset defined by the HTTP 1.1 specification.
addHtmlContent(string $content, string $charset = 'UTF-8')
Adds an HTML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxml_use_internal_errors(true) and then, get the errors via libxml_get_errors(). Be sure to clear errors with libxml_clear_errors() afterward.
addXmlContent(string $content, string $charset = 'UTF-8', int $options = LIBXML_NONET)
Adds an XML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxml_use_internal_errors(true) and then, get the errors via libxml_get_errors(). Be sure to clear errors with libxml_clear_errors() afterward.
array
each(Closure $closure)
Calls an anonymous function on each node of the list.
The anonymous function receives the position and the node wrapped in a Crawler instance as arguments.
Example:
$crawler->filter('h1')->each(function ($node, $i) {
return $node->text();
});
Crawler
reduce(Closure $closure)
Reduces the list of nodes by calling an anonymous function.
To remove a node from the list, the anonymous function must return false.
array|Crawler
evaluate(string $xpath)
Evaluates an XPath expression.
Since an XPath expression might evaluate to either a simple type or a \DOMNodeList, this method will return either an array of simple types or a new Crawler instance.
array
extract(array $attributes)
Extracts information from the list of nodes.
You can extract attributes or/and the node value (_text).
Example:
$crawler->filter('h1 a')->extract(array('_text', 'href'));
Crawler
filterXPath(string $xpath)
Filters the list of nodes with an XPath expression.
The XPath expression is evaluated in the context of the crawler, which is considered as a fake parent of the elements inside it. This means that a child selector "div" or "./div" will match only the div elements of the current crawler, not their children.
Crawler
filter(string $selector)
Filters the list of nodes with a CSS selector.
This method only works if you have installed the CssSelector Symfony Component.
Form
form(array $values = null, string $method = null)
Returns a Form object for the first node in the list.
setDefaultNamespacePrefix(string $prefix)
Overloads a default namespace prefix to be used with XPath and CSS expressions.
static string
xpathLiteral(string $s)
Converts string for XPath expressions.
Escaped characters are: quotes (") and apostrophe (').
Examples:
echo Crawler::xpathLiteral('foo " bar');
//prints 'foo " bar'
echo Crawler::xpathLiteral("foo ' bar");
//prints "foo ' bar"
echo Crawler::xpathLiteral('a\'b"c');
//prints concat('a', "'", 'b"c')