Crawler
class Crawler implements Countable, IteratorAggregate
Crawler eases navigation of a list of \DOMElement objects.
Methods
__construct(mixed $node = null, string $currentUri = null, string $baseHref = null) Constructor. | ||
clear() Removes all the nodes. | ||
add(DOMNodeList|DOMNode|array|string|null $node) Adds a node to the current list of nodes. | ||
addContent(string $content, null|string $type = null) Adds HTML/XML content. | ||
addHtmlContent(string $content, string $charset = 'UTF-8') Adds an HTML content to the list of nodes. | ||
addXmlContent(string $content, string $charset = 'UTF-8') Adds an XML content to the list of nodes. | ||
addDocument(DOMDocument $dom) Adds a \DOMDocument to the list of nodes. | ||
addNodeList(DOMNodeList $nodes) Adds a \DOMNodeList to the list of nodes. | ||
addNodes(array $nodes) Adds an array of \DOMNode instances to the list of nodes. | ||
addNode(DOMNode $node) Adds a \DOMNode instance to the list of nodes. | ||
Crawler | eq(int $position) Returns a node given its position in the node list. | |
array | each(Closure $closure) Calls an anonymous function on each node of the list. | |
Crawler | slice(int $offset, int $length = null) Slices the list of nodes by $offset and $length. | |
Crawler | reduce(Closure $closure) Reduces the list of nodes by calling an anonymous function. | |
Crawler | first() Returns the first node of the current selection. | |
Crawler | last() Returns the last node of the current selection. | |
Crawler | siblings() Returns the siblings nodes of the current selection. | |
Crawler | nextAll() Returns the next siblings nodes of the current selection. | |
Crawler | previousAll() Returns the previous sibling nodes of the current selection. | |
Crawler | parents() Returns the parents nodes of the current selection. | |
Crawler | children() Returns the children nodes of the current selection. | |
string|null | attr(string $attribute) Returns the attribute value of the first node of the list. | |
string | nodeName() Returns the node name of the first node of the list. | |
string | text() Returns the node value of the first node of the list. | |
string | html() Returns the first node of the list as HTML. | |
array | extract(array $attributes) Extracts information from the list of nodes. | |
Crawler | filterXPath(string $xpath) Filters the list of nodes with an XPath expression. | |
Crawler | filter(string $selector) Filters the list of nodes with a CSS selector. | |
Crawler | selectLink(string $value) Selects links by name or alt value for clickable images. | |
Crawler | selectButton(string $value) Selects a button by name or alt value for images. | |
Link | link(string $method = 'get') Returns a Link object for the first node in the list. | |
Link[] | links() Returns an array of Link objects for the nodes in the list. | |
Form | form(array $values = null, string $method = null) Returns a Form object for the first node in the list. | |
setDefaultNamespacePrefix(string $prefix) Overloads a default namespace prefix to be used with XPath and CSS expressions. | ||
registerNamespace(string $prefix, string $namespace) | ||
static string | xpathLiteral(string $s) Converts string for XPath expressions. | |
DOMElement|null | getNode(int $position) | |
int | count() | |
ArrayIterator | getIterator() |
Details
__construct(mixed $node = null, string $currentUri = null, string $baseHref = null)
Constructor.
Parameters
mixed | $node | A Node to use as the base for the crawling |
string | $currentUri | The current URI |
string | $baseHref | The base href value |
clear()
Removes all the nodes.
add(DOMNodeList|DOMNode|array|string|null $node)
Adds a node to the current list of nodes.
This method uses the appropriate specialized add*() method based on the type of the argument.
Parameters
DOMNodeList|DOMNode|array|string|null | $node | A node |
Exceptions
InvalidArgumentException | When node is not the expected type. |
addContent(string $content, null|string $type = null)
Adds HTML/XML content.
If the charset is not set via the content type, it is assumed to be ISO-8859-1, which is the default charset defined by the HTTP 1.1 specification.
Parameters
string | $content | A string to parse as HTML/XML |
null|string | $type | The content type of the string |
addHtmlContent(string $content, string $charset = 'UTF-8')
Adds an HTML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.
Parameters
string | $content | The HTML content |
string | $charset | The charset |
addXmlContent(string $content, string $charset = 'UTF-8')
Adds an XML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.
Parameters
string | $content | The XML content |
string | $charset | The charset |
addDocument(DOMDocument $dom)
Adds a \DOMDocument to the list of nodes.
Parameters
DOMDocument | $dom | A \DOMDocument instance |
addNodeList(DOMNodeList $nodes)
Adds a \DOMNodeList to the list of nodes.
Parameters
DOMNodeList | $nodes | A \DOMNodeList instance |
addNodes(array $nodes)
Adds an array of \DOMNode instances to the list of nodes.
Parameters
array | $nodes | An array of \DOMNode instances |
addNode(DOMNode $node)
Adds a \DOMNode instance to the list of nodes.
Parameters
DOMNode | $node | A \DOMNode instance |
Crawler eq(int $position)
Returns a node given its position in the node list.
Parameters
int | $position | The position |
Return Value
Crawler | A new instance of the Crawler with the selected node, or an empty Crawler if it does not exist. |
array each(Closure $closure)
Calls an anonymous function on each node of the list.
The anonymous function receives the position and the node wrapped in a Crawler instance as arguments.
Example:
$crawler->filter('h1')->each(function ($node, $i) {
return $node->text();
});
Parameters
Closure | $closure | An anonymous function |
Return Value
array | An array of values returned by the anonymous function |
Crawler slice(int $offset, int $length = null)
Slices the list of nodes by $offset and $length.
Parameters
int | $offset | |
int | $length |
Return Value
Crawler | A Crawler instance with the sliced nodes |
Crawler reduce(Closure $closure)
Reduces the list of nodes by calling an anonymous function.
To remove a node from the list, the anonymous function must return false.
Parameters
Closure | $closure | An anonymous function |
Return Value
Crawler | A Crawler instance with the selected nodes. |
Crawler first()
Returns the first node of the current selection.
Return Value
Crawler | A Crawler instance with the first selected node |
Crawler last()
Returns the last node of the current selection.
Return Value
Crawler | A Crawler instance with the last selected node |
Crawler siblings()
Returns the siblings nodes of the current selection.
Return Value
Crawler | A Crawler instance with the sibling nodes |
Exceptions
InvalidArgumentException | When current node is empty |
Crawler nextAll()
Returns the next siblings nodes of the current selection.
Return Value
Crawler | A Crawler instance with the next sibling nodes |
Exceptions
InvalidArgumentException | When current node is empty |
Crawler previousAll()
Returns the previous sibling nodes of the current selection.
Return Value
Crawler | A Crawler instance with the previous sibling nodes |
Exceptions
InvalidArgumentException |
Crawler parents()
Returns the parents nodes of the current selection.
Return Value
Crawler | A Crawler instance with the parents nodes of the current selection |
Exceptions
InvalidArgumentException | When current node is empty |
Crawler children()
Returns the children nodes of the current selection.
Return Value
Crawler | A Crawler instance with the children nodes |
Exceptions
InvalidArgumentException | When current node is empty |
string|null attr(string $attribute)
Returns the attribute value of the first node of the list.
Parameters
string | $attribute | The attribute name |
Return Value
string|null | The attribute value or null if the attribute does not exist |
Exceptions
InvalidArgumentException | When current node is empty |
string nodeName()
Returns the node name of the first node of the list.
Return Value
string | The node name |
Exceptions
InvalidArgumentException | When current node is empty |
string text()
Returns the node value of the first node of the list.
Return Value
string | The node value |
Exceptions
InvalidArgumentException | When current node is empty |
string html()
Returns the first node of the list as HTML.
Return Value
string | The node html |
Exceptions
InvalidArgumentException | When current node is empty |
array extract(array $attributes)
Extracts information from the list of nodes.
You can extract attributes or/and the node value (_text).
Example:
$crawler->filter('h1 a')->extract(array('_text', 'href'));
Parameters
array | $attributes | An array of attributes |
Return Value
array | An array of extracted values |
Crawler filterXPath(string $xpath)
Filters the list of nodes with an XPath expression.
The XPath expression is evaluated in the context of the crawler, which is considered as a fake parent of the elements inside it. This means that a child selector "div" or "./div" will match only the div elements of the current crawler, not their children.
Parameters
string | $xpath | An XPath expression |
Return Value
Crawler | A new instance of Crawler with the filtered list of nodes |
Crawler filter(string $selector)
Filters the list of nodes with a CSS selector.
This method only works if you have installed the CssSelector Symfony Component.
Parameters
string | $selector | A CSS selector |
Return Value
Crawler | A new instance of Crawler with the filtered list of nodes |
Exceptions
RuntimeException | if the CssSelector Component is not available |
Crawler selectLink(string $value)
Selects links by name or alt value for clickable images.
Parameters
string | $value | The link text |
Return Value
Crawler | A new instance of Crawler with the filtered list of nodes |
Crawler selectButton(string $value)
Selects a button by name or alt value for images.
Parameters
string | $value | The button text |
Return Value
Crawler | A new instance of Crawler with the filtered list of nodes |
Link link(string $method = 'get')
Returns a Link object for the first node in the list.
Parameters
string | $method | The method for the link (get by default) |
Return Value
Link | A Link instance |
Exceptions
InvalidArgumentException | If the current node list is empty |
Link[] links()
Returns an array of Link objects for the nodes in the list.
Return Value
Link[] | An array of Link instances |
Form form(array $values = null, string $method = null)
Returns a Form object for the first node in the list.
Parameters
array | $values | An array of values for the form fields |
string | $method | The method for the form |
Return Value
Form | A Form instance |
Exceptions
InvalidArgumentException | If the current node list is empty |
setDefaultNamespacePrefix(string $prefix)
Overloads a default namespace prefix to be used with XPath and CSS expressions.
Parameters
string | $prefix |
registerNamespace(string $prefix, string $namespace)
Parameters
string | $prefix | |
string | $namespace |
static string xpathLiteral(string $s)
Converts string for XPath expressions.
Escaped characters are: quotes (") and apostrophe (').
Examples:
echo Crawler::xpathLiteral('foo " bar');
//prints 'foo " bar'
echo Crawler::xpathLiteral("foo ' bar");
//prints "foo ' bar"
echo Crawler::xpathLiteral('a\'b"c');
//prints concat('a', "'", 'b"c')
Parameters
string | $s | String to be escaped |
Return Value
string | Converted string |
DOMElement|null getNode(int $position)
Parameters
int | $position |
Return Value
DOMElement|null |
int count()
Return Value
int |
ArrayIterator getIterator()
Return Value
ArrayIterator |
© 2004–2017 Fabien Potencier
Licensed under the MIT License.
http://api.symfony.com/3.0/Symfony/Component/DomCrawler/Crawler.html