Crawler

class Crawler extends SplObjectStorage

Crawler eases navigation of a list of \DOMNode objects.

Methods

__construct(mixed $node = null, string $currentUri = null, string $baseHref = null)
clear()

Removes all the nodes.

add(DOMNodeList|DOMNode|array|string|null $node)

Adds a node to the current list of nodes.

addContent(string $content, null|string $type = null)

Adds HTML/XML content.

addHtmlContent(string $content, string $charset = 'UTF-8')

Adds an HTML content to the list of nodes.

addXmlContent(string $content, string $charset = 'UTF-8', int $options = LIBXML_NONET)

Adds an XML content to the list of nodes.

addDocument(DOMDocument $dom)

Adds a \DOMDocument to the list of nodes.

addNodeList(DOMNodeList $nodes)

Adds a \DOMNodeList to the list of nodes.

addNodes(array $nodes)

Adds an array of \DOMNode instances to the list of nodes.

addNode(DOMNode $node)

Adds a \DOMNode instance to the list of nodes.

unserialize($serialized)
serialize()
Crawler eq(int $position)

Returns a node given its position in the node list.

array each(Closure $closure)

Calls an anonymous function on each node of the list.

Crawler slice(int $offset, int $length = -1)

Slices the list of nodes by $offset and $length.

Crawler reduce(Closure $closure)

Reduces the list of nodes by calling an anonymous function.

Crawler first()

Returns the first node of the current selection.

Crawler last()

Returns the last node of the current selection.

Crawler siblings()

Returns the siblings nodes of the current selection.

Crawler nextAll()

Returns the next siblings nodes of the current selection.

Crawler previousAll()

Returns the previous sibling nodes of the current selection.

Crawler parents()

Returns the parents nodes of the current selection.

Crawler children()

Returns the children nodes of the current selection.

string|null attr(string $attribute)

Returns the attribute value of the first node of the list.

string nodeName()

Returns the node name of the first node of the list.

string text()

Returns the node value of the first node of the list.

string html()

Returns the first node of the list as HTML.

array extract(array $attributes)

Extracts information from the list of nodes.

Crawler filterXPath(string $xpath)

Filters the list of nodes with an XPath expression.

Crawler filter(string $selector)

Filters the list of nodes with a CSS selector.

Crawler selectLink(string $value)

Selects links by name or alt value for clickable images.

Crawler selectButton(string $value)

Selects a button by name or alt value for images.

Link link(string $method = 'get')

Returns a Link object for the first node in the list.

Link[] links()

Returns an array of Link objects for the nodes in the list.

Form form(array $values = null, string $method = null)

Returns a Form object for the first node in the list.

setDefaultNamespacePrefix(string $prefix)

Overloads a default namespace prefix to be used with XPath and CSS expressions.

registerNamespace(string $prefix, string $namespace)
static string xpathLiteral(string $s)

Converts string for XPath expressions.

attach($object, $data = null) deprecated
detach($object) deprecated
contains($object) deprecated
addAll($storage) deprecated
removeAll($storage) deprecated
removeAllExcept($storage) deprecated
getInfo() deprecated
setInfo($data) deprecated
offsetExists($object) deprecated
offsetSet($object, $data = null) deprecated
offsetUnset($object) deprecated
offsetGet($object) deprecated
DOMElement|null getNode(int $position)

Details

__construct(mixed $node = null, string $currentUri = null, string $baseHref = null)

Parameters

mixed $node A Node to use as the base for the crawling
string $currentUri The current URI
string $baseHref The base href value

clear()

Removes all the nodes.

add(DOMNodeList|DOMNode|array|string|null $node)

Adds a node to the current list of nodes.

This method uses the appropriate specialized add*() method based on the type of the argument.

Parameters

DOMNodeList|DOMNode|array|string|null $node A node

Exceptions

InvalidArgumentException when node is not the expected type

addContent(string $content, null|string $type = null)

Adds HTML/XML content.

If the charset is not set via the content type, it is assumed to be ISO-8859-1, which is the default charset defined by the HTTP 1.1 specification.

Parameters

string $content A string to parse as HTML/XML
null|string $type The content type of the string

addHtmlContent(string $content, string $charset = 'UTF-8')

Adds an HTML content to the list of nodes.

The libxml errors are disabled when the content is parsed.

If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.

Parameters

string $content The HTML content
string $charset The charset

addXmlContent(string $content, string $charset = 'UTF-8', int $options = LIBXML_NONET)

Adds an XML content to the list of nodes.

The libxml errors are disabled when the content is parsed.

If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.

Parameters

string $content The XML content
string $charset The charset
int $options Bitwise OR of the libxml option constants LIBXML_PARSEHUGE is dangerous, see http://symfony.com/blog/security-release-symfony-2-0-17-released

addDocument(DOMDocument $dom)

Adds a \DOMDocument to the list of nodes.

Parameters

DOMDocument $dom A \DOMDocument instance

addNodeList(DOMNodeList $nodes)

Adds a \DOMNodeList to the list of nodes.

Parameters

DOMNodeList $nodes A \DOMNodeList instance

addNodes(array $nodes)

Adds an array of \DOMNode instances to the list of nodes.

Parameters

array $nodes An array of \DOMNode instances

addNode(DOMNode $node)

Adds a \DOMNode instance to the list of nodes.

Parameters

DOMNode $node A \DOMNode instance

unserialize($serialized)

Parameters

$serialized

serialize()

Crawler eq(int $position)

Returns a node given its position in the node list.

Parameters

int $position The position

Return Value

Crawler

array each(Closure $closure)

Calls an anonymous function on each node of the list.

The anonymous function receives the position and the node wrapped in a Crawler instance as arguments.

Example:

$crawler->filter('h1')->each(function ($node, $i) {
    return $node->text();
});

Parameters

Closure $closure An anonymous function

Return Value

array An array of values returned by the anonymous function

Crawler slice(int $offset, int $length = -1)

Slices the list of nodes by $offset and $length.

Parameters

int $offset
int $length

Return Value

Crawler

Crawler reduce(Closure $closure)

Reduces the list of nodes by calling an anonymous function.

To remove a node from the list, the anonymous function must return false.

Parameters

Closure $closure An anonymous function

Return Value

Crawler

Crawler first()

Returns the first node of the current selection.

Return Value

Crawler

Crawler last()

Returns the last node of the current selection.

Return Value

Crawler

Crawler siblings()

Returns the siblings nodes of the current selection.

Return Value

Crawler

Exceptions

InvalidArgumentException When current node is empty

Crawler nextAll()

Returns the next siblings nodes of the current selection.

Return Value

Crawler

Exceptions

InvalidArgumentException When current node is empty

Crawler previousAll()

Returns the previous sibling nodes of the current selection.

Return Value

Crawler

Exceptions

InvalidArgumentException

Crawler parents()

Returns the parents nodes of the current selection.

Return Value

Crawler

Exceptions

InvalidArgumentException When current node is empty

Crawler children()

Returns the children nodes of the current selection.

Return Value

Crawler

Exceptions

InvalidArgumentException When current node is empty

string|null attr(string $attribute)

Returns the attribute value of the first node of the list.

Parameters

string $attribute The attribute name

Return Value

string|null The attribute value or null if the attribute does not exist

Exceptions

InvalidArgumentException When current node is empty

string nodeName()

Returns the node name of the first node of the list.

Return Value

string The node name

Exceptions

InvalidArgumentException When current node is empty

string text()

Returns the node value of the first node of the list.

Return Value

string The node value

Exceptions

InvalidArgumentException When current node is empty

string html()

Returns the first node of the list as HTML.

Return Value

string The node html

Exceptions

InvalidArgumentException When current node is empty

array extract(array $attributes)

Extracts information from the list of nodes.

You can extract attributes or/and the node value (_text).

Example:

$crawler->filter('h1 a')->extract(array('_text', 'href'));

Parameters

array $attributes An array of attributes

Return Value

array An array of extracted values

Crawler filterXPath(string $xpath)

Filters the list of nodes with an XPath expression.

The XPath expression is evaluated in the context of the crawler, which is considered as a fake parent of the elements inside it. This means that a child selector "div" or "./div" will match only the div elements of the current crawler, not their children.

Parameters

string $xpath An XPath expression

Return Value

Crawler

Crawler filter(string $selector)

Filters the list of nodes with a CSS selector.

This method only works if you have installed the CssSelector Symfony Component.

Parameters

string $selector A CSS selector

Return Value

Crawler

Exceptions

RuntimeException if the CssSelector Component is not available

Selects links by name or alt value for clickable images.

Parameters

string $value The link text

Return Value

Crawler

Crawler selectButton(string $value)

Selects a button by name or alt value for images.

Parameters

string $value The button text

Return Value

Crawler

Returns a Link object for the first node in the list.

Parameters

string $method The method for the link (get by default)

Return Value

Link A Link instance

Exceptions

InvalidArgumentException If the current node list is empty or the selected node is not instance of DOMElement

Returns an array of Link objects for the nodes in the list.

Return Value

Link[] An array of Link instances

Exceptions

InvalidArgumentException If the current node list contains non-DOMElement instances

Form form(array $values = null, string $method = null)

Returns a Form object for the first node in the list.

Parameters

array $values An array of values for the form fields
string $method The method for the form

Return Value

Form A Form instance

Exceptions

InvalidArgumentException If the current node list is empty or the selected node is not instance of DOMElement

setDefaultNamespacePrefix(string $prefix)

Overloads a default namespace prefix to be used with XPath and CSS expressions.

Parameters

string $prefix

registerNamespace(string $prefix, string $namespace)

Parameters

string $prefix
string $namespace

static string xpathLiteral(string $s)

Converts string for XPath expressions.

Escaped characters are: quotes (") and apostrophe (').

Examples: echo Crawler::xpathLiteral('foo " bar'); //prints 'foo " bar'

echo Crawler::xpathLiteral("foo ' bar");
//prints "foo ' bar"

echo Crawler::xpathLiteral('a\'b"c');
//prints concat('a', "'", 'b"c')

Parameters

string $s String to be escaped

Return Value

string Converted string

attach($object, $data = null) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$object
$data

detach($object) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$object

contains($object) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$object

addAll($storage) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$storage

removeAll($storage) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$storage

removeAllExcept($storage) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$storage

getInfo() deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

setInfo($data) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$data

offsetExists($object) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$object

offsetSet($object, $data = null) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$object
$data

offsetUnset($object) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$object

offsetGet($object) deprecated

deprecated

Using the SplObjectStorage API on the Crawler is deprecated as of 2.8 and will be removed in 3.0.

Parameters

$object

DOMElement|null getNode(int $position)

Parameters

int $position

Return Value

DOMElement|null

© 2004–2017 Fabien Potencier
Licensed under the MIT License.
http://api.symfony.com/2.8/Symfony/Component/DomCrawler/Crawler.html