
View on GitHub


Test Coverage
# Cheat sheet
- [Cache manage](#cache-manage)
  - [Cache base](#cache-base)
- [Delay manage](#delay-manage)
  - [Delay base](#delay-base)
- [Import](#import)
- [UriClient](#uriclient)
- [TxtClient](#txtclient)

## Cache manage
[Cache documentation](methods/

Note: _Most parameters available is set to their default values and is not shown. Referer to the documentation for the specific class, for a full overview._

__Example usage:__
$db = new \vipnytt\RobotsTxtParser\Database($pdo);
$cacheManage = $db->cache();
Clean the cache for unused robots.txt files
Update the cache for any active robots.txt files
Set an upper limit of bytes to parse
Set an array of custom cURL options
#### Create an [Cache base](#cache-base)

### Cache base

Note: _Most parameters available is set to their default values and is not shown. Referer to the documentation for the specific class, for a full overview._

__Example usage:__
$db = new \vipnytt\RobotsTxtParser\Database($pdo);
$cacheBase = $db->cache()->base('');
Get the RAW data from the database.
Invalidate the cache for an specific URI
#### Create an [TxtClient](#txtclient)

Create the TxtClient for parsing purposes
$client = $handler->client('');

## Delay manage

Note: _Most parameters available is set to their default values and is not shown. Referer to the documentation for the specific class, for a full overview._

__Example usage:__
$db = new \vipnytt\RobotsTxtParser\Database($pdo);
$delayManage = $db->delay();
Clean the delay storage for any outdated records
Get an list of the hosts with highest wait-time.
Get the RAW data from the database.

### Delay base

__Example usage:__
$txtClient = new \vipnytt\RobotsTxtParser\TxtClient('', 200, 'robots.txt');

$delayInterface = $txtClient->userAgent('myBot')->crawlDelay();
// or
$delayInterface = $txtClient->userAgent('myBot')->cacheDelay();
// or
$delayInterface = $txtClient->userAgent('myBot')->requestRate();

$db = new \vipnytt\RobotsTxtParser\Database($pdo);
$delayBase = $delayInterface->handle($db->delay());
Check the current request queue, returns the number of seconds of expected delay/sleep time.
Get the timestamp w/microseconds you'll have to wait until before sending the request
Reset the global queue for this host
Sleep until it's your turn to send the request

## Import

__Example usage:__
$client = new \vipnytt\RobotsTxtParser\Import($array);
Get the difference between the imported and the generated export array. Intended for debugging purposes only.

The `Import` class extends the `TxtClient`. See [TxtClient](#txtclient) for the rest of the available methods.

## UriClient

Note: _Most parameters available is set to their default values and is not shown. Referer to the documentation for the specific class, for a full overview._

__Example usage:__
$client = new \vipnytt\RobotsTxtParser\UriClient('');
Get base-URI
Get the robots.txt contents
Get the effective base-URI (after any redirects)
Get the character encoding
Get the HTTP/FTP status code
Next-update timestamp
The timestamp the robots.txt is valid until

The `UriClient` extends the `TxtClient`. See [TxtClient](#txtclient) for the rest of the available methods.

## TxtClient

Note: _Most parameters available is set to their default values and is not shown. Referer to the documentation for the specific class, for a full overview._

__Example usage:__
$client = new \vipnytt\RobotsTxtParser\TxtClient('', 200, 'robots.txt');

### `Clean-param` directive

Array of dynamic URI parameters detected in an URI.
Array of dynamic URI parameters detected in an URI. This func also includes an list of generic dynamic parameters, as well as any (optional) custom parameters.
List of dynamic URI parameters

### Export

Export all rules as an array

### Get user-agents

Get an list of all declared User-agents

### `Host` directive

Get the main host declared by the Host directive.
Get the main host declared by the Host directive. Falls back to the host of the effective URI if it isn't set
Find out whether the host of the current URI also is the preferred one

### Render

Compatibility mode. Optimized for parsing by custom 3rd party parsers, witch do not follow the standards strictly.
Compressed to a absolute minimum. Optimized for storage in ex. databases.
Normal looking robots.txt. Optimized for human readability, it's also the easiest to modify.

### `Sitemap` directive

Export an list of sitemaps

### `Allow` directive

Export an array of the directives rules
Check if the specified path is covered by this directive

### `Cache-delay` directive

Export the value of the directive
Intended for usage by an 3rd party Delay handler
Intended for usage by an 3rd party Delay handler
Get the request-delay value

### Handling of the `Cache-delay` directive

Get the size of the current request queue in seconds
Get the timestamp w/microseconds you'll have to wait until before sending the request
Reset the queue for this host
Sleep or delay the php processing until it's your turn to send the request

### `Comment` directive

Export an list of comments/messages/information that exists for the matching user-agent.
Export an list of comments/messages/information that exists for your user-agent only. Spam-filtered and is intended to be read.

### `Crawl-delay` directive

Export the value of the directive
Intended for usage by an 3rd party Delay handler
Intended for usage by an 3rd party Delay handler
Get the request-delay value

### Handling of the `Crawl-delay` directive

Get the size of the current request queue in seconds
Get the timestamp w/microseconds you'll have to wait until before sending the request
Reset the queue for this host
Sleep or delay the php processing until it's your turn to send the request

### `Disallow` directive

Export an array of the directives rules
Check if the specified path is covered by this directive

### Export

Export an array of the rules for the selected User-agent

### isAllowed

Check if an URI is allowed to crawl

### isDisallowed

Check if an URI is disallowed to crawl

### `NoIndex` directive

Export an array of the directives rules
Check if the specified path is covered by this directive

### `Request-rate` directive

Export an array of delays and their corresponding timestamps
Intended for usage by an 3rd party Delay handler
Intended for usage by an 3rd party Delay handler
Get the request-delay value

### Handling of the `Request-rate` directive

Get the size of the current request queue in seconds
Get the timestamp w/microseconds you'll have to wait until before sending the request
Reset the queue for this host
Sleep or delay the php processing until it's your turn to send the request

### `Robot-version` directive

Exports the value of the directive

### `Visit-time` directive

Export an list of visit-times in UTC
Check if it's currently visiting time