File RobotsTxtParser.php
has 353 lines of code (exceeds 250 allowed). Consider refactoring.
<?php declare(strict_types=1);
namespace t1gor\RobotsTxtParser;
use Psr\Log\LoggerAwareInterface;
The class RobotsTxtParser has an overall complexity of 103 which is very high. The configured complexity threshold is 50.
class RobotsTxtParser implements LoggerAwareInterface {
use LogsIfAvailableTrait;
RobotsTxtParser
has 27 functions (exceeds 20 allowed). Consider refactoring.
class RobotsTxtParser implements LoggerAwareInterface {
use LogsIfAvailableTrait;
Function parseURL
has a Cognitive Complexity of 15 (exceeds 5 allowed). Consider refactoring.
protected function parseURL($url) {
$parsed = parse_url($url);
if ($parsed === false) {
return false;
} elseif (!isset($parsed['scheme']) || !$this->isValidScheme($parsed['scheme'])) {
Function render
has a Cognitive Complexity of 13 (exceeds 5 allowed). Consider refactoring.
public function render($eol = "\r\n") {
$input = $this->getRules();
krsort($input);
$output = [];
foreach ($input as $userAgent => $rules) {
Method render
has 30 lines of code (exceeds 25 allowed). Consider refactoring.
public function render($eol = "\r\n") {
$input = $this->getRules();
krsort($input);
$output = [];
foreach ($input as $userAgent => $rules) {
Function getSitemaps
has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
public function getSitemaps(?string $userAgent = null): array {
$this->buildTree();
$maps = [];
if (!is_null($userAgent)) {
Function checkRules
has a Cognitive Complexity of 10 (exceeds 5 allowed). Consider refactoring.
protected function checkRules(string $rule, string $path, string $userAgent = '*'): bool {
if ($this->checkHttpStatusCodeRule()) {
return ($rule === Directive::DISALLOW);
}
Method __construct
has 5 arguments (exceeds 4 allowed). Consider refactoring.
$content,
string $encoding = self::DEFAULT_ENCODING,
?TreeBuilderInterface $treeBuilder = null,
?ReaderInterface $reader = null,
?UserAgentMatcherInterface $userAgentMatcher = null
Function getHost
has a Cognitive Complexity of 7 (exceeds 5 allowed). Consider refactoring.
public function getHost(?string $userAgent = null) {
$this->buildTree();
if (!is_null($userAgent)) {
$userAgent = $this->userAgentMatcher->getMatching($userAgent, array_keys($this->tree));
Avoid too many return
statements within this method.
return $parsed;
The method parseURL() has a Cyclomatic Complexity of 10. The configured cyclomatic complexity threshold is 10.
protected function parseURL($url) {
$parsed = parse_url($url);
if ($parsed === false) {
return false;
} elseif (!isset($parsed['scheme']) || !$this->isValidScheme($parsed['scheme'])) {
The class RobotsTxtParser has a coupling between objects value of 15. Consider to reduce the number of dependencies under 13.
class RobotsTxtParser implements LoggerAwareInterface {
use LogsIfAvailableTrait;
Missing class import via use statement (line '236', column '13').
throw new \RuntimeException(WarmingMessages::SET_UA_DEPRECATED);
Avoid using static access to class '\t1gor\RobotsTxtParser\Parser\HostName' in method 'isValidHostName'.
return HostName::isValid($host);
The method parseURL uses an else expression. Else clauses are basically not necessary and you can simplify the code by not using them.
} else {
if (!isset($parsed['host']) || !$this->isValidHostName($parsed['host'])) {
return false;
} else {
if (!isset($parsed['port'])) {
Avoid using static access to class 't1gor\RobotsTxtParser\Directive' in method 'checkRuleSwitch'.
switch (Directive::attemptGetInline($rule)) {
The method parseURL uses an else expression. Else clauses are basically not necessary and you can simplify the code by not using them.
} else {
if (!isset($parsed['port'])) {
$parsed['port'] = getservbyname($parsed['scheme'], 'tcp');
if (!is_int($parsed['port'])) {
return false;
Avoid using static access to class 't1gor\RobotsTxtParser\Directive' in method 'checkRuleSwitch'.
if ($this->checkCleanParamRule(Directive::stripInline($rule), $path)) {
Avoid using static access to class 't1gor\RobotsTxtParser\Directive' in method 'checkRuleSwitch'.
if ($this->checkHostRule(Directive::stripInline($rule))) {
Avoid using static access to class '\t1gor\RobotsTxtParser\Stream\GeneratorBasedReader' in method '__construct'.
? GeneratorBasedReader::fromStream($content)
The method getSitemaps uses an else expression. Else clauses are basically not necessary and you can simplify the code by not using them.
} else {
foreach ($this->tree as $userAgentBased) {
if (isset($userAgentBased[Directive::SITEMAP]) && !empty($userAgentBased[Directive::SITEMAP])) {
$maps = array_merge($maps, $userAgentBased[Directive::SITEMAP]);
}
The method render uses an else expression. Else clauses are basically not necessary and you can simplify the code by not using them.
} else {
$output[] = $directive . ': ' . $value;
}
Avoid using static access to class '\t1gor\RobotsTxtParser\Parser\DirectiveProcessorsFactory' in method 'buildTree'.
DirectiveProcessorsFactory::getDefault($this->logger),
Avoid using static access to class '\t1gor\RobotsTxtParser\Stream\GeneratorBasedReader' in method '__construct'.
: GeneratorBasedReader::fromString($content);
Avoid using static access to class '\t1gor\RobotsTxtParser\Parser\Url' in method 'isValidScheme'.
return Url::isValidScheme($scheme);
Avoid unused private fields such as '$userAgent'.
private $userAgent = '*';
Avoid unused private fields such as '$content'.
private $content = '';
Avoid unused parameters such as '$userAgent'.
public function setUserAgent(string $userAgent) {
Avoid variables with short names like $b. Configured minimum length is 3.
usort($value, function ($a, $b) {
Avoid variables with short names like $a. Configured minimum length is 3.
usort($value, function ($a, $b) {
Blank line found at start of control structure
switch (Directive::attemptGetInline($rule)) {
Opening brace of a class must be on the line after the definition
class RobotsTxtParser implements LoggerAwareInterface {
Scope keyword "private" must be followed by a single space
private $content = '';
CASE statements must be defined using a colon
case Directive::HOST;
Only one argument is allowed per line in a multi-line function call
$host, [
Spaces must be used to indent lines; tabs are not allowed
private string $encoding = '';
Spaces must be used to indent lines; tabs are not allowed
?TreeBuilderInterface $treeBuilder = null,
Spaces must be used to indent lines; tabs are not allowed
) {
Spaces must be used to indent lines; tabs are not allowed
if ($this->encoding !== static::DEFAULT_ENCODING) {
Spaces must be used to indent lines; tabs are not allowed
DirectiveProcessorsFactory::getDefault($this->logger),
Spaces must be used to indent lines; tabs are not allowed
return $this->logger;
Spaces must be used to indent lines; tabs are not allowed
if ($this->reader instanceof LoggerAwareInterface) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if (!isset($parsed['host']) || !$this->isValidHostName($parsed['host'])) {
Spaces must be used to indent lines; tabs are not allowed
if (!isset($parsed['port'])) {
Spaces must be used to indent lines; tabs are not allowed
return $parsed;
Spaces must be used to indent lines; tabs are not allowed
private function explodeCleanParamRule($rule) {
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
* @param int $code
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
public function __construct(
Spaces must be used to indent lines; tabs are not allowed
$this->treeBuilder = $treeBuilder;
Spaces must be used to indent lines; tabs are not allowed
$this->reader = is_resource($content)
Spaces must be used to indent lines; tabs are not allowed
: GeneratorBasedReader::fromString($content);
Spaces must be used to indent lines; tabs are not allowed
$this->logger
Spaces must be used to indent lines; tabs are not allowed
$this->reader->setLogger($this->logger);
Spaces must be used to indent lines; tabs are not allowed
} elseif (!isset($parsed['scheme']) || !$this->isValidScheme($parsed['scheme'])) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if (!is_int($code) || $code < 100 || $code > 599) {
Spaces must be used to indent lines; tabs are not allowed
return true;
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
$this->userAgentMatcher = new UserAgentMatcher();
Spaces must be used to indent lines; tabs are not allowed
$this->reader->setEncoding($this->encoding);
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
$this->logger = $logger;
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if ($parsed === false) {
Spaces must be used to indent lines; tabs are not allowed
* Explode Clean-Param rule
Spaces must be used to indent lines; tabs are not allowed
* @return bool
Spaces must be used to indent lines; tabs are not allowed
public function setHttpStatusCode(int $code): bool {
Spaces must be used to indent lines; tabs are not allowed
$this->buildTree();
Spaces must be used to indent lines; tabs are not allowed
protected function checkRules(string $rule, string $path, string $userAgent = '*'): bool {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$userAgent = $this->userAgentMatcher->getMatching($userAgent, array_keys($this->tree));
Spaces must be used to indent lines; tabs are not allowed
private function checkHttpStatusCodeRule(): bool {
Spaces must be used to indent lines; tabs are not allowed
use LogsIfAvailableTrait;
Spaces must be used to indent lines; tabs are not allowed
private $userAgent = '*';
Spaces must be used to indent lines; tabs are not allowed
return;
Spaces must be used to indent lines; tabs are not allowed
$this->treeBuilder = new TreeBuilder(
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
$parsed['port'] = getservbyname($parsed['scheme'], 'tcp');
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$content,
Spaces must be used to indent lines; tabs are not allowed
?ReaderInterface $reader = null,
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
$this->log('UserAgentMatcher is not passed, using a default one...');
Spaces must be used to indent lines; tabs are not allowed
protected function checkRuleSwitch(string $rule, string $path): bool {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
public function setLogger(LoggerInterface $logger): void {
Spaces must be used to indent lines; tabs are not allowed
return true;
Spaces must be used to indent lines; tabs are not allowed
private static function isValidHostName(string $host): bool {
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$this->httpStatusCode = $code;
Spaces must be used to indent lines; tabs are not allowed
* @param string $userAgent - which robot to check for
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
protected $rules = [];
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
private $url = null;
Spaces must be used to indent lines; tabs are not allowed
string $encoding = self::DEFAULT_ENCODING,
Spaces must be used to indent lines; tabs are not allowed
?UserAgentMatcherInterface $userAgentMatcher = null
Spaces must be used to indent lines; tabs are not allowed
$this->encoding = $encoding;
Spaces must be used to indent lines; tabs are not allowed
public function getLogger(): ?LoggerInterface {
Spaces must be used to indent lines; tabs are not allowed
return Url::isValidScheme($scheme);
Spaces must be used to indent lines; tabs are not allowed
if (!is_int($parsed['port'])) {
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
$cleanParam['path'] = isset($array[1]) ? $this->encode_url(preg_replace('/[^A-Za-z0-9\.-\/\*\_]/', '', $array[1])) : '/*';
Spaces must be used to indent lines; tabs are not allowed
$cleanParam['param'][] = trim($key);
Spaces must be used to indent lines; tabs are not allowed
return $this->checkRules(Directive::ALLOW, $url->getPath(), $userAgent);
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
protected ?int $httpStatusCode;
Spaces must be used to indent lines; tabs are not allowed
$this->reader = $reader;
Spaces must be used to indent lines; tabs are not allowed
);
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
public function isAllowed(string $url, ?string $userAgent = '*'): bool {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @return void
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
return ($rule === Directive::DISALLOW);
Spaces must be used to indent lines; tabs are not allowed
$result = ($rule === Directive::ALLOW);
Spaces must be used to indent lines; tabs are not allowed
foreach ([Directive::DISALLOW, Directive::ALLOW] as $directive) {
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
? GeneratorBasedReader::fromStream($content)
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @return bool
Spaces must be used to indent lines; tabs are not allowed
private static function isValidScheme($scheme) {
Spaces must be used to indent lines; tabs are not allowed
* Parse URL
Spaces must be used to indent lines; tabs are not allowed
* @param string $url
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return true;
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @param string $path - path to check
Spaces must be used to indent lines; tabs are not allowed
if ($this->checkCleanParamRule(Directive::stripInline($rule), $path)) {
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
$this->userAgentMatcher = $userAgentMatcher;
Spaces must be used to indent lines; tabs are not allowed
if (!empty($this->tree)) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return HostName::isValid($host);
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
foreach ($param as $key) {
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
* Set UserAgent
Spaces must be used to indent lines; tabs are not allowed
public function setUserAgent(string $userAgent) {
Spaces must be used to indent lines; tabs are not allowed
* @return bool
Spaces must be used to indent lines; tabs are not allowed
continue;
Spaces must be used to indent lines; tabs are not allowed
* @return bool
Spaces must be used to indent lines; tabs are not allowed
$this->log("Disallowed by HTTP status code {$this->httpStatusCode}");
Spaces must be used to indent lines; tabs are not allowed
case Directive::CLEAN_PARAM:
Spaces must be used to indent lines; tabs are not allowed
private ?UserAgentMatcherInterface $userAgentMatcher;
Spaces must be used to indent lines; tabs are not allowed
if (is_null($this->reader)) {
Spaces must be used to indent lines; tabs are not allowed
$this->log('Reader is not passed, using a default one...');
Spaces must be used to indent lines; tabs are not allowed
if (is_null($this->treeBuilder)) {
Spaces must be used to indent lines; tabs are not allowed
* @param string $scheme
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @param string $userAgent
Spaces must be used to indent lines; tabs are not allowed
* @param string $rule - rule to check
Spaces must be used to indent lines; tabs are not allowed
foreach ($this->tree[$userAgent][$directive] as $robotRule) {
Spaces must be used to indent lines; tabs are not allowed
* Check HTTP status code rule
Spaces must be used to indent lines; tabs are not allowed
switch (Directive::attemptGetInline($rule)) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if ($this->userAgentMatcher instanceof LoggerAwareInterface) {
Spaces must be used to indent lines; tabs are not allowed
}
Line exceeds 120 characters; contains 135 characters
$parsed['custom'] = (isset($parsed['path']) ? $parsed['path'] : '/') . (isset($parsed['query']) ? '?' . $parsed['query'] : '');
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
private ?TreeBuilderInterface $treeBuilder;
Spaces must be used to indent lines; tabs are not allowed
if (is_null($this->userAgentMatcher)) {
Spaces must be used to indent lines; tabs are not allowed
* Validate URL scheme
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
* @return array|false
Spaces must be used to indent lines; tabs are not allowed
$rule = preg_replace('/\s+/S', ' ', $rule);
Spaces must be used to indent lines; tabs are not allowed
return $cleanParam;
Spaces must be used to indent lines; tabs are not allowed
* Set the HTTP status code
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
* Check rules
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
return $result;
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
private array $tree = [];
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
$url = new Url($url);
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
protected $host = null;
Spaces must be used to indent lines; tabs are not allowed
private function buildTree() {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$this->userAgentMatcher->setLogger($this->logger);
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$parsed = parse_url($url);
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
} else {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$parsed['custom'] = (isset($parsed['path']) ? $parsed['path'] : '/') . (isset($parsed['query']) ? '?' . $parsed['query'] : '');
Spaces must be used to indent lines; tabs are not allowed
$array = explode(' ', $rule, 2);
Spaces must be used to indent lines; tabs are not allowed
$cleanParam = [];
Line exceeds 120 characters; contains 130 characters
$cleanParam['path'] = isset($array[1]) ? $this->encode_url(preg_replace('/[^A-Za-z0-9\.-\/\*\_]/', '', $array[1])) : '/*';
Spaces must be used to indent lines; tabs are not allowed
$this->log('Invalid HTTP status code, not taken into account.', ['code' => $code], LogLevel::WARNING);
Spaces must be used to indent lines; tabs are not allowed
!is_null($this->logger) && $url->setLogger($this->logger);
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
if (!isset($this->tree[$userAgent][$directive])) {
Spaces must be used to indent lines; tabs are not allowed
$result = ($rule === $directive);
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
const DEFAULT_ENCODING = 'UTF-8';
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
$this->treeBuilder->setContent($this->reader->getContentIterated());
Spaces must be used to indent lines; tabs are not allowed
$this->tree = $this->treeBuilder->build();
Spaces must be used to indent lines; tabs are not allowed
protected function parseURL($url) {
Spaces must be used to indent lines; tabs are not allowed
} else {
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
$param = explode('&', $array[0]);
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
throw new \RuntimeException(WarmingMessages::SET_UA_DEPRECATED);
Spaces must be used to indent lines; tabs are not allowed
if ($this->checkHttpStatusCodeRule()) {
Spaces must be used to indent lines; tabs are not allowed
if ($this->checkRuleSwitch($robotRule, $path)) {
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
private $content = '';
Spaces must be used to indent lines; tabs are not allowed
private ?ReaderInterface $reader;
Spaces must be used to indent lines; tabs are not allowed
$this->log('Creating a default tree builder as none passed...');
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @param string $rule
Spaces must be used to indent lines; tabs are not allowed
* @return array
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
* @deprecated please check rules for exact user agent instead
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
if (isset($this->httpStatusCode) && $this->httpStatusCode >= 500 && $this->httpStatusCode <= 599) {
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
foreach ($cleanParam['param'] as $param) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return true;
Spaces must be used to indent lines; tabs are not allowed
if (mb_strrpos($value, '/') == (mb_strlen($value) - 1)
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
if (!isset($this->url)) {
Spaces must be used to indent lines; tabs are not allowed
$url['scheme'] . '://' . $url['host'] . ':' . $url['port'],
Spaces must be used to indent lines; tabs are not allowed
$this->log('Rule match: ' . Directive::HOST . ' directive');
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
* Check url wrapper
Spaces must be used to indent lines; tabs are not allowed
$this->buildTree();
Spaces must be used to indent lines; tabs are not allowed
$this->log("{$directive} directive (unofficial): Not found, fallback to " . Directive::CRAWL_DELAY . " directive");
Spaces must be used to indent lines; tabs are not allowed
return 0;
Spaces must be used to indent lines; tabs are not allowed
public function getCleanParam(): array {
Spaces must be used to indent lines; tabs are not allowed
if (!isset($this->tree[Directive::CLEAN_PARAM]) || empty($this->tree[Directive::CLEAN_PARAM])) {
Spaces must be used to indent lines; tabs are not allowed
break;
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
if (preg_match('@' . $escaped . '@', $path)) {
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
$value = '^' . $value;
Spaces must be used to indent lines; tabs are not allowed
* @param string $rule
Spaces must be used to indent lines; tabs are not allowed
private function checkHostRule($rule) {
Spaces must be used to indent lines; tabs are not allowed
$url['host'],
Spaces must be used to indent lines; tabs are not allowed
$this->log(Directive::CLEAN_PARAM . ' directive: Not found');
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
});
Spaces must be used to indent lines; tabs are not allowed
foreach ($value as $subValue) {
Spaces must be used to indent lines; tabs are not allowed
$this->log(sprintf("Rules not found for the given User-Agent '%s'", $userAgent));
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
* @note NULL is returned to public API compatibility reasons. Will be removed in the future.
Spaces must be used to indent lines; tabs are not allowed
if (!is_null($userAgent)) {
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
if (!strpos($path, "?$param=")
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$this->log('Rule match: ' . Directive::CLEAN_PARAM . ' directive');
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
krsort($input);
Spaces must be used to indent lines; tabs are not allowed
} else {
Spaces must be used to indent lines; tabs are not allowed
$output[] = '';
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$this->log(sprintf("No direct match found for '%s', fallback to *", $userAgent));
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @return string[]|string|null
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
public function getSitemaps(?string $userAgent = null): array {
Spaces must be used to indent lines; tabs are not allowed
return $this->checkBasicRule($rule, $path);
Spaces must be used to indent lines; tabs are not allowed
* @return bool
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if (in_array(
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return $this->tree[$userAgent];
Spaces must be used to indent lines; tabs are not allowed
public function getHost(?string $userAgent = null) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$this->buildTree();
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if (!$this->checkBasicRule($cleanParam['path'], $path)) {
Spaces must be used to indent lines; tabs are not allowed
* Check basic rule
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
$escaped = strtr($this->prepareRegexRule($rule), ['@' => '\@']);
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
* @param string|null $userAgent - which robot to check for
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @return array
Spaces must be used to indent lines; tabs are not allowed
* @deprecated
Spaces must be used to indent lines; tabs are not allowed
* @return string
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
* @param ?string $userAgent
Spaces must be used to indent lines; tabs are not allowed
} else {
Spaces must be used to indent lines; tabs are not allowed
foreach ($this->tree as $userAgentBased) {
Spaces must be used to indent lines; tabs are not allowed
* Check Clean-Param rule
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
)) {
Spaces must be used to indent lines; tabs are not allowed
public function getDelay(string $userAgent = "*", string $type = Directive::CRAWL_DELAY) {
Spaces must be used to indent lines; tabs are not allowed
if (isset($this->tree[$userAgent][Directive::CRAWL_DELAY])) {
Spaces must be used to indent lines; tabs are not allowed
$directive = ucfirst($directive);
Spaces must be used to indent lines; tabs are not allowed
if (is_array($value)) {
Spaces must be used to indent lines; tabs are not allowed
return [];
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
$userAgent = $this->userAgentMatcher->getMatching($userAgent, array_keys($this->tree));
Line exceeds 120 characters; contains 124 characters
if (isset($this->tree[$userAgent][Directive::SITEMAP]) && !empty($this->tree[$userAgent][Directive::SITEMAP])) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
case Directive::HOST;
Spaces must be used to indent lines; tabs are not allowed
if (substr($value, 0, 2) != '.*') {
Spaces must be used to indent lines; tabs are not allowed
$url = $this->parseURL($this->url);
Spaces must be used to indent lines; tabs are not allowed
* @return bool
Spaces must be used to indent lines; tabs are not allowed
$url = new Url($url);
Spaces must be used to indent lines; tabs are not allowed
return [];
Spaces must be used to indent lines; tabs are not allowed
* Render
Spaces must be used to indent lines; tabs are not allowed
$output = [];
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
return $this->tree['*'];
Spaces must be used to indent lines; tabs are not allowed
return null;
Spaces must be used to indent lines; tabs are not allowed
$maps = array_merge($maps, $userAgentBased[Directive::SITEMAP]);
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return true;
Spaces must be used to indent lines; tabs are not allowed
$this->log($error_msg, [], LogLevel::ERROR);
Spaces must be used to indent lines; tabs are not allowed
? Directive::CACHE_DELAY
Spaces must be used to indent lines; tabs are not allowed
: Directive::CRAWL_DELAY;
Spaces must be used to indent lines; tabs are not allowed
$this->buildTree();
Spaces must be used to indent lines; tabs are not allowed
return $this->tree[Directive::CLEAN_PARAM];
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @param string $eol
Spaces must be used to indent lines; tabs are not allowed
foreach ($rules as $directive => $value) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
return $maps;
Spaces must be used to indent lines; tabs are not allowed
private function checkCleanParamRule($rule, $path) {
Spaces must be used to indent lines; tabs are not allowed
protected function prepareRegexRule(string $value): string {
Spaces must be used to indent lines; tabs are not allowed
if (mb_strlen($value) > 2 && mb_substr($value, -2) == '\$') {
Spaces must be used to indent lines; tabs are not allowed
$value .= '.*';
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
$url['host'] . ':' . $url['port'],
Spaces must be used to indent lines; tabs are not allowed
$this->buildTree();
Spaces must be used to indent lines; tabs are not allowed
return $this->tree[$userAgent][$directive];
Spaces must be used to indent lines; tabs are not allowed
return $this->tree[$userAgent][Directive::CRAWL_DELAY];
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
$output[] = $directive . ': ' . $value;
Spaces must be used to indent lines; tabs are not allowed
$output[] = '';
Spaces must be used to indent lines; tabs are not allowed
$this->buildTree();
Spaces must be used to indent lines; tabs are not allowed
if (isset($this->tree['*'])) {
Spaces must be used to indent lines; tabs are not allowed
if (isset($this->tree[$userAgent][Directive::HOST]) && !empty($this->tree[$userAgent][Directive::HOST])) {
Spaces must be used to indent lines; tabs are not allowed
private function checkBasicRule(string $rule, string $path): bool {
Spaces must be used to indent lines; tabs are not allowed
) {
Spaces must be used to indent lines; tabs are not allowed
}
Line exceeds 120 characters; contains 127 characters
$this->log("{$directive} directive (unofficial): Not found, fallback to " . Directive::CRAWL_DELAY . " directive");
Spaces must be used to indent lines; tabs are not allowed
$output[] = $directive . ': ' . $subValue;
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if (!is_null($userAgent)) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @param string $path
Spaces must be used to indent lines; tabs are not allowed
$escape = ['$' => '\$', '?' => '\?', '.' => '\.', '*' => '.*', '[' => '\[', ']' => '\]'];
Spaces must be used to indent lines; tabs are not allowed
$url['scheme'] . '://' . $url['host'],
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
return $this->checkRules(Directive::DISALLOW, $url->getPath(), $userAgent);
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
foreach ($sitemaps as $sitemap) {
Spaces must be used to indent lines; tabs are not allowed
public function getRules(?string $userAgent = null) {
Spaces must be used to indent lines; tabs are not allowed
if (isset($this->tree[$userAgent])) {
Spaces must be used to indent lines; tabs are not allowed
foreach ($this->tree as $userAgentBased) {
Spaces must be used to indent lines; tabs are not allowed
return !empty($hosts) ? $hosts : null;
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return true;
Spaces must be used to indent lines; tabs are not allowed
break;
Spaces must be used to indent lines; tabs are not allowed
default:
Spaces must be used to indent lines; tabs are not allowed
$cleanParam = $this->explodeCleanParamRule($rule);
Spaces must be used to indent lines; tabs are not allowed
|| mb_strrpos($value, '=') == (mb_strlen($value) - 1)
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return $value;
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
]
Spaces must be used to indent lines; tabs are not allowed
!is_null($this->logger) && $url->setLogger($this->logger);
Spaces must be used to indent lines; tabs are not allowed
$directive = in_array($type, [Directive::CACHE, Directive::CACHE_DELAY])
Spaces must be used to indent lines; tabs are not allowed
public function getLog(): array {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
$maps = [];
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @param string $rule
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
* @return bool
Spaces must be used to indent lines; tabs are not allowed
$error_msg = WarmingMessages::INLINED_HOST;
Spaces must be used to indent lines; tabs are not allowed
$host, [
Spaces must be used to indent lines; tabs are not allowed
* @deprecated
Spaces must be used to indent lines; tabs are not allowed
foreach ($input as $userAgent => $rules) {
Spaces must be used to indent lines; tabs are not allowed
$host = $this->getHost();
Spaces must be used to indent lines; tabs are not allowed
$output[] = 'Sitemap: ' . $sitemap;
Spaces must be used to indent lines; tabs are not allowed
return $this->tree;
Spaces must be used to indent lines; tabs are not allowed
$this->buildTree();
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
$userAgent = $this->userAgentMatcher->getMatching($userAgent, array_keys($this->tree));
Spaces must be used to indent lines; tabs are not allowed
if ($this->checkHostRule(Directive::stripInline($rule))) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
&& !strpos($path, "&$param=")
Spaces must be used to indent lines; tabs are not allowed
) {
Spaces must be used to indent lines; tabs are not allowed
$this->log('Rule match: Path');
Spaces must be used to indent lines; tabs are not allowed
$value = substr($value, 0, -2) . '$';
Spaces must be used to indent lines; tabs are not allowed
|| mb_strrpos($value, '?') == (mb_strlen($value) - 1)
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
* Check Host rule
Spaces must be used to indent lines; tabs are not allowed
$host = trim(str_ireplace(Directive::HOST . ':', '', mb_strtolower($rule)));
Spaces must be used to indent lines; tabs are not allowed
* @param string $url - url to check
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if (isset($this->tree[$userAgent][$directive])) {
Spaces must be used to indent lines; tabs are not allowed
return $this->reader->getContentRaw();
Spaces must be used to indent lines; tabs are not allowed
usort($value, function ($a, $b) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if ($userAgent === null) {
Spaces must be used to indent lines; tabs are not allowed
if (isset($userAgentBased[Directive::SITEMAP]) && !empty($userAgentBased[Directive::SITEMAP])) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
return false;
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return true;
Spaces must be used to indent lines; tabs are not allowed
public function isDisallowed(string $url, string $userAgent = '*'): bool {
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
$this->log("$directive directive: Not found");
Spaces must be used to indent lines; tabs are not allowed
public function getContent(): string {
Spaces must be used to indent lines; tabs are not allowed
* @see RobotsTxtParser::getLogger()
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
$output[] = 'User-agent: ' . $userAgent;
Spaces must be used to indent lines; tabs are not allowed
return mb_strlen($a) < mb_strlen($b);
Spaces must be used to indent lines; tabs are not allowed
Spaces must be used to indent lines; tabs are not allowed
return $this->tree[$userAgent][Directive::HOST];
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
if (isset($userAgentBased[Directive::HOST]) && !empty($userAgentBased[Directive::HOST])) {
Spaces must be used to indent lines; tabs are not allowed
if (isset($this->tree[$userAgent][Directive::SITEMAP]) && !empty($this->tree[$userAgent][Directive::SITEMAP])) {
Spaces must be used to indent lines; tabs are not allowed
$value = str_replace(array_keys($escape), array_values($escape), $value);
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
/**
Spaces must be used to indent lines; tabs are not allowed
*
Spaces must be used to indent lines; tabs are not allowed
$input = $this->getRules();
Spaces must be used to indent lines; tabs are not allowed
$output[] = 'Host: ' . $host;
Spaces must be used to indent lines; tabs are not allowed
$sitemaps = $this->getSitemaps();
Spaces must be used to indent lines; tabs are not allowed
return implode($eol, $output);
Spaces must be used to indent lines; tabs are not allowed
$userAgent = $this->userAgentMatcher->getMatching($userAgent, array_keys($this->tree));
Spaces must be used to indent lines; tabs are not allowed
*/
Spaces must be used to indent lines; tabs are not allowed
$hosts = [];
Spaces must be used to indent lines; tabs are not allowed
array_push($hosts, $userAgentBased[Directive::HOST]);
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
public function render($eol = "\r\n") {
Spaces must be used to indent lines; tabs are not allowed
if ($host !== null) {
Spaces must be used to indent lines; tabs are not allowed
}
Spaces must be used to indent lines; tabs are not allowed
return $this->tree[$userAgent][Directive::SITEMAP];
Opening brace should be on a new line
public function isAllowed(string $url, ?string $userAgent = '*'): bool {
Opening brace should be on a new line
public function setLogger(LoggerInterface $logger): void {
Opening brace should be on a new line
public function setHttpStatusCode(int $code): bool {
Opening brace should be on a new line
private function checkHttpStatusCodeRule(): bool {
Opening brace should be on a new line
protected function checkRules(string $rule, string $path, string $userAgent = '*'): bool {
Opening brace should be on a new line
private function buildTree() {
Opening brace should be on a new line
protected function parseURL($url) {
Opening brace should be on a new line
public function setUserAgent(string $userAgent) {
Opening brace should be on a new line
public function getLogger(): ?LoggerInterface {
Opening brace should be on a new line
private static function isValidHostName(string $host): bool {
Opening brace should be on a new line
private function explodeCleanParamRule($rule) {
Opening brace should be on a new line
protected function checkRuleSwitch(string $rule, string $path): bool {
Opening brace should be on a new line
private static function isValidScheme($scheme) {
Opening brace should be on a new line
public function getCleanParam(): array {
Opening brace should be on a new line
private function checkHostRule($rule) {
Opening brace should be on a new line
public function getHost(?string $userAgent = null) {
Opening brace should be on a new line
public function getSitemaps(?string $userAgent = null): array {
Opening brace should be on a new line
private function checkBasicRule(string $rule, string $path): bool {
Opening brace should be on a new line
public function getDelay(string $userAgent = "*", string $type = Directive::CRAWL_DELAY) {
Opening brace should be on a new line
public function render($eol = "\r\n") {
Opening brace should be on a new line
private function checkCleanParamRule($rule, $path) {
Opening brace should be on a new line
protected function prepareRegexRule(string $value): string {
Opening brace should be on a new line
public function getLog(): array {
Opening brace should be on a new line
public function getRules(?string $userAgent = null) {
Opening brace should be on a new line
public function getContent(): string {
Opening brace should be on a new line
public function isDisallowed(string $url, string $userAgent = '*'): bool {
The variable $error_msg is not named in camelCase.
private function checkHostRule($rule) {
if (!isset($this->url)) {
$error_msg = WarmingMessages::INLINED_HOST;
$this->log($error_msg, [], LogLevel::ERROR);
return false;
The variable $error_msg is not named in camelCase.
private function checkHostRule($rule) {
if (!isset($this->url)) {
$error_msg = WarmingMessages::INLINED_HOST;
$this->log($error_msg, [], LogLevel::ERROR);
return false;
There are no issues that match your filters.