TYPO3
7.6
|
Public Member Functions | |
crawler_init (&$pObj) | |
crawler_execute ($params, &$pObj) | |
crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj) | |
crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj) | |
crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj) | |
crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj) | |
cleanUpOldRunningConfigurations () | |
checkUrl ($url, $urlLog, $baseUrl) | |
indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId) | |
indexSingleRecord ($r, $cfgRec, $rl=null) | |
getUidRootLineForClosestTemplate ($id) | |
generateNextIndexingTime ($cfgRec) | |
checkDeniedSuburls ($url, $url_deny) | |
addQueueEntryForHook ($cfgRec, $title) | |
deleteFromIndex ($id) | |
processCmdmap_preProcess ($command, $table, $id, $value, $pObj) | |
processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, $pObj) | |
Public Attributes | |
$secondsPerExternalUrl = 3 | |
$instanceCounter = 0 | |
$callBack = CrawlerHook::class | |
Crawler hook for indexed search. Works with the "crawler" extension
Definition at line 24 of file Hook/CrawlerHook.php.
addQueueEntryForHook | ( | $cfgRec, | |
$title | |||
) |
Adding entry in queue for Hook
array | $cfgRec | Configuration record |
string | $title | Title/URL |
Definition at line 629 of file Hook/CrawlerHook.php.
checkDeniedSuburls | ( | $url, | |
$url_deny | |||
) |
Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns TRUE.
string | $url | URL to test |
string | $url_deny | String where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend) |
Definition at line 608 of file Hook/CrawlerHook.php.
References $url, GeneralUtility\isFirstPartOfStr(), and GeneralUtility\trimExplode().
Referenced by CrawlerHook\crawler_execute_type3().
checkUrl | ( | $url, | |
$urlLog, | |||
$baseUrl | |||
) |
Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.
string | $url | URL string to check |
array | $urlLog | Array of already indexed URLs (input url is looked up here and must not exist already) |
string | $baseUrl | Base URL of the indexing process (input URL must be "inside" the base URL!) |
Definition at line 455 of file Hook/CrawlerHook.php.
References $url, and GeneralUtility\isFirstPartOfStr().
Referenced by CrawlerHook\crawler_execute_type3().
cleanUpOldRunningConfigurations | ( | ) |
Look up all old index configurations which are finished and needs to be reset and done
Definition at line 414 of file Hook/CrawlerHook.php.
References $GLOBALS, and BackendUtility\deleteClause().
Referenced by CrawlerHook\crawler_init().
crawler_execute | ( | $params, | |
& | $pObj | ||
) |
Call back function for execution of a log element
array | $params | Params from log element. Must contain $params['indexConfigUid'] |
object | $pObj | Parent object (tx_crawler lib) |
Definition at line 161 of file Hook/CrawlerHook.php.
References $GLOBALS, CrawlerHook\crawler_execute_type1(), CrawlerHook\crawler_execute_type2(), CrawlerHook\crawler_execute_type3(), CrawlerHook\crawler_execute_type4(), and GeneralUtility\getUserObj().
crawler_execute_type1 | ( | $cfgRec, | |
& | $session_data, | ||
$params, | |||
& | $pObj | ||
) |
Indexing records from a table
array | $cfgRec | Indexing Configuration Record |
array | $session_data | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! |
array | $params | Parameters from the log queue. |
object | $pObj | Parent object (from "crawler" extension!) |
Definition at line 221 of file Hook/CrawlerHook.php.
References $GLOBALS, BackendUtility\BEenableFields(), BackendUtility\deleteClause(), CrawlerHook\getUidRootLineForClosestTemplate(), and CrawlerHook\indexSingleRecord().
Referenced by CrawlerHook\crawler_execute().
crawler_execute_type2 | ( | $cfgRec, | |
& | $session_data, | ||
$params, | |||
& | $pObj | ||
) |
Indexing files from fileadmin
array | $cfgRec | Indexing Configuration Record |
array | $session_data | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! |
array | $params | Parameters from the log queue. |
object | $pObj | Parent object (from "crawler" extension!) |
Definition at line 266 of file Hook/CrawlerHook.php.
References $GLOBALS, elseif, GeneralUtility\get_dirs(), GeneralUtility\getAllFilesAndFoldersInPath(), GeneralUtility\getFileAbsFileName(), CrawlerHook\getUidRootLineForClosestTemplate(), GeneralUtility\isAbsPath(), GeneralUtility\isAllowedAbsPath(), GeneralUtility\makeInstance(), GeneralUtility\removePrefixPathFromList(), and GeneralUtility\trimExplode().
Referenced by CrawlerHook\crawler_execute().
crawler_execute_type3 | ( | $cfgRec, | |
& | $session_data, | ||
$params, | |||
& | $pObj | ||
) |
Indexing External URLs
array | $cfgRec | Indexing Configuration Record |
array | $session_data | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! |
array | $params | Parameters from the log queue. |
object | $pObj | Parent object (from "crawler" extension!) |
Definition at line 328 of file Hook/CrawlerHook.php.
References $GLOBALS, $url, CrawlerHook\checkDeniedSuburls(), CrawlerHook\checkUrl(), CrawlerHook\getUidRootLineForClosestTemplate(), and CrawlerHook\indexExtUrl().
Referenced by CrawlerHook\crawler_execute().
crawler_execute_type4 | ( | $cfgRec, | |
& | $session_data, | ||
$params, | |||
& | $pObj | ||
) |
Page tree indexing type
array | $cfgRec | Indexing Configuration Record |
array | $session_data | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! |
array | $params | Parameters from the log queue. |
object | $pObj | Parent object (from "crawler" extension!) |
Definition at line 369 of file Hook/CrawlerHook.php.
References $GLOBALS, $url, BackendUtility\deleteClause(), and BackendUtility\getRecord().
Referenced by CrawlerHook\crawler_execute().
crawler_init | ( | & | $pObj | ) |
Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.
object | $pObj | Parent object (tx_crawler lib) |
Definition at line 53 of file Hook/CrawlerHook.php.
References $GLOBALS, CrawlerHook\cleanUpOldRunningConfigurations(), BackendUtility\deleteClause(), CrawlerHook\generateNextIndexingTime(), GeneralUtility\getUserObj(), and GeneralUtility\md5int().
deleteFromIndex | ( | $id | ) |
Deletes all data stored by indexed search for a given page
int | $id | Uid of the page to delete all pHash |
Definition at line 646 of file Hook/CrawlerHook.php.
References $GLOBALS.
Referenced by CrawlerHook\processCmdmap_preProcess(), and CrawlerHook\processDatamap_afterDatabaseOperations().
generateNextIndexingTime | ( | $cfgRec | ) |
Generate the unix time stamp for next visit.
array | $cfgRec | Index configuration record |
Definition at line 582 of file Hook/CrawlerHook.php.
References $GLOBALS.
Referenced by CrawlerHook\crawler_init().
getUidRootLineForClosestTemplate | ( | $id | ) |
Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser
int | $id | The page id to traverse rootline back from |
Definition at line 557 of file Hook/CrawlerHook.php.
References GeneralUtility\makeInstance().
Referenced by CrawlerHook\crawler_execute_type1(), CrawlerHook\crawler_execute_type2(), CrawlerHook\crawler_execute_type3(), and CrawlerHook\indexSingleRecord().
indexExtUrl | ( | $url, | |
$pageId, | |||
$rl, | |||
$cfgUid, | |||
$setId | |||
) |
Indexing External URL
string | $url | URL, http://.... |
int | $pageId | Page id to relate indexing to. |
array | $rl | Rootline array to relate indexing to |
int | $cfgUid | Configuration UID |
int | $setId | Set ID value |
Definition at line 478 of file Hook/CrawlerHook.php.
References $list, $url, GeneralUtility\makeInstance(), and GeneralUtility\resolveBackPath().
Referenced by CrawlerHook\crawler_execute_type3().
indexSingleRecord | ( | $r, | |
$cfgRec, | |||
$rl = null |
|||
) |
Indexing Single Record
array | $r | Record to index |
array | $cfgRec | Configuration Record |
array | $rl | Rootline array to relate indexing to |
Definition at line 525 of file Hook/CrawlerHook.php.
References $GLOBALS, CrawlerHook\getUidRootLineForClosestTemplate(), GeneralUtility\makeInstance(), and GeneralUtility\trimExplode().
Referenced by CrawlerHook\crawler_execute_type1(), and CrawlerHook\processDatamap_afterDatabaseOperations().
processCmdmap_preProcess | ( | $command, | |
$table, | |||
$id, | |||
$value, | |||
$pObj | |||
) |
TCEmain hook function for on-the-fly indexing of database records
string | $command | TCEmain command |
string | $table | Table name |
string | $id | Record ID. If new record its a string pointing to index inside ::substNEWwithIDs |
mixed | $value | Target value (ignored) |
FormEngine | $pObj | tcemain calling object |
Definition at line 685 of file Hook/CrawlerHook.php.
References CrawlerHook\deleteFromIndex().
processDatamap_afterDatabaseOperations | ( | $status, | |
$table, | |||
$id, | |||
$fieldArray, | |||
$pObj | |||
) |
TCEmain hook function for on-the-fly indexing of database records
string | $status | Status "new" or "update |
string | $table | Table name |
string | $id | Record ID. If new record its a string pointing to index inside ::substNEWwithIDs |
array | $fieldArray | Field array of updated fields in the operation |
FormEngine | $pObj | tcemain calling object |
Definition at line 703 of file Hook/CrawlerHook.php.
References $GLOBALS, BackendUtility\deleteClause(), CrawlerHook\deleteFromIndex(), elseif, BackendUtility\getRecord(), and CrawlerHook\indexSingleRecord().
$callBack = CrawlerHook::class |
Definition at line 43 of file Hook/CrawlerHook.php.
$instanceCounter = 0 |
Definition at line 38 of file Hook/CrawlerHook.php.
$secondsPerExternalUrl = 3 |
Definition at line 31 of file Hook/CrawlerHook.php.