Path and URL options
117
-nofollow
Type
: Web crawling only
Syntax
:
-nofollow "exp"
Specifies that Verity Spider cannot follow any URLs that match the exp expression. If you do not
specify an exp value for the
-nofollow
option, Verity Spider assumes a value of "*", where no
documents are followed.
You can use wildcard expressions, where the asterisk (*) is for text strings and the question mark
(?) is for single characters. Always encapsulate the exp values in double-quotation marks to ensure
that they are properly interpreted.
If you use backslashes, you must double them so that they are properly escaped; for example:
C:\\test\\docs\\path
To use regular expressions, also specify the
-regexp
option.
Earlier versions of Verity Spider did not allow the use of an expression. This meant that for each
starting point URL, only the first document would be indexed. With the addition of the
expression functionality, you can now selectively skip URLs, even within documents.
See also
-regexp
-norobo
Type: Web crawling only
Specifies to ignore any robots.txt files encountered. The robots.txt file is used on many websites to
specify what parts of the site indexers should avoid. The default is to honor any robots.txt files.
If you are re-indexing a site and the robots.txt file has changed, Verity Spider deletes documents
that have been newly disallowed by the robots.txt file.
Use this option with discretion and extreme care, especially in conjunction with the
-cgiok
option.
See also
-nodocrobo
and http://info.webcrawler.com/mak/projects/robots/norobots.html.
-pathlen
Syntax
:
-pathlen num_pathsegments
Limits indexing to the specified number of path segments in the URL or file system path. The
path length is determined as follows:
•
The host name and drive letter are not included; for example, neither www.spider.com:80/ nor
C:\ would be included in determining the path length.
•
All elements following the host name are included.
•
The actual filename, if present, is included; for example, /world.html would be included in
determining the path length.
•
Any directory paths between the host and the actual filename are included.
Summary of Contents for COLDFUSION MX 61 - CONFIGURING AND ADMINISTERING COLDFUSION...
Page 1: ...Configuring and Administering ColdFusion MX...
Page 8: ...8 Contents...
Page 10: ...10 Introduction...
Page 12: ......
Page 36: ...36 Chapter 2 Basic ColdFusion MX Administration...
Page 56: ...56 Chapter 3 Data Source Management...
Page 74: ...74 Chapter 5 Administering Security...
Page 84: ......
Page 132: ...132 Chapter 9 Indexing Collections with Verity Spider...
Page 142: ...142 Chapter 10 Searching Collections with K2 Server...
Page 148: ...148 Chapter 11 Searching Collections with the rcvdk Utility...