| <body> |
| <h3 id="fsadaptor">Deployment of File System Adaptor</h3> |
| |
| <p>A single instance of File System adaptor can have |
| GSA index a single UNC share. DFS is supported. |
| |
| <h4>Requirements</h4> |
| <ul> |
| <li>GSA 7.2 or higher |
| <li>Java JRE 1.7 update 6 or higher installed on computer that runs adaptor |
| <li>File System Adaptor JAR executable |
| <li>Requires running on Microsoft Windows |
| <li>A Windows account with sufficient permissions for the adaptor |
| (see the <b>Permissions needed by the Adaptor</b> section below) |
| </ul> |
| |
| <h4>Permissions needed by the Adaptor</h4> |
| |
| <p>The Windows account that the adaptor is running under must have |
| sufficient permissions to: |
| <ul> |
| <li>List the content of folders</li> |
| <li>Read the content of documents</li> |
| <li>Read attributes of files and folders</li> |
| <li>Read permissions (ACLs) for both files and folders</li> |
| </ul> |
| |
| <p>Membership in one of these groups grants a Windows account the |
| sufficient permissions needed by the Adaptor: |
| <ul> |
| <li>Administrators</li> |
| <li>Power Users</li> |
| <li>Print Operators</li> |
| <li>Server Operators</li> |
| </ul> |
| |
| <p>Note: it is not sufficient for the user to be member of one of |
| these groups at the domain level. The user must be a member of one of |
| these groups on the local machine that exports the Windows Share. More |
| information in the Microsoft documentation, on the <a |
| href="http://msdn.microsoft.com/en-us/library/bb525388(VS.85).aspx"> |
| NetShareGetinfo function</a>.</p> |
| |
| <h4>Configure GSA for Adaptor</h4> |
| <ol> |
| <li>Add the IP address of the computer that hosts the adaptor to the <b>List |
| of Trusted IP Addresses</b> on the GSA. |
| <p>In the GSA's Admin Console, go to <b>Content Sources > Feeds</b>, |
| and scroll down to <b>List of Trusted IP Addresses</b>. Add the IP address |
| for the adaptor to the list. |
| |
| <li>Add the URLs provided by the adaptor to the <b>Follow Patterns</b> |
| on the GSA. |
| <p>In the Admin console, go to <b>Content Sources > Web Crawl |
| > Start and Block URLs</b>, and |
| scroll down to <b>Follow Patterns</b>. |
| Add an entry like <code>http://adaptor.example.com:5678/doc/ |
| </code> where <code>adaptor.example.com</code> is the hostname of the |
| machine that hosts the adaptor. By default the adaptor runs on port 5678. |
| </ol> |
| |
| <h4>Configure Adaptor</h4> |
| <ol> |
| <li>Create a file named <code>adaptor-config.properties</code> in the |
| directory that contains the adaptor binary. |
| <p> |
| Here is an example configuration (bold items are example values to be |
| replaced): |
| <pre> |
| gsa.hostname=<b>yourgsa.hostname.com</b> |
| filesystemadaptor.src=<b>\\\\host\\share</b> |
| </pre> |
| <p> Note: Backslashes are entered as double backslashes. In order |
| to represent a single '\' you need to enter '\\'. |
| <p> Note: DFS links can be given as |
| filesystemadaptor.src: <b>\\\\host\\dfsnamespace\\link</b> |
| <p> Note: UNICODE, as well as non-ASCII, characters can be used in |
| filesystemadaptor.src. Including these characters will require |
| the <code>adaptor-config.properties</code> file to be saved |
| using UTF-8 encoding. |
| <br> |
| |
| <li> Create file named <code>logging.properties</code> in the same directory |
| that contains adaptor binary: |
| <pre> |
| .level=INFO |
| handlers=java.util.logging.FileHandler,java.util.logging.ConsoleHandler |
| java.util.logging.FileHandler.formatter=com.google.enterprise.adaptor.CustomFormatter |
| java.util.logging.FileHandler.pattern=logs/adaptor.%g.log |
| java.util.logging.FileHandler.limit=10485760 |
| java.util.logging.FileHandler.count=20 |
| java.util.logging.ConsoleHandler.formatter=com.google.enterprise.adaptor.CustomFormatter |
| </pre> |
| |
| <li><p>Create a directory named <code>logs</code> inside same directory that contains |
| the adaptor binary. |
| |
| <li><p>Run the adaptor using a command line like: |
| <pre>java -Djava.util.logging.config.file=logging.properties -jar adaptor-fs-YYYYMMDD-withlib.jar</pre> |
| </ol> |
| |
| <h4>Running as service on Windows</h4> |
| <p>Example service creation on Windows with prunsrv: |
| <pre>prunsrv install adaptor-fs --StartPath="%CD%" ^ |
| --Classpath=adaptor-fs-YYYYMMDD-withlib.jar ^ |
| --StartMode=jvm --StartClass=com.google.enterprise.adaptor.Daemon ^ |
| --StartMethod=serviceStart --StartParams=com.google.enterprise.adaptor.fs.FsAdaptor ^ |
| --StopMode=jvm --StopClass=com.google.enterprise.adaptor.Daemon ^ |
| --StopMethod=serviceStop --StdOutput=stdout.log --StdError=stderr.log ^ |
| ++JvmOptions=-Djava.util.logging.config.file=logging.properties</pre> |
| |
| <p> Note: By default the File System adaptor service runs using the Windows Local System account. |
| This should be fine in most cases but this can cause issues if access to documents is |
| restricted through Acls. |
| In cases where the File System adaptor service is not able to crawl documents due |
| to Acl restrictions, you would need to specify a user for the File System adaptor |
| service through the Service Control Manager that has sufficient access to crawl the documents. |
| |
| <h4>Optional <code>adaptor-config.properties</code> fields</h4> |
| <dl> |
| <dt> |
| <code>server.dashboardPort</code> |
| </dt> |
| <dd> |
| Port on which to view web page showing information |
| and diagnostics. Defaults to "5679". |
| </dd> |
| <br> |
| <dt> |
| <code>filesystemadaptor.supportedAccounts</code> |
| </dt> |
| <dd> |
| Accounts that are in the supportedAccounts will be |
| included in Acls regardless if they are builtin or |
| not. |
| By default the value is: |
| <pre> |
| BUILTIN\\Administrators,\\Everyone,BUILTIN\\Users, |
| BUILTIN\\Guest,NT AUTHORITY\\INTERACTIVE, |
| NT AUTHORITY\\Authenticated Users |
| </pre> |
| </dd> |
| <dt> |
| <code>filesystemadaptor.builtinGroupPrefix</code> |
| </dt> |
| <dd> |
| Builtin accounts are excluded from the Acls |
| that are pushed to the GSA. An account that starts with |
| this prefix is considered a builtin account and will be |
| excluded from the Acls. |
| By default the value is: |
| <pre> |
| BUILTIN\\ |
| </pre> |
| </dd> |
| <dt> |
| <code>filesystemadaptor.crawlHiddenFiles</code> |
| </dt> |
| <dd> |
| This boolean configuration property allows or disallows indexing |
| of hidden files and folders. The definition of hidden files and |
| folders is platform dependent. On Windows file sytems a file or |
| folder is considered hidden if the DOS <code>hidden</code> |
| attribute is set. |
| <p/> |
| By default, hidden files are not indexed and the contents of |
| hidden folders are not indexed. Setting |
| <code>filesystemadaptor.crawlHiddenFiles</code> to <code>true</code> |
| will allow hidden files and folders to be crawled by the Search |
| Appliance. By default the value is: |
| <pre> |
| false |
| </pre> |
| </dd> |
| <dt> |
| <code>filesystemadaptor.lastAccessedDate</code> |
| </dt> |
| <dd> |
| This configuration property can be used to disable crawling of files |
| whose time of last access is earlier than a specific date. The cut-off |
| date is specified in <a href="http://www.w3.org/TR/NOTE-datetime"> |
| ISO8601</a> date format, <code>YYYY-MM-DD</code>. |
| <p/> |
| Setting <code>filesystemadaptor.lastAccessedDate</code> to |
| <code>2010-01-01</code> would only crawl content that has been accessed |
| since the beginning of 2010. |
| <p/> |
| By default, filtering content based upon last accessed time is disabled. |
| <br> |
| Only one of <code>filesystemadaptor.lastAccessedDate</code> or |
| <code>filesystemadaptor.lastAccessedDays</code> may be specified. |
| </dd> |
| <dt> |
| <code>filesystemadaptor.lastAccessedDays</code> |
| </dt> |
| <dd> |
| This configuration property can be used to disable crawling of files |
| that have not been accessed within the specified number of days. Unlike the |
| absolute cut-off date used by <code>filesystemadaptor.lastAccessedDate</code>, |
| this property can be used to expire previously indexed content if it |
| has not been accessed in a while. |
| <p/> |
| The expiration window is specified as a positive integer number of days. |
| <br> |
| Setting <code>filesystemadaptor.lastAccessedDays</code> to |
| <code>365</code> would only crawl content that has been accessed |
| in the last year. |
| <p/> |
| By default, filtering content based upon last accessed time is disabled. |
| <br> |
| Only one of <code>filesystemadaptor.lastAccessedDate</code> or |
| <code>filesystemadaptor.lastAccessedDays</code> may be specified. |
| </dd> |
| <dt> |
| <code>filesystemadaptor.lastModifiedDate</code> |
| </dt> |
| <dd> |
| This configuration property can be used to disable crawling of files |
| whose time of last access is earlier than a specific date. The cut-off |
| date is specified in <a href="http://www.w3.org/TR/NOTE-datetime"> |
| ISO8601</a> date format, <code>YYYY-MM-DD</code>. |
| <p/> |
| Setting <code>filesystemadaptor.lastModifiedDate</code> to |
| <code>2010-01-01</code> would only crawl content that has been modified |
| since the beginning of 2010. |
| <p/> |
| By default, filtering content based upon last modified time is disabled. |
| <br> |
| Only one of <code>filesystemadaptor.lastModifiedDate</code> or |
| <code>filesystemadaptor.lastModifiedDays</code> may be specified. |
| </dd> |
| <dt> |
| <code>filesystemadaptor.lastModifiedDays</code> |
| </dt> |
| <dd> |
| This configuration property can be used to disable crawling of files |
| that have not been modified within the specified number of days. Unlike the |
| absolute cut-off date used by <code>filesystemadaptor.lastModifiedDate</code>, |
| this property can be used to expire previously indexed content if it |
| has not been modified in a while. |
| <p/> |
| The expiration window is specified as a positive integer number of days. |
| <br> |
| Setting <code>filesystemadaptor.lastModifiedDays</code> to |
| <code>365</code> would only crawl content that has been modified |
| in the last year. |
| <p/> |
| By default, filtering content based upon last modified time is disabled. |
| <br> |
| Only one of <code>filesystemadaptor.lastModifiedDate</code> or |
| <code>filesystemadaptor.lastModifiedDays</code> may be specified. |
| </dd> |
| <dt> |
| <code>adaptor.incrementalPollPeriodSecs</code> |
| </dt> |
| <dd> |
| Time between incremental crawls. Default value is 300 seconds. |
| </dd> |
| <br> |
| <dt> |
| <code>adaptor.namespace</code> |
| </dt> |
| <dd> |
| Namespace used for ACLs sent to GSA. Defaults to "Default". |
| </dd> |
| <br> |
| |
| <dt> |
| <code>server.port</code> |
| </dt> |
| <dd> |
| Port from which documents are served. GSA crawls this port. |
| Each instance of an adaptor on same machine requires a unique port. |
| Defaults to 5678. |
| </dd> |
| <br> |
| |
| </dl> |
| |
| <br> |
| <br> |
| |
| <h3> Advanced Topics </h3> |
| |
| <h4>Not changing 'last access' of the documents on the share</h4> |
| <p>The adaptor attempts to leave the last access date for documents |
| unchanged while reading the content during a crawl. In some instances, |
| this may not happen and the last access date gets updated. When this |
| happens the adaptor then attempts to restore the last access date for the |
| document back to the original value before the crawl started. In order |
| for the last access date to be restored back to the original value, |
| the user account that the adaptor is running under needs to have |
| write basic attributes permission. If the account has read-only |
| permission and last access date happens to change during a crawl, |
| then the adaptor will not be able to restore the last access date |
| back to the original value before the crawl started. |
| |
| <br> |
| <br> |
| |
| |
| <h3> Developer Topics </h3> |
| |
| <h4>File System Adaptor Acl Overview</h4> |
| |
| <p>ACLs for documents and folders are read, preserved and pushed to the Google |
| Search Appliance by the File System Adaptor for UNC and DFS UNC paths. |
| </p> |
| |
| <p>The following images show the ACL inheritance used by the File System Adaptor. |
| The green and pink arrows signify inheritance. While the dotted arrows show an |
| optional inheritance depending on whether the item inherits permission from |
| its parent or if it breaks inheritance and defines its own set of permissions. |
| </p> |
| |
| <h4>non-DFS ACL inheritance</h4> |
| <img src="non_dfs_acls.jpg" /> |
| |
| <h4>DFS ACL inheritance</h4> |
| <img src="dfs_acls.jpg" /> |
| |
| </body> |