blob: bd158d41c443d05c62d11e69c7118af60197e459 [file] [log] [blame]
<body>
<h3 id="fsadaptor">Deployment of File System Adaptor</h3>
<p>A single instance of File System adaptor can have
GSA index a single UNC share. DFS is supported.
<h4>Requirements</h4>
<ul>
<li>GSA 7.2 or higher
<li>Java JRE 1.7 update 6 or higher installed on computer that runs adaptor
<li>File System Adaptor JAR executable
<li>Requires running on Microsoft Windows
<li>A Windows account with sufficient permissions for the adaptor
(see the <b>Permissions needed by the Adaptor</b> section below)
</ul>
<h4>Permissions needed by the Adaptor</h4>
<p>The Windows user account that the adaptor is running under must have sufficient permissions to
list the content of folders, read the content of documents, read attributes of files and folders
and read ACLs for both files and folders. You can accomplish this by using an
account that is a member of one of the following groups:
<ul>
<li>Administrator</li>
<li>Power User</li>
<li>Print Operator</li>
<li>Server Operator</li>
</ul>
<p>It is not sufficient for the user to be a member of one of the groups at the Domain level.
For more information, read the following Microsoft document:
<a href="http://msdn.microsoft.com/en-us/library/bb525388(VS.85).aspx">http://msdn.microsoft.com/en-us/library/bb525388(VS.85).aspx</a></p>
<p>For the adaptor to restore the last access date for documents after reading its contents, the adaptor
needs write permission. Being a member of the above groups grants this permission.
<h4>Configure GSA for Adaptor</h4>
<ol>
<li>Add the IP address of the computer that hosts the adaptor to the <b>List
of Trusted IP Addresses</b> on the GSA.
<p>In the GSA's Admin Console, go to <b>Content Sources &gt; Feeds</b>,
and scroll down to <b>List of Trusted IP Addresses</b>. Add the IP address
for the adaptor to the list.
<li>Add the URLs provided by the adaptor to the <b>Follow Patterns</b>
on the GSA.
<p>In the Admin console, go to <b>Content Sources &gt; Web Crawl
&gt; Start and Block URLs</b>, and
scroll down to <b>Follow Patterns</b>.
Add an entry like <code>http://adaptor.example.com:5678/doc/
</code> where <code>adaptor.example.com</code> is the hostname of the
machine that hosts the adaptor. By default the adaptor runs on port 5678.
</ol>
<h4>Configure Adaptor</h4>
<ol>
<li>Create a file named <code>adaptor-config.properties</code> in the
directory that contains the adaptor binary.
<p>
Here is an example configuration (bold items are example values to be
replaced):
<pre>
gsa.hostname=<b>yourgsa.hostname.com</b>
filesystemadaptor.src=<b>\\\\host\\share</b>
</pre>
<p> Note: Backslashes are entered as double backslashes. In order
to represent a single '\' you need to enter '\\'.
<p> Note: DFS links can be given as
filesystemadaptor.src: <b>\\\\host\\dfsnamespace\\link</b>
<p> Note: UNICODE, as well as non-ASCII, characters can be used in
filesystemadaptor.src. Including these characters will require
the <code>adaptor-config.properties</code> file to be saved
using UTF-8 encoding.
<br>
<li> Create file named <code>logging.properties</code> in the same directory
that contains adaptor binary:
<pre>
.level=INFO
handlers=java.util.logging.FileHandler,java.util.logging.ConsoleHandler
java.util.logging.FileHandler.formatter=com.google.enterprise.adaptor.CustomFormatter
java.util.logging.FileHandler.pattern=logs/adaptor.%g.log
java.util.logging.FileHandler.limit=10485760
java.util.logging.FileHandler.count=20
java.util.logging.ConsoleHandler.formatter=com.google.enterprise.adaptor.CustomFormatter
</pre>
<li><p>Create a directory named <code>logs</code> inside same directory that contains
the adaptor binary.
<li><p>Run the adaptor using a command line like:
<pre>java -Djava.util.logging.config.file=logging.properties -jar adaptor-fs-YYYYMMDD-withlib.jar</pre>
</ol>
<h4>Running as service on Windows</h4>
<p>Example service creation on Windows with prunsrv:
<pre>prunsrv install adaptor-fs --StartPath="%CD%" ^
--Classpath=adaptor-fs-YYYYMMDD-withlib.jar ^
--StartMode=jvm --StartClass=com.google.enterprise.adaptor.Daemon ^
--StartMethod=serviceStart --StartParams=com.google.enterprise.adaptor.fs.FsAdaptor ^
--StopMode=jvm --StopClass=com.google.enterprise.adaptor.Daemon ^
--StopMethod=serviceStop --StdOutput=stdout.log --StdError=stderr.log ^
++JvmOptions=-Djava.util.logging.config.file=logging.properties</pre>
<p> Note: By default the File System adaptor service runs using the Windows Local System account.
This should be fine in most cases but this can cause issues if access to documents is
restricted through Acls.
In cases where the File System adaptor service is not able to crawl documents due
to Acl restrictions, you would need to specify a user for the File System adaptor
service through the Service Control Manager that has sufficient access to crawl the documents.
<h4>Optional <code>adaptor-config.properties</code> fields</h4>
<dl>
<dt>
<code>server.dashboardPort</code>
</dt>
<dd>
Port on which to view web page showing information
and diagnostics. Defaults to "5679".
</dd>
<br>
<dt>
<code>filesystemadaptor.supportedAccounts</code>
</dt>
<dd>
Accounts that are in the supportedAccounts will be
included in Acls regardless if they are builtin or
not.
By default the value is:
<pre>
BUILTIN\\Administrators,\\Everyone,BUILTIN\\Users,
BUILTIN\\Guest,NT AUTHORITY\\INTERACTIVE,
NT AUTHORITY\\Authenticated Users
</pre>
</dd>
<dt>
<code>filesystemadaptor.builtinGroupPrefix</code>
</dt>
<dd>
Builtin accounts are excluded from the Acls
that are pushed to the GSA. An account that starts with
this prefix is considered a builtin account and will be
excluded from the Acls.
By default the value is:
<pre>
BUILTIN\\
</pre>
</dd>
<dt>
<code>filesystemadaptor.crawlHiddenFiles</code>
</dt>
<dd>
This boolean configuration property allows or disallows indexing
of hidden files and folders. The definition of hidden files and
folders is platform dependent. On Windows file sytems a file or
folder is considered hidden if the DOS <code>hidden</code>
attribute is set.
<br>
By default, hidden files are not indexed and the contents of
hidden folders are not indexed. Setting
<code>filesystemadaptor.crawlHiddenFiles</code> to <code>true</code>
will allow hidden files and folders to be crawled by the Search
Appliance. By default the value is:
<pre>
false
</pre>
</dd>
<br>
<dt>
<code>adaptor.incrementalPollPeriodSecs</code>
</dt>
<dd>
Time between incremental crawls. Default value is 300 seconds.
</dd>
<br>
<dt>
<code>adaptor.namespace</code>
</dt>
<dd>
Namespace used for ACLs sent to GSA. Defaults to "Default".
</dd>
</dl>
<br>
<br>
<h3> Advanced Topics </h3>
<h4>Not changing 'last access' of the documents on the share</h4>
<p>The adaptor attempts to restore the last access date for documents after
it reads the document content during a crawl. In order for the last access
date to be restored back to the original value before the content was read,
the user account that the adaptor is running under needs to have write permission.
If the account has read-only permission and not write permission for documents,
then the last access date for documents will change as the adaptor reads
document content during a crawl.
<br>
<br>
<h3> Developer Topics </h3>
<h4>File System Adaptor Acl Overview</h4>
<p>ACLs for documents and folders are read, preserved and pushed to the Google
Search Appliance by the File System Adaptor for UNC and DFS UNC paths.
</p>
<p>The following images show the ACL inheritance used by the File System Adaptor.
The green and pink arrows signify inheritance. While the dotted arrows show an
optional inheritance depending on whether the item inherits permission from
its parent or if it breaks inheritance and defines its own set of permissions.
</p>
<h4>non-DFS ACL inheritance</h4>
<img src="non_dfs_acls.jpg" />
<h4>DFS ACL inheritance</h4>
<img src="dfs_acls.jpg" />
</body>