<body>
<h3 id="fsadaptor">Deployment of File System Adaptor</h3>

<p>A single instance of File System adaptor can have
GSA index a single UNC share.  DFS is supported.

<h4>Requirements</h4>
<ul>
  <li>GSA 7.2 or higher
  <li>Java JRE 1.7 update 6 or higher installed on computer that runs adaptor
  <li>File System Adaptor JAR executable
  <li>Requires running on Microsoft Windows
  <li>A Windows account with sufficient permissions for the adaptor
      (see the <b>Permissions needed by the Adaptor</b> section below)
</ul>

<h4>Permissions needed by the Adaptor</h4>
  <p>The Windows user account that the adaptor is running under must
  have the following permissions: 

  <ul>
    <li>list the content of folders,</li> 
    <li>read the content of documents,</li> 
    <li>read attributes of files and foldersr.</li>
  </ul>

<h5>Special permissions needed to read the ACLs</h5>
  Additionaly, the GSA must be able to:
  <ul>
    <li>read ACLs for both files and folders,</li>
    <li>reset last access dates to the original value prior to the GSA access.</li>
  </ul>

  To get this set of permission, the account must be member of one of
  the following groups:
  <ul>
    <li>Administrator</li>
    <li>Power User</li>
    <li>Print Operator</li>
    <li>Server Operator</li>
  </ul>
  <p>Please note that it is not sufficient for the user to be a member
  of one of these groups at the Domain level: the user must be a member
  of one of these group at the local level.</p>

  <p>More information in the Microsoft documentation, on the 
  <a href="http://msdn.microsoft.com/en-us/library/bb525388(VS.85).aspx">
  NetShareGetinfo function</a>.</p>

<h4>Configure GSA for Adaptor</h4>
<ol>
  <li>Add the IP address of the computer that hosts the adaptor to the <b>List
    of Trusted IP Addresses</b> on the GSA.
    <p>In the GSA's Admin Console, go to <b>Content Sources &gt; Feeds</b>,
    and scroll down to <b>List of Trusted IP Addresses</b>. Add the IP address
    for the adaptor to the list.

  <li>Add the URLs provided by the adaptor to the <b>Follow Patterns</b>
    on the GSA.
    <p>In the Admin console, go to <b>Content Sources &gt; Web Crawl 
    &gt; Start and Block URLs</b>, and
    scroll down to <b>Follow Patterns</b>.
    Add an entry like <code>http://adaptor.example.com:5678/doc/
    </code> where <code>adaptor.example.com</code> is the hostname of the
    machine that hosts the adaptor. By default the adaptor runs on port 5678.
</ol>

<h4>Configure Adaptor</h4>
<ol>
  <li>Create a file named <code>adaptor-config.properties</code> in the
  directory that contains the adaptor binary.
  <p>
  Here is an example configuration (bold items are example values to be
  replaced):
<pre>
gsa.hostname=<b>yourgsa.hostname.com</b>
filesystemadaptor.src=<b>\\\\host\\share</b>
</pre>
  <p> Note: Backslashes are entered as double backslashes. In order
      to represent a single '\' you need to enter '\\'.
  <p> Note: DFS links can be given as 
      filesystemadaptor.src: <b>\\\\host\\dfsnamespace\\link</b>
  <p> Note: UNICODE, as well as non-ASCII, characters can be used in
      filesystemadaptor.src. Including these characters will require
      the <code>adaptor-config.properties</code> file to be saved
      using UTF-8 encoding.
  <br>

  <li> Create file named <code>logging.properties</code> in the same directory
  that contains adaptor binary:
  <pre>
.level=INFO
handlers=java.util.logging.FileHandler,java.util.logging.ConsoleHandler
java.util.logging.FileHandler.formatter=com.google.enterprise.adaptor.CustomFormatter
java.util.logging.FileHandler.pattern=logs/adaptor.%g.log
java.util.logging.FileHandler.limit=10485760
java.util.logging.FileHandler.count=20
java.util.logging.ConsoleHandler.formatter=com.google.enterprise.adaptor.CustomFormatter
</pre>

  <li><p>Create a directory named <code>logs</code> inside same directory that contains 
    the adaptor binary.

  <li><p>Run the adaptor using a command line like:
  <pre>java -Djava.util.logging.config.file=logging.properties -jar adaptor-fs-YYYYMMDD-withlib.jar</pre>
</ol>

<h4>Running as service on Windows</h4>
  <p>Example service creation on Windows with prunsrv:
  <pre>prunsrv install adaptor-fs --StartPath="%CD%" ^
  --Classpath=adaptor-fs-YYYYMMDD-withlib.jar ^
  --StartMode=jvm --StartClass=com.google.enterprise.adaptor.Daemon ^
  --StartMethod=serviceStart --StartParams=com.google.enterprise.adaptor.fs.FsAdaptor ^
  --StopMode=jvm --StopClass=com.google.enterprise.adaptor.Daemon ^
  --StopMethod=serviceStop --StdOutput=stdout.log --StdError=stderr.log ^
  ++JvmOptions=-Djava.util.logging.config.file=logging.properties</pre>

  <p> Note: By default the File System adaptor service runs using the Windows Local System account.
      This should be fine in most cases but this can cause issues if access to documents is
      restricted through Acls.
      In cases where the File System adaptor service is not able to crawl documents due
      to Acl restrictions, you would need to specify a user for the File System adaptor
      service through the Service Control Manager that has sufficient access to crawl the documents.

<h4>Optional <code>adaptor-config.properties</code> fields</h4>
<dl>
  <dt>
  <code>server.dashboardPort</code>
  </dt>
  <dd>
  Port on which to view web page showing information
  and diagnostics.  Defaults to "5679".
  </dd>
  <br>
  <dt>
  <code>filesystemadaptor.supportedAccounts</code>
  </dt>
  <dd>
  Accounts that are in the supportedAccounts will be
  included in Acls regardless if they are builtin or
  not.
  By default the value is:
  <pre>
  BUILTIN\\Administrators,\\Everyone,BUILTIN\\Users,
  BUILTIN\\Guest,NT AUTHORITY\\INTERACTIVE,
  NT AUTHORITY\\Authenticated Users
  </pre>
  </dd>
  <dt>
  <code>filesystemadaptor.builtinGroupPrefix</code>
  </dt>
  <dd>
  Builtin accounts are excluded from the Acls
  that are pushed to the GSA. An account that starts with
  this prefix is considered a builtin account and will be
  excluded from the Acls.
  By default the value is:
  <pre>
  BUILTIN\\
  </pre>
  </dd>
  <dt>
  <code>filesystemadaptor.crawlHiddenFiles</code>
  </dt>
  <dd>
  This boolean configuration property allows or disallows indexing
  of hidden files and folders. The definition of hidden files and
  folders is platform dependent. On Windows file sytems a file or
  folder is considered hidden if the DOS <code>hidden</code>
  attribute is set.
  <p/>
  By default, hidden files are not indexed and the contents of
  hidden folders are not indexed. Setting
  <code>filesystemadaptor.crawlHiddenFiles</code> to <code>true</code>
  will allow hidden files and folders to be crawled by the Search
  Appliance. By default the value is:
  <pre>
  false
  </pre>
  </dd>
  <dt>
  <code>filesystemadaptor.lastAccessedDate</code>
  </dt>
  <dd>
  This configuration property can be used to disable crawling of files
  whose time of last access is earlier than a specific date.  The cut-off
  date is specified in <a href="http://www.w3.org/TR/NOTE-datetime">
  ISO8601</a> date format, <code>YYYY-MM-DD</code>.
  <p/>
  Setting <code>filesystemadaptor.lastAccessedDate</code> to
  <code>2010-01-01</code> would only crawl content that has been accessed
  since the beginning of 2010.
  <p/>
  By default, filtering content based upon last accessed time is disabled.
  <br>
  Only one of <code>filesystemadaptor.lastAccessedDate</code> or
  <code>filesystemadaptor.lastAccessedDays</code> may be specified.
  </dd>
  <dt>
  <code>filesystemadaptor.lastAccessedDays</code>
  </dt>
  <dd>
  This configuration property can be used to disable crawling of files
  that have not been accessed within the specified number of days. Unlike the
  absolute cut-off date used by <code>filesystemadaptor.lastAccessedDate</code>,
  this property can be used to expire previously indexed content if it
  has not been accessed in a while.
  <p/>
  The expiration window is specified as a positive integer number of days.
  <br>
  Setting <code>filesystemadaptor.lastAccessedDays</code> to
  <code>365</code> would only crawl content that has been accessed
  in the last year.
  <p/>
  By default, filtering content based upon last accessed time is disabled.
  <br>
  Only one of <code>filesystemadaptor.lastAccessedDate</code> or
  <code>filesystemadaptor.lastAccessedDays</code> may be specified.
  </dd>
  <dt>
  <code>filesystemadaptor.lastModifiedDate</code>
  </dt>
  <dd>
  This configuration property can be used to disable crawling of files
  whose time of last access is earlier than a specific date.  The cut-off
  date is specified in <a href="http://www.w3.org/TR/NOTE-datetime">
  ISO8601</a> date format, <code>YYYY-MM-DD</code>.
  <p/>
  Setting <code>filesystemadaptor.lastModifiedDate</code> to
  <code>2010-01-01</code> would only crawl content that has been modified
  since the beginning of 2010.
  <p/>
  By default, filtering content based upon last modified time is disabled.
  <br>
  Only one of <code>filesystemadaptor.lastModifiedDate</code> or
  <code>filesystemadaptor.lastModifiedDays</code> may be specified.
  </dd>
  <dt>
  <code>filesystemadaptor.lastModifiedDays</code>
  </dt>
  <dd>
  This configuration property can be used to disable crawling of files
  that have not been modified within the specified number of days. Unlike the
  absolute cut-off date used by <code>filesystemadaptor.lastModifiedDate</code>,
  this property can be used to expire previously indexed content if it
  has not been modified in a while.
  <p/>
  The expiration window is specified as a positive integer number of days.
  <br>
  Setting <code>filesystemadaptor.lastModifiedDays</code> to
  <code>365</code> would only crawl content that has been modified
  in the last year.
  <p/>
  By default, filtering content based upon last modified time is disabled.
  <br>
  Only one of <code>filesystemadaptor.lastModifiedDate</code> or
  <code>filesystemadaptor.lastModifiedDays</code> may be specified.
  </dd>
  <dt>
  <code>adaptor.incrementalPollPeriodSecs</code>
  </dt>
  <dd>
  Time between incremental crawls. Default value is 300 seconds.
  </dd>
  <br>
  <dt>
  <code>adaptor.namespace</code>
  </dt>
  <dd>
  Namespace used for ACLs sent to GSA.  Defaults to "Default".
  </dd>
</dl>

<br>
<br>

<h3> Advanced Topics </h3>

<h4>Not changing 'last access' of the documents on the share</h4>
<p>The adaptor attempts to restore the last access date for documents after 
it reads the document content during a crawl. In order for the last access 
date to be restored back to the original value before the content was read, 
the user account that the adaptor is running under needs to have write permission. 
If the account has read-only permission and not write permission for documents, 
then the last access date for documents will change as the adaptor reads 
document content during a crawl.

<br>
<br>


<h3> Developer Topics </h3>

<h4>File System Adaptor Acl Overview</h4>

<p>ACLs for documents and folders are read, preserved and pushed to the Google 
Search Appliance by the File System Adaptor for UNC and DFS UNC paths.
</p>

<p>The following images show the ACL inheritance used by the File System Adaptor. 
The green and pink arrows signify inheritance. While the dotted arrows show an 
optional inheritance depending on whether the item inherits permission from 
its parent or if it breaks inheritance and defines its own set of permissions.
</p>

<h4>non-DFS ACL inheritance</h4>
<img src="non_dfs_acls.jpg" />

<h4>DFS ACL inheritance</h4>
<img src="dfs_acls.jpg" />

</body>
