blob: 5009e27d889b0143c84491773cee008e8a5e16a9 [file] [log] [blame]
<body>
<p>Easily provide repository data to a Google Search Appliance (GSA).
<p> If you'd like to use a language other than Java or if you have command
line programs that can provide repository access, see if {@link
com.google.enterprise.adaptor.prebuilt.CommandLineAdaptor} fits your needs.
</p>
<h3>Basic GSA Setup</h3>
<ol>
<li>Add the IP address of the computer that hosts the adaptor to the <b>List
of Trusted IP Addresses</b> on the GSA.
<p>In the GSA's Admin Console, go to <b>Crawl and Index &gt; Feeds</b>,
and scroll down to <b>List of Trusted IP Addresses</b>. Add the IP address
for the adaptor to the list.</p>
<li>Add the URLs provided by the adaptor to the <b>Follow and Crawl Only
URLs with the Following Patterns</b> on the GSA.
<p>In the Admin console, go to <b>Crawl And Index &gt; Crawl URLs</b>, and
scroll down to <b>Follow and Crawl Only URLs with the Following
Patterns</b>. Add an entry like {@code hostname:port/} where {@code
hostname} is the hostname of the machine that hosts the adaptor and {@code
port} defaults to 5678 (read on to change port number).</p>
</ol>
<h3>Running the Adaptor Template, as an initial test</h3>
<ol>
<li>You should have already installed JDK 6 or higher and gotten a plexi
release (download from https://code.google.com/p/plexi/). From the
downloaded release zip file, use the extracted adaptor jar
(eg: {@code adaptor-20130612-withlib.jar}) and extracted adaptor
examples jar (eg: {@code examples/adaptor-20130612-examples.jar}).
If instead of working from a release you are
working from source code you can build the required jars by running:
<pre>ant dist
cd dist</pre>
<p>The needed jars will be in a zip file
within the current directory (eg: adaptor-20130612-bin.zip will have
adaptor-20130612-withlib.jar and examples/adaptor-20130612-examples.jar).
</p>
<li>Create an <code>adaptor-config.properties</code> text file in the
current directory that looks like:
<pre>gsa.hostname=mygsahostname</pre>
<p>You should replace <code>mygsahostname</code> with the hostname or IP
of your GSA. This file allows you to do other configuration of the adaptor
library like changing the server port and feed name:
<pre>gsa.hostname=mygsahostname
server.port=6677
feed.name=mydocfeedtogsa</pre>
<p>Later, if you have trouble with the adaptor library incorrectly
auto-detecting your computer's hostname, then you may need to add a line
like:
<pre>server.hostname=yourcomputershostname</pre>
<p>For a list and explanation of available configruation options view
{@link com.google.enterprise.adaptor.Config}.
<li>Start the Adaptor Template. Note that the jar files you have may have a
different date in their names. For Windows:
<pre>java -cp adaptor-20130612-withlib.jar;examples/adaptor-20130612-examples.jar com.google.enterprise.adaptor.examples.AdaptorTemplate</pre>
For all other OSes:
<pre>java -cp adaptor-20130612-withlib.jar:examples/adaptor-20130612-examples.jar com.google.enterprise.adaptor.examples.AdaptorTemplate</pre>
<li> Ensure crawling is enabled on your GSA.
<p>
Go to <b>Status and Reports</b> and click <b>Resume Crawl</b> if
crawling system is currently paused.
<li> Confirm things ran successfully.
<p>
In the GSA, go to <b>Crawl and Index &gt; Feeds</b>.
In the <b>Current Feeds</b> section, you should see an entry for a
"adaptor_HOSTNAME_PORT" (which can be changed by setting the
<code>feed.name</code> configuration variable).
<p>
In the adaptor log look to see document ids being pushed and
requests for document contents being served.
</ol>
<h3>Creating your own Adaptor</h3>
<ol>
<li>Review JavaDoc for {@link com.google.enterprise.adaptor.Adaptor}
and {@link com.google.enterprise.adaptor.AbstractAdaptor}.
<li>From the zip file (eg:{@code adaptor-20130612-src.zip}),
make a copy of {@code
src/com/google/enterprise/adaptor/examples/AdaptorTemplate.java}
to your own package and name. You will need to modify the contents
appropriately for the new package and name.
<li>Compile, run, and verify the copied adaptor using your favorite IDE. You
will only need {@code adaptor-20130612-withlib.jar} in your classpath.
Note that the date may be different.
<li>Modify it further for your own repository.
<li>Declare success for getting content from your custom repository to the
GSA.
</ol>
<h3>Testing Tip</h3>
<p>An adaptor, by default, will deny all document accesses, except from the
GSA. To allow debugging and testing an adaptor without a GSA, you can add a
hostname to the <code>server.fullAccessHosts</code> config key to allow that
computer full access to all adaptor content. In addition, this setting
allows that computer to see metadata and other GSA-specific information as
HTTP headers. This can be very useful when combined with Firebug or the Web
Inspector in your browser to observe an Adaptor's behavior.
<h3>Advanced</h3>
<p>You can set configuration variables on the command line instead of in
<code>adaptor-config.properties</code>. You are allowed multiple arguments
of the form "-Dconfigkey=configvalue". When providing a value on the command
line, it overrides the default value and the value (if any) in the
configuration file. For example:
<pre>java -cp adaptor-20130612-withlib.jar:examples/adaptor-20130612-examples.jar
com.google.enterprise.adaptor.examples.AdaptorTemplate -Dgsa.hostname=mygsahostname
-Dserver.port=6677</pre>
<h3>Enabling Security</h3>
<p>Security is not enabled by default because it requires a reasonable amount
of setup, on both the GSA and adaptor. The GSA needs a valid certificate for
the hostname you are accessing it with (<code>gsa.hostname</code>). Thus,
the default one it ships with cannot be valid and you need to generate a new
one. Setting up security is required before users can access non-public
documents directly from the adaptor.
<h4>Creating Self-Signed Certificates</h4>
<p>In the GSA's Admin Console, go to <b>Administration &gt; SSL Settings</b>.
Under the <b>Create a New SSL Certificate</b> heading change <b>Host
Name</b> to GSA's hostname written exactly as the adaptor will use.
Then click <b>Create
Self-Signed Certificate</b> and wait for the operation to complete.
Then click <b>Install SSL Certificate</b> and wait for that operation
to complete (about 1 minute).
You now have a valid self-signed certificate, but it is not available to be
trusted by the adaptor.
<p>You need to get the GSA's freshly-created certificate to add it as a
trusted host for the adaptor:
<ul>
<li><b>Using Firefox:</b> Navigate to the GSA's secure search:
https://gsahostname/. You should see a warning page that says, "This
Connection is Untrusted." This message is because the certificate is
self-signed and not signed by a trusted Certificate Authority. Click, "I
Understand the Risks" and "Add Exception." Wait until the "View..."
button is clickable, then click it. Change to the "Details" tab and
click "Export...". Save the certificate in your adaptor's directory with
the name "gsa.crt". You can then hit "Close" and "Cancel" to close the
dialog windows.
<li><b>Using Chrome:</b> Navigate to the GSA's secure search:
https://gsahostname/. You should see a warning page that says, "The
site's security certificate is not trusted!" In the location bar, there
should be a pad lock with a red 'x' on it. Click the pad lock and then
click "Certificate Information." Change to the "Details" tab and click
"Export...". Save the certificate in your adaptor's directory with the
name "gsa.crt". You can then hit "Close" and "Cancel" to close the
dialog windows.
<li><b>Using OpenSSL:</b> Execute:
<pre>openssl s_client -connect gsahostname:443 &lt; /dev/null</pre>
Copy the section that begins with <code>-----BEGIN CERTIFICATE-----</code>
and ends with <code>-----END CERTIFICATE-----</code> (including the BEGIN
and END CERTIFICATE portions) into a new file. Save the file in your
adaptor's directory with the name "gsa.crt".
</ul>
<p>Now you should generate a self-signed certificate for the adaptor and
export the newly created certificate. Within the adaptor's directory, you
should run:
<pre>keytool -genkeypair -keystore keys.jks -storepass changeit -keypass changeit -alias adaptor -keyalg RSA -validity 365</pre>
<p>For "What is your first and last name?", you should enter the hostname of
the adaptor's computer. You are free to answer the other questions however
you wish (including not answering them). When you are happy with your
answers, answer "yes" to "Is CN=yourcomputershostname, OU=... correct?"
<p>Then, still in adaptor's directory, you should run:
<pre>keytool -exportcert -alias adaptor -keystore keys.jks -storepass changeit -keypass changeit -rfc -file adaptor.crt</pre>
<p>Copy cacerts from Java to the adaptor's directory. For Windows:
<pre>copy PATH\TO\JRE\lib\security\cacerts cacerts.jks</pre>
<p>For all other OSes:
<pre>cp PATH/TO/JRE/lib/security/cacerts cacerts.jks</pre>
<p>To allow the adaptor to trust itself, execute:
<pre>keytool -importcert -keystore cacerts.jks -storepass changeit -file adaptor.crt -alias adaptor</pre>
<p>Answer "yes" to "Trust this certificate?"
<h4>Exchanging Certificates</h4>
<p>To allow the adaptor to trust the GSA, execute:
<pre>keytool -importcert -keystore cacerts.jks -storepass changeit -file gsa.crt -alias gsa</pre>
<p>Answer "yes" to "Trust this certificate?"
<p>To allow the GSA to trust the adaptor, within the GSA's Admin Console, go
to <b>Administration &gt; Certificate Authorities</b>. Click the <b>Choose
File</b> button (this button could be called "Browse...") under the
<b>Add more Cerificate Authorities</b> heading.
Choose "adaptor.crt" in the adaptor's directory and click <b>Save
Settings</b>.
<h4>Flipping the Switch</h4>
<p>Now that everything is prepared, you can flip the security switch with the
adaptor by adding a line to your <code>adaptor-config.properties</code>:
<pre>server.secure=true</pre>
<p>The adaptor can now use the GSA's authentication configuration and will use
HTTPS for all communication.</p>
<p> Example command line to run secure:
<pre>
java \
-Djava.util.logging.config.file=src/logging.properties \
-Djavax.net.ssl.keyStore=keys.jks \
-Djavax.net.ssl.keyStoreType=jks \
-Djavax.net.ssl.keyStorePassword=changeit \
-Djavax.net.ssl.trustStore=cacerts.jks \
-Djavax.net.ssl.trustStoreType=jks \
-Djavax.net.ssl.trustStorePassword=changeit \
-classpath 'adaptor-20130612-withlib.jar:examples/adaptor-20130612-examples.jar' \
com.google.enterprise.adaptor.examples.AdaptorWithCrawlTimeMetadataTemplate
</pre>
<h4>Enable Stricter Security (optional)</h4>
<p>There are additional security options you can control on the GSA.
You may want to try running an adaptor with server.secure set before
enabling these stricter features.
Within the GSA's Admin Console, go to <b>Administration &gt; SSL
Settings</b>. There you can:<ul>
<li> uncheck <b>Enable HTTP (non-SSL) access for Feedergate</b>. With this
field unchecked only HTTPS communications will be accepted by feedergate.
Adaptors send document ids to feedergate.
<li> check <b>Enable Client Certificate Authentication for Feedergate</b>.
<li> check <b>Enable Server Certificate Authentication</b>. Note: Does not
work at this time (Oct 4 2011).
</ul>
<p>
Click <b>Save Setup</b> to save your changes.
<p>
Note: By using these settings you improve security, but also require
all adaptors to be configured for security and have
<code>server.secure=true</code> in their configuration.
</body>