| <body> |
| <p>Easily provide repository data to a Google Search Appliance (GSA). |
| |
| <p> If you'd like to use a language other than Java or if you have command |
| line programs that can provide repository access, see if {@link |
| com.google.enterprise.adaptor.prebuilt.CommandLineAdaptor} fits your needs. |
| </p> |
| |
| <h3>Basic GSA Setup</h3> |
| <ol> |
| <li>Add the IP address of the computer that hosts the adaptor to the <b>List |
| of Trusted IP Addresses</b> on the GSA. |
| <p>In the GSA's Admin Console, go to <b>Crawl and Index > Feeds</b>, |
| and scroll down to <b>List of Trusted IP Addresses</b>. Add the IP address |
| for the adaptor to the list.</p> |
| <li>Add the URLs provided by the adaptor to the <b>Follow and Crawl Only |
| URLs with the Following Patterns</b> on the GSA. |
| <p>In the Admin console, go to <b>Crawl And Index > Crawl URLs</b>, and |
| scroll down to <b>Follow and Crawl Only URLs with the Following |
| Patterns</b>. Add an entry like {@code hostname:port/} where {@code |
| hostname} is the hostname of the machine that hosts the adaptor and {@code |
| port} defaults to 5678 (read on to change port number).</p> |
| </ol> |
| |
| <h3>Running the Adaptor Template, as an initial test</h3> |
| <ol> |
| <li>You should have already installed JDK 6 or higher and gotten a plexi |
| release. Specifically, you will need {@code adaptor.jar} and {@code |
| adaptor-examples.jar}. If instead of working from a release you are |
| working from source code you can build the required jars by running: |
| <pre>ant dist |
| cd dist</pre> |
| The needed {@code adaptor.jar} and {@code adaptor-examples.jar} will be |
| within the current directory. |
| <li>Create an <code>adaptor-config.properties</code> text file in the |
| current directory that looks like: |
| <pre>gsa.hostname=mygsahostname</pre> |
| <p>You should replace <code>mygsahostname</code> with the hostname or IP |
| of your GSA. This file allows you to do other configuration of the adaptor |
| library like changing the server port and feed name: |
| <pre>gsa.hostname=mygsahostname |
| server.port=6677 |
| feed.name=mydocfeedtogsa</pre> |
| <p>Later, if you have trouble with the adaptor library incorrectly |
| auto-detecting your computer's hostname, then you may need to add a line |
| like: |
| <pre>server.hostname=yourcomputershostname</pre> |
| <p>For a list and explanation of available configruation options view |
| {@link com.google.enterprise.adaptor.Config}. |
| <li>Start the Adaptor Template. For Windows: |
| <pre>java -cp adaptor.jar;adaptor-examples.jar com.google.enterprise.adaptor.examples.AdaptorTemplate</pre> |
| For all other OSes: |
| <pre>java -cp adaptor.jar:adaptor-examples.jar com.google.enterprise.adaptor.examples.AdaptorTemplate</pre> |
| <li> Ensure crawling is enabled on your GSA. |
| <p> |
| Go to <b>Status and Reports</b> and click <b>Resume Crawl</b> if |
| crawling system is currently paused. |
| |
| <li> Confirm things ran successfully. |
| <p> |
| In the GSA, go to <b>Crawl and Index > Feeds</b>. |
| In the <b>Current Feeds</b> section, you should see an entry for a |
| "adaptor_HOSTNAME_PORT" (which can be changed by setting the |
| <code>feed.name</code> configuration variable). |
| <p> |
| In the adaptor log look to see document ids being pushed and |
| requests for document contents being served. |
| </ol> |
| |
| <h3>Creating your own Adaptor</h3> |
| <ol> |
| <li>Review JavaDoc for {@link com.google.enterprise.adaptor.Adaptor} |
| and {@link com.google.enterprise.adaptor.AbstractAdaptor}. |
| <li>From {@code adaptor-src.zip}, make a copy of {@code |
| src/com/google/enterprise/adaptor/examples/AdaptorTemplate.java} |
| to your own package and name. You will need to modify the contents |
| appropriately for the new package and name. |
| <li>Compile, run, and verify the copied adaptor using your favorite IDE. You |
| will only need {@code adaptor.jar} in your classpath. |
| <li>Modify it further for your own repository. |
| <li>Declare success for getting content from your custom repository to the |
| GSA. |
| </ol> |
| |
| <h3>Advanced</h3> |
| <p>You can set configuration variables on the command line instead of in |
| <code>adaptor-config.properties</code>. You are allowed multiple arguments |
| of the form "-Dconfigkey=configvalue". When providing a value on the command |
| line, it overrides the default value and the value (if any) in the |
| configuration file. For example: |
| <pre>java -cp adaptor.jar:adaptor-examples.jar com.google.enterprise.adaptor.examples.AdaptorTemplate -Dgsa.hostname=mygsahostname -Dserver.port=6677</pre> |
| |
| <h3>Enabling Security</h3> |
| <p>Security is not enabled by default because it requires a reasonable amount |
| of setup, on both the GSA and adaptor. The GSA needs a valid certificate for |
| the hostname you are accessing it with (<code>gsa.hostname</code>). Thus, |
| the default one it ships with cannot be valid and you need to generate a new |
| one. Setting up security is required before users can access non-public |
| documents directly from the adaptor. |
| |
| <h4>Creating Self-Signed Certificates</h4> |
| <p>In the GSA's Admin Console, go to <b>Administration > SSL Settings</b>. |
| Under the <b>Create a New SSL Certificate</b> heading change <b>Host |
| Name</b> to GSA's hostname written exactly as the adaptor will use. |
| Then click <b>Create |
| Self-Signed Certificate</b> and wait for the operation to complete. |
| Then click <b>Install SSL Certificate</b> and wait for that operation |
| to complete (about 1 minute). |
| You now have a valid self-signed certificate, but it is not available to be |
| trusted by the adaptor. |
| |
| <p>You need to get the GSA's freshly-created certificate to add it as a |
| trusted host for the adaptor: |
| <ul> |
| <li><b>Using Firefox:</b> Navigate to the GSA's secure search: |
| https://gsahostname/. You should see a warning page that says, "This |
| Connection is Untrusted." This message is because the certificate is |
| self-signed and not signed by a trusted Certificate Authority. Click, "I |
| Understand the Risks" and "Add Exception." Wait until the "View..." |
| button is clickable, then click it. Change to the "Details" tab and |
| click "Export...". Save the certificate in your adaptor's directory with |
| the name "gsa.crt". You can then hit "Close" and "Cancel" to close the |
| dialog windows. |
| <li><b>Using Chrome:</b> Navigate to the GSA's secure search: |
| https://gsahostname/. You should see a warning page that says, "The |
| site's security certificate is not trusted!" In the location bar, there |
| should be a pad lock with a red 'x' on it. Click the pad lock and then |
| click "Certificate Information." Change to the "Details" tab and click |
| "Export...". Save the certificate in your adaptor's directory with the |
| name "gsa.crt". You can then hit "Close" and "Cancel" to close the |
| dialog windows. |
| <li><b>Using OpenSSL:</b> Execute: |
| <pre>openssl s_client -connect gsahostname:443 < /dev/null</pre> |
| Copy the section that begins with <code>-----BEGIN CERTIFICATE-----</code> |
| and ends with <code>-----END CERTIFICATE-----</code> (including the BEGIN |
| and END CERTIFICATE portions) into a new file. Save the file in your |
| adaptor's directory with the name "gsa.crt". |
| </ul> |
| |
| <p>Now you should generate a self-signed certificate for the adaptor and |
| export the newly created certificate. Within the adaptor's directory, you |
| should run: |
| <pre>keytool -genkeypair -keystore keys.jks -storepass changeit -keypass changeit -alias adaptor -keyalg RSA -validity 365</pre> |
| <p>For "What is your first and last name?", you should enter the hostname of |
| the adaptor's computer. You are free to answer the other questions however |
| you wish (including not answering them). When you are happy with your |
| answers, answer "yes" to "Is CN=yourcomputershostname, OU=... correct?" |
| <p>Then, still in adaptor's directory, you should run: |
| <pre>keytool -exportcert -alias adaptor -keystore keys.jks -storepass changeit -keypass changeit -rfc -file adaptor.crt</pre> |
| |
| <p>Copy cacerts from Java to the adaptor's directory. For Windows: |
| <pre>copy PATH\TO\JRE\lib\security\cacerts cacerts.jks</pre> |
| <p>For all other OSes: |
| <pre>cp PATH/TO/JRE/lib/security/cacerts cacerts.jks</pre> |
| |
| <p>To allow the adaptor to trust itself, execute: |
| <pre>keytool -importcert -keystore cacerts.jks -storepass changeit -file adaptor.crt -alias adaptor</pre> |
| <p>Answer "yes" to "Trust this certificate?" |
| |
| <h4>Exchanging Certificates</h4> |
| <p>To allow the adaptor to trust the GSA, execute: |
| <pre>keytool -importcert -keystore cacerts.jks -storepass changeit -file gsa.crt -alias gsa</pre> |
| <p>Answer "yes" to "Trust this certificate?" |
| |
| <p>To allow the GSA to trust the adaptor, within the GSA's Admin Console, go |
| to <b>Administration > Certificate Authorities</b>. Click the <b>Choose |
| File</b> button (this button could be called "Browse...") under the |
| <b>Add more Cerificate Authorities</b> heading. |
| Choose "adaptor.crt" in the adaptor's directory and click <b>Save |
| Settings</b>. |
| |
| <h4>Flipping the Switch</h4> |
| <p>Now that everything is prepared, you can flip the security switch with the |
| adaptor by adding a line to your <code>adaptor-config.properties</code>: |
| <pre>server.secure=true</pre> |
| <p>The adaptor can now use the GSA's authentication configuration and will use |
| HTTPS for all communication.</p> |
| <p> Example command line to run secure: |
| <pre> |
| java \ |
| -Djava.util.logging.config.file=src/logging.properties \ |
| -Djavax.net.ssl.keyStore=keys.jks \ |
| -Djavax.net.ssl.keyStoreType=jks \ |
| -Djavax.net.ssl.keyStorePassword=changeit \ |
| -Djavax.net.ssl.trustStore=cacerts.jks \ |
| -Djavax.net.ssl.trustStoreType=jks \ |
| -Djavax.net.ssl.trustStorePassword=changeit \ |
| -classpath 'adaptor.jar:adaptor-examples.jar' \ |
| com.google.enterprise.adaptor.examples.AdaptorWithCrawlTimeMetadataTemplate |
| </pre> |
| |
| <h4>Enable Stricter Security (optional)</h4> |
| <p>There are additional security options you can control on the GSA. |
| You may want to try running an adaptor with server.secure set before |
| enabling these stricter features. |
| Within the GSA's Admin Console, go to <b>Administration > SSL |
| Settings</b>. There you can:<ul> |
| <li> uncheck <b>Enable HTTP (non-SSL) access for Feedergate</b>. With this |
| field unchecked only HTTPS communications will be accepted by feedergate. |
| Adaptors send document ids to feedergate. |
| <li> check <b>Enable Client Certificate Authentication for Feedergate</b>. |
| <li> check <b>Enable Server Certificate Authentication</b>. Note: Does not |
| work at this time (Oct 4 2011). |
| </ul> |
| <p> |
| Click <b>Save Setup</b> to save your changes. |
| <p> |
| Note: By using these settings you improve security, but also require |
| all adaptors to be configured for security and have |
| <code>server.secure=true</code> in their configuration. |
| |
| |
| </body> |