Utilize surrounding text information to achieve more efficient and intelligent text input experience.
Mozc converter internally maintains history segments mainly for users who input Japanese sentence with segments in fragments segments. Imagine that a user input an example sentence “今日は良い天気です” as 3 segments as follows.
At the step 3, Mozc converter takes the result of 1 and 2 into consideration when “tennkidesu” is converted. However, this approach may not work well when the caret position is moved but the Mozc converter cannot notice it. In order to work around this situation, Mozc converter can read the preceding text and check if the internal history information is consistent with the preceding text. If they are inconsistent, history segments should be invalidated.
In order to improve the conversion quality when preceding text and history segment are mismatched, it would be nice if we can reconstruct (or emulate) history segments from the preceding text.
In this project, reconstruct segments that consists of only number or alphabet as a first step. Reconstructing more variety of tokens will be future work.
Following table describe the mappings from a preceding text to key/value and POS (Part-of-speech) ID.
|"1 10 "||“10”||“10”||Number|
Here is the list of typical cases when preceding text and history segment are mismatched.
Surrounding text has been available in the following OSes and frameworks:
Here is the list of other possible usages of surrounding text in future projects.
Some buggy applications that wrongly handle surrounding text event may become unstable. Basically there should be no privacy risk because applications are expected to hide sensitive text such as password from IME.
Available on Windows, Apple OS X, Chromium OS and Linux desktop. No impact for Android platform.