Handle emojis that have multiple readings correctly
There are some emojis in emoji_data.ts which contain multiple readings but gen_emoji_rewriter_data.py couldn't handle them correctly. As a result, some emoji are registered as if they could be read as " ". This CL addresses the issue.
Closes Issue 266.
BUG=Issue mozc:266
TEST=manually done with Nexus 5 / Android 5.0.1 (LRX22C)
git-svn-id: https://mozc.googlecode.com/svn/trunk@478 a6090854-d499-a067-5803-1114d4e51264
diff --git a/src/mozc_version_template.txt b/src/mozc_version_template.txt
index 239cfcd..b6b4a66 100644
--- a/src/mozc_version_template.txt
+++ b/src/mozc_version_template.txt
@@ -1,6 +1,6 @@
MAJOR=2
MINOR=16
-BUILD=2011
+BUILD=2012
REVISION=102
# NACL_DICTIONARY_VERSION is the target version of the system dictionary to be
# downloaded by NaCl Mozc.
diff --git a/src/rewriter/gen_emoji_rewriter_data.py b/src/rewriter/gen_emoji_rewriter_data.py
index 49a59bc..262f503 100644
--- a/src/rewriter/gen_emoji_rewriter_data.py
+++ b/src/rewriter/gen_emoji_rewriter_data.py
@@ -126,7 +126,7 @@
kddi_description))
# \xe3\x80\x80 is a full-width space
- for reading in re.split(r'( |\xe3\x80\x80)+', readings.strip()):
+ for reading in re.split(r'(?: |\xe3\x80\x80)+', readings.strip()):
token_dict[reading].append(index)
return (emoji_data_list, token_dict)