You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
(50) |
May
(10) |
Jun
(48) |
Jul
(72) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
(6) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(9) |
Feb
(9) |
Mar
(1) |
Apr
(1) |
May
(3) |
Jun
(10) |
Jul
(5) |
Aug
|
Sep
(1) |
Oct
|
Nov
(17) |
Dec
|
2009 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(7) |
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
|
1
(1) |
2
(2) |
3
|
4
|
5
|
6
(2) |
7
(2) |
8
|
9
|
10
|
11
(1) |
12
|
13
|
14
|
15
|
16
|
17
(3) |
18
(13) |
19
(11) |
20
(10) |
21
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
29
(3) |
30
|
From: Mathieu B. <mbl...@ru...> - 2007-06-29 17:47:53
|
Hi, I have good news ! Using tomoe_query_set_max_n_strokes gives very good results. With a patch like the following, candidates get displayed very fast! --- module/recognizer/tomoe-recognizer-simple-logic.c (révision 1543) +++ module/recognizer/tomoe-recognizer-simple-logic.c (copie de travail) @@ -90,6 +90,17 @@ query = tomoe_query_new (); tomoe_query_set_min_n_strokes (query, input_stroke_num); + + /* Statistics show that characters with less than 6 strokes + represent less that 10% of characters and characters with + between 7 and 13 strokes represent more than 60% of characters */ + if (input_stroke_num <= 6) { + tomoe_query_set_max_n_strokes (query, input_stroke_num + 5); + } + else if(input_stroke_num <= 13) { + tomoe_query_set_max_n_strokes (query, input_stroke_num + 3); + } + target_chars = tomoe_dict_search (dict, query); g_object_unref (query); if (!target_chars) return NULL; I think we can add this by default even for platforms which don't have performance issues because IMHO comparing for example a one stroke input with a character of more than say 10 strokes doesn't make sense ! Here are some statistics I have made using handwriting-ja.xml (handwriting-zh_CN.xml gives similar results) : N_strokes N_characters Cumulated Percent 1 26 26 0.402476780185758 % 2 56 82 0.86687306501548 % 3 77 159 1.19195046439628 % 4 128 287 1.98142414860681 % 5 158 445 2.44582043343653 % 6 201 646 3.11145510835913 % 7 318 964 4.92260061919505 % 8 440 1404 6.81114551083591 % 9 476 1880 7.36842105263158 % 10 550 2430 8.51393188854489 % 11 577 3007 8.93188854489164 % 12 570 3577 8.82352941176471 % 13 534 4111 8.26625386996904 % 14 434 4545 6.71826625386997 % 15 428 4973 6.62538699690402 % 16 367 5340 5.68111455108359 % 17 297 5637 4.59752321981424 % 18 207 5844 3.20433436532508 % 19 166 6010 2.56965944272446 % 20 138 6148 2.13622291021672 % 21 107 6255 1.65634674922601 % 22 68 6323 1.05263157894737 % 23 50 6373 0.773993808049536 % 24 37 6410 0.572755417956656 % 25 19 6429 0.294117647058824 % 26 11 6440 0.170278637770898 % 27 9 6449 0.139318885448916 % 28 6 6455 0.0928792569659443 % 29 2 6457 0.0309597523219814 % 30 3 6460 0.0464396284829721 % Cheers, Mathieu |
From: Hu Z. <zh...@re...> - 2007-06-29 05:21:33
|
Very nice. Congratulations! 在 2007-06-29五的 12:03 +0900,Takuro Ashie写道: > Hi. > > I have released tomoe-0.6.0, tomoe-gtk-0.6.0, scim-tomoe-0.6.0 > and uim-tomoe-gtk-0.6.0. > > Tomoe is a handwriting recognition engine: > > http://tomoe.sourceforge.net/ > > Changes from 0.5.x: > > * Simplified Chinese dictionary. > (Thanks Red Hat engineers!) > > * Enhanced Japanese dictionary (support JIS X 0208 level2). > > * Choose default dictionary automatically according to current locale. > However currently no dictionary will be enabled with most locales except ja > and zh_CN, and on-demand language switching is not implemented yet. > Please use tomoe application with ja or zh_CN locale like this: > > $ LANG=zh_CN uim-tomoe-gtk > $ LANG=ja scim-tomoe > ... > > * Rename the package name of libtomoe-gtk to tomoe-gtk. > > * Add tomoe_gtk_init() and tomoe_gtk_quit(). > Although tomoe_window_new() calls tomoe_gtk_init() internally for > compatibility reason, it is recommended to call tomoe_gtk_init() manually in > your code. > > * Add --with-gucharmap option to tomoe-gtk. > > * Python binding. > > * Some minor fixes. > > Download: > > * http://sourceforge.net/project/showfiles.php?group_id=193138 > > In addition to it, you can find stroke-editor for tomoe handwriting > dictionary from our subversion repository. > > * http://tomoe.svn.sourceforge.net/viewvc/tomoe/ > > This is also Red Hat engineer's work. Thanks a lot. > > Regards, > -- > Takuro Ashie <as...@ho...> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > tomoe-devel mailing list > tom...@li... > https://lists.sourceforge.net/lists/listinfo/tomoe-devel |
From: Takuro A. <as...@ho...> - 2007-06-29 03:03:46
|
Hi. I have released tomoe-0.6.0, tomoe-gtk-0.6.0, scim-tomoe-0.6.0 and uim-tomoe-gtk-0.6.0. Tomoe is a handwriting recognition engine: http://tomoe.sourceforge.net/ Changes from 0.5.x: * Simplified Chinese dictionary. (Thanks Red Hat engineers!) * Enhanced Japanese dictionary (support JIS X 0208 level2). * Choose default dictionary automatically according to current locale. However currently no dictionary will be enabled with most locales except ja and zh_CN, and on-demand language switching is not implemented yet. Please use tomoe application with ja or zh_CN locale like this: $ LANG=zh_CN uim-tomoe-gtk $ LANG=ja scim-tomoe ... * Rename the package name of libtomoe-gtk to tomoe-gtk. * Add tomoe_gtk_init() and tomoe_gtk_quit(). Although tomoe_window_new() calls tomoe_gtk_init() internally for compatibility reason, it is recommended to call tomoe_gtk_init() manually in your code. * Add --with-gucharmap option to tomoe-gtk. * Python binding. * Some minor fixes. Download: * http://sourceforge.net/project/showfiles.php?group_id=193138 In addition to it, you can find stroke-editor for tomoe handwriting dictionary from our subversion repository. * http://tomoe.svn.sourceforge.net/viewvc/tomoe/ This is also Red Hat engineer's work. Thanks a lot. Regards, -- Takuro Ashie <as...@ho...> |
From: Mathieu B. <mbl...@ru...> - 2007-06-20 12:05:15
|
> I do not understand HMM *like* model which is refered by Hu. What is it? > As far as I read some thesis about HMM for handwriting recognition of > Japanese characters, it seems faster than TOMOE's current logic. Even though we can improve accuracy and maybe performances with HMM, it will take some time. I would like to get a working version of tomoe on Maemo as soon as possible... Mathieu |
From: Hiroyuki I. <poi...@ik...> - 2007-06-20 11:43:18
|
2007-06-20 (水) の 13:24 +0200 に Mathieu Blondel さんは書きました: > On Wed, June 20, 2007 13:08, Hiroyuki Ikezoe wrote: > > But I can not understand yet. > > You mention the performance of recognizer, don't you? > > I mean that even though we use HMM, even though we can train the model, > data still need to be stored in a file. Therefore, it does not solve the > problem of long dictionary loading. Well, the data which is used for trainer and the data which is needed for recgonizer are not the same. Recognizer loads only the latter. > For that, I think my idea of binary > file can provide significant improvements. I'll experiment with this idea > as soon as possible... I do not disagree binary format. I recommend you the best way for TOMOE. If HMM comes, it will have binary format file. > Furthermore, as Hu Zheng pointed out, it is not sure whether HMM will > improve performances or not, it may even worsen them... I do not understand HMM *like* model which is refered by Hu. What is it? As far as I read some thesis about HMM for handwriting recognition of Japanese characters, it seems faster than TOMOE's current logic. |
From: Mathieu B. <mbl...@ru...> - 2007-06-20 11:24:31
|
On Wed, June 20, 2007 13:08, Hiroyuki Ikezoe wrote: > But I can not understand yet. > You mention the performance of recognizer, don't you? I mean that even though we use HMM, even though we can train the model, data still need to be stored in a file. Therefore, it does not solve the problem of long dictionary loading. For that, I think my idea of binary file can provide significant improvements. I'll experiment with this idea as soon as possible... Furthermore, as Hu Zheng pointed out, it is not sure whether HMM will improve performances or not, it may even worsen them... Mathieu |
From: Hiroyuki I. <poi...@ik...> - 2007-06-20 11:10:11
|
2007-06-19 (火) の 19:29 +0200 に Mathieu Blondel さんは書きました: > Hiroyuki Ikezoe wrote: > >> When you train the model, the data still need to be stored somewhere > >> afterhand... > >> > > > > I am sorry I can not understand what you mean. > > What is the meaning of "afterhand"? > > > > > This word does not really exist. This is a neologism based on beforehand > :) So afterhand simply means after... But I can not understand yet. You mention the performance of recognizer, don't you? |
From: Mathieu B. <mbl...@ru...> - 2007-06-20 07:35:21
|
On Wed, June 20, 2007 09:22, Hu Zheng wrote: > Two or three month? Cool > It should mainly improve the recognition accuracy. The performance may > even be slower as it is more complex, but we can optimize it then. > That's what I thought. We definitely need to improve the performances of the current recognizer. Mathieu |
From: Hu Z. <zh...@re...> - 2007-06-20 07:21:26
|
Two or three month? It should mainly improve the recognition accuracy. The performance may even be slower as it is more complex, but we can optimize it then. 在 2007-06-20三的 08:50 +0200,Mathieu Blondel写道: > On Wed, June 20, 2007 03:30, Hu Zheng wrote: > > One of our RedHat engineer is doing this work. He will introduce a new > > recognition algorithm to tomoe, which has learn ability and HMM like. It > > is still in the early prototype stage. We will notify you after we get a > > working beta version :) > > When do you think the beta version will be ready ? And do you think it > will improve the performances or only the recognition accuracy ? > > Mathieu > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > tomoe-devel mailing list > tom...@li... > https://lists.sourceforge.net/lists/listinfo/tomoe-devel |
From: Mathieu B. <mbl...@ru...> - 2007-06-20 06:50:35
|
On Wed, June 20, 2007 03:30, Hu Zheng wrote: > One of our RedHat engineer is doing this work. He will introduce a new > recognition algorithm to tomoe, which has learn ability and HMM like. It > is still in the early prototype stage. We will notify you after we get a > working beta version :) When do you think the beta version will be ready ? And do you think it will improve the performances or only the recognition accuracy ? Mathieu |
From: Hu Z. <zh...@re...> - 2007-06-20 03:08:46
|
Here is a good thesis that you can read: http://reciteword.cosoft.org.cn/redhat/thesis.pdf |
From: Hu Z. <zh...@re...> - 2007-06-20 01:30:02
|
One of our RedHat engineer is doing this work. He will introduce a new recognition algorithm to tomoe, which has learn ability and HMM like. It is still in the early prototype stage. We will notify you after we get a working beta version :) 在 2007-06-19二的 20:08 +0900,Hiroyuki Ikezoe写道: > 2007-06-19 (火) の 08:15 +0200 に Mathieu Blondel さんは書きました: > > > He said he want to also improve loading performance because first > > > stroke takes about 4 times longer than following each strokes, and it's > > > also hard to be patient for most people. > > > > > Well yes, I know that we don't parse the XML file every time... But > > still, XML is not the best backend solution to load the dictionary into > > memory... > > The best way you can do is implement HMM. > It will be low memory consumption and high performance. :-) Of cource it > depends on the model. > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > tomoe-devel mailing list > tom...@li... > https://lists.sourceforge.net/lists/listinfo/tomoe-devel |
From: Hu Z. <zh...@re...> - 2007-06-20 01:24:49
|
Sure! Nice work :) I think you can maintain the new version of stroke-editor from now on. 在 2007-06-19二的 19:56 +0900,Hiroyuki Ikezoe写道: > 2007-06-19 (火) の 10:15 +0800 に Hu Zheng さんは書きました: > > > I think it is better to keep the two branches in svn, the old > > independent version and the tomoe version(as trunk). We may still have > > some small changes to the old version, and it can still be useful. > > Yes, it was my fault, I thought there should have no need for more than > > one branch ago, but now it comes, so I gained this experience :) > > Can you do a "svn mv" to create the "trunk branches tags" structure? And > > add the new codes as trunk. > > I've commited my patch now. By the way, Can I change ChangeLog format to > GNU style? > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > tomoe-devel mailing list > tom...@li... > https://lists.sourceforge.net/lists/listinfo/tomoe-devel |
From: Mathieu B. <mbl...@ru...> - 2007-06-19 17:29:16
|
Hiroyuki Ikezoe wrote: >> When you train the model, the data still need to be stored somewhere >> afterhand... >> > > I am sorry I can not understand what you mean. > What is the meaning of "afterhand"? > > This word does not really exist. This is a neologism based on beforehand :) So afterhand simply means after... |
From: Hiroyuki I. <poi...@ik...> - 2007-06-19 12:12:55
|
2007-06-19 (火) の 13:57 +0200 に Mathieu Blondel さんは書きました: > > With HMM, learning data, which is used by recognizer, does not need raw > > stroke data. The size will be smaller than the current TOMOE's stroke > > data even though it depends on the model. > > When you train the model, the data still need to be stored somewhere > afterhand... I am sorry I can not understand what you mean. What is the meaning of "afterhand"? |
From: Mathieu B. <mbl...@ru...> - 2007-06-19 11:57:49
|
> With HMM, learning data, which is used by recognizer, does not need raw > stroke data. The size will be smaller than the current TOMOE's stroke > data even though it depends on the model. When you train the model, the data still need to be stored somewhere afterhand... Mathieu |
From: Hiroyuki I. <poi...@ik...> - 2007-06-19 11:40:15
|
2007-06-19 (火) の 13:23 +0200 に Mathieu Blondel さんは書きました: > On Tue, June 19, 2007 13:08, Hiroyuki Ikezoe wrote: > > > The best way you can do is implement HMM. > > It will be low memory consumption and high performance. :-) Of cource it > > depends on the model. > > Can you elaborate ? I think HMM (Hidden Markov Model) could improve > recognition accuracy, but I am not sure it would improve performances... With HMM, learning data, which is used by recognizer, does not need raw stroke data. The size will be smaller than the current TOMOE's stroke data even though it depends on the model. |
From: Mathieu B. <mbl...@ru...> - 2007-06-19 11:23:30
|
On Tue, June 19, 2007 13:08, Hiroyuki Ikezoe wrote: > The best way you can do is implement HMM. > It will be low memory consumption and high performance. :-) Of cource it > depends on the model. Can you elaborate ? I think HMM (Hidden Markov Model) could improve recognition accuracy, but I am not sure it would improve performances... Anyway as I said the very first stroke takes about 15 sec. Knowing that the following strokes take 4 sec, this means that dictionary loading takes about 11 sec. Mathieu |
From: Hiroyuki I. <poi...@ik...> - 2007-06-19 11:10:02
|
2007-06-19 (火) の 08:15 +0200 に Mathieu Blondel さんは書きました: > > He said he want to also improve loading performance because first > > stroke takes about 4 times longer than following each strokes, and it's > > also hard to be patient for most people. > > > Well yes, I know that we don't parse the XML file every time... But > still, XML is not the best backend solution to load the dictionary into > memory... The best way you can do is implement HMM. It will be low memory consumption and high performance. :-) Of cource it depends on the model. |
From: Hiroyuki I. <poi...@ik...> - 2007-06-19 10:58:41
|
2007-06-19 (火) の 10:15 +0800 に Hu Zheng さんは書きました: > I think it is better to keep the two branches in svn, the old > independent version and the tomoe version(as trunk). We may still have > some small changes to the old version, and it can still be useful. > Yes, it was my fault, I thought there should have no need for more than > one branch ago, but now it comes, so I gained this experience :) > Can you do a "svn mv" to create the "trunk branches tags" structure? And > add the new codes as trunk. I've commited my patch now. By the way, Can I change ChangeLog format to GNU style? |
From: Hiroyuki I. <poi...@ik...> - 2007-06-19 10:29:27
|
2007-06-19 (火) の 10:15 +0800 に Hu Zheng さんは書きました: > Use python to develop stroke-editor is much faster, and I would like to > learn a new programming language :) > > I think it is better to keep the two branches in svn, the old > independent version and the tomoe version(as trunk). We may still have > some small changes to the old version, and it can still be useful. > Yes, it was my fault, I thought there should have no need for more than > one branch ago, but now it comes, so I gained this experience :) > Can you do a "svn mv" to create the "trunk branches tags" structure? And > add the new codes as trunk. Done. URL changed from https://tomoe.svn.sourceforge.net/svnroot/tomoe/tools/stroke-editor to https://tomoe.svn.sourceforge.net/svnroot/tomoe/stroke-editor |
From: Mathieu B. <mbl...@ru...> - 2007-06-19 06:12:06
|
Hi > He seems understand this issue correctly. > > He said he want to also improve loading performance because first > stroke takes about 4 times longer than following each strokes, and it's > also hard to be patient for most people. > Well yes, I know that we don't parse the XML file every time... But still, XML is not the best backend solution to load the dictionary into memory... Mathieu |
From: Hu Z. <zh...@re...> - 2007-06-19 02:14:30
|
Use python to develop stroke-editor is much faster, and I would like to learn a new programming language :) I think it is better to keep the two branches in svn, the old independent version and the tomoe version(as trunk). We may still have some small changes to the old version, and it can still be useful. Yes, it was my fault, I thought there should have no need for more than one branch ago, but now it comes, so I gained this experience :) Can you do a "svn mv" to create the "trunk branches tags" structure? And add the new codes as trunk. 在 2007-06-18一的 19:40 +0900,Hiroyuki Ikezoe写道: > Hello, > > 2007-06-18 (月) の 10:28 +0800 に Hu Zheng さんは書きました: > > The old implementation has the advantage of don't depend on tomoe, but > > use tomoe python binding should be the way. I think your version can be > > 1.5 or 2.0 some thing like. > > Well, stroke-editor was my first python project, so it is a little c > > style :) > > Yo-ho-ho! I had never written python code before TOMOE's python tests > too! Python is not easy to use for me. If I were you, I wrote > stroke-editor in C. :-) > > > I think you can create a branch in svn and add your patch into it > > directly. > > Are you willing to go on developing stroke-editor without TOMOE's python > binding? I think you do not have the will yet since you did not replay > [tomoe-devel 39]. > > If you do not have the will as ever, the current stroke-editor code > should be in branches, shouldn't it? > > Thank you, > |
From: Takuro A. <as...@ho...> - 2007-06-19 01:04:39
|
On Tue, 19 Jun 2007 07:19:22 +0900 Hiroyuki Ikezoe <poi...@ik...> wrote: > > tomoe_recognizer_simple_get_candidates) but we will need to improve the > > dictionary backend too because actually the very first stroke takes > > about 15 seconds. Then the following strokes take about 4 seconds each. > > An XML file is quite long to parse... I think we can reach better > > performances with a binary file. > > You misundertand. TomoeRecognizer does not parse XML while searching. He seems understand this issue correctly. He said he want to also improve loading performance because first stroke takes about 4 times longer than following each strokes, and it's also hard to be patient for most people. Regards, -- Takuro Ashie <as...@ho...> |
From: Hiroyuki I. <poi...@ik...> - 2007-06-18 22:19:38
|
Hello, 2007-06-18 (月) の 22:30 +0200 に Mathieu Blondel さんは書きました: > I am going to study the recognizer first (get_candidates, > dist_tomoe_points, tomoe_char_compare, > tomoe_recognizer_simple_get_candidates) but we will need to improve the > dictionary backend too because actually the very first stroke takes > about 15 seconds. Then the following strokes take about 4 seconds each. > An XML file is quite long to parse... I think we can reach better > performances with a binary file. You misundertand. TomoeRecognizer does not parse XML while searching. XML file is loaded and parsed on initialization of TomoeRecognizer. See constructor() in tomoe-recognizer-simple.c. Thank you, -- Hiroyuki Ikezoe <poi...@ik...> |