KR102033327B1 - Apparatus and method for building sorting corpus by user participation - Google Patents
Apparatus and method for building sorting corpus by user participation Download PDFInfo
- Publication number
- KR102033327B1 KR102033327B1 KR1020150084039A KR20150084039A KR102033327B1 KR 102033327 B1 KR102033327 B1 KR 102033327B1 KR 1020150084039 A KR1020150084039 A KR 1020150084039A KR 20150084039 A KR20150084039 A KR 20150084039A KR 102033327 B1 KR102033327 B1 KR 102033327B1
- Authority
- KR
- South Korea
- Prior art keywords
- sorting
- user
- sentence
- word
- module
- Prior art date
Links
Images
Classifications
-
- G06F17/2845—
-
- G06F17/2705—
-
- G06F17/2818—
Landscapes
- Machine Translation (AREA)
Abstract
The sorting module receives the type of the game and the purpose of the game, and presents the sorting method according to the type of the game and the purpose of the game to the user, the sentence sorting step of receiving a sentence or word alignment, the sorting module A user evaluation step of determining a level of a user by monitoring a progress of sorting, a sentence recommendation module evaluating a sorted sentence using a level of the user, and a sentence recommendation module evaluating the sorted sentence And a sentence recommendation step of recommending a sentence that improves performance of statistical machine translation (SMT) using the result of the user and the level of the user.
Description
The present invention relates to a user-participating sorting corpus building apparatus and method for building a sorting corpus that is a source of various knowledge extraction for automatic translation.
At present, the biggest flow of automatic translation technology is statistical-based automatic machine translation (Statistical Machine Translation) method. Statistical-based automatic translation is a method of performing word-by-word sorting by applying a machine learning method based on a statistical model to a corpus arranged in sentence units. . Therefore, for statistical-based automatic translation, an ordered corpus is required and wording mapping information can be correctly extracted from the ordered corpus.
However, there are not many language pairs that exist on a scale level where sentence-aligned corpus is required, which is a deadly obstacle especially in the development of multilingual automatic translation technology. In such a situation, constructing a sentence-aligned corpus through an individual language expert requires a lot of time and money, and it is difficult to construct a corpus aligned by words.
In the case of sentence-based corpus, statistics-based automatic translation automatically obtains word-by-word sorting results using the automatic sorting methodology, but the automatic sorting results include a large number of errors, resulting in a decrease in translation performance. There are two main types of errors in this automatic sorting result.
First, errors occur due to problems with the automatic alignment methodology itself. In other words, it is not easy to generate a perfect sort result using the automatic sort method.
The second is an error due to inadequate sentences. In other words, due to sentences that are not easy to sort automatically, sorting errors occur, which degrades the performance of statistics-based automatic translation. For this reason, in case of statistics-based automatic translation, performance is not improved because there are many parallel corpus unconditionally, and it is possible to improve the performance by selecting the appropriate sentences for sorting.
Therefore, there is a need for a method of selecting a sentence that can improve the statistics-based automatic translation performance by constructing a correct word-based corpus.
An object of the present invention is to solve the above problems, and based on a user-participated environment in a game form, the user can easily construct a sentence or word unit corpus through a game to improve statistics-based automatic translation performance. This is to help you select a sentence that is helpful.
Furthermore, an object of the present invention is to provide a user with low language ability according to the user's purpose and to improve the foreign language ability, and to store the user level to evaluate the user's quality.
In order to achieve the above object, a user-participating sorting corpus building method according to an embodiment of the present invention, the sorting module receives the type of the game and the purpose of the game, the sorting method according to the type of game and the purpose of the game to the user Presenting a sentence sorting step to receive a sentence or word alignment, the sorting module monitors the user's sorting progress to determine the user's level, the sentence recommendation module using the user's level, Sentence evaluation step to evaluate the sorted sentences and sentence recommendation step that the sentence recommendation module recommends sentences that improve the performance of statistical machine translation (SMT) by using the result of the sorted sentence evaluation and the user level It includes.
In the sentence sorting step, the sorting module receives a selection of a game type and a purpose of the game, the sorting module presents a sorting method to the user, and uses the same to sort the sentence or word of the user, and the sorting module is an example. Searching for and providing example sentences to the user.
In the step of arranging a sentence or word, the sorting module generates a word unit by chunking a plurality of words until the sorting is completed.
In providing an example sentence, the sorting module may provide a word unit dictionary, a phrase unit dictionary, and a corpus example sentence to the user.
The user evaluation step is characterized by evaluating the user by comparing the difficulty of the sentence, the time spent on the sentence alignment, and the sentence alignment result with other users' alignment results or existing alignment information.
In order to achieve the above object, a user participatory sorting corpus building apparatus according to an embodiment of the present invention receives a type of a game and a purpose of a game, and presents a sorting method according to the type of a game and a game to a user. A sorting module that receives the sorting of sentences or words, monitors the user's sorting progress to determine the level of the user, and evaluates the sorted sentences using the user's level. It includes a sentence recommendation module for recommending a sentence that improves the performance of statistical machine translation (SMT) using a database and a database for storing the user's level and sorted sentences.
The sorting module retrieves an example sentence while receiving an alignment of a sentence or a word and provides an example sentence to the user.
The sorting module may generate a word unit by chunking a plurality of words until the sorting is completed.
The example sentence is characterized by being a word unit dictionary, a phrase unit dictionary, and a corpus example sentence.
The sentence recommendation module is characterized by evaluating the user by comparing the difficulty of the sentence, the time spent in the sentence alignment, and the sentence alignment result with other users' alignment results or existing alignment information.
According to the present invention, by using a user-participatory environment in the form of a game, users with various levels of language abilities can easily build a sentence or word unit corpus through a game in large quantities, thereby increasing the accuracy of the alignment.
In addition, according to the present invention, it is possible to select only sentences that contribute to the improvement of statistics-based automatic translation performance by evaluating user level, thereby improving the performance of statistics-based automatic translation.
Furthermore, according to the present invention, a user who has a low level of language can improve his or her language ability, and can obtain information of users having a high level of language by making a database of user level evaluations.
1 is a view for explaining the configuration of the user participatory alignment corpus building apparatus according to an embodiment of the present invention.
2 is a flowchart illustrating a user participatory sorting corpus building method according to an embodiment of the present invention.
3 is a flowchart illustrating a sentence alignment step of aligning a sentence or a word according to an exemplary embodiment of the present invention.
4 is a flowchart illustrating an alignment method according to an exemplary embodiment of the present invention and a user's alignment using the alignment method.
5 is a view for explaining a step-by-word alignment method according to an embodiment of the present invention.
Hereinafter, the preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the technical idea of the present invention. . First of all, in adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are used as much as possible even if displayed on different drawings. In addition, in describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.
Hereinafter, a user participatory alignment corpus building apparatus and method according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
1 is a view for explaining the configuration of a user participatory alignment corpus building apparatus. 2 is a flowchart illustrating a user participatory sorting corpus building method. 3 is a flowchart illustrating a sentence sorting step of sorting a sentence or a word. 4 is a flowchart illustrating a method of arranging a user using the method of presenting the sorting method. FIG. 5 is a diagram for describing a word-by-word alignment method. FIG.
As shown in FIG. 1, the user participatory sorting corpus constructing apparatus includes an
At this time, the
The
Database 300 stores the user-level database for storing the level of each user who progressed the game through the
The
As shown in FIG. 2, in the user-participating sorting corpus building method, a sentence sorting step (S100) of sorting a sentence or a word using a game selected by the user through the
Here, in the sentence sorting step (S100) of sorting a sentence or a word, the user selects the type of game and the purpose of the game, and the
In the user evaluation step (S200) in which the
That is, the user did not produce the entire sorting result correctly because the sentence pair is not a correct translation, the sentence pair is a paraphrase sentence that is not easy to sort, or the user's level is insufficient to produce the correct sorting result. Can be. The
More specifically, while the user is playing the alignment game, the
At this time, the
Furthermore, the level of the user can play the same role as the certification for the foreign language.
Furthermore, the user participatory sorting corpus building method according to an embodiment of the present invention, after the user evaluation step (S200) of determining the level of the user, the user feedback step of feeding back information about the error of the user's alignment result to the user It may further include. In this case, the user may receive feedback about the error and correct the alignment result of the sentence or may determine that the error feedback is wrong and ignore it.
The
In this case, in order to evaluate the sorting sentence (S300), the
More specifically, a case in which the original sentence is not aligned with the unit of the translated sentence may occur in the aligned sentence. If there is a unit that has not yet been sorted even though the sentence has been sorted, the
For example, if the user's level is low, the
Furthermore, in the sentence recommendation step (S400) for recommending a sentence that improves the performance of statistics-based automatic translation, the
For example, the
As shown in FIG. 3, in the step S100 of sorting a sentence or a word, a step in which a user selects a type of a game and a purpose of the game in step S110, and the
In more detail, in step S110, when the user selects the type of game and the purpose of the game, the user may select whether to execute a game in sentence unit alignment or a game in word unit alignment. Here, the text to be sorted by the user may be provided by the
In this case, when the user executes the sentence-based sorting game, the user plays a game based on the document unit that sorts the original document into the translated document. When the user executes the word-based sorting game, the user reads the original sentence. You will play a game based on sentence units sorted by translation sentences.
When the user selects the type of game to proceed, the
In this case, when the user executes a sorting game for learning a word, the
On the other hand, when the user executes the alignment game for building the alignment corpus, the
Here, the primary alignment result may be a dictionary-based translation result and a machine translation result. In other words, when a user runs a sorting game for constructing a sorting corpus, the user does not need to directly translate and sort all the sentences that he or she is sorting. can do.
Thereafter, the
At this time, the user performs the sorting by using the sorting unit for each word unit, which will be described later, through the
As the user progresses the sorting, the
For example, the
In this case, when the
As shown in FIG. 4, when the user executes the word-based sorting game, the
Here, the
Referring to FIG. 5, the sorting method according to each word unit will be described in more detail. The
By using the word-by-word sorting method as described above, the
Although a preferred embodiment according to the present invention has been described above, it can be modified in various forms, and those skilled in the art can make various modifications and modifications without departing from the claims of the present invention. It is understood that it may be practiced.
100: alignment module 200: control unit
300: database 400: sentence recommendation module
Claims (10)
A user evaluation step of determining a level of the user by monitoring the progress of the user's alignment in the alignment module;
An order sentence evaluation step of evaluating an ordered sentence using a level of the user in a sentence recommendation module; And
Constructing a user-participating sorting corpus comprising a sentence recommendation step of recommending a sentence that improves performance of statistical machine translation (SMT) using the result of the alignment sentence evaluation and the level of the user in the sentence recommendation module Way.
The sentence sorting step,
Receiving, by the sorting module, a selection about a type of game and a purpose of the game;
Presenting a sorting method to the user by the sorting module, arranging sentences or words of the user to correspond to the sorting method;
And the sorting module searches for an example sentence and provides the searched example sentence to the user.
Arranging the sentence or word,
And the sorting module generates a word unit by chunking a plurality of words until sorting is completed.
In providing the example sentence,
And the sorting module provides a word unit dictionary, a phrase unit dictionary, and a corpus example.
The user evaluation step,
A method for constructing a user-participating sorting corpus, which evaluates the user by comparing a sentence difficulty, time spent sorting a sentence, and a sentence sorting result with other sorting results or existing sorting information.
A sentence that evaluates the sentence sorted using the level of the user, and recommends a sentence that improves the performance of statistical machine translation (SMT) using the result of the sorted sentence evaluation and the level of the user. Recommended module and
And a database for storing the user's level and the sorted sentences.
The alignment module,
A user participatory sorting corpus constructing device for searching for example sentences while receiving the alignment of the sentence or word and providing the example sentences to the user.
The alignment module
A user-participating sorting corpus building device that generates word units by chunking a plurality of words until sorting is complete.
The example sentence,
A user participatory sorting corpus building apparatus, characterized in that it is a word unit dictionary, a phrase unit dictionary, and a corpus example sentence.
The sentence recommendation module,
And a sentence difficulty level, time spent sorting sentences, and a sentence sorting result, which are compared with other users' sorting results or existing sorting information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150084039A KR102033327B1 (en) | 2015-06-15 | 2015-06-15 | Apparatus and method for building sorting corpus by user participation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150084039A KR102033327B1 (en) | 2015-06-15 | 2015-06-15 | Apparatus and method for building sorting corpus by user participation |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20160147375A KR20160147375A (en) | 2016-12-23 |
KR102033327B1 true KR102033327B1 (en) | 2019-10-17 |
Family
ID=57736098
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150084039A KR102033327B1 (en) | 2015-06-15 | 2015-06-15 | Apparatus and method for building sorting corpus by user participation |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR102033327B1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010020675A (en) * | 2008-07-14 | 2010-01-28 | Brother Ind Ltd | Translator selection support method, translator selection support server, and translator selection support program |
KR101478282B1 (en) * | 2013-06-24 | 2015-01-02 | 인제대학교 산학협력단 | Method for providing web-based translation service using a collective intelligence |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101721536B1 (en) | 2010-08-23 | 2017-04-10 | 에스케이플래닛 주식회사 | statistical WORD ALIGNMENT METHOD FOR APPLYING ALIGNMENT TENDENCY BETWEEN WORD CLASS AND machine translation APPARATUS USING THE SAME |
KR20130014730A (en) * | 2011-08-01 | 2013-02-12 | 한국전자통신연구원 | Tuning system of automatic translator by a plurality of users |
KR20140049150A (en) * | 2012-10-16 | 2014-04-25 | 한국전자통신연구원 | Automatic translation postprocessing system based on user participating |
-
2015
- 2015-06-15 KR KR1020150084039A patent/KR102033327B1/en active IP Right Grant
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010020675A (en) * | 2008-07-14 | 2010-01-28 | Brother Ind Ltd | Translator selection support method, translator selection support server, and translator selection support program |
KR101478282B1 (en) * | 2013-06-24 | 2015-01-02 | 인제대학교 산학협력단 | Method for providing web-based translation service using a collective intelligence |
Also Published As
Publication number | Publication date |
---|---|
KR20160147375A (en) | 2016-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10311146B2 (en) | Machine translation method for performing translation between languages | |
JP6515624B2 (en) | Method of identifying lecture video topics and non-transitory computer readable medium | |
KR100750886B1 (en) | Apparatus and method for learning data construction | |
Jarosz | Learning with hidden structure in optimality theory and harmonic grammar: Beyond robust interpretive parsing | |
CN110990691A (en) | Online course recommendation method and device and computer storage medium | |
US20200042433A1 (en) | System and method for determining quality metrics for a question set | |
Choshen et al. | Automatic metric validation for grammatical error correction | |
Seljan et al. | Human evaluation of online machine translation services for english/russian-croatian | |
CN107066452A (en) | Translate householder method, translation servicing unit, translating equipment and translation auxiliary program | |
CN107977454A (en) | The method, apparatus and computer-readable recording medium of bilingual corpora cleaning | |
TWI403911B (en) | Chinese dictionary constructing apparatus and methods, and storage media | |
CN112691379B (en) | Game resource text auditing method and device, storage medium and computer equipment | |
US20030195740A1 (en) | Translation evaluation using optimum template pattern determination method | |
CN106484851A (en) | Information searching system based on talents selection information platform and method | |
KR102033327B1 (en) | Apparatus and method for building sorting corpus by user participation | |
KR101745874B1 (en) | System and method for a learning course automatic generation | |
Temesi | An interactive approach to determine the elements of a pairwise comparison matrix | |
CN113505195A (en) | Knowledge base, construction method and retrieval method thereof, and question setting method and system based on knowledge base | |
CN110134945B (en) | Method, device, equipment and storage medium for identifying examination points of exercise | |
Luo et al. | Elementary Students' Understanding of Variables in Computational Thinking-Integrated Instruction: A Mixed Methods Study | |
Aranberri et al. | Tools and guidelines for principled machine translation development | |
US6598019B1 (en) | Evaluation method, apparatus, and recording medium using optimum template pattern determination method, apparatus and optimum template pattern | |
CN108446277B (en) | Method and device for simulating learning | |
CN115170364A (en) | Reading method, device and equipment of student books and storage medium | |
KR20230136265A (en) | Personalized and customized education support system based on elo rating |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E701 | Decision to grant or registration of patent right |