WO2016208941A1 - Procédé de prétraitement de texte et système de prétraitement permettant de mettre en œuvre ledit procédé - Google Patents

Procédé de prétraitement de texte et système de prétraitement permettant de mettre en œuvre ledit procédé Download PDF

Info

Publication number
WO2016208941A1
WO2016208941A1 PCT/KR2016/006576 KR2016006576W WO2016208941A1 WO 2016208941 A1 WO2016208941 A1 WO 2016208941A1 KR 2016006576 W KR2016006576 W KR 2016006576W WO 2016208941 A1 WO2016208941 A1 WO 2016208941A1
Authority
WO
WIPO (PCT)
Prior art keywords
term
text
substitute
preprocessing
alternative
Prior art date
Application number
PCT/KR2016/006576
Other languages
English (en)
Korean (ko)
Inventor
문연국
이동현
채승훈
윤희화
Original Assignee
전자부품연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전자부품연구원 filed Critical 전자부품연구원
Priority to CN201680001271.6A priority Critical patent/CN107148624A/zh
Publication of WO2016208941A1 publication Critical patent/WO2016208941A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Definitions

  • the present invention relates to a text translation technology, and more particularly, to a text preprocessing method and a preprocessing system for performing the same, which can improve the accuracy of machine translation through preprocessing by replacing the terms contained in the text to be translated into standard words.
  • Webtoon is a compound word of Web and Cartoon, which collectively refers to the web comic platform. Webtoons, combined with mobile device technology, have emerged as one of the world's most popular media content. With the worldwide popularity of webtoons, webtoons are being translated into various languages.
  • the conventional machine translation engine the translation accuracy of non-standard languages such as new words, Korean destructive words, colloquial words, onomatopoeia, diaphragms, dialects is significantly lowered, and the accuracy is greatly influenced by the translation method, terms built in the engine-specific DB, etc. There is a problem.
  • Korean Patent No. 10-1099177 relates to a method and system for training a machine translator, and discloses a machine translator trained with text inputs generated by other machine translators.
  • Text input in a first language is provided by a user or other source, which is then translated by the first machine translator to produce a translated version of the text input in a second language.
  • the text input and translated versions are parsed and passed through a training architecture to develop transfer mappings and bilingual dictionaries. These components are then used when translating other text inputs by a second machine translator.
  • Korean Patent No. 10-0961717 relates to a method and apparatus for detecting a machine translation error using a parallel corpus.
  • the automatic translation and detection of an error in a rule-based machine translation system using a parallel corpus it is found in machine translation. It is possible not only to correct the errors using the parallel language corpus' target language sentence, that is, the correct answer sentence, but also to classify the error types and provide the error information with a certain frequency or more, so that the time required to detect errors in machine translation Efforts can be drastically reduced, and system engineers can easily improve the performance of the machine translation system through detected and tracked error information, thereby maximizing the performance improvement efficiency of the machine translation system.
  • An embodiment of the present invention is to provide a text preprocessing method for performing a preprocessing to replace the terms contained in the text to be translated into a standard language, and a preprocessing system for performing the same.
  • An embodiment of the present invention is to provide a text preprocessing method and a preprocessing system for performing the same, which can improve translation accuracy by performing preprocessing on the text to be translated prior to machine translation.
  • One embodiment of the present invention to provide a text preprocessing method and a preprocessing system that can improve the translation accuracy of webtoon text including non-standard languages, such as new words, destructive Hangul, colloquialism, onomatopoeia, diametics, dialect do.
  • the text preprocessing system includes an alternative term database for storing alternative terms and a processor for preprocessing the input text to output a text of the same language as the input text, wherein the preprocessing engine Identifies a substitute target term in the input text and outputs a text in which the identified substitute target term is replaced with the substitute term.
  • the preprocessing engine is a morpheme analysis unit for separating the input text into morpheme units and determine parts of speech of the separated morphemes, a term identification unit for identifying whether the separated morphemes correspond to the replacement target term And when the separated morpheme corresponds to a substitute target term, the substitute term search unit searching for a substitute term corresponding to the substitute target term in the substitute term database based on whether the terms match.
  • the preprocessing engine may further include a text generation unit configured to generate a text in which the substitute target term is replaced with the searched substitute term.
  • the preprocessing engine analyzes the syntax of the input text to estimate the meaning of the term without the alternative term.
  • the apparatus may further include a syntax analyzer, and the alternative term search unit may search for an alternative term corresponding to an estimated meaning based on the syntax analysis result.
  • the syntax analyzer may estimate the meaning of the term by generating a syntax tree structure by analyzing the separated morphemes according to grammar.
  • the preprocessing engine may further include an alternative term register configured to associate the term with an alternative term corresponding to an estimated meaning of the term and store the term in the substitute term database.
  • the text preprocessing system further includes a substitute target term database storing a substitute target term, and the preprocessing engine is configured to input the text based on whether or not the substitute target term is stored in the substitute target term database.
  • the replacement target term can be identified from.
  • the text preprocessing system further comprises a translation terminology database for storing machine translation terms, wherein the preprocessing engine replaces terms in the input text based on whether the terminology is included in the translation terminology database. Can be identified.
  • the processor may execute a machine translation engine that translates the input text into text of another language, and the machine translation engine may machine translate the text output from the preprocessing engine into a set language.
  • the text preprocessing method may include (a) separating the input text into morphological units and determining a part-of-speech of the separated morphemes, and (b) identifying whether the separated morphemes correspond to replacement terms. (C) if the isolated morpheme corresponds to a substitute target term, searching for a substitute term corresponding to the substitute target term in the substitute term database based on whether the terms match; and (d) the substitute. Generating text replacing a target term with the searched substitute term.
  • a text preprocessing method and a preprocessing system for performing the same may perform preprocessing for replacing a term included in a text to be translated with a standard word.
  • the text preprocessing method and the preprocessing system performing the same according to an embodiment of the present invention may improve the translation accuracy by performing preprocessing on the text to be translated before the machine translation.
  • a text preprocessing method and a preprocessing system for performing the same can improve the translation accuracy of webtoon text including non-standard languages such as new words, Korean destructive words, spoken words, onomatopoeia, phrasal verbs, dialects, and dialect.
  • FIG. 1 is a diagram illustrating a text preprocessing system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating a text preprocessing server in FIG. 1.
  • FIG. 3 is a block diagram illustrating a preprocessing engine in FIG. 2.
  • FIG. 4 is a block diagram illustrating a parsing process.
  • FIG. 5 is a flowchart illustrating a text translation method performed in the text preprocessing system of FIG. 1.
  • first and second are intended to distinguish one component from another component, and the scope of rights should not be limited by these terms.
  • first component may be named a second component, and similarly, the second component may also be named a first component.
  • an identification code (e.g., a, b, c, etc.) is used for convenience of description, and the identification code does not describe the order of the steps, and each step clearly indicates a specific order in context. Unless stated otherwise, they may occur out of the order noted. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.
  • the present invention can be embodied as computer readable code on a computer readable recording medium
  • the computer readable recording medium includes all kinds of recording devices in which data can be read by a computer system.
  • Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like, and are also implemented in the form of a carrier wave (for example, transmission over the Internet). It also includes.
  • the computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
  • FIG. 1 is a diagram illustrating a text preprocessing system according to an embodiment of the present invention.
  • the text preprocessing system 100 includes a user terminal 110, a text preprocessing server 120, a first database 130, and a second database 140, which may be connected through a network. .
  • the user terminal 110 may correspond to a computing device connected to the text preprocessing server 120 and may be implemented as, for example, a desktop, a notebook, a tablet PC, or a smartphone. In one embodiment, the user terminal 110 may correspond to a desktop and may be connected to the text preprocessing server 120 via a LAN.
  • the text preprocessing server 120 may preprocess the text before outputting the input text into text of another language.
  • the text preprocessing server 120 may be associated with a text translation server (not shown) that includes a preprocessing engine for preprocessing the input text and a machine translation engine for machine translation of the preprocessed text.
  • the preprocessing engine may identify a substitute target term in the input text, and output the text in which the identified substitute target term is replaced with the substitute term.
  • the machine translation engine machine translates and outputs the text output from the preprocessing engine in a set language.
  • text preprocessing server 120 may be associated with a crowd sourcing server (not shown) that includes a crowd sourcing translation engine.
  • Crowd sourcing Translation engines can use a database built through crowd sourcing to modify machine translated text to suit the language's representation.
  • the text preprocessing server, translation server, and crowd sourcing server may be provided in one system and implemented, or may be implemented in different systems and connected in the order of processing.
  • the first database 130 may correspond to a substitute target term database that stores the substitute target term.
  • the preprocessing engine may identify the substitute target term included in the text by determining whether the substitute target term stored in the first database 130 is included in the text.
  • the first database 130 may correspond to a translation term database that stores translation terms.
  • the preprocessing engine may determine whether the term included in the text is included in the first database to identify a substitute target term. For example, if a term included in the text is not included in the first database 130, the preprocessing engine may identify the term to be replaced.
  • the second database 140 may correspond to an alternative term database that stores alternative terms.
  • the preprocessing engine may search for a substitute term corresponding to the substitute target term in the second database 140.
  • FIG. 2 is a block diagram illustrating a text preprocessing server in FIG. 1.
  • the text preprocessing server 110 may include a processor 210, a memory 220, a storage device 230, a network interface 240, a user interface input device 250, and a user interface output device 260. It includes.
  • Processor 210 executes preprocessing engine 212 and memory manager 216.
  • the preprocessing engine 212 identifies a substitute target term in the input text, and outputs text in which the identified substitute target term is replaced with the substitute term.
  • the memory manager 216 manages data in the memory 220 that is read or written by the preprocessing engine 212.
  • the memory 220 may be implemented as a volatile or nonvolatile memory.
  • the storage device 230 may be implemented as a nonvolatile memory such as a solid state disk (SSD) or a hard disk drive (HDD), and is used to store data necessary for the text preprocessing server 120.
  • SSD solid state disk
  • HDD hard disk drive
  • the network interface 240 may include a device for connecting with a network and may include, for example, an adapter for local area network (LAN) communication.
  • LAN local area network
  • the user interface input device 250 includes a device for receiving user input, and may include, for example, an adapter such as a mouse, trackball, touch pad, graphics tablet, scanner, touch screen, keyboard or pointing device.
  • the user interface output device 260 includes a device for outputting specific information (eg, translated text) to a user, and may include an adapter such as a monitor or a touch screen, for example.
  • FIG. 3 is a block diagram illustrating a preprocessing engine in FIG. 2.
  • the preprocessing engine 212 includes a morpheme analysis unit 310, a term identification unit 320, an alternative term search unit 330, a text generation unit 340, and a syntax analysis unit 350. .
  • the preprocessing engine 212 receives the text (hereinafter, referred to as webtoon text) included in the web morpheme analysis unit 310. For example, all the texts included in the webtoon, such as speech bubbles, commentary, onomatopoeia, and idioms, may be input to the preprocessing engine 212. In one embodiment, the preprocessing engine 212 may receive webtoon text from a text recognition engine that recognizes text included in an image, or may receive webtoon text read and organized by a human.
  • the morpheme analyzer 310 separates the input text into morpheme units and determines the parts of speech of the separated morphemes.
  • the morpheme analyzer 310 may restore each of the separated morphemes to a circular shape and determine the part-of-speech based on the restored morphemes.
  • a morpheme is the smallest unit of speech with a certain meaning, the smallest significant unit, and is also called an associate. For example, 'clear sky' can be divided into four morphemes, such as 'sky', 'yi', 'clear-', and '-da', and when they are no longer separated, their meanings change or disappear. .
  • the morpheme analyzer 310 may refer to the word 'ni' (NP, pronoun), 's' (XSN, noun-derived suffix), 'where' ( NP, pronoun), 'school' (NNG, general noun), 'ya' (JKV, thorough investigation) can be divided into morphemes and parts of speech can be determined.
  • the term identification unit 320 identifies whether the morpheme separated from the morpheme analyzer 310 corresponds to a term to be replaced.
  • the term identification unit 320 may be a webtoon term included in the text based on whether or not the corresponding webtoon term (substituted term) stored in the webtoon term database (substituted term database) matches the morpheme. Term) can be identified.
  • the webtoon terminology database (substituted terminology) may be stored in advance.
  • the term identification unit 320 may be a webtoon term included in the text based on whether or not the corresponding morpheme is included in the machine translation term (translation term) stored in the machine translation term database (translation term database). Subject terms) can be identified. For example, when the separated morphemes are not included in the machine translation terminology, the term identification unit 320 may identify the morphemes as webtoon terms (substituted terms).
  • the substitute term search unit 330 searches for the alternative term in the substitute term database 140 based on whether the terms match. For example, the substitute term search unit 330 may search for an alternative term having a match rate equal to or greater than a preset value by comparing the substitute term and the substitute term stored in the substitute term database 140. In one embodiment, the alternate term may correspond to a standard word.
  • the alternative term search unit 330 may search for 'you' as an alternative term.
  • the alternative term search unit 330 may determine a plurality of alternative terms based on a relationship with other morphemes around the parts of the substitute target term or the morpheme identified as the substitute target term.
  • One alternative term can be determined from among two alternative terms. For example, when there are a plurality of alternative terms corresponding to 'ni', the alternative term search unit 330 refers to a coupling relationship (eg, a plurality of others) of other morphs ('s') around 'ni'. 'You' can be determined as an alternative term.
  • the substitute term search unit 330 may search for substitute terms for each substitute target term.
  • the text generation unit 340 replaces the substitute target term with the substitute term searched by the substitute term search unit 330 to generate text in which the webtoon term is replaced with the standard language.
  • the preprocessing engine 212 may output the webtoon text before the preprocessing and the text replacing the webtoon term in the standard language by the text generation unit 340 together.
  • the parser 350 analyzes the syntax of the input text and estimates the meaning of the term without the substitute term.
  • the syntax analysis unit 350 analyzes the morphemes separated by the morpheme analysis unit 310 according to the grammar to generate a syntax tree structure and estimate the meaning of the terms.
  • FIG. 4 is a block diagram illustrating a parsing process.
  • the parser 350 may analyze the separated morphemes in parallel and analyze the morphemes according to grammar to generate a syntax tree structure. For example, as shown in FIG. If the webtoon text ' ⁇ ⁇ ' is input, the parser 350 generates a syntax tree structure as shown in FIG.
  • the webtoon text of FIG. 4 (a) is the first object according to the Korean grammar (for example, the form of a sentence, a rule for the subject to be at the beginning of the sentence, a rule for the verb to be at the end of the sentence, etc.). It can be classified into an 'ex generation,' a second object, 'fantage', a verb 'morsam', and a modifier ' ⁇ ⁇ '.
  • the parser 350 may generate a syntax tree structure as shown in FIG. 4B by adding a subject 'you' indicating a third party.
  • the syntax tree of FIG. 4B is an example in which verbs are arranged in an upper node, and subjects, objects, modifiers, etc. are arranged in the same lower node.
  • the syntax analyzer 350 estimates the meaning of the term based on the generated syntax tree structure. For example, in the case of 'morsam', the verb is located at the end of the sentence and is located before the question mark (?), So it can be inferred that the verb is expressed in question form. In addition, since 'morsam' is most similar to 'mordan', the parser 350 may infer a questionable expression of 'mordan' in case of 'morsam'.
  • the alternative term search unit 330 searches for an alternative term corresponding to the meaning estimated based on the result of the syntax analysis of the syntax analyzer 350 in the alternative term database 140.
  • the alternative term search unit 330 may search for 'don't know' or 'morni' corresponding to the questionable expression of 'not know'.
  • the text generation unit 340 replaces the substitute target term with the substitute term searched by the substitute term search unit 330 to generate text in which the webtoon term is replaced with the standard language.
  • the alternative term search unit 330 substitutes for the term 'X', the alternative terms' X ',' fantasy ', and the alternative terms' Fantasy', ' ⁇
  • the text generator 340 based on the found alternative term ' Do you not know X-Gen Fantasy ?” Hahahaha.
  • the preprocessing engine 212 may further include a substitute term register (not shown) that stores the substitute term corresponding to the substitute target term and the estimated meaning in the substitute term database 140.
  • the preprocessing engine 212 may receive an alternative term corresponding to the term from an operator.
  • FIG. 5 is a flowchart illustrating a text preprocessing method performed in the text preprocessing system of FIG. 1.
  • the text preprocessing server 120 preprocesses an input text before translating it into text of another language.
  • the text preprocessing server 120 may output the preprocessed text in the same language as the input text.
  • the preprocessing engine 212 receives input of text to be translated (step S510).
  • the preprocessing engine 212 may receive webtoon text from a text recognition engine that recognizes text included in an image, or may receive webtoon text read and organized by a human.
  • the preprocessing engine 212 separates the input text into morpheme units and determines the parts of speech of the separated morphemes (step S520). The preprocessing engine 212 identifies whether the separated morphemes correspond to replacement target terms (step S530).
  • the preprocessing engine 212 may identify the replacement target term based on whether the replacement target term is stored in the replacement target term database. In another embodiment, the preprocessing engine 212 may identify the replacement target term based on whether the term is included in the translation term database.
  • the preprocessing engine 212 determines whether the identified substitute target term and the substitute term stored in the substitute term database 140 match (step S540), and searches for the substitute term corresponding to the substitute target term.
  • the preprocessing engine 212 When the substitution target term is compared with the substitution term stored in the substitution term database 140, when the match ratio is equal to or greater than a preset value, the preprocessing engine 212 generates text in which the substitution term is replaced with the substitution term (step S570).
  • the preprocessing engine 212 analyzes the syntax of the input text to estimate the meaning of the term without the alternative term (step S550). In one embodiment, the preprocessing engine 212 may parse the input text to generate a syntax tree structure.
  • the preprocessing engine 212 searches for an alternative term corresponding to the estimated meaning based on the result of the parsing (step S560), and generates text in which the substitute target term is replaced with the alternative term (step S570).

Abstract

L'invention concerne un procédé de prétraitement de texte capable d'augmenter la précision des traductions automatiques au moyen d'un processus de prétraitement permettant de remplacer des mots inclus dans un texte à traduire par des mots standard, ainsi qu'un système de prétraitement permettant de mettre en œuvre ledit procédé. Le système de prétraitement de texte comprend : une base de données de mots alternatifs qui stocke des mots alternatifs ; et un processeur permettant d'utiliser un moteur de prétraitement qui traite au préalable un texte entré puis génère un texte dans la même langue que le texte entré, le moteur de prétraitement identifiant un mot qui doit être remplacé à partir du texte entré et générant un texte dans lequel le mot identifié à remplacer est remplacé par un mot alternatif.
PCT/KR2016/006576 2015-06-22 2016-06-21 Procédé de prétraitement de texte et système de prétraitement permettant de mettre en œuvre ledit procédé WO2016208941A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201680001271.6A CN107148624A (zh) 2015-06-22 2016-06-21 预处理文本的方法以及用于执行该方法的预处理系统

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2015-0088695 2015-06-22
KR1020150088695A KR101664258B1 (ko) 2015-06-22 2015-06-22 텍스트 전처리 방법 및 이를 수행하는 전처리 시스템

Publications (1)

Publication Number Publication Date
WO2016208941A1 true WO2016208941A1 (fr) 2016-12-29

Family

ID=57162178

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2016/006576 WO2016208941A1 (fr) 2015-06-22 2016-06-21 Procédé de prétraitement de texte et système de prétraitement permettant de mettre en œuvre ledit procédé

Country Status (3)

Country Link
KR (1) KR101664258B1 (fr)
CN (1) CN107148624A (fr)
WO (1) WO2016208941A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019147804A1 (fr) * 2018-01-26 2019-08-01 Ge Inspection Technologies, Lp Génération de recommandations de langage naturel basées sur un modèle de langage artificiel

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038160A (zh) * 2017-03-30 2017-08-11 唐亮 多语言智能预处理实时统计机器翻译系统的预处理模块
KR102516364B1 (ko) 2018-02-12 2023-03-31 삼성전자주식회사 기계 번역 방법 및 장치
KR102041935B1 (ko) * 2018-07-18 2019-12-02 주식회사 토리웍스 웹툰체 사전 서비스 제공 방법
CN112597779A (zh) * 2020-12-24 2021-04-02 语联网(武汉)信息技术有限公司 文档翻译方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100322743B1 (ko) * 1999-09-28 2002-02-07 윤종용 음성합성기의 문서해석기에서 사용되는 형태소 해석방법 및 그 장치
KR100837358B1 (ko) * 2006-08-25 2008-06-12 한국전자통신연구원 동적 번역자원을 이용한 분야 적응형 휴대용 방송자막기계번역 장치 및 방법
KR100911372B1 (ko) * 2006-12-05 2009-08-10 한국전자통신연구원 통계적 기계번역 시스템에서 단어 및 구문들간의 번역관계를 자율적으로 학습하기 위한 장치 및 그 방법
US20100088085A1 (en) * 2008-10-02 2010-04-08 Jae-Hun Jeon Statistical machine translation apparatus and method
KR20120035077A (ko) * 2010-10-04 2012-04-13 한국전자통신연구원 하이브리드 자동 번역 방법 및 이를 수행하는 장치

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040030540A1 (en) * 2002-08-07 2004-02-12 Joel Ovil Method and apparatus for language processing
US7319949B2 (en) 2003-05-27 2008-01-15 Microsoft Corporation Unilingual translator
KR100961717B1 (ko) 2008-09-16 2010-06-10 한국전자통신연구원 병렬 코퍼스를 이용한 기계번역 오류 탐지 방법 및 장치
KR101777421B1 (ko) * 2010-04-06 2017-09-11 삼성전자주식회사 구문 분석 및 계층적 어구 모델 기반 기계 번역 시스템 및 방법
KR20120122894A (ko) * 2011-04-30 2012-11-07 삼성전자주식회사 수익 분배 방법 및 이를 이용한 수익 분배 시스템
KR20130047471A (ko) * 2011-10-31 2013-05-08 한국전자통신연구원 자동번역 시스템의 패러프레이징 데이터 구축방법
CN103914444B (zh) * 2012-12-29 2018-07-24 高德软件有限公司 一种纠错方法及其装置
CN104484374B (zh) * 2014-12-08 2018-11-16 百度在线网络技术(北京)有限公司 一种创建网络百科词条的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100322743B1 (ko) * 1999-09-28 2002-02-07 윤종용 음성합성기의 문서해석기에서 사용되는 형태소 해석방법 및 그 장치
KR100837358B1 (ko) * 2006-08-25 2008-06-12 한국전자통신연구원 동적 번역자원을 이용한 분야 적응형 휴대용 방송자막기계번역 장치 및 방법
KR100911372B1 (ko) * 2006-12-05 2009-08-10 한국전자통신연구원 통계적 기계번역 시스템에서 단어 및 구문들간의 번역관계를 자율적으로 학습하기 위한 장치 및 그 방법
US20100088085A1 (en) * 2008-10-02 2010-04-08 Jae-Hun Jeon Statistical machine translation apparatus and method
KR20120035077A (ko) * 2010-10-04 2012-04-13 한국전자통신연구원 하이브리드 자동 번역 방법 및 이를 수행하는 장치

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019147804A1 (fr) * 2018-01-26 2019-08-01 Ge Inspection Technologies, Lp Génération de recommandations de langage naturel basées sur un modèle de langage artificiel

Also Published As

Publication number Publication date
KR101664258B1 (ko) 2016-10-11
CN107148624A (zh) 2017-09-08

Similar Documents

Publication Publication Date Title
WO2016208941A1 (fr) Procédé de prétraitement de texte et système de prétraitement permettant de mettre en œuvre ledit procédé
Kasewa et al. Wronging a right: Generating better errors to improve grammatical error detection
Brill Some advances in transformation-based part of speech tagging
WO2014025135A1 (fr) Procédé permettant de détecter des erreurs grammaticales, appareil de détection d'erreurs correspondant, et support d'enregistrement lisible par ordinateur sur lequel le procédé est enregistré
WO2013081301A1 (fr) Dispositif d'évaluation de phrase automatique utilisant un analyseur superficiel pour évaluer automatiquement une phrase, et appareil de détection d'erreur et procédé correspondant
WO2017010652A1 (fr) Procédé pour questions et réponses automatiques et dispositif associé
US10169703B2 (en) System and method for analogy detection and analysis in a natural language question and answering system
CN109460552B (zh) 基于规则和语料库的汉语语病自动检测方法及设备
WO2014069779A1 (fr) Appareil d'analyse syntaxique fondée sur un prétraitement syntaxique, et son procédé
WO2012026667A2 (fr) Appareil de décodage intégré qui intègre des procédés de catégorisation et d'interprétation de jetons, et procédé associé
WO2012026668A2 (fr) Procédé de traduction automatique statistique utilisant une forêt de dépendances
JP5646792B2 (ja) 単語分割装置、単語分割方法、及び単語分割プログラム
WO2014030834A1 (fr) Procédé de détection d'erreurs grammaticales, dispositif de détection d'erreur pour celui-ci, et support d'enregistrement lisible par ordinateur sur lequel est enregistré le procédé
WO2015023035A1 (fr) Procédé de correction d'erreurs de préposition et dispositif le réalisant
WO2015050321A1 (fr) Appareil pour générer un corpus d'alignement basé sur un alignement d'auto-apprentissage, procédé associé, appareil pour analyser un morphème d'expression destructrice par utilisation d'un corpus d'alignement et procédé d'analyse de morphème associé
Song et al. ZPR2: Joint zero pronoun recovery and resolution using multi-task learning and BERT
WO2018088664A1 (fr) Dispositif de détection automatique d'erreur de corpus d'étiquetage morphosyntaxique au moyen d'ensembles approximatifs, et procédé associé
JPWO2012081386A1 (ja) 自然言語処理装置、方法、及びプログラム
Bassam et al. Formal description of Arabic syntactic structure in the framework of the government and binding theory
US10133736B2 (en) Contextual analogy resolution
Yulianti et al. Normalisation of Indonesian-English code-mixed text and its effect on emotion classification
WO2022060060A1 (fr) Procédé de génération de texte d'enseignement en langue étrangère par réglage du niveau de difficulté d'un texte, programme informatique associé et appareil associé
Li et al. A survey on out-of-distribution evaluation of neural nlp models
WO2012060534A1 (fr) Dispositif et procédé pour la construction de modèle de traduction verbes à particule mettant en oeuvre un corpus parallèle
WO2012030053A2 (fr) Appareil et procédé de reconnaissance d'expression idiomatique à l'aide d'alignement de phrases de corpus parallèle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16814654

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16814654

Country of ref document: EP

Kind code of ref document: A1