WO2011136426A1 - Procédé et système permettant d'élaborer un dictionnaire d'entités nommées par extraction d'entités nommées d'un contexte et d'enregistrer des règles - Google Patents

Procédé et système permettant d'élaborer un dictionnaire d'entités nommées par extraction d'entités nommées d'un contexte et d'enregistrer des règles Download PDF

Info

Publication number
WO2011136426A1
WO2011136426A1 PCT/KR2010/003079 KR2010003079W WO2011136426A1 WO 2011136426 A1 WO2011136426 A1 WO 2011136426A1 KR 2010003079 W KR2010003079 W KR 2010003079W WO 2011136426 A1 WO2011136426 A1 WO 2011136426A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity name
terms
term
context
entity
Prior art date
Application number
PCT/KR2010/003079
Other languages
English (en)
Korean (ko)
Inventor
정한민
김평
이승우
이미경
서동민
성원경
Original Assignee
한국과학기술정보연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술정보연구원 filed Critical 한국과학기술정보연구원
Publication of WO2011136426A1 publication Critical patent/WO2011136426A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing

Definitions

  • the present invention relates to an entity name dictionary construction and rule registration method and system using entity name extraction from a context, and more particularly, an entity name in a specific classification registered in a pre-established named entity dictionary in a text document. When included, it extracts the context before or after the entity name (in the form of a string or Lexico-Semantic-Pattern), and searches for other terms (including terminology) within the extracted context. By sorting in order, it is possible to easily expand the entity name dictionary by supporting the names of the objects belonging to the classification, and register the contexts excluding other terms from the other contexts using the other terms as rules.
  • the present invention relates to an entity name dictionary construction and rule registration method and system using entity name extraction from a context.
  • a named entity is a noun or numeric expression that has a unique meaning in the document.
  • the semantic categories of individual names can be broadly divided into names, names, name expressions such as institution names, time expressions such as dates and times, and numerical expressions such as amounts or percentages.
  • This rule-based method manually constructs rules for entity name recognition and recognizes entity names using various dictionaries such as proper noun dictionaries, word dictionaries that lead to entity name recognition, and word dictionaries that emerge from the context of entity names. Way.
  • this method relies heavily on human intuition, and requires a lot of time and money because rules and dictionaries must be changed when applied to a new domain.
  • the statistics-based method automatically learns the knowledge necessary for recognizing individual names from the training data.
  • the statistics-based method learns rules for recognizing individual names using information obtained from spelling, parts of speech, and morphemes.
  • this method requires a large amount of tagged text for learning, difficult to reflect various characteristics such as omission or abbreviation, and re-learning when learning data increases.
  • the hybrid method combines rule-based and statistics-based methods to obtain better results. It combines various knowledge such as rules, vocabulary, and dictionaries into statistical-based models. However, this method can be said to have both rule-based and statistics-based problems.
  • a new term can be added to an existing entity name or a new term can be added to an existing entity name to expand the existing entity name, and accordingly a new rule.
  • An object of the present invention for solving the above-described problems is that, when an object name in a specific classification registered in a previously established object name dictionary is included in a text document, the context before or after the object name (string or LSP format context) ) And other terms (including terminology) within the extracted context, sorted in order of frequency or alphabet, and make it easier to find the names of the objects in the classification, so that the dictionary of object names can be expanded and constructed.
  • the present invention provides a method and system for constructing a dictionary of entity names and extracting rules using entity name extraction from a context, in which contexts excluding other terms from other contexts using the other terms can be registered as rules.
  • An entity name dictionary construction method for solving the above-mentioned problems is an entity name dictionary construction method of a system having an entity name dictionary in which one or more entity names and one or more terms corresponding to the entity names are registered. (a) searching whether a term having an entity name registered in the entity name dictionary exists in the text document; (b) if the term exists, extracting the context containing the term from the text document; (c) re-searching from the text document a context in which the term is excluded from the context in which the term is included; (d) separating, sorting and displaying other terms that exist at the term position in the context in which the term is re-researched excluded; (e) adding the other terms to the entity name, or adding and setting a second entity name of the other terms; And (f) registering the entity name or the second entity name in the entity name dictionary.
  • step (e) may select one or more of the other terms from a user and set the entity name.
  • the second entity name may be input from a user and set.
  • the rule registration method using the entity name extraction for solving the above-described problem, the entity name of the system having an entity name dictionary in which at least one entity name and one or more terms corresponding to the entity name are registered
  • a rule registration method using extraction comprising: (a) searching whether a term having one entity name registered in the entity name dictionary exists in a text document; (b) if the term exists, extracting a context containing the term from the text document; (c) re-searching from the text document a context in which the term is excluded from the context containing the term; (d) separating, sorting and displaying other terms that exist at the term position in the context in which the term is re-searched excluded; (e) extracting from the text document a context containing other terms that exist at the term location in the context in which the term is re-researched excluded; (f) receiving the other terms from the user, adding the entity name, and then registering other contexts in which the other terms are excluded as rules; And (a) searching whether a term having
  • step (f) registers the contexts selected by the user among the other contexts as a positive rule.
  • step (f) registers the contexts not selected by the user among the other contexts as a negative rule.
  • the contexts in which the second entity name is set by receiving the input from the user among the other contexts are registered as a positive rule of the second entity name.
  • the entity name dictionary construction system for solving the above-described problem, dictionary database for storing the entity name dictionary in which at least one entity name and one or more terms corresponding to the entity name is registered;
  • a display unit for displaying the one or more entity names or the one or more terms and the entity name dictionary on a screen;
  • a context extraction unit for extracting a context including the term from the text document when a term having the entity name registered in the entity name dictionary exists in the text document;
  • a context search unit for searching a context in which the term is excluded from the text including the term from the text document;
  • a term separator to separate other terms existing at the term position in a context in which the term found is excluded;
  • An entity name setting unit for adding the other terms to the entity name or adding and setting a second entity name of the other terms; And separating and sorting other terms existing at the term position in the context in which the term searched through the context search unit is excluded and sorting through the term separation unit to display on the display unit, and in the context in which the term is excluded Extracts
  • the entity name setting unit may select one or more of the other terms from a user and additionally set the entity name.
  • the entity name setting unit receives and adds the second entity name from a user and sets the entity.
  • the controller may be configured to add the entity name by selecting the other terms from the user and register other contexts in which the other terms are excluded as rules.
  • the controller receives the other terms from the user, adds and sets the second entity name, and then registers other contexts in which the other terms are excluded as rules.
  • the controller registers the contexts selected by the user among the other contexts as a positive rule.
  • control unit registers contexts not selected by the user among the other contexts as a negative rule.
  • the control unit registers contexts in which the second entity name is set by receiving a user input from among the other contexts as a positive rule of the second entity name.
  • a rule for recognizing an entity name and a new entity name can be extracted and generated from a text document, and new terms added to the existing entity name or new terms not existing in the existing entity name can be generated.
  • FIG. 1 is a block diagram schematically showing the configuration of an RDF searcher group query and answer service system according to an embodiment of the present invention.
  • FIG. 2 is a flowchart illustrating an RDF searcher class query response service method according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating an example of a process of forming an RDF network according to an embodiment of the present invention.
  • FIG. 4 is a diagram illustrating an example of an RDF searcher class question and answer service according to an embodiment of the present invention.
  • FIG. 5 is a diagram illustrating another example of an RDF searcher class question and answer service according to an embodiment of the present invention.
  • FIG. 1 is a block diagram showing a schematic configuration of a system for establishing an individual name according to an embodiment of the present invention.
  • the entity name dictionary construction system 100 includes a dictionary database 110, a display unit 120, a context search unit 130, a term separation unit 140, and an entity name setting unit ( 150 and the controller 160.
  • the dictionary database 110 stores an entity name dictionary in which one or more entity names and one or more terms corresponding to the entity names are registered.
  • the display unit 120 displays one or more entity names or one or more terms and entity name dictionaries on the screen.
  • the context search unit 130 extracts the context including the term from the text document or excludes the term from the context in which the term is included. retrieves the context from a text document.
  • the term separator 140 separates other terms that exist at the term position in the context in which the term which is re-searched is excluded.
  • the entity name setting unit 150 adds other separated terms to an existing entity name or adds and sets a second entity name of other terms.
  • the entity name setting unit 150 may select one or more of the above-described other terms from the user and set the entity name in addition to the entity name.
  • entity name setting unit 150 may add and set a second entity name of the above-described other terms from a user.
  • the controller 160 separates and sorts the other terms existing at the term position in the context where the terms are excluded, re-searched through the context search unit 130, on the display unit 120. And extracting a context from a text document including other terms existing at the term position in the context in which the rescanned term is excluded, and the entity name or other terms to which other terms are added through the entity name setting unit 150.
  • the branch controls to register the second entity name in the entity name dictionary.
  • controller 160 selects other terms from the user, sets the entity name, and then registers other contexts in which the other terms are excluded as rules.
  • controller 160 receives other terms from the user, adds and sets a second entity name, and then registers other contexts in which the other terms are excluded as rules.
  • controller 160 registers contexts selected by the user among other contexts as a positive rule.
  • controller 160 registers contexts that are not selected by the user among other contexts as a negative rule.
  • controller 160 registers contexts in which the second entity name is set as a positive rule of the second entity name while being input from the user among other contexts.
  • FIG. 2 is a flowchart illustrating a method for constructing an entity name dictionary according to an embodiment of the present invention.
  • the entity name dictionary construction system 100 includes an entity name dictionary in which one or more entity names and one or more terms corresponding to the entity names are registered as shown in FIG. 3.
  • it is searched for a term having an object name (eg, a group) registered in the object name dictionary in the text document, for example, 'girl's age' as shown in FIG. 4A. (S210).
  • FIG. 3 is a diagram illustrating an example of an entity name dictionary stored in a dictionary database according to an embodiment of the present invention
  • FIG. 4 is a term of an entity name in an entity name dictionary in a text document according to an embodiment of the present invention.
  • 2 is a diagram illustrating an example of extracting a context that includes a.
  • the entity name dictionary building system 100 generates a context including the term 'girl's age' from the text document as shown in (b) of FIG. 4. Extract (S220).
  • the entity name dictionary construction system 100 has a context of "another member of the girls 'generation" consisting of' Girls 'Generation' and its subsequent context 'Other Members of' as shown in Fig. 4B in the text document. To extract it.
  • the entity name dictionary building system 100 re-searches from the text document the context in which the term 'girl's age' is excluded from the context including the term 'girl's age', that is, the context of 'another member of' (S230). ).
  • the entity name building system 100 may obtain one or more other contexts, as shown in FIG. 5, from the text document, including the context 'another member of.' 5 is a diagram illustrating a result of searching for other contexts including the context in which the term of the entity name is excluded according to an embodiment of the present invention.
  • the entity name dictionary construction system 100 separates and arranges other terms existing at the term 'girl generation' in a context in which the term 'girl generation' is excluded again and displays them as shown in FIG. 6.
  • S240 is a diagram illustrating an example of separating and sorting other terms existing at a term position in a context in which a term of an individual name is excluded according to an embodiment of the present invention and displaying the order in frequency order.
  • the entity name dictionary building system 100 adds other terms displayed on the screen as shown in FIG. 6 to the entity name (@group) as shown in FIG. 7, or adds a second entity name that other terms have.
  • And set (S250). 7 is a diagram illustrating an example in which other terms searched from a text document are added to an entity name or set by adding a second entity name according to an embodiment of the present invention.
  • the entity name dictionary construction system 100 may set an entity name by selecting one or more of other terms from a user, as shown in FIG. 6.
  • entity name dictionary construction system 100 may receive and set a second entity name such as “@programming_element” from the user as illustrated in FIG. 7.
  • the entity name dictionary construction system 100 registers the entity name or the second entity name in the entity name dictionary (S260).
  • the entity name dictionary construction system 100 registers the terms selected by the user and the corresponding entity names as positive entity names as shown in FIG. 6, and the terms not selected by the user and the entity accordingly. You can register the name as a negative entity name.
  • FIG. 8 is a flowchart illustrating a rule registration method using entity name extraction according to another embodiment of the present invention.
  • the entity name dictionary construction system 100 may include a dictionary of entity names in which one or more entity names and one or more terms corresponding to the entity names are registered as shown in FIG. 3.
  • a dictionary of entity names in which one or more entity names and one or more terms corresponding to the entity names are registered as shown in FIG. 3.
  • the entity name dictionary construction system 100 extracts the context including the term from the text document as shown in (b) of FIG. 4 (S820).
  • the entity name dictionary building system 100 re-searches the context in which the term (girl's age) is excluded from the context including the term, for example, the context of 'another member of' from the text document (S830).
  • the entity name building system 100 may obtain one or more other contexts, including, for example, the context 'another member of' from a text document as shown in FIG.
  • the entity name dictionary building system 100 separates and sorts other terms existing at the corresponding term position in the context where the term is excluded again and displayed on the screen as shown in FIG. 6 (S840).
  • the entity name dictionary building system 100 extracts the context including the other terms existing at the term position in the context from which the term is re-searched from the text document (S850).
  • the entity name dictionary construction system 100 receives other terms from the user, adds and sets the entity name, and registers other contexts in which the other terms are excluded as rules (S860).
  • the entity name dictionary construction system 100 registers other contexts separated from the context of the term selected by the user as a rule, as shown in FIG. 9, with other terms arranged as shown in FIG. 6. It is.
  • the entity name dictionary building system 100 registers contexts selected by the user among other contexts as a positive rule, and registers contexts not selected by the user among other contexts as a negative rule. do.
  • the entity name dictionary construction system 100 may classify the positive context for the entity name '@group' into a string form or LSP (Lexico Semantic Pattern) format and register it as a positive rule.
  • LSP Longico Semantic Pattern
  • the entity name dictionary construction system 100 classifies a negative context for the entity name '@group' into a string format or a LSP (Lexico Semantic Pattern) format to register as a negative rule.
  • Can be. 9 is a diagram illustrating an example of registering other contexts in which a term is excluded as a rule according to another exemplary embodiment of the present invention.
  • the entity name dictionary construction system 100 receives other terms from the user, sets the second entity name, and then registers other contexts in which the other terms are excluded as rules (S870).
  • the entity name dictionary construction system 100 receives the contexts in which the second entity name is set from the user among other contexts and sets the second entity name, as shown in FIG. 9 for the positive rule of @programming_element in the form of a string. It can be registered as a positive rule) or as a negative rule of a second entity name (Negative rule for @programming_element in LSP format).
  • the front context or the rear context (string or LSP format context) of the entity name is extracted.
  • the other terms including terminology
  • the present invention can be applied to a system or service for extracting a term from a text document and setting an entity name.
  • the present invention can be applied to a system or service for providing a semantic web service or a search service.
  • the present invention can be applied to systems and services that extend the entity name dictionary by extracting terms from text documents, setting entity names, and registering them in the entity name dictionary.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un procédé et un système permettant d'élaborer un dictionnaire d'entités nommées par extraction d'entités nommées d'un contexte et d'enregistrer des règles. Conformément auxdits procédé et système, lorsqu'une entité nommée, qui est enregistrée dans un dictionnaire d'entités nommées préalablement élaboré et classée dans une catégorie particulière, est comprise dans un document-texte, le contexte (la chaîne de caractères ou le contexte de type LSP (modèle lexico-sémantique)) avant ou après l'entité nommée est extrait et d'autres termes (y compris des termes techniques) sont disposés dans l'ordre de fréquence ou dans l'ordre alphabétique dans le contexte extrait. Par conséquent, les entités nommées appartenant à la catégorie sont facilement retrouvées et le dictionnaire d'entités nommées est ainsi étoffé; et, à partir d'autres contextes utilisant les autres termes, des contextes excluant les autres termes peuvent être régulièrement enregistrés. Conformément à la présente invention, le procédé permettant d'élaborer l'entité nommée, pour un système doté du dictionnaire d'entités nommées dans lequel une ou plusieurs entités nommées et un ou plusieurs termes leur correspondant sont enregistrés comprend : (a) une étape de mise en œuvre d'une recherche afin de détecter si un terme ayant une entité nommée enregistrée dans le dictionnaire d'entités nommées existe ou non dans un document-texte; (b) une étape d'extraction d'un contexte comprenant le terme provenant du document-texte lorsque le terme existe; (c) une étape de nouvelle recherche dans le document-texte d'un contexte obtenu par exclusion du terme du contexte qui comprend le terme; (d) une étape de séparation d'autres termes existant à l'emplacement du terme qui a été exclu du contexte, et la disposition et l'affichage des autres termes; (e) une étape d'ajout des autres termes aux entités nommées ou d'ajout et de configuration de secondes entités nommées pour les autres termes; et (f) une étape d'enregistrement des entités nommées ou des secondes entités nommées dans le dictionnaire d'entités nommées.
PCT/KR2010/003079 2010-04-28 2010-05-17 Procédé et système permettant d'élaborer un dictionnaire d'entités nommées par extraction d'entités nommées d'un contexte et d'enregistrer des règles WO2011136426A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2010-0039254 2010-04-28
KR20100039254 2010-04-28

Publications (1)

Publication Number Publication Date
WO2011136426A1 true WO2011136426A1 (fr) 2011-11-03

Family

ID=44861699

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2010/003079 WO2011136426A1 (fr) 2010-04-28 2010-05-17 Procédé et système permettant d'élaborer un dictionnaire d'entités nommées par extraction d'entités nommées d'un contexte et d'enregistrer des règles

Country Status (1)

Country Link
WO (1) WO2011136426A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108604236A (zh) * 2015-10-30 2018-09-28 康维达无线有限责任公司 语义物联网的restful操作

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1011466A (ja) * 1996-06-27 1998-01-16 Toshiba Corp 文書作成装置及び辞書情報取得方法
JP2005309706A (ja) * 2004-04-21 2005-11-04 Fuji Xerox Co Ltd 情報処理システム及び情報処理方法、並びにコンピュータ・プログラム

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1011466A (ja) * 1996-06-27 1998-01-16 Toshiba Corp 文書作成装置及び辞書情報取得方法
JP2005309706A (ja) * 2004-04-21 2005-11-04 Fuji Xerox Co Ltd 情報処理システム及び情報処理方法、並びにコンピュータ・プログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LEE, KYUNG HEE ET AL.: "Study on named Entity Recognition in Korean Text", JOURNAL OF THE INSTITUTE OF LANGUAGE ENGINEERING, 31 October 2000 (2000-10-31), pages 294 - 296 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108604236A (zh) * 2015-10-30 2018-09-28 康维达无线有限责任公司 语义物联网的restful操作

Similar Documents

Publication Publication Date Title
WO2017010652A1 (fr) Procédé pour questions et réponses automatiques et dispositif associé
WO2018016673A1 (fr) Dispositif et procédé d'extraction automatique de mot alternatif, et support d'enregistrement permettant la mise en œuvre de ce procédé
WO2013172534A1 (fr) Système et procédé de gestion de dialogues
WO2014025135A1 (fr) Procédé permettant de détecter des erreurs grammaticales, appareil de détection d'erreurs correspondant, et support d'enregistrement lisible par ordinateur sur lequel le procédé est enregistré
WO2011162446A1 (fr) Module et procédé permettant de décider une entité nommée d'un terme à l'aide d'un dictionnaire d'entités nommées combiné avec un schéma d'ontologie et une règle d'exploration
WO2020085663A1 (fr) Système de génération automatique de logos basée sur l'intelligence artificielle et procédé de service de génération de logos l'utilisant
WO2020111314A1 (fr) Appareil et procédé d'interrogation-réponse basés sur un graphe conceptuel
WO2014030834A1 (fr) Procédé de détection d'erreurs grammaticales, dispositif de détection d'erreur pour celui-ci, et support d'enregistrement lisible par ordinateur sur lequel est enregistré le procédé
WO2013062302A1 (fr) Système de détection d'erreur à base d'exemple pour une évaluation automatique d'écriture, procédé correspondant et appareil de détection d'erreur correspondant
WO2018088664A1 (fr) Dispositif de détection automatique d'erreur de corpus d'étiquetage morphosyntaxique au moyen d'ensembles approximatifs, et procédé associé
WO2020111395A1 (fr) Dispositif et procédé de regroupement de termes de données de texte non structurées pour l'analyse de mégadonnées
WO2014142422A1 (fr) Procédé permettant de traiter un dialogue d'après une expression d'instruction de traitement, et appareil associé
WO2020111827A1 (fr) Serveur et procédé de génération de profil automatique
WO2018147543A1 (fr) Système de questions-réponses basé sur un graphe de concept et procédé de recherche de contexte l'utilisant
WO2021107445A1 (fr) Procédé pour fournir un service d'informations de mots nouvellement créés sur la base d'un graphe de connaissances et d'une conversion de translittération spécifique à un pays, et appareil associé
WO2011136426A1 (fr) Procédé et système permettant d'élaborer un dictionnaire d'entités nommées par extraction d'entités nommées d'un contexte et d'enregistrer des règles
WO2017138752A1 (fr) Appareil et procédé d'affichage de couleur d'intonation
WO2018143490A1 (fr) Système de prédiction de l'humeur d'un utilisateur à l'aide d'un contenu web, et procédé associé
CN110008314B (zh) 一种意图解析方法及装置
WO2022177372A1 (fr) Système de fourniture de service de tutorat à l'aide d'une intelligence artificielle et son procédé
WO2019112223A1 (fr) Procédé de récupération de document électronique et serveur associé
WO2020242086A1 (fr) Serveur, procédé et programme informatique pour supposer l'avantage comparatif de multi-connaissances
WO2020111374A1 (fr) Système pour convertir un fichier de conférence vocale en texte sur la base de mots-clés associés à une conférence
WO2016068514A1 (fr) Procédé et dispositif d'analyse de structures industrielles destinées à des produits respectifs au moyen d'un traitement en langage naturel
WO2010093101A1 (fr) Procédé et système pour la transformation de billet en information à base d'ontologie

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10850793

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10850793

Country of ref document: EP

Kind code of ref document: A1