EA202092862A1 - METHOD AND SYSTEM FOR EXTRACTION OF NAMED ENTITIES - Google Patents

METHOD AND SYSTEM FOR EXTRACTION OF NAMED ENTITIES

Info

Publication number
EA202092862A1
EA202092862A1 EA202092862A EA202092862A EA202092862A1 EA 202092862 A1 EA202092862 A1 EA 202092862A1 EA 202092862 A EA202092862 A EA 202092862A EA 202092862 A EA202092862 A EA 202092862A EA 202092862 A1 EA202092862 A1 EA 202092862A1
Authority
EA
Eurasian Patent Office
Prior art keywords
named entities
tokens
sequence
vectors
vector representation
Prior art date
Application number
EA202092862A
Other languages
Russian (ru)
Inventor
Антон Александрович ЕМЕЛЬЯНОВ
Original Assignee
Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) filed Critical Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Publication of EA202092862A1 publication Critical patent/EA202092862A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Abstract

Представленное изобретение относится в общем к области вычислительной техники, а в частности к способу и системе извлечения именованных сущностей. Техническим результатом является повышение точности предсказания именованных сущностей. Указанный технический результат достигается благодаря осуществлению способа извлечения именованных сущностей из текстовой информации, выполняемого по меньшей мере одним вычислительным устройством, содержащего этапы, на которых получают текстовую информацию; выполняют разбиение текста на слова; выполняют токенизацию текста для получения последовательности токенов; формируют посредством нейронной сети для полученной последовательности токенов набор векторов; формируют на основе полученного набора векторов векторное представление последовательности токенов; посредством сравнения показателей полученного векторного представления последовательности токенов с заранее заданными показателями векторов, полученными в результате обучения нейронной сети, осуществляют предсказание именованных сущностей для векторного представления последовательности токенов; распознают полученные на предыдущем этапе именованные сущности посредством подбора метки слова.The present invention relates generally to the field of computer technology, and in particular to a method and system for extracting named entities. The technical result is to increase the accuracy of predicting named entities. The specified technical result is achieved due to the implementation of a method for extracting named entities from text information, performed by at least one computing device, comprising the steps at which text information is obtained; split the text into words; performing tokenization of the text to obtain a sequence of tokens; form by means of a neural network for the received sequence of tokens a set of vectors; form on the basis of the obtained set of vectors a vector representation of the sequence of tokens; by comparing the indicators of the obtained vector representation of the sequence of tokens with predetermined indicators of the vectors obtained as a result of training the neural network, predict named entities for the vector representation of the sequence of tokens; recognize the named entities obtained at the previous stage by selecting the word label.

EA202092862A 2020-08-31 2020-12-23 METHOD AND SYSTEM FOR EXTRACTION OF NAMED ENTITIES EA202092862A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
RU2020128709A RU2760637C1 (en) 2020-08-31 2020-08-31 Method and system for retrieving named entities

Publications (1)

Publication Number Publication Date
EA202092862A1 true EA202092862A1 (en) 2022-03-31

Family

ID=79174140

Family Applications (1)

Application Number Title Priority Date Filing Date
EA202092862A EA202092862A1 (en) 2020-08-31 2020-12-23 METHOD AND SYSTEM FOR EXTRACTION OF NAMED ENTITIES

Country Status (3)

Country Link
EA (1) EA202092862A1 (en)
RU (1) RU2760637C1 (en)
WO (1) WO2022045920A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
CN107203511B (en) * 2017-05-27 2020-07-17 中国矿业大学 Network text named entity identification method based on neural network probability disambiguation
CN111310471B (en) * 2020-01-19 2023-03-10 陕西师范大学 Travel named entity identification method based on BBLC model
CN111353310B (en) * 2020-02-28 2023-08-11 腾讯科技(深圳)有限公司 Named entity identification method and device based on artificial intelligence and electronic equipment

Also Published As

Publication number Publication date
WO2022045920A1 (en) 2022-03-03
RU2760637C1 (en) 2021-11-29

Similar Documents

Publication Publication Date Title
US10528667B2 (en) Artificial intelligence based method and apparatus for generating information
CN107680579B (en) Text regularization model training method and device, and text regularization method and device
CN107679039B (en) Method and device for determining statement intention
US10229111B1 (en) Sentence compression using recurrent neural networks
CN105183720B (en) Machine translation method and device based on RNN model
US20190065506A1 (en) Search method and apparatus based on artificial intelligence
EP2991003B1 (en) Method and apparatus for classification
CN107437417B (en) Voice data enhancement method and device based on recurrent neural network voice recognition
US20190080688A1 (en) Language model generating device, language model generating method, and recording medium
CN109635288A (en) A kind of resume abstracting method based on deep neural network
CN108959242A (en) A kind of target entity recognition methods and device based on Chinese character part of speech feature
MX2016003981A (en) Classifier training method, type recognition method, and apparatus.
CN111475649A (en) False news prediction method, system, device and medium based on deep learning
CN111177341B (en) End-to-end ID + SF model-based user conversation demand extraction method and system
US10902350B2 (en) System and method for relationship identification
CN111368544A (en) Named entity identification method and device
WO2021002968A1 (en) Model generation based on model compression
CN113836925A (en) Training method and device for pre-training language model, electronic equipment and storage medium
EP3312755A1 (en) Method and apparatus for detecting application
CN113220854B (en) Intelligent dialogue method and device for machine reading and understanding
CN113468323B (en) Dispute focus category and similarity judging method, system and device and recommending method
EA202092862A1 (en) METHOD AND SYSTEM FOR EXTRACTION OF NAMED ENTITIES
CN112069818A (en) Triple prediction model generation method, relation triple extraction method and device
CN109902286A (en) A kind of method, apparatus and electronic equipment of Entity recognition
US20230028376A1 (en) Abstract learning method, abstract learning apparatus and program