CN109522553A - Name recognition methods and the device of entity - Google Patents
Name recognition methods and the device of entity Download PDFInfo
- Publication number
- CN109522553A CN109522553A CN201811332914.2A CN201811332914A CN109522553A CN 109522553 A CN109522553 A CN 109522553A CN 201811332914 A CN201811332914 A CN 201811332914A CN 109522553 A CN109522553 A CN 109522553A
- Authority
- CN
- China
- Prior art keywords
- text
- vector
- name entity
- character image
- name
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 239000013598 vector Substances 0.000 claims abstract description 167
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 22
- 230000015654 memory Effects 0.000 claims description 23
- 239000012634 fragment Substances 0.000 claims description 16
- 238000003062 neural network model Methods 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 abstract description 10
- 230000006870 function Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 210000004218 nerve net Anatomy 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of recognition methods for naming entity and devices.Wherein, this method comprises: carrying out information extraction to character image using convolutional neural networks MODEL C NN, the corresponding font vector of text in character image is obtained;Font vector text vector corresponding with text is spliced, and feature vector is obtained according to the splicing vector that splicing obtains;Name entity set is obtained according to feature vector, wherein includes multiple name entities in name entity set;Rhetoric question topic corresponding with character image is constructed, and positions to obtain the name entity for needing to obtain based on topic is put up a question, wherein the name entity for needing to obtain belongs to name entity set.The present invention solves the technical issues of information identified in the way of traditional information extraction to some files progress information in the related technology is not available information.
Description
Technical field
The present invention relates to natural language processing technique field, in particular to a kind of recognition methods for naming entity and
Device.
Background technique
The certificate of traditional national authentication, including CET-4, CET-6, diploma, diploma etc. suffer from fixed
Mode, fixed position, specific content.So in certificate identification, it is only necessary to which the text extracted on relevant position can
Directly to match corresponding information, that is, identification obtains.
Relieving with country to certificate form and content, colleges and universities and scientific research institution start autonomous Design one after another to be had respectively
The certificate of characteristic, especially diploma and degree's diploma.There are different form and content or even school's difference in different schools
Certificate content and form it is also not identical.This is just that traditional certificate identification brings problem: even having extracted certificate
In text, but still can not match information, that is, only identify but be not available information.
What information identified is carried out to some files in the way of traditional information extraction in the related technology for above-mentioned
Information is not available Information Problems, and currently no effective solution has been proposed.
Summary of the invention
The embodiment of the invention provides a kind of recognition methods for naming entity and devices, sharp in the related technology at least to solve
The technical issues of information that information identifies is not available information is carried out to some files with traditional information extraction mode.
According to an aspect of an embodiment of the present invention, a kind of recognition methods for naming entity is provided, comprising: utilize convolution
Neural network model CNN carries out information extraction to character image, obtains the corresponding font vector of text in the character image;It will
Font vector text vector corresponding with the text is spliced, and obtains feature according to the splicing vector that splicing obtains
Vector;Name entity set is obtained according to described eigenvector, wherein includes multiple name entities in the name entity set;Structure
Rhetoric question topic corresponding with the character image is built, and positions to obtain the name entity for needing to obtain based on the rhetoric question topic,
Wherein, described that the name entity obtained is needed to belong to the name entity set.
Optionally, the font vector is the vector of N*1 dimension, and the text vector is the vector of M*1 dimension, wherein N is indicated
The quantity of the font attribute of the corresponding text of the font vector, M indicate the number of the word attribute of text in the text vector
Amount.
Optionally, font vector text vector corresponding with the text is spliced, and is obtained according to splicing
Splicing vector obtain feature vector include: by dimension be N*1 the font vector and dimension be M*1 the text vector
Spliced, obtains the splicing vector of the dimension of (N+M) * 1;The splicing vector that (N+M) * 1 by described in is tieed up is as two-way long short-term memory
The input of network model Bi-LSTM;Obtain the output of two-way length memory network Model B i-LSTM in short-term;According to described defeated
Described eigenvector is obtained out, wherein described eigenvector is the vector of 2 (N+M) * 1 dimension.
Optionally, obtaining name entity set according to described eigenvector includes: using described eigenvector as condition random
The input of field model CRF;Obtain the output of the conditional random field models CRF;According to the defeated of the conditional random field models CRF
The name entity set is obtained out.
Optionally, constructing rhetoric question topic corresponding with the character image includes: to extract the corresponding text of the character image
This key message, wherein the key message is the Feature Words for having incidence relation with the name entity;By the key
Information is as the rhetoric question topic.
Optionally, it positions to obtain the name entity that needs obtain to include: by matching nerve net based on the rhetoric question topic
Network model, the identifier of determining text fragments corresponding with the rhetoric question topic, wherein the matching neural network model is to make
Obtained with multi-group data by machine learning training, every group of data in the multi-group data include: put up a question topic and
The identifier of the corresponding text fragments of rhetoric question topic;It extracts to obtain according to the identifier of the text fragments and described needs to obtain
Name entity.
Optionally, before positioning to obtain the name entity for needing to obtain based on the rhetoric question topic, the name entity
Recognition methods further include: the corresponding text of the character image is identified, multiple text segments are obtained;Based on pre-defined rule
Identifier is added for the multiple text segment;Wherein, the corresponding text of the character image is identified, obtains multiple texts
Word slice section includes: the predetermined punctuation mark in the identification text;According to the predefined identifier number to the character image pair
The text answered is identified, the multiple text segment is obtained.
Another aspect according to an embodiment of the present invention, additionally provides a kind of identification device for naming entity, comprising: take out
Unit is taken, for carrying out information extraction to character image using convolutional neural networks MODEL C NN, obtains the character image Chinese
The corresponding font vector of word;First acquisition unit, for carrying out font vector text vector corresponding with the text
Splicing, and feature vector is obtained according to the splicing vector that splicing obtains;Second acquisition unit, for being obtained according to described eigenvector
To name entity set, wherein include multiple name entities in the name entity set;Third acquiring unit, for building and institute
The corresponding rhetoric question topic of character image is stated, and positions to obtain the name entity for needing to obtain based on the rhetoric question topic, wherein institute
It states and the name entity obtained is needed to belong to the name entity set.
Optionally, the font vector is the vector of N*1 dimension, and the text vector is the vector of M*1 dimension, wherein N is indicated
The quantity of the font attribute of the corresponding text of the font vector, M indicate the number of the word attribute of text in the text vector
Amount.
Optionally, the first acquisition unit includes: splicing module, for by dimension be N*1 the font vector with
Dimension is that the text vector of M*1 is spliced, and obtains the splicing vector of the dimension of (N+M) * 1;First determining module is used for institute
State input of the splicing vector of the dimension of (N+M) * 1 as two-way length memory network Model B i-LSTM in short-term;First obtains module, uses
In the output for obtaining two-way length memory network Model B i-LSTM in short-term;Second obtains module, for being exported according to described
To described eigenvector, wherein described eigenvector is the vector of 2 (N+M) * 1 dimension.
Optionally, the second acquisition unit includes: the second determining module, for using described eigenvector as condition with
The input of airport MODEL C RF;Third obtains module, for obtaining the output of the conditional random field models CRF;4th obtains mould
Block, for obtaining the name entity set according to the output of the conditional random field models CRF.
Optionally, the third acquiring unit includes: abstraction module, for extracting the corresponding text of the character image
Key message, wherein the key message is the Feature Words for having incidence relation with the name entity;Third determining module,
For using the key message as the rhetoric question topic.
Optionally, the third acquiring unit includes: the 4th determining module, for passing through matching neural network model, really
The identifier of fixed text fragments corresponding with the rhetoric question topic, wherein the matching neural network model is to use multiple groups number
According to what is obtained by machine learning training, every group of data in the multi-group data include: to put up a question topic and rhetoric question topic
The identifier of the corresponding text fragments of mesh;Extraction module obtains the need for extracting according to the identifier of the text fragments
The name entity to be obtained.
Optionally, the identification device of the name entity further include: the 4th acquiring unit, for being based on the rhetoric question topic
Before positioning obtains the name entity for needing to obtain, the corresponding text of the character image is identified, multiple texts are obtained
Segment;Adding unit, for being that the multiple text segment adds identifier based on pre-defined rule;Wherein, it the described 4th obtains
Unit includes: identification module, for identification the predetermined punctuation mark in the text;5th obtains module, for according to
Predefined identifier number identifies the corresponding text of the character image, obtains the multiple text segment.
Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, the storage medium includes
The program of storage, wherein described program execute it is any one of above-mentioned described in name entity recognition methods.
Another aspect according to an embodiment of the present invention, additionally provides a kind of processor, the processor is for running
Program, wherein described program executes the recognition methods that entity is named described in above-mentioned any one when running.
In embodiments of the present invention, information extraction is carried out to character image using using convolutional neural networks MODEL C NN, obtained
The corresponding font vector of text into character image;Font vector text vector corresponding with text is spliced, and according to
Splice obtained splicing vector and obtains feature vector;Name entity set is obtained according to feature vector, wherein is wrapped in name entity set
Include multiple name entities;Rhetoric question topic corresponding with character image is constructed, and positions to obtain what needs obtained based on topic is put up a question
Name entity, wherein the mode that the name entity for needing to obtain belongs to name entity set is named Entity recognition, by this hair
The font vector sum text information for the font information that will be extracted may be implemented in the recognition methods for the name entity that bright embodiment provides
Corresponding text information is spliced to obtain spliced splicing vector, and obtains the mesh of name entity set according to splicing vector
, to consider not only the spatial information of text, it is also considered that the contextual information for having arrived text improves effective information
Recognition efficiency, and then solve and in the related technology some files progress information is identified to obtain in the way of traditional information extraction
Information be not available information the technical issues of.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the recognition methods of name entity according to an embodiment of the present invention;
Fig. 2 is the schematic diagram of the identification device of name entity according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, a kind of embodiment of the method for recognition methods for naming entity is provided, needs to illustrate
It is that step shown in the flowchart of the accompanying drawings can execute in a computer system such as a set of computer executable instructions,
Also, although logical order is shown in flow charts, and it in some cases, can be to be different from sequence execution herein
Shown or described step.
Fig. 1 is the flow chart of the recognition methods of name entity according to an embodiment of the present invention, as shown in Figure 1, the name is real
The recognition methods of body includes the following steps:
Step S102 carries out information extraction to character image using convolutional neural networks MODEL C NN, obtains in character image
The corresponding font vector of text.
Wherein, convolutional neural networks (convolutional neural network, abbreviation CNN) are a kind of depth feedforwards
Artificial neural network, artificial neuron can respond surrounding cells, carry out large-scale image procossing.Including convolutional layer, pond layer, swash
Layer living and dropout layers etc..It include: one-dimensional convolutional neural networks, two-dimensional convolution neural network and Three dimensional convolution nerve net
Network.Wherein, one-dimensional convolutional neural networks are usually used in the data processing of sequence class;Two-dimensional convolution neural network is commonly applied to image class
Text identification;Three dimensional convolution neural network is mainly used in medical image and video class data identification.
In the present invention is implemented, it can use the font information in convolutional neural networks MODEL C NN extraction character image, and
Export the corresponding font vector of each text in pictograph.For example, CET-4, CET-6, diploma, diploma etc., by
Different in certificate content type, text also has different fonts in certificate, for example, information such as name, time, unit and general
Font type, font size size, font weight of text etc. suffer from difference.These texts are generally also key message in certificate
A part, even all, therefore, firstly, it is necessary to extract the font information of text.And convolutional neural networks MODEL C NN conduct
A kind of convolutional neural networks model, is commonly used to the spatial information of abstract image, in practical application, herein can by application scene and
Demand uses the convolutional neural networks MODEL C NN of differing complexity.
Step S104 splices font vector text vector corresponding with text, and the splicing obtained according to splicing
Vector obtains feature vector.
In step S104, it can be input to the font information extracted in step S102 as a part of text vector
In Bi-LSTM+CRF model, it is named Entity recognition.
Here Bi-LSTM, that is, two-way LSTM model is a kind of variant of Recognition with Recurrent Neural Network (RNN), wherein LSTM exists
Memory unit is modified on the basis of basic RNN model, input gate is set up, forgets door and out gate, thus when realizing more effective
Sequence information learning.Bi-LSTM is then to increase a backward sequence on the basis of forward direction (for reversed) LSTM of script
Forward direction is usually spliced with reversed vector in output element, obtains a final output vector by column study.
The input of Bi-LSTM is the vector of each word or text, can be the simple form of one-hot, is also possible to
The term vector (Word2vec, Glove) of pre_train, will be added the font information of each text, institute in embodiments of the present invention
With the character/word vector for using the M*1 of pre_train to tie up, spliced word text/term vector and font information vector to obtain (N+
M) the input vector of * 1 dimension.Obtained after Bi-LSTM output vector dimension be 2 (N+M) * 1, that is, the feature spliced to
Amount.
Preferably, font vector is the vector of N*1 dimension, and text vector is the vector of M*1 dimension, wherein N indicates font vector
The quantity of the font attribute of corresponding text, M indicate the quantity of the word attribute of text in text vector.Wherein, word here
Body attribute can be font type, font size of text etc. for attribute the characteristics of indicating font.Word attribute is then to use
To indicate that text is the attribute of verb, noun, predicate, subject, name, place name etc..
As a kind of optional embodiment, font vector text vector corresponding with text is spliced, and according to spelling
The splicing vector that connects obtain feature vector include: by dimension be N*1 font vector and dimension be M*1 text vector into
Row splicing, obtains the splicing vector of the dimension of (N+M) * 1;The splicing vector that (N+M) * 1 is tieed up is as two-way length memory network mould in short-term
The input of type Bi-LSTM;Obtain the output of two-way length memory network Model B i-LSTM in short-term;Feature vector is obtained according to output,
Wherein, feature vector is the vector of 2 (N+M) * 1 dimension.
Step S106, obtains name entity set according to feature vector, wherein includes that multiple names are real in name entity set
Body.
Wherein, here condition random field (Conditional Random Field, abbreviation CRF) is a kind of probability without
To graph model.Condition random field is the conditional probability of another set output stochastic variable under the conditions of given one group of input stochastic variable
Distributed model, feature assume that output stochastic variable constitutes markov random file.It and HMM are on the contrary, be one kind by observation sequence
Column predict the discrimination model of implicit variable, are usually used in the scenes such as syntactic analysis, name Entity recognition, part-of-speech tagging.Herein, I
Using CRF as next layer of Bi-LSTM, input for each layer of Bi-LSTM 2 (N+M) * 1 dimension feature vector, export for pair
The sequence label answered, i.e., various name entities.
In step S108, according to feature vector obtain name entity set may include: using feature vector as condition with
The input of airport MODEL C RF;Obtain the output of conditional random field models CRF;It is obtained according to the output of conditional random field models CRF
Name entity set.
Step S108 constructs rhetoric question topic corresponding with character image, and needs to obtain based on putting up a question topic and positioning to obtain
Name entity, wherein the name entity for needing to obtain belongs to name entity set.
In this embodiment it is possible to be obtained by carrying out information extraction to character image using convolutional neural networks MODEL C NN
The corresponding font vector of text into character image;Then font vector text vector corresponding with text is spliced, and
Feature vector is obtained according to the splicing vector that splicing obtains;Name entity set is obtained further according to feature vector, wherein name entity
Concentrating includes multiple name entities;And corresponding with character image rhetoric question topic is constructed, and position and needed based on rhetoric question topic
The name entity to be obtained, wherein the name entity for needing to obtain belongs to name entity set.Relative in the related technology due to card
Book it is many kinds of, the certificate of different unit grantings has different form and content or even the same unit different time not
Content and form with the certificate of department's granting is also less identical.This is just that traditional certificate identification brings problem, even if
Extracted the text of certificate, but still can not match information the drawbacks of, the name entity that provides through the embodiment of the present invention
Recognition methods may be implemented to splice the corresponding text information of font vector sum text information of the font information of extraction
Spliced splicing vector is obtained, and the purpose of name entity set is obtained according to splicing vector, to consider not only text
Spatial information, it is also considered that the contextual information for having arrived text improves the recognition efficiency of effective information, and then solves correlation
Carrying out the information that information identifies to some files in the way of traditional information extraction in technology is not available information
Technical problem.
In step S108, constructing rhetoric question topic corresponding with character image may include: that extraction character image is corresponding
The key message of text, wherein key message be with name entity have incidence relation Feature Words;Using key message as setting
Problem mesh.That is, put up a question for key message, the purpose of this step is to be analogous to reading understanding to will extract information to ask
The problem of inscribing, passing through rhetoric question, looks for word segment relevant to problem, from original text with the position of location answer.
Here by taking diploma as an example, the key message for extracting diploma should include: name, graduation time, graduation list
Position, graduation educational background, date of birth, length of schooling etc..It can so put up a question accordingly:
Does is A: what the name of student?
Does is B: what the graduation unit of student?
……
In addition, positioning to obtain the name entity that needs obtain to may include: to pass through based on topic is put up a question in step S108
Neural network model is matched, determines the identifier of text fragments corresponding with topic is put up a question, wherein matching neural network model is
It is obtained using multi-group data by machine learning training, every group of data in multi-group data include: to put up a question topic and be somebody's turn to do
Put up a question the identifier of the corresponding text fragments of topic;It is extracted to obtain the name reality for needing to obtain according to the identifier of text fragments
Body.
For example, can go to understand text using the model similar to Match-LSTM, segment relevant to problem is positioned.Card
The characteristics of book content is that text is extremely terse, and one segment of a content is separated with comma, in response to this, according to text
Sequence by text segment number, the number of final output segment relevant to problem.
The training process for matching neural network model is similar to Match-LSTM, also divides four steps.First to problem and original text
It is Embedding, generates term vector;Then Encode is carried out to problem and source text using two-way LSTM;Third calculates
The each word of original text is distributed about the attention of problem, and summarizes problem representation using attention distribution, and the original text word is indicated
Indicate that another LSTM layers of input is Encode and obtains the query-aware expression of the word with correspondence problem;4th, then plus one layer
Attention layers, the vector for obtaining text indicates;The probability P i for finally going to seek each word with Softmax layers, optimization aim are mesh
The probability of the word of standard film section even multiplies value maximum, that is,Wherein, l indicates that loss function, k indicate text fragments
Number, i indicate segment in i-th of word.Here loss function is mainly used for the network in matching neural network model
The parameter of function in layer optimizes.It should be noted that the text due to certificate is relatively short, name entity is brighter
It is aobvious, so not needing positioning initial position.That is, the training process of above-mentioned matching neural network model is similar to Match-LSTM
, but be distinct in last result output, it is only necessary to target can be directly found by finding corresponding position, no
It needs to position initial position.
Wherein, Embedding is that the embeding layer in network structure is mainly converted to positive integer with fixed size
Vector.The reason of using embeding layer: 1. vectors encoded using one-hot method can very higher-dimension it is also very sparse.Assuming that we
Do encountered in natural language processing one include 2000 words dictionary, when being encoded using one-hot, each word can be one
Vector comprising 2000 integers indicates, wherein 1999 numbers are 0;During 2. neural network being trained, Mei Geqian
The vector entered can all be updated.
Softmax function, also known as normalization exponential function, in mathematics, especially probability theory and related fields, actually
It is the log of gradient normalization of finite term discrete probability distribution.It can be by a K dimensional vector Z containing any real number " compressed " to another
An outer K is tieed up in real vector so that the range of each element is between (0,1), and all elements and be 1.
It should before positioning to obtain the name entity for needing to obtain based on rhetoric question topic as a kind of optional embodiment
The recognition methods for naming entity can also include: to identify to the corresponding text of character image, obtain multiple text segments;Base
It is that multiple text segments add identifier in pre-defined rule;Wherein, the corresponding text of character image is identified, is obtained multiple
Text segment includes: the predetermined punctuation mark identified in text;According to predefined identifier number to the corresponding text of character image into
Row identification, obtains multiple text segments.
In addition, the feature more terse due to certificate text, the name entity of extraction is object content, passes through above-mentioned base
Text segment is positioned in content of text in putting up a question topic, so that it may which the core answer of correspondence problem is found.I.e. first positioning is set
Problem purpose answer position, then extract the name entity of the position.
The recognition methods of the name entity provided through the embodiment of the present invention can extract the font information of character image, and
In conjunction with font information, Entity recognition is named using Bi-LSTM+CRF model, time, name, mechanism name in extraction text
Title, place name etc. name entity;Set up " problem " using key message as answer;It is managed again using Bi-LSTM+Attention model
Text is solved, predicts sentence relevant to problem;And match the name entity in correlative, as answer.For Current Content
After changeable certificate identification the problem of text information extraction, the font information for combining text and instantly popular deep learning are proposed
Method go realize name Entity recognition, so both consider text spatial information, it is also considered that the contextual information of text.So
Simple the problem of reading answer " what is " in understanding is converted by Text Feature Extraction afterwards, proposes that one kind is similar to Match-LSTM
Model building method, no longer go prediction answer starting point or answer word, but go positioning be segmented according to punctuation mark after
Answer segment position.It goes to extract information in conjunction with text position and name Entity recognition.
Embodiment 2
A kind of identification device for naming entity is additionally provided according to embodiments of the present invention, it should be noted that the present invention is real
The identification device for applying the name entity of example can be used for executing the recognition methods of name entity provided by the embodiment of the present invention.With
Under to it is provided in an embodiment of the present invention name entity identification device be introduced.
Fig. 2 is the schematic diagram of the identification device of name entity according to an embodiment of the present invention, as shown in Fig. 2, the name is real
The identification device of body may include: extracting unit 21, first acquisition unit 23, second acquisition unit 25, third acquiring unit 27.
The identification device of the name entity is described in detail below.
Extracting unit 21 obtains text for carrying out information extraction to character image using convolutional neural networks MODEL C NN
The corresponding font vector of text in image.
First acquisition unit 23 is connect with above-mentioned extracting unit 21, for by font vector text corresponding with text to
Amount is spliced, and obtains feature vector according to the splicing vector that splicing obtains.
Second acquisition unit 25 is connect with above-mentioned first acquisition unit 23, for obtaining name entity according to feature vector
Collection, wherein include multiple name entities in name entity set.
Third acquiring unit 27 is connect with above-mentioned second acquisition unit 25, for constructing rhetoric question corresponding with character image
Topic, and position to obtain the name entity for needing to obtain based on topic is put up a question, wherein the name entity for needing to obtain belongs to name
Entity set.
It should be noted that the extracting unit 21 in the embodiment can be used for executing the step in the embodiment of the present invention
S102, the first acquisition unit 23 in the embodiment can be used for executing the step S104 in the embodiment of the present invention, the embodiment
In second acquisition unit 25 can be used for executing the step S106 in the embodiment of the present invention, third in the embodiment obtains single
Member 27 can be used for executing the step S108 in the embodiment of the present invention.The example and answer that above-mentioned module and corresponding step are realized
It is identical with scene, but it is not limited to the above embodiments disclosure of that.
In this embodiment it is possible to be carried out using convolutional neural networks MODEL C NN to character image using extracting unit 21
Information extraction obtains the corresponding font vector of text in character image;Then using first acquisition unit 23 by font vector with
The corresponding text vector of text is spliced, and obtains feature vector according to the splicing vector that splicing obtains;Second is recycled to obtain
Unit 25 is taken to obtain name entity set according to feature vector, wherein to include multiple name entities in name entity set;And utilize the
Three acquiring units construct rhetoric question topic corresponding with character image, and position to obtain the name reality for needing to obtain based on topic is put up a question
Body, wherein the name entity for needing to obtain belongs to name entity set.Relative to many kinds of due to certificate in the related technology,
The card that the certificate of different unit grantings has different form and content or even the same unit different time difference department to provide
The content and form of book is also less identical.This is just that traditional certificate identification brings problem, even if having extracted certificate
Text, but still can not match information the drawbacks of, the identification device of the name entity provided through the embodiment of the present invention can be with
Realization is spliced the corresponding text information of font vector sum text information of the font information of extraction to obtain spliced spelling
Vector is connect, and is also examined according to the purpose that splicing vector obtains name entity set to consider not only the spatial information of text
The contextual information for having considered text, improves the recognition efficiency of effective information, and then solves and utilize tradition in the related technology
Information extraction mode the technical issues of information that identifies of information is not available information is carried out to some files.
As a kind of optional embodiment, font vector is the vector of N*1 dimension, and text vector is the vector of M*1 dimension,
In, N indicates the quantity of the font attribute of the corresponding text of font vector, and M indicates the number of the word attribute of text in text vector
Amount.
As a kind of optional embodiment, first acquisition unit includes: splicing module, for the font for being N*1 by dimension
Vector is spliced with the text vector that dimension is M*1, obtains the splicing vector of the dimension of (N+M) * 1;First determining module, being used for will
(N+M) input of the splicing vector of * 1 dimension as two-way length memory network Model B i-LSTM in short-term;First obtains module, is used for
Obtain the output of two-way length memory network Model B i-LSTM in short-term;Second obtain module, for according to output obtain feature to
Amount, wherein feature vector is the vector of 2 (N+M) * 1 dimension.
As a kind of optional embodiment, second acquisition unit includes: the second determining module, for using feature vector as
The input of conditional random field models CRF;Third obtains module, for obtaining the output of conditional random field models CRF;4th obtains
Module, for obtaining name entity set according to the output of conditional random field models CRF.
As a kind of optional embodiment, third acquiring unit includes: abstraction module, corresponding for extracting character image
The key message of text, wherein key message be with name entity have incidence relation Feature Words;Third determining module is used
In using key message as rhetoric question topic.
As a kind of optional embodiment, third acquiring unit includes: the 4th determining module, for passing through matching nerve net
Network model determines the identifier of text fragments corresponding with topic is put up a question, wherein matching neural network model is to use multiple groups number
According to what is obtained by machine learning training, every group of data in multi-group data include: to put up a question topic and the rhetoric question topic pair
The identifier for the text fragments answered;Extraction module extracts to obtain the name for needing to obtain for the identifier according to text fragments
Entity.
As a kind of optional embodiment, the identification device of the name entity further include: the 4th acquiring unit, in base
Before putting up a question topic and positioning to obtain the name entity for needing to obtain, the corresponding text of character image is identified, is obtained more
A text segment;Adding unit, for being that multiple text segments add identifier based on pre-defined rule;Wherein, the 4th list is obtained
Member includes: identification module, for identification the predetermined punctuation mark in text;5th obtains module, for according to predefined identifier
Number the corresponding text of character image is identified, obtains multiple text segments.
The identification device of above-mentioned name entity includes processor and memory, above-mentioned extracting unit 21, first acquisition unit
23, second acquisition unit 25, third acquiring unit 27 is equal to be stored in memory as program unit, is deposited by processor execution
Above procedure unit in memory is stored up to realize corresponding function.
Include kernel in above-mentioned processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set
One or more constructs rhetoric question topic corresponding with character image by adjusting kernel parameter, and is positioned to based on topic is put up a question
The name entity obtained to needs, wherein the name entity for needing to obtain belongs to name entity set.
Above-mentioned memory may include the non-volatile memory in computer-readable medium, random access memory
(RAM) and/or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes extremely
A few storage chip.
Another aspect according to an embodiment of the present invention, additionally provides a kind of storage medium, and storage medium includes storage
Program, wherein program executes the recognition methods of any one of above-mentioned name entity.
Another aspect according to an embodiment of the present invention additionally provides a kind of processor, and processor is used to run program,
Wherein, the recognition methods of the name entity of above-mentioned any one is executed when program is run.
A kind of equipment is additionally provided in embodiments of the present invention, which includes processor, memory and be stored in storage
On device and the program that can run on a processor, processor perform the steps of when executing program and utilize convolutional neural networks mould
Type CNN carries out information extraction to character image, obtains the corresponding font vector of text in character image;By font vector and text
Corresponding text vector is spliced, and obtains feature vector according to the splicing vector that splicing obtains;It is obtained according to feature vector
Name entity set, wherein include multiple name entities in name entity set;Rhetoric question topic corresponding with character image is constructed, and
It positions to obtain the name entity for needing to obtain based on topic is put up a question, wherein the name entity for needing to obtain belongs to name entity set.
A kind of computer program product is additionally provided in embodiments of the present invention, when being executed on data processing equipment,
It is adapted for carrying out the program of initialization there are as below methods step: using convolutional neural networks MODEL C NN to character image progress information
It extracts, obtains the corresponding font vector of text in character image;Font vector text vector corresponding with text is spliced,
And feature vector is obtained according to the splicing vector that splicing obtains;Name entity set is obtained according to feature vector, wherein name entity
Concentrating includes multiple name entities;Rhetoric question topic corresponding with character image is constructed, and is needed based on putting up a question topic and positioning
The name entity of acquisition, wherein the name entity for needing to obtain belongs to name entity set.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others
Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of recognition methods for naming entity characterized by comprising
Information extraction is carried out to character image using convolutional neural networks MODEL C NN, it is corresponding to obtain text in the character image
Font vector;
Font vector text vector corresponding with the text is spliced, and is obtained according to the splicing vector that splicing obtains
Take feature vector;
Name entity set is obtained according to described eigenvector, wherein includes multiple name entities in the name entity set;
Rhetoric question topic corresponding with the character image is constructed, and positions to obtain the name for needing to obtain based on the rhetoric question topic
Entity, wherein described that the name entity obtained is needed to belong to the name entity set.
2. the method according to claim 1, wherein the font vector be N*1 dimension vector, the text to
Amount is the vector of M*1 dimension, wherein N indicates the quantity of the font attribute of the corresponding text of the font vector, and M indicates the text
The quantity of the word attribute of text in word vector.
3. according to the method described in claim 2, it is characterized in that, by font vector text corresponding with the text to
Amount is spliced, and the splicing vector acquisition feature vector obtained according to splicing includes:
The text vector that the font vector that dimension is N*1 is M*1 with dimension is spliced, the dimension of (N+M) * 1 is obtained
Splicing vector;
Will described in input of the splicing vector tieed up of (N+M) * 1 as two-way length memory network Model B i-LSTM in short-term;
Obtain the output of two-way length memory network Model B i-LSTM in short-term;
Described eigenvector is obtained according to the output, wherein described eigenvector is the vector of 2 (N+M) * 1 dimension.
4. the method according to claim 1, wherein according to described eigenvector obtain name entity set include:
Using described eigenvector as the input of conditional random field models CRF;
Obtain the output of the conditional random field models CRF;
The name entity set is obtained according to the output of the conditional random field models CRF.
5. the method according to claim 1, wherein building rhetoric question topic packet corresponding with the character image
It includes:
Extract the key message of the corresponding text of the character image, wherein the key message is that have with the name entity
Relevant Feature Words;
Using the key message as the rhetoric question topic.
6. the method according to claim 1, wherein positioning to obtain the life for needing to obtain based on the rhetoric question topic
Name entity include:
By matching neural network model, the identifier of determining text fragments corresponding with the rhetoric question topic, wherein described
It is obtained using multi-group data by machine learning training with neural network model, every group of data in the multi-group data are equal
It include: the identifier for putting up a question topic and the corresponding text fragments of rhetoric question topic;
It is extracted to obtain the name entity for needing to obtain according to the identifier of the text fragments.
7. according to the method described in claim 6, it is characterized in that, positioning to obtain what needs obtained based on the rhetoric question topic
Before name entity, further includes:
The corresponding text of the character image is identified, multiple text segments are obtained;
It is that the multiple text segment adds identifier based on pre-defined rule;
Wherein, the corresponding text of the character image is identified, obtaining multiple text segments includes:
Identify the predetermined punctuation mark in the text;
The corresponding text of the character image is identified according to the predefined identifier number, obtains the multiple letter plate
Section.
8. a kind of identification device for naming entity characterized by comprising
Extracting unit obtains the text figure for carrying out information extraction to character image using convolutional neural networks MODEL C NN
The corresponding font vector of text as in;
First acquisition unit, for splicing font vector text vector corresponding with the text, and according to spelling
The splicing vector connect obtains feature vector;
Second acquisition unit, for obtaining name entity set according to described eigenvector, wherein include in the name entity set
Multiple name entities;
Third acquiring unit is positioned for constructing rhetoric question topic corresponding with the character image, and based on the rhetoric question topic
Obtain the name entity for needing to obtain, wherein described that the name entity obtained is needed to belong to the name entity set.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein described program right of execution
Benefit require any one of 1 to 7 described in name entity recognition methods.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit require any one of 1 to 7 described in name entity recognition methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811332914.2A CN109522553B (en) | 2018-11-09 | 2018-11-09 | Named entity identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811332914.2A CN109522553B (en) | 2018-11-09 | 2018-11-09 | Named entity identification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109522553A true CN109522553A (en) | 2019-03-26 |
CN109522553B CN109522553B (en) | 2020-02-11 |
Family
ID=65776277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811332914.2A Active CN109522553B (en) | 2018-11-09 | 2018-11-09 | Named entity identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522553B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119694A (en) * | 2019-04-24 | 2019-08-13 | 北京百炼智能科技有限公司 | A kind of image processing method, device and computer readable storage medium |
CN110209721A (en) * | 2019-06-04 | 2019-09-06 | 南方科技大学 | Method and device for calling judgment document, server and storage medium |
CN110334357A (en) * | 2019-07-18 | 2019-10-15 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition |
CN110348023A (en) * | 2019-07-18 | 2019-10-18 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and the electronic equipment of Chinese text participle |
CN110348025A (en) * | 2019-07-18 | 2019-10-18 | 北京香侬慧语科技有限责任公司 | A kind of interpretation method based on font, device, storage medium and electronic equipment |
CN110348022A (en) * | 2019-07-18 | 2019-10-18 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus of similarity analysis, storage medium and electronic equipment |
CN110619124A (en) * | 2019-09-19 | 2019-12-27 | 成都数之联科技有限公司 | Named entity identification method and system combining attention mechanism and bidirectional LSTM |
CN110705272A (en) * | 2019-08-28 | 2020-01-17 | 昆明理工大学 | Named entity identification method for automobile engine fault diagnosis |
CN110781646A (en) * | 2019-10-15 | 2020-02-11 | 泰康保险集团股份有限公司 | Name standardization method, device, medium and electronic equipment |
CN111126069A (en) * | 2019-12-30 | 2020-05-08 | 华南理工大学 | Social media short text named entity identification method based on visual object guidance |
CN111241839A (en) * | 2020-01-16 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Entity identification method, entity identification device, computer readable storage medium and computer equipment |
CN111488739A (en) * | 2020-03-17 | 2020-08-04 | 天津大学 | Implicit discourse relation identification method based on multi-granularity generated image enhancement representation |
CN111767732A (en) * | 2020-06-09 | 2020-10-13 | 上海交通大学 | Document content understanding method and system based on graph attention model |
WO2020232864A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Data processing method and related apparatus |
CN112069792A (en) * | 2019-05-24 | 2020-12-11 | 阿里巴巴集团控股有限公司 | Named entity identification method, device and equipment |
CN113283241A (en) * | 2020-02-20 | 2021-08-20 | 阿里巴巴集团控股有限公司 | Text recognition method and device, electronic equipment and computer readable storage medium |
US11681875B2 (en) | 2019-09-16 | 2023-06-20 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for image text recognition, apparatus, device and storage medium |
WO2023130688A1 (en) * | 2022-01-05 | 2023-07-13 | 苏州浪潮智能科技有限公司 | Natural language processing method and apparatus, device, and readable storage medium |
CN117252202A (en) * | 2023-11-20 | 2023-12-19 | 江西风向标智能科技有限公司 | Construction method, identification method and system for named entities in high school mathematics topics |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101246550A (en) * | 2008-03-11 | 2008-08-20 | 深圳华为通信技术有限公司 | Image character recognition method and device |
CN102314417A (en) * | 2011-09-22 | 2012-01-11 | 西安电子科技大学 | Method for identifying Web named entity based on statistical model |
CN106228157A (en) * | 2016-07-26 | 2016-12-14 | 江苏鸿信系统集成有限公司 | Coloured image word paragraph segmentation based on image recognition technology and recognition methods |
CN107644014A (en) * | 2017-09-25 | 2018-01-30 | 南京安链数据科技有限公司 | A kind of name entity recognition method based on two-way LSTM and CRF |
-
2018
- 2018-11-09 CN CN201811332914.2A patent/CN109522553B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101246550A (en) * | 2008-03-11 | 2008-08-20 | 深圳华为通信技术有限公司 | Image character recognition method and device |
CN102314417A (en) * | 2011-09-22 | 2012-01-11 | 西安电子科技大学 | Method for identifying Web named entity based on statistical model |
CN106228157A (en) * | 2016-07-26 | 2016-12-14 | 江苏鸿信系统集成有限公司 | Coloured image word paragraph segmentation based on image recognition technology and recognition methods |
CN107644014A (en) * | 2017-09-25 | 2018-01-30 | 南京安链数据科技有限公司 | A kind of name entity recognition method based on two-way LSTM and CRF |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119694B (en) * | 2019-04-24 | 2021-03-12 | 北京百炼智能科技有限公司 | Picture processing method and device and computer readable storage medium |
CN110119694A (en) * | 2019-04-24 | 2019-08-13 | 北京百炼智能科技有限公司 | A kind of image processing method, device and computer readable storage medium |
WO2020232864A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Data processing method and related apparatus |
CN112069792A (en) * | 2019-05-24 | 2020-12-11 | 阿里巴巴集团控股有限公司 | Named entity identification method, device and equipment |
CN110209721A (en) * | 2019-06-04 | 2019-09-06 | 南方科技大学 | Method and device for calling judgment document, server and storage medium |
CN110334357A (en) * | 2019-07-18 | 2019-10-15 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition |
CN110348023A (en) * | 2019-07-18 | 2019-10-18 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus, storage medium and the electronic equipment of Chinese text participle |
CN110348025A (en) * | 2019-07-18 | 2019-10-18 | 北京香侬慧语科技有限责任公司 | A kind of interpretation method based on font, device, storage medium and electronic equipment |
CN110348022A (en) * | 2019-07-18 | 2019-10-18 | 北京香侬慧语科技有限责任公司 | A kind of method, apparatus of similarity analysis, storage medium and electronic equipment |
CN110705272A (en) * | 2019-08-28 | 2020-01-17 | 昆明理工大学 | Named entity identification method for automobile engine fault diagnosis |
US11681875B2 (en) | 2019-09-16 | 2023-06-20 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for image text recognition, apparatus, device and storage medium |
CN110619124A (en) * | 2019-09-19 | 2019-12-27 | 成都数之联科技有限公司 | Named entity identification method and system combining attention mechanism and bidirectional LSTM |
CN110781646A (en) * | 2019-10-15 | 2020-02-11 | 泰康保险集团股份有限公司 | Name standardization method, device, medium and electronic equipment |
CN110781646B (en) * | 2019-10-15 | 2023-08-22 | 泰康保险集团股份有限公司 | Name standardization method, device, medium and electronic equipment |
CN111126069B (en) * | 2019-12-30 | 2022-03-29 | 华南理工大学 | Social media short text named entity identification method based on visual object guidance |
CN111126069A (en) * | 2019-12-30 | 2020-05-08 | 华南理工大学 | Social media short text named entity identification method based on visual object guidance |
WO2021135193A1 (en) * | 2019-12-30 | 2021-07-08 | 华南理工大学 | Visual object guidance-based social media short text named entity identification method |
CN111241839A (en) * | 2020-01-16 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Entity identification method, entity identification device, computer readable storage medium and computer equipment |
CN111241839B (en) * | 2020-01-16 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Entity identification method, entity identification device, computer readable storage medium and computer equipment |
CN113283241A (en) * | 2020-02-20 | 2021-08-20 | 阿里巴巴集团控股有限公司 | Text recognition method and device, electronic equipment and computer readable storage medium |
CN113283241B (en) * | 2020-02-20 | 2022-04-29 | 阿里巴巴集团控股有限公司 | Text recognition method and device, electronic equipment and computer readable storage medium |
CN111488739A (en) * | 2020-03-17 | 2020-08-04 | 天津大学 | Implicit discourse relation identification method based on multi-granularity generated image enhancement representation |
CN111488739B (en) * | 2020-03-17 | 2023-07-18 | 天津大学 | Implicit chapter relation identification method for generating image enhancement representation based on multiple granularities |
CN111767732A (en) * | 2020-06-09 | 2020-10-13 | 上海交通大学 | Document content understanding method and system based on graph attention model |
CN111767732B (en) * | 2020-06-09 | 2024-01-26 | 上海交通大学 | Document content understanding method and system based on graph attention model |
WO2023130688A1 (en) * | 2022-01-05 | 2023-07-13 | 苏州浪潮智能科技有限公司 | Natural language processing method and apparatus, device, and readable storage medium |
CN117252202A (en) * | 2023-11-20 | 2023-12-19 | 江西风向标智能科技有限公司 | Construction method, identification method and system for named entities in high school mathematics topics |
CN117252202B (en) * | 2023-11-20 | 2024-03-19 | 江西风向标智能科技有限公司 | Construction method, identification method and system for named entities in high school mathematics topics |
Also Published As
Publication number | Publication date |
---|---|
CN109522553B (en) | 2020-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109522553A (en) | Name recognition methods and the device of entity | |
CN108733837B (en) | Natural language structuring method and device for medical history text | |
CN111078836B (en) | Machine reading understanding method, system and device based on external knowledge enhancement | |
CN110134954B (en) | Named entity recognition method based on Attention mechanism | |
CN110019843A (en) | The processing method and processing device of knowledge mapping | |
CN113569001A (en) | Text processing method and device, computer equipment and computer readable storage medium | |
CN110990555B (en) | End-to-end retrieval type dialogue method and system and computer equipment | |
CN112819023A (en) | Sample set acquisition method and device, computer equipment and storage medium | |
CN109214562A (en) | A kind of power grid scientific research hotspot prediction and method for pushing based on RNN | |
CN112749556B (en) | Multi-language model training method and device, storage medium and electronic equipment | |
CN113761868B (en) | Text processing method, text processing device, electronic equipment and readable storage medium | |
CN115329200A (en) | Teaching resource recommendation method based on knowledge graph and user similarity | |
CN113392179A (en) | Text labeling method and device, electronic equipment and storage medium | |
CN114282528A (en) | Keyword extraction method, device, equipment and storage medium | |
CN112651324A (en) | Method and device for extracting semantic information of video frame and computer equipment | |
CN116561272A (en) | Open domain visual language question-answering method and device, electronic equipment and storage medium | |
CN114239730B (en) | Cross-modal retrieval method based on neighbor ordering relation | |
CN115203388A (en) | Machine reading understanding method and device, computer equipment and storage medium | |
CN114648032A (en) | Training method and device of semantic understanding model and computer equipment | |
CN114510561A (en) | Answer selection method, device, equipment and storage medium | |
CN113342944A (en) | Corpus generalization method, apparatus, device and storage medium | |
CN112132075A (en) | Method and medium for processing image-text content | |
CN113657092B (en) | Method, device, equipment and medium for identifying tag | |
CN112989801B (en) | Sequence labeling method, device and equipment | |
CN114625986A (en) | Method, device and equipment for sorting search results and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: 519031 office 1316, No. 1, lianao Road, Hengqin new area, Zhuhai, Guangdong Patentee after: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd. Address before: 519000 room 417, building 20, creative Valley, Hengqin new area, Xiangzhou, Zhuhai, Guangdong Patentee before: LONGMA ZHIXIN (ZHUHAI HENGQIN) TECHNOLOGY Co.,Ltd. |
|
PP01 | Preservation of patent right | ||
PP01 | Preservation of patent right |
Effective date of registration: 20240718 Granted publication date: 20200211 |