AU2020103654A4 - Method for intelligent construction of place name annotated corpus based on interactive and iterative learning - Google Patents

Method for intelligent construction of place name annotated corpus based on interactive and iterative learning Download PDF

Info

Publication number
AU2020103654A4
AU2020103654A4 AU2020103654A AU2020103654A AU2020103654A4 AU 2020103654 A4 AU2020103654 A4 AU 2020103654A4 AU 2020103654 A AU2020103654 A AU 2020103654A AU 2020103654 A AU2020103654 A AU 2020103654A AU 2020103654 A4 AU2020103654 A4 AU 2020103654A4
Authority
AU
Australia
Prior art keywords
place name
model
character
sentence
interactive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2020103654A
Inventor
Shuhui Chen
Yubing CHEN
Chen Wang
Chunju ZHANG
Xueying ZHANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Normal University
Original Assignee
Nanjing Normal University
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Normal University, Nanjing Tech University filed Critical Nanjing Normal University
Application granted granted Critical
Publication of AU2020103654A4 publication Critical patent/AU2020103654A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present invention discloses a method for intelligent construction of a place name annotated corpus based on interactive and iterative learning. The method includes: generating a word vector matrix of a character and a disambiguation matrix of the character in a sentence in an initial corpus, after splicing the word vector matrix and the disambiguation matrix, inputting, for training, the word vector matrix and the disambiguation matrix into a model in which Bi-LSTM and CRF are integrated, and generating a place name identification model; embedding the place name identification model into a human-machine interactive place name annotation platform, and performing human-machine interactive correction; and merging initial training linguistic data with annotated place name linguistic data, optimizing a parameter of the place name identification model, and ending iterative training and learning until the constructed corpus meets a requirement, thereby intelligently constructing and optimizing the place name corpus. Based on the present invention, current problems of a lack and slow update of place name linguistic data, and time-consuming, laborious, and inefficient manual construction of the place name linguistic data can be effectively resolved, and intelligent update of the place name annotated corpus facing multi-source, dynamic, heterogeneous, and exponentially growing Internet texts can be effectively implemented. 8 8) e~ a C4. _ C) .- oa CAC Z ~H~i & o a --- -- ----- ----- --. ti Fig. 1 1/10

Description

8 8)
e~
a
C4. _
.- oa
C) Z CAC -- --- ----- ----- --.
~H~i & oa ti
Fig. 1
1/10
METHOD FOR INTELLIGENT CONSTRUCTION OF PLACE NAME ANNOTATED CORPUS BASED ON INTERACTIVE AND ITERATIVE LEARNING FIELD OF TECHNOLOGY
The present invention belongs to the field of geographic information processing
technologies, and specifically, relates to a method for intelligent construction of a
place name annotated corpus based on interactive and iterative learning, to optimize a
parameter of a deep learning model to a full extent, improve a place name
identification effect, and achieve intelligent construction and optimization of the place
name annotated corpus.
BACKGROUND
With rapid development of the Internet and the advent of the era of big data and
artificial intelligence, the world today is entering a ubiquitous information society and
the era of big data (Chenghu Zhou, 2011; Deren Li, 2012; Goodchild, 2017). Big
location data is an important part of big data, and 80% of information in the world is
related to locations (Williams, 1987; Jingnan Liu, 2014). Place names are the proper
names assigned by people to specific geographic entities in the universe, are an
important part of location information, and are also indispensable information for
digital surveying and mapping products. As one of the most commonly used social
public information, place names are the most acceptable positioning method for
ordinary people, and also provide indispensable basic information resources for
national administration, economic construction, and domestic and foreign exchanges.
Texts are a typical representative of ubiquitous geographic big data sources. A
data scale of the texts is getting larger and larger, and the texts cover a plurality of
fields which are more complex. Chinese text expressions have the following
characteristics: being unstructured, vague, and random, and having complex
composition and no obvious separator between words. Description of place name
entities in Chinese texts has the following characteristics: (1) Internal composition of
Chinese place name entities is complex and diverse, including both simple place
names and a large quantity of compound place names, that is, there may be a plurality
of overlapping place name entities, such as "Jiangning Nanjing Jiangsu". (2) Chinese
place names and other categories of entity names often contain each other. For
example, "Zuchong Road" contains a name. (3) Lengths of Chinese place names vary
relatively greatly, including an abbreviation and a full name of a place name, some
Chinese place names contain only one Chinese character, Such as "Ying", "Mei",
"Hu", and some Chinese place names can have up to a dozen of Chinese characters,
such as "Hong Kong Special Administrative Region of the People's Republic of
China". (4) A Chinese sentence is a sequence of Chinese characters, a place name
entity is a segment of the character sequence, and there is no separator between
Chinese characters, which is not conducive to identification of a boundary of place
name entities. (5) Compared with common nouns, a Chinese place name entity has no
obvious distinguishing features such as a case change and a word form change. (6)
Chinese linguistic data resources are small in scale and slowly updated. In particular,
with the rapid development of Intemet+ and big data, a large quantity of new and
unregistered place names has emerged. A plurality of the above-mentioned factors
cause identification of Chinese place name identities to fail to meet requirements of
ubiquitous location information services.
At present, identification methods of Chinese place names are mainly classified
into a method based on rules and dictionaries, a method based on statistics, and a
method based on both. The Chinese place name identification method based on rules
and dictionaries mostly use rule templates manually constructed by linguistic experts.
These rules often depend on specific languages, domains, and text styles, are
compiled in a time-consuming manner and difficult to cover all language phenomena,
have poor system portability, and are costly. According to different linguistic data, the
statistical-based Chinese place name identification method sets up complex feature
templates to extract features, inputs the features into a classification model, and
converts Chinese place name converts into sentence sequence tagging problems. This
method has the following disadvantages: (1) the method relatively severely depends on a corpus, and currently there are relatively few large-scale general corpora that can be used to construct and evaluate a place name entity identification system. (2)
Manually designed features require repeated experiments to complete modification,
adjustment and selection. The process is time-consuming and laborious, and requires
researchers to have a lot of linguistic knowledge. (3) Sparse representation of data
leads to excessively large model parameter space and excessive consumption of
model calculation and storage. In recent years, deep learning methods have provided a
new idea and method for extracting natural language information. In the deep learning
methods, feature templates no longer need to be manually formulated, but final output
is optimized by effectively learning features of input linguistic data and context
representation. Currently, deep learning neural networks commonly used for Chinese
name entity identification include feedforward neural network models, recurrent
neural networks (RNN), and the like. The feedforward neural networks generally
select input information by using fixed-length windows. Therefore, when some
sentences whose length exceeds a length of the window, there will be deficiencies and
context information of a word is ignored. A recurrent neural network (RNN) model is
a sequence model whose structure contains directional loops, can make full use of
sequence information, and has a memory function. Therefore, the RNN can handle
short-distance dependencies better, but a problem of disappearance of gradients or the
like occurs when the RNN deals with long-distance dependencies. To overcome
shortcomings of the RNN model, a variety of complex RNN models have been
proposed, such as a bidirectional recurrent neural network model (Bi-RNN) and a
long short-term memory model (LSTM). Because LSTM can handle long-distance
dependencies, LSTM is effective in natural language processing tasks.
Both traditional methods and the deep learning-based Chinese place name
identification methods rely heavily on corpora. A scale and coverage of training
linguistic data required directly affect an identification effect of Chinese place names.
Existing public place name linguistic data are as follows: (1) the People's Daily
annotated corpus, where the corpus covers a wide range of content, involving finance,
military, sports, entertainment, and the like, but place name information included in the corpus is sparsely and unevenly distributed; (2) linguistic data of "Encyclopedia of
China and Geography of China" (referred to as geographic encyclopedia linguistic
data, http://www.geoip.com.cn:9004/ITIS/corpus.html) is special linguistic data of
place names with independent intellectual property rights of Nanjing Normal
University, and description of place name entities is standardized and evenly
distributed, and include rich spatial semantic relationship information of place names;
(3) the Microsoft MSRA linguistic data is more in line with description characteristics
of free texts, but a quantity of place name entities is relatively small and distribution is
sparse and uneven. At present, large-scale general corpora that can be used to
construct and evaluate place name entity identification are relatively lacking and
slowly updated. Manual construction of the place name linguistic data is
time-consuming, laborious, and inefficient, which makes it impossible to optimize a
model parameter to a full extent during a deep learning training process, thereby
affecting a place name identification effect. In addition, in the era of ubiquitous
geographic information, a large quantity of new place names and unregistered place
names cannot be effectively resolved for exponentially growing multi-source,
dynamic, and heterogeneous Internet texts.
SUMMARY
Invention objective: In view of current problems that large-scale general corpora
for place name identification are relatively few and slowly updated, manual
construction of place name linguistic data is time-consuming, laborious, and
inefficient, and place name entity identification cannot meet requirements of
ubiquitous location information services, the objective of the present invention is to
provide a method for intelligent construction of a place name annotated corpus based
on interactive and iterative learning, to optimize a parameter of a deep learning model
to a full extent, improve a place name identification effect, and achieve intelligent
construction and optimization of the place name annotated corpus.
Technical solutions: To implement the foregoing invention objective, the present
invention uses the following technical solutions:
A method for intelligent construction of a place name annotated corpus based on interactive and iterative learning is provided, including the following steps: step 1: reading an initial place name annotated corpus data, including geographic encyclopedia linguistic data and Microsoft MSRA linguistic data; step 2: preprocessing the place name annotated corpus data, including segmenting sentences by using a blank line, deduplicating a sentence, and deleting a stop word; step 3: mixing the geographic encyclopedia linguistic data and the Microsoft MSRA linguistic data, and performing training by using a tool Word2vec, to obtain a character-level word vector model; step 4: representing each character in the place name annotated corpus by using the word vector model, to generate a word vector matrixX 100of each character; step 5: performing word segmentation and part-of-speech annotation on a sentence by using a tool Jieba, and generating, as a disambiguation matrix of the character, a vector matrixix20 of each character in the sentence based on a word segmentation result; step 6: splicing the word vector matrix of each character in the sentence and the disambiguation matrix of the corresponding character, to finally obtain a word vector matrix of the sentence; inputting, for training, the word vector matrix into a place name identification model in which Bi-LSTM and CRF are integrated; and selecting an optimal place name identification model by using three evaluation indicators of a natural language processing field: precision P, a recall rate R, and a comprehensive value F; step 7: developing an interactive Chinese place name annotation platform, and embedding the place name identification model in step 6 into the interactive Chinese place name annotation platform; step 8: performing place name identification on a new Internet text on the interactive place name annotation platform, and performing human-machine interactive correction on a place name identification result; and visually displaying, in a corresponding window, a place name finally identified in the Internet text, an added place name tag, and a deleted place name tag that is wrongly tagged; step 9: when a scale of annotated place name text linguistic data in step 8 reaches a specified threshold, automatically merging, by the interactive place name annotation platform, initial place name annotation linguistic data with place name linguistic data on which human-machine interactive correction is performed, to update the place name corpus; step 10: continuing training training code and a model parameter of the place name identification model in step 6 by using, as training linguistic data, the place name linguistic data generated in step 9, to optimize the parameter of the model, and improve a model identification effect; and displaying a model training progress, final precision, the recall rate, and the value F on the interactive annotation platform; and step 11: performing iterative looping from step 2 to step 10 for the new Internet text, to intelligently update and optimize the place name annotated corpus, and ending iterative training and learning until the place name identification effect and the scale of the place name annotated corpus meet a user requirement.
Further, step 6 specifically includes:
step 1: splicing the word vector matrix of each character in the sentence and the
disambiguation matrix of the corresponding character, to obtain the word vector
matrix of the sentence an input layer, and inputting the word vector matrix into the
Bi-LSTM for training;
step 2: setting a dropout regularization method, to preventing model overfitting;
step 3: using a sentence sequence (x 1 ,x 2 , ---xn) of the input layer as input of
time steps of the Bi-LSTM, where n indicates a quantity of character s in a sentence,
and x, indicates an ith character in the sentence; and then splicing a forward LSTM
hidden output sequence (fi,f 2 , .. fn) and a backward LSTM hidden input sequence
(bi, b 2 ,... bn) based on positions, to obtain a complete hidden output sequence
(fi,f 2 , .. f, bi, b 2 , ... bn), where semantic description information above and below is
fully considered to achieve deep learning and representation of features;
step 4: after dropout is set, connecting a linear layer, to convert the complete hidden output sequence from 2n dimensions to k dimensions, where the complete hidden output sequence is denoted as a matrix P"(, where k is a quantity of tag categories in an annotation set, including four categories of tags in total: B, I, E, and , B indicates a beginning character of a place name, I indicates a middle character of the place name, E indicates an end character of the place name, and 0 indicates a non-place-name character, so that features of the sentence are automatically extracted; step 5: based on an output layer matrix of a Bi-LSTM model in step 4, setting dropout to prevent model overfitting, inputting the Bi-LSTM model output layer matrix into a CRF model for sentence sequence annotation, that is, predicting a tag for each character; and step 6: selecting the optimal place name identification model by using the three evaluation indicators of the natural language processing field: the precision P, the recall rate R, and the comprehensive value F. Further, performing sentence sequence annotation based on the CRF model in step 5 is specifically as follows: for a tag sequence y = (yy2,-... ,yn) whose length is equal to a sentence length, a model scores a sentence x whose tag is equal to y as follows: n n+1 s(x'y) = P'yi + Ayi-_y where Pey, is a probability of outputting yj at an ith position, Ay _ is a probability of performing transition from yi_1 to y, a score of the entire sequence is equal to a sum of scores at various positions, and a score at each position is obtained based on two parts: one part is determined by Piy, output by LSTM, and the other part is determined by a transition matrix A of CRF; and a normalized probability obtained by using Softmax is as follows: exp(s(x, y)) P(ylx) =yepsxy) where a numerator indicates an index value for performing scoring by the model on the sentence x whose tag is equal to y, and a denominator indicates an index sum for performing scoring by the model on all sentences whose tags are equal to corresponding y; according to the obtained normalized probability, the sentences are sorted to identify a place name.
Further, the interactive Chinese place name annotation platform is implemented
by using the Python GUI programming Tkinter.
Further, the model is optimized on a local server or by uploading the training
code and the model parameter of the place name identification model to the cloud
Google Colaboratory in step 10.
Beneficial effects: Based on the present invention, current problems of a lack and
slow update of place name linguistic data, and time-consuming, laborious, and
inefficient manual construction of the place name linguistic data can be effectively
resolved, and intelligent update of the place name annotated corpus facing
multi-source, dynamic, heterogeneous, and exponentially growing Internet texts can
be effectively implemented. The present invention is widely applied to fields such as
ubiquitous geographic information mining, spatial location services, spatial
information retrieval, and natural language processing.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a flowchart of a method for intelligent construction of a place name
annotated corpus based on interactive and iterative learning according to the present
invention;
Fig. 2 is a screenshot of some data of a place name corpus according to an
embodiment of the present invention;
Fig. 3 is a screenshot of a list of some stop words according to an embodiment of
the present invention;
Fig. 4 is a screenshot of a pretrained word vector model according to an
embodiment of the present invention;
Fig. 5 is a screenshot of a result of matching characters in a dictionary and
pretrained word vectors according to an embodiment of the present invention;
Fig. 6 is a structural diagram of a model in which Bi-LSTM and CRF are
integrated according to an embodiment of the present invention;
Fig. 7 is a flowchart of Chinese place name identification in which Bi-LSTM and
CRF are integrated according to an embodiment of the present invention;
Fig. 8 is a screenshot of a CRF feature template according to an embodiment of
the present invention;
Fig. 9 is a screenshot of a training and evaluation result of a model in which
Bi-LSTM and CRF are integrated according to an embodiment of the present
invention;
Fig. 10 is an interface diagram of an interactive Chinese place name identification
and annotation platform according to an embodiment of the present invention;
Fig. 11 is an interface diagram of an identification result of Chinese place names
on an interactive annotation platform according to an embodiment of the present
invention;
Fig. 12 is an interface diagram of a result of human-machine interactive place
name annotation according to an embodiment of the present invention; and
Fig. 13 is an intelligent update interface diagram of an annotated corpus
according to an embodiment of the present invention.
DESCRIPTION OF THE EMBODIMENTS
The method of the present invention is further described in detail below with
reference to specific instances.
As shown in Fig. 1, a method for intelligent construction of a place name
annotated corpus based on interactive and iterative learning disclosed in an
embodiment of the present invention uses a method for integrating a bi-directional
long short-term memory model (Bi-LSTM) and a CRF model to implement identification of a place name entity in a text. Based on this, a human-machine interactive Chinese place name annotation platform is constructed, to perform place name identification on an Internet text, and human-machine interactive correction is performed on a place name identification result. When a scale of annotated Chinese place name text linguistic data reaches a specified threshold, initial training linguistic data is merged with place name annotation linguistic data, and the initial training corpus and the place name annotated corpus are re-input into a place name identification model for training, thereby optimizing a model parameter, improving a model identification effect, and adding new linguistic data to the place name annotated corpus. The above steps are iteratively looped, iterative training and learning are ended until a constructed corpus meets a requirement, thereby implementing intelligent construction and optimization of the place name corpus.
The method mainly includes three parts: the place name identification model in
which Bi-LSTM and CRF are integrated, a human-machine interactive Chinese place
name annotation method, and intelligent construction of the place name annotated
corpus based on iterative learning. Detailed steps are as follows:
Step 1: Read an initial place name annotated corpus data.
Place name linguistic data in geographic encyclopedia linguistic data and place
name linguistic data in Microsoft MSRA linguistic data (Fig. 2) are read.
Step 2: Preprocessing the corpus data.
Sentences in the corpus data are segmented by using a blank line. Then word
segmentation is performed on the geographic encyclopedia linguistic data and the
Microsoft MSRA place name linguistic data by using a tool Jieba, a sentence is
deduplicated, and a stop word is deleted (Fig. 3).
Step 3: Generate a word vector matrix of the place name linguistic data based on
word2vec.
First, the geographic encyclopedia linguistic data is mixed with the Microsoft
MSRA linguistic data, and training is performed by using the tool Word2vec, to obtain
a character-level word vector model (Fig. 4).
Training parameters are as follows: a minimum quantity of appearance times of a word needing to be trained: mincount=5; word vector scale (dimension): size=100; a quantity of words transferred to a thread in each batch: batch_words=10000; training window: window=5; training algorithm: sg=1 (sg=0is a cbow algorithm, and sg=1 is a skip-gram algorithm); thread: workers=4; a quantity of iteration times: Iter--50.
Step 4: Generate a word vector matrix of a character in a place name linguistic
data set.
Each character in a place name annotated corpus is represented by using the word
vector model, to generate a word vector matrixX100 of each character (Fig. 5).
Step 5: Generate a disambiguation matrix of the character in the place name
linguistic data set.
Word segmentation and part-of-speech annotation are performed on the sentence
by using the tool Jieba. Based on a word segmentation result, meanings of character s
in the sentence are classified into 4 categories, represented by numbers 0, 1, 2 and 3. 0
indicates that a character is single word, 1 indicates that a character is a beginning of
the word, 2 indicates that the character is in the middle of the word, and 3 indicates
that this character is an end of the word. For example, "I am Chinese" may be
expressed as [0, 0, 1, 2, 3]. Based on the word segmentation result, a vector matrixix20
(briefly referred to as a disambiguation matrix of the character) is generated for each
character in the sentence, to achieve a purpose of eliminating a plurality of semantic
expressions of the character. For example, "shang" may be an independent positional
preposition or a character in the noun "Shanghai".
Step 6: Place name identification model in which Bi-LSTM and CRF are
integrated.
The word vector matrix of each character in the sentence and the disambiguation
matrix of the corresponding character are spliced, to finally obtain a word vector matrix of the sentence; the word vector matrix of the sentence is input, for training, into the place name identification model in which Bi-LSTM and CRF are integrated.
An optimal place name identification model is selected by using three evaluation
indicators of a natural language processing field: precision P, a recall rate R, and a
comprehensive value F (referring to Fig. 6 and Fig. 7). Details are specifically as
follows:
Step 1: Splice the word vector matrix of each character in the sentence and the
disambiguation matrix of the corresponding character, to obtain the word vector
matrix of the sentence as an input layer (a first layer of the model), and input the word
vector matrix into the Bi-LSTM for training.
Step 2: Set a dropout regularization method, to preventing model overfitting.
During a training process of dropout, some input is randomly discarded. In this case, a
parameter corresponding to the discarded part is not updated. Equivalently, dropout is
an integration method, in which results of all sub-networks are combined, and various
sub-networks may be obtained by randomly discarding input.
Step 3: Use a sentence sequence (x 1 ,x 2 , ---xn) of the input layer as input of
time steps of the Bi-LSTM, where xj indicates an it character in the sentence; and
then splice a forward LSTM hidden output sequence 2,... fn) and a backward (ff
LSTM hidden input sequence (bi, b 2 , ... bn) based on positions, to obtain a complete
hidden output sequence (fif2 ,.. , bi,b2 ,... b,), where semantic description
information above and below is fully considered to achieve deep learning and
representation of features;
Step 4: After dropout is set, connect a linear layer, to convert the complete hidden
output sequence from 2n dimensions to k dimensions, where n indicates a
quantity of characters in the sentence, and k is a quantity of tag categories in an
annotation set, there are four categories of tags in total in the annotated corpus: B, I, E,
and 0 (B indicates a beginning character of a place name, I indicates a middle character of the place name, E indicates an end character of the place name, and 0 indicates a non-place-name character), the complete hidden output sequence is recorded as a matrix p"xk, so that features of the sentence are automatically extracted.
Step 5: Based on an output layer matrix of a Bi-LSTM model, set dropout to
prevent model overfitting; and input the output layer matrix into a CRF model for
sentence sequence annotation, that is, predict a tag for each character.
If a tag sequence y = (Y , Y2, 1 -- -, yn) whose length is equal to a sentence length
is recorded, a model scores a sentence x whose tag is equal to y as follows:
n n+1
s(x,y) Piy+ .AAy
where Py, is a probability of outputting yj at an ith position, that is, an initial
score; AYLY is a probability of performing transition from yi_1 to yj, that is, a
conversion score; a score of the entire sequence is equal to a sum of scores at various
positions, and a score at each position is obtained based on two parts: one part is
determined by Pjy, output by LSTM, and the other part is determined by a transition
matrix A of CRF; a normalized probability obtained by using Softmax is as
follows:
exp(s (x,y))
where a numerator indicates an index value for performing scoring by the model
on the sentence x whose tag is equal to y, and a denominator indicates an index sum
for performing scoring by the model on all sentences whose tags are equal to
corresponding y; according to the obtained normalized probability, the sentences are
sorted to identify a place name.
Step 6: Select the optimal place name identification model by using the three
evaluation indicators of the natural language processing field: the precision P, the
recall rate R, and the comprehensive value F.
Step 7: Human-machine interactive Chinese place name annotation.
First, an interactive Chinese place name annotation platform is developed through
Python GUI programming (Tkinter), the Chinese place name identification model in
step 6 is embedded into the interactive Chinese place name annotation platform, and
place name identification is performed on an Internet text. Then human-machine
interactive correction is performed on a Chinese place name identification result.
Finally, a place name finally identified in the Internet text, an added place name tag,
and a deleted place name tag that is wrongly tagged are all visually displayed in a
corresponding window.
Step 8: Update the place name annotated corpus.
When the annotated Chinese place name text linguistic data in step 7 reaches a
quantity of characters (a threshold), the interactive place name annotation platform
automatically merges the initial training linguistic data with text linguistic data with a
place name annotated, to update the place name annotated corpus.
Step 9: Iteratively optimize the Chinese place name identification model.
The training code and the model parameter of the place name identification
model in step 6 are uploaded to a local server or the cloud Google Colaboratory, and
training is continued by using, as training linguistic data, the place name linguistic
data generated in step 8, to optimize the parameter of the model, and improve a model
identification effect. A model training progress, final precision, the recall rate, and the
value F are displayed on the interactive annotation platform.
Step 10: Intelligently update the place name annotated corpus.
Iterative looping from step 2 to step 9 is performed, to intelligently optimize the
annotated corpus, and iterative training and learning are ended until the place name
identification effect and the scale of the place name corpus meet a user requirement.
Main parts of the solutions of the embodiments of the present invention are
further described below with reference to specific experimental examples.
Part 1: A Chinese place name identification method in which Bi-LSTM and CRF
are integrated.
Corpus data in this method separately uses the geographic encyclopedia linguistic
data, the Microsoft MSRA linguistic data, linguistic data obtained by mixing the
geographic encyclopedia and the Microsoft MSRA (referred to as mixed linguistic
data below).
The geographic encyclopedia linguistic data has about 1.18 million characters,
among which a character quantity in a training set accounts for about 82%, a character
quantity in a verification set accounts for about 5%, and a character quantity in a test
set accounts for about 13%. The geographic encyclopedia linguistic data is thematic
linguistic data of place names. Place name entities are in a large quantity and evenly
distributed in a text, and a description text contains rich geographic semantic
relations.
The Microsoft MSRA linguistic data has about 2.36 million characters, among
which a character quantity in a training set accounts for about 85%, a character
quantity in a verification set accounts for about 7%, and a character quantity in a test
set accounts for about 8%. Place name entities in the Microsoft MSRA linguistic data
are in a relatively small quantity in a text and are sparsely and unevenly distributed.
The mixed linguistic data has about 3.57 million characters, among which a
character quantity in a training set accounts for about 85%, a character quantity in a
verification set accounts for about 6%, and a character quantity in a test set accounts
for about 9%. Place name entities in the mixed linguistic data are in an intermediate
quantity in a text, and are relatively evenly distributed.
In this example, 7 groups of experiments (see Table 1) are set for comparison, to
evaluate an effect of this method.
Table 1 Settings of place name identification experiments
Experiment name Experiment content
Use the geographic encyclopedia linguistic data and a Experiment 1 CRF-based method
Experiment 2 Use the Microsoft linguistic data and a CRF-based method
Experiment 3 Use the mixed linguistic data and a CRF-based method
The geographic encyclopedia linguistic data generates a random
Experiment 4 word vector matrix as an input
layer+dropout+Bi-LSTM+dropout+CRF
Geographic encyclopedia linguistic
Experiment5 data+disambiguation+pre-trained word
vector+dropout+Bi-LSTM+dropout+CRF
Microsoft linguistic data+disambiguation+pre-trained word
Experiment 6 vector
+dropout+Bi-LSTM+dropout+CRF
Linguistic data obtained by mixing the geographic encyclopedia
Experiment 7 corpus and the microsoft corpus+disambiguation+pre-trained
word vector+dropout+Bi-LSTM+dropout+CRF
(1) Experiments 1, 2, and 3
The experiments 1, 2, and 3 use a Chinese place name identification method of
different linguistic data based on the traditional statistical model CRF. A same feature
template (Fig. 8) is used, and the different linguistic data are trained to obtain
corresponding CRF models. Model evaluation results are shown in Table 2.
Table 2 Place name evaluation results of the experiments 1, 2, and 3
Comprehensive Experiment name Precision P(%) Recall rate R (%) value F (%)
Experiment 1 89.82 88.61 89.21
Experiment 2 89.81 79.18 84.16
Experiment 3 88.24 83.94 86.04
(2) Experiment 4
First, deduplication and stop word deletion are performed on a geographic
encyclopedia data set, and a word vector matrix corresponding to each character in the
data set is randomly generated by using a tool Word2vec. The word vector matrix is then input into Bi-LSTM+CRF for training to obtain a model. Settings of training parameters of the Bi-LSTM model are shown in Table 3, and evaluation results are shown in Table 4.
Table 3 Settings of the training parameters of the Bi-LSTM model
Parameter Value
Learning rate 0.001
Dropout 0.5
Maximum gradient 5
Quantity of model iteration times 100
Tag category Four categories (BIEO)
Table 4 Place name identification and evaluation result of the experiment 4
Experiment Precision P Recall rate R Comprehensive
name (%) (%) value F (%)
Experi 80.73 84.44 82.54 ment 4
(3) Experiments 5, 6, and 7
The experiments 5, 6, and 7 use a place name identification method integrated
based on different linguistic data and a same "bidirectional long short-term memory
model and CRF model". Therefore, experiment steps are the same.
First, the geographic encyclopedia linguistic data is mixed with the Microsoft
linguistic data, deduplication and stop word deletion are performed, training is
performed by using the tool Word2vec, to obtain a character-level word vector model,
and each character in the place name annotated corpus is represented by using the
word vector model, to generate a word vector matrix of each character. Then, word
segmentation and part-of-speech annotation are performed on a sentence by using the
tool Jieba, to generate a disambiguation matrix of the character, and the
disambiguation matrix and the word vector matrix of each character in the sentence
are spliced and input to the Bi-LSTM model for training. In addition, 100 model
results are evaluated and compared to obtain an optimal model (as shown in Fig. 9).
The evaluation results are shown in Table 5.
Table 5 Place name identification and evaluation results of the experiments 5, 6, and 7
mprehensive Experiment name Precision P(%) Recall rate R (%) value F (%)
Experiment 5 95.09 93.17 94.12
Experiment 6 92.86 89.91 91.36
Experiment 7 90.87 89.53 90.65
Based on a same corpus, compared with the traditional CRF-based Chinese place
name identification method, precision, a recall rate, and a comprehensive value in this
method are all increased (see Table 6).
Table 6 Comparison of place name identification and evaluation results of same
linguistic data and different identification models
Linguistic Variation (%) Variation (%) Variation (%) Experiment data of the value P of the value R of the value F
Experiment 5 VS Geographic 5.27 4.56 4.91 experiment 1 encyclopedia
Microsoft Experiment 6 VS linguistic 3.05 10.73 7.2 experiment 2 data
Mixed Experiment 7 VS linguistic 2.63 5.59 4.61 experiment 3 data
Part 2: A method for intelligent construction of a place name corpus based on
interactive and iterative learning
Step 1: First, develop an interactive Chinese place name annotation platform (see
Fig. 10) through Python GUI programming (Tkinter), and embed, into the interactive
Chinese place name annotation platform, the Chinese place name identification model
in which Bi-LSTM and CRF are integrated; and when a button "place name
identification" is clicked, perform place name entity identification on an input Internet text, and automatically attach a place name tag to a place name (see Fig. 11).
Step 2: Manually perform interactive correction on a Chinese place name
identification result: for a place name that is not identified, right-clicking and
selecting a function "set as a place name", and adding a place name tag to a place
name that is not tagged, and for a place name that is wrongly identified, right-clicking
and selecting a function "cancel setting" on a wrongly tagged place name tag, to
delete the corresponding place name tag.
Step 3: Visually display, in a corresponding window, a place name finally
identified in the Internet text, an added place name tag, and a deleted place name tag
that is wrongly tagged (see Fig. 12).
Step 4: Save the foregoing final tagging result by clicking a button "save a place
name annotation result"; when an accumulated quantity of saved characters in Internet
texts annotated with place names is greater than a threshold (in the present invention,
the quantity is set to 100,000 characters), the platform automatically merges an initial
training corpus with a text corpus in which place names are tagged, and inputs the
initial training linguistic data and the text linguistic data into the Chinese place name
identification model in which Bi-LSTM and CRF are integrated in the part 1 for
retraining, thereby optimizing a parameter of the model, and improving a model
identification effect; and display a model training progress, final precision, a recall
rate, and the value F on an interface (see Fig. 13).
Step 5: Add the foregoing new linguistic data to the place name annotated corpus,
perform iterative looping from step 1 to step 4, and end iterative training and learning
until a place name identification effect and a scale of the place name corpus meet a
user requirement.
CLAIMES:
1. A method for intelligent construction of a place name annotated corpus based on interactive and iterative learning, comprising the following steps: step 1: reading an initial place name annotated corpus data, comprising geographic encyclopedia linguistic data and Microsoft MSRA linguistic data; step 2: preprocessing the place name annotated corpus data, comprising segmenting sentences by using a blank line, deduplicating a sentence, and deleting a stop word; step 3: mixing the geographic encyclopedia linguistic data and the Microsoft MSRA linguistic data, and performing training by using a tool Word2vec, to obtain a character-level word vector model; step 4: representing each character in the place name annotated corpus by using the word vector model, to generate a word vector matrixX 100of each character; step 5: performing word segmentation and part-of-speech annotation on a sentence by using a tool Jieba, and generating, as a disambiguation matrix of the character, a vector matrix1 x20 of each character in the sentence based on a word segmentation result; step 6: splicing the word vector matrix of each character in the sentence and the disambiguation matrix of the corresponding character, to finally obtain a word vector matrix of the sentence; inputting, for training, the word vector matrix into a place name identification model in which Bi-LSTM and CRF are integrated; and selecting an optimal place name identification model by using three evaluation indicators of a natural language processing field: precision P, a recall rate R, and a comprehensive value F; step 7: developing an interactive Chinese place name annotation platform, and embedding the place name identification model in step 6 into the interactive Chinese place name annotation platform; step 8: performing place name identification on a new Internet text on the interactive place name annotation platform, and performing human-machine

Claims (5)

  1. interactive correction on a place name identification result; and visually displaying, in
    a corresponding window, a place name finally identified in the Internet text, an added
    place name tag, and a deleted place name tag that is wrongly tagged;
    step 9: when a scale of annotated place name text linguistic data in step 8 reaches
    a specified threshold, automatically merging, by the interactive place name annotation
    platform, initial place name annotation linguistic data with place name linguistic data
    on which human-machine interactive correction is performed, to update the place
    name corpus;
    step 10: continuing training training code and a model parameter of the place
    name identification model in step 6 by using, as training linguistic data, the place
    name linguistic data generated in step 9, to optimize the parameter of the model, and
    improve a model identification effect; and displaying a model training progress, final
    precision, the recall rate, and the value F on the interactive annotation platform; and
    step 11: performing iterative looping from step 2 to step 10 for the new Internet
    text, to intelligently update and optimize the place name annotated corpus, and ending
    iterative training and learning until the place name identification effect and the scale
    of the place name annotated corpus meet a user requirement.
  2. 2. The method for intelligent construction of the place name annotated corpus
    based on interactive and iterative learning according to claim 1, wherein step 6
    specifically comprises:
    step 1: splicing the word vector matrix of each character in the sentence and the
    disambiguation matrix of the corresponding character, to obtain the word vector
    matrix of the sentence as an input layer, and inputting the word vector matrix into the
    Bi-LSTM for training;
    step 2: setting a dropout regularization method, to preventing model overfitting;
    step 3: using a sentence sequence (xIx2 , -- xn) of the input layer as input of
    time steps of the Bi-LSTM, wherein n indicates a quantity of character s in a
    sentence, and xj indicates an ith character in the sentence; and then splicing a forward LSTM hidden output sequence (fif2,... fn)and a backward LSTM hidden input sequence (b 1 , b 2 ,... b,) based on positions, to obtain a complete hidden output sequence (fi,f 2, -- fn, bi, b 2 , ... b,), wherein semantic description information above and below is fully considered to achieve deep learning and representation of features; step 4: after dropout is set, connecting a linear layer, to convert the complete hidden output sequence from 2n dimensions to k dimensions, wherein the nk complete hidden output sequence is denoted as a matrix P" , wherein k is a quantity of tag categories in an annotation set, including four categories of tags: B, I, E, and 0,
    B indicates a beginning character of a place name, I indicates a middle character of
    the place name, E indicates an end character of the place name, and 0 indicates a
    non-place-name character, so that features of the sentence are automatically extracted;
    step 5: based on an output layer matrix of a Bi-LSTM model in step 4, setting
    dropout to prevent model overfitting, inputting the output layer matrix into a CRF
    model for sentence sequence annotation, that is, predicting a tag for each character;
    and
    step 6: selecting the optimal place name identification model by using the three
    evaluation indicators of the natural language processing field: the precision P, the
    recall rate R, and the comprehensive value F.
  3. 3. The method for intelligent construction of the place name annotated corpus
    based on interactive and iterative learning according to claim 2, wherein performing
    sentence sequence annotation based on the CRF model in step 5 is specifically as
    follows:
    for a tag sequence y = (yi,y2,...,yn) whose length is equal to a sentence
    length, a model scores a sentence x whose tag is equal to y as follows:
    n n+1
    s(x,y) = P y + Ay _ wherein Pjy, is a probability of outputting yj at an ith position, Ay _IYs a probability of performing transition from y -1 to yj, a score of the entire sequence is equal to a sum of scores at various positions, and a score at each position is obtained based on two parts: one part is determined by Pjy, output by LSTM, and the other part is determined by a transition matrix A of CRF; and a normalized probability obtained by using Softmax is as follows:
    PGlx) exp(s(x,y)) Zy' exp (s (x, y'))
    wherein a numerator indicates an index value for performing scoring by the
    model on the sentence x whose tag is equal to y, and a denominator indicates an index
    sum for performing scoring by the model on all sentences whose tags are equal to
    corresponding y; according to the obtained normalized probability, the sentences are
    sorted to identify a place name.
  4. 4. The method for intelligent construction of the place name annotated corpus
    based on interactive and iterative learning according to claim 1, wherein the
    interactive Chinese place name annotation platform is implemented by using the
    Python GUI programming Tkinter.
  5. 5. The method for intelligent construction of the place name annotated corpus
    based on interactive and iterative learning according to claim 1, wherein the model is
    optimized on a local server or by uploading the training code and the model parameter
    of the place name identification model to the cloud Google Colaboratory in step 10.
    1 / 10 Fig. 1
    2 / 10 Fig. 3 Fig. 2
    3 / 10 Fig. 5 Fig. 4
    4 / 10 Fig. 6
    / 10 Fig. 8 Fig. 7
    6 / 10 Fig. 9
    7 / 10 Fig. 10
    8 / 10 Fig. 11
    9 / 10 Fig. 12
    / 10 Fig. 13
AU2020103654A 2019-10-28 2020-04-21 Method for intelligent construction of place name annotated corpus based on interactive and iterative learning Ceased AU2020103654A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911029958.2 2019-10-28
CN201911029958.2A CN110826331B (en) 2019-10-28 2019-10-28 Intelligent construction method of place name labeling corpus based on interactive and iterative learning

Publications (1)

Publication Number Publication Date
AU2020103654A4 true AU2020103654A4 (en) 2021-01-14

Family

ID=69550890

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020103654A Ceased AU2020103654A4 (en) 2019-10-28 2020-04-21 Method for intelligent construction of place name annotated corpus based on interactive and iterative learning

Country Status (3)

Country Link
CN (1) CN110826331B (en)
AU (1) AU2020103654A4 (en)
WO (1) WO2021082366A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407439A (en) * 2021-05-24 2021-09-17 西北工业大学 Detection method for software self-recognition type technical debt
CN113657103A (en) * 2021-08-18 2021-11-16 哈尔滨工业大学 Non-standard Chinese express mail information identification method and system based on NER
CN113722530A (en) * 2021-09-08 2021-11-30 云南大学 Fine-grained geographical position positioning method
CN114169330A (en) * 2021-11-24 2022-03-11 匀熵教育科技(无锡)有限公司 Chinese named entity identification method fusing time sequence convolution and Transformer encoder
CN114943230A (en) * 2022-04-17 2022-08-26 西北工业大学 Chinese specific field entity linking method fusing common knowledge
CN117436449A (en) * 2023-11-01 2024-01-23 哈尔滨工业大学 Crowd-sourced named entity recognition model and system based on multi-source domain adaptation and reinforcement learning
CN117669574A (en) * 2024-02-01 2024-03-08 浙江大学 Artificial intelligence field entity identification method and system based on multi-semantic feature fusion
CN117669574B (en) * 2024-02-01 2024-05-17 浙江大学 Artificial intelligence field entity identification method and system based on multi-semantic feature fusion

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826331B (en) * 2019-10-28 2023-04-18 南京师范大学 Intelligent construction method of place name labeling corpus based on interactive and iterative learning
CN111522914B (en) * 2020-04-20 2023-05-12 北大方正集团有限公司 Labeling data acquisition method and device, electronic equipment and storage medium
CN112711621A (en) * 2021-01-18 2021-04-27 湛江市前程网络有限公司 Universal object interconnection training platform and control method and device
US11769015B2 (en) 2021-04-01 2023-09-26 International Business Machines Corporation User interface disambiguation
CN113190678B (en) * 2021-05-08 2023-10-31 陕西师范大学 Chinese dialect language classification system based on parameter sparse sharing
CN113221575B (en) * 2021-05-28 2022-08-02 北京理工大学 PU reinforcement learning remote supervision named entity identification method
CN113486173B (en) * 2021-06-11 2023-09-12 南京邮电大学 Text labeling neural network model and labeling method thereof
CN113255328B (en) * 2021-06-28 2024-02-02 北京京东方技术开发有限公司 Training method and application method of language model
CN113486127A (en) * 2021-07-23 2021-10-08 上海明略人工智能(集团)有限公司 Knowledge alignment method, system, electronic device and medium
CN113610993B (en) * 2021-08-05 2022-05-17 南京师范大学 3D map building object annotation method based on candidate label evaluation
CN113642336B (en) * 2021-08-27 2024-03-08 青岛全掌柜科技有限公司 SaaS-based insurance automatic question-answering method and system
CN113901826A (en) * 2021-12-08 2022-01-07 中国电子科技集团公司第二十八研究所 Military news entity identification method based on serial mixed model
CN114818717A (en) * 2022-05-25 2022-07-29 华侨大学 Chinese named entity recognition method and system fusing vocabulary and syntax information
CN117435746B (en) * 2023-12-18 2024-02-27 广东信聚丰科技股份有限公司 Knowledge point labeling method and system based on natural language processing

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7069216B2 (en) * 2000-09-29 2006-06-27 Nuance Communications, Inc. Corpus-based prosody translation system
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN107102989B (en) * 2017-05-24 2020-09-29 南京大学 Entity disambiguation method based on word vector and convolutional neural network
CN107861939B (en) * 2017-09-30 2021-05-14 昆明理工大学 Domain entity disambiguation method fusing word vector and topic model
CN108446269B (en) * 2018-03-05 2021-11-23 昆明理工大学 Word sense disambiguation method and device based on word vector
CN109359291A (en) * 2018-08-28 2019-02-19 昆明理工大学 A kind of name entity recognition method
CN109885824B (en) * 2019-01-04 2024-02-20 北京捷通华声科技股份有限公司 Hierarchical Chinese named entity recognition method, hierarchical Chinese named entity recognition device and readable storage medium
CN110134956A (en) * 2019-05-14 2019-08-16 南京邮电大学 Place name tissue name recognition method based on BLSTM-CRF
CN110287482B (en) * 2019-05-29 2022-07-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Semi-automatic participle corpus labeling training device
CN110826331B (en) * 2019-10-28 2023-04-18 南京师范大学 Intelligent construction method of place name labeling corpus based on interactive and iterative learning

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407439A (en) * 2021-05-24 2021-09-17 西北工业大学 Detection method for software self-recognition type technical debt
CN113407439B (en) * 2021-05-24 2024-02-27 西北工业大学 Detection method for software self-recognition type technical liabilities
CN113657103A (en) * 2021-08-18 2021-11-16 哈尔滨工业大学 Non-standard Chinese express mail information identification method and system based on NER
CN113722530A (en) * 2021-09-08 2021-11-30 云南大学 Fine-grained geographical position positioning method
CN113722530B (en) * 2021-09-08 2023-10-24 云南大学 Fine granularity geographic position positioning method
CN114169330A (en) * 2021-11-24 2022-03-11 匀熵教育科技(无锡)有限公司 Chinese named entity identification method fusing time sequence convolution and Transformer encoder
CN114943230A (en) * 2022-04-17 2022-08-26 西北工业大学 Chinese specific field entity linking method fusing common knowledge
CN114943230B (en) * 2022-04-17 2024-02-20 西北工业大学 Method for linking entities in Chinese specific field by fusing common sense knowledge
CN117436449A (en) * 2023-11-01 2024-01-23 哈尔滨工业大学 Crowd-sourced named entity recognition model and system based on multi-source domain adaptation and reinforcement learning
CN117669574A (en) * 2024-02-01 2024-03-08 浙江大学 Artificial intelligence field entity identification method and system based on multi-semantic feature fusion
CN117669574B (en) * 2024-02-01 2024-05-17 浙江大学 Artificial intelligence field entity identification method and system based on multi-semantic feature fusion

Also Published As

Publication number Publication date
WO2021082366A1 (en) 2021-05-06
CN110826331A (en) 2020-02-21
CN110826331B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
AU2020103654A4 (en) Method for intelligent construction of place name annotated corpus based on interactive and iterative learning
Chang et al. Chinese named entity recognition method based on BERT
CN109271529B (en) Method for constructing bilingual knowledge graph of Xilier Mongolian and traditional Mongolian
CN110502644B (en) Active learning method for field level dictionary mining construction
CN114036933B (en) Information extraction method based on legal documents
CN110489523B (en) Fine-grained emotion analysis method based on online shopping evaluation
CN114065758A (en) Document keyword extraction method based on hypergraph random walk
CN111651572A (en) Multi-domain task type dialogue system, method and terminal
Li et al. Integrating language model and reading control gate in BLSTM-CRF for biomedical named entity recognition
CN115859980A (en) Semi-supervised named entity identification method, system and electronic equipment
Xi et al. Global encoding for long Chinese text summarization
Wei et al. GP-GCN: Global features of orthogonal projection and local dependency fused graph convolutional networks for aspect-level sentiment classification
CN111178080A (en) Named entity identification method and system based on structured information
CN112836062B (en) Relation extraction method of text corpus
Xue et al. A method of chinese tourism named entity recognition based on bblc model
Zhou et al. Named entity recognition of ancient poems based on Albert-BiLSTM-MHA-CRF model
CN112257442A (en) Policy document information extraction method based on corpus expansion neural network
Liu et al. The extension of domain ontology based on text clustering
CN113779987A (en) Event co-reference disambiguation method and system based on self-attention enhanced semantics
CN113869054A (en) Deep learning-based electric power field project feature identification method
Kan et al. Grid structure attention for natural language interface to bash commands
Shi et al. Improve on Entity Recognition Method Based on BiLSTM-CRF Model for the Nuclear Technology Knowledge Graph
Qiao et al. A Survey of Deep learning-based Image caption
Wang et al. A text classification model for hypergraph convolutional neural networks with multi-feature fusion
Zhu et al. Image based agorithm for automatic generation of chinese couplets

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry