CN113033206A - Bridge detection field text entity identification method based on machine reading understanding - Google Patents

Bridge detection field text entity identification method based on machine reading understanding Download PDF

Info

Publication number
CN113033206A
CN113033206A CN202110357215.9A CN202110357215A CN113033206A CN 113033206 A CN113033206 A CN 113033206A CN 202110357215 A CN202110357215 A CN 202110357215A CN 113033206 A CN113033206 A CN 113033206A
Authority
CN
China
Prior art keywords
embedding
character
text
word
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110357215.9A
Other languages
Chinese (zh)
Other versions
CN113033206B (en
Inventor
李韧
莫天金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Jiaotong University
Original Assignee
Chongqing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Jiaotong University filed Critical Chongqing Jiaotong University
Priority to CN202110357215.9A priority Critical patent/CN113033206B/en
Publication of CN113033206A publication Critical patent/CN113033206A/en
Application granted granted Critical
Publication of CN113033206B publication Critical patent/CN113033206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a method for recognizing text entities in the field of bridge detection based on machine reading understanding, which comprises the following steps: s1, acquiring a question text and a target text; s2, extracting character embedding, binary character embedding and weighted word embedding from the question text and the target text; s3, embedding characters, embedding binary characters and embedding and splicing weighted words to obtain joint feature expression; and S4, inputting the combined feature expression into a neural network to complete entity identification. Because the character Embedding only extracts the characteristics at the level of the context character, in order to extract the characteristics with richer semanteme, the invention pertinently introduces the external dictionary information to enhance the characteristic expression of model input, namely introduces a binary Word Embedding (Bigram Embedding) unit and a Weighted Word Embedding (Weighted Word Embedding) unit trained by large-scale corpus, thereby leading the effect of entity identification to be better.

Description

Bridge detection field text entity identification method based on machine reading understanding
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for recognizing text entities in the field of bridge detection based on machine reading understanding.
Background
For years, natural language processing has been an important research direction in the field of artificial intelligence, one of core research tasks, and a great deal of research is carried out under the development of machine learning and deep learning, so that the development is great. However, the application research of the intelligent decision-making in the field of bridge health management and maintenance based on the natural language processing technology is rarely developed, and the bridge is influenced by the traffic load, the environmental excitation, the emergency, the property degradation of the bridge structure material and other internal and external factors in the long-term operation process, so that various diseases of the structure parts are inevitable. Meanwhile, an operation-period bridge health management business system consisting of daily inspection maintenance, frequent detection, regular/special detection, load test, maintenance reinforcement, structural health monitoring and the like is formed in the bridge industry at present, massive bridge health management historical data are accumulated, and the characteristics of obvious data multi-source isomerism, high-speed increase of data volume and the like are presented. However, various health management and maintenance information is still stored in a relational database in a document link mode, when services such as bridge structure state evaluation or management and maintenance decision support are carried out, related documents are still mainly consulted manually, and massive fine-grained structures and disease information are scattered in unstructured texts and need to be identified and extracted. Therefore, based on the application of natural language processing technology, the problem of intelligent aid decision support in the field of bridge management and maintenance still needs to be further solved.
At present, with the rapid development of deep learning, a deep neural network model based on end-to-end becomes mature day by day and becomes a main method and trend of natural language processing problems, and the problems that the performance of a traditional machine learning model depends on feature engineering seriously, the context feature representation capability is insufficient and the like are solved. The named entity recognition is always the fundamental important content of research in the field of natural language processing, and the essence of the named entity recognition is to extract predefined valuable information from semi-structured text or unstructured text information, store the information in a semi-structured mode, and support intelligent applications such as knowledge maps, automatic question answering and the like. Aiming at a bridge detection text named entity recognition task, only a named entity recognition method based on a bridge Onto ontology and semi-supervised CRF (conditional Random fields) for structural state and maintenance activities is provided at present.
Therefore, the problem of how to effectively utilize prior information and the problem of the nested entities is not considered in related research, and the outer layer of the named entity and the nested identification suitable for the description characteristics of the Chinese bridge detection report still need to be further researched.
Disclosure of Invention
Aiming at the defects in the prior art, the invention discloses a method for recognizing text entities in the bridge detection field based on machine reading understanding, which purposefully introduces external dictionary information to enhance the feature expression of model input on the basis of character Embedding, namely introduces a binary Word Embedding (Bigram Embedding) unit and a Weighted Word Embedding (Weighted Word Embedding) unit trained by large-scale linguistic data, thereby ensuring better entity recognition effect.
In order to solve the technical problems, the invention adopts the following technical scheme:
a bridge detection field text entity identification method based on machine reading understanding comprises the following steps:
s1, acquiring a question text and a target text;
s2, extracting character embedding, binary character embedding and weighted word embedding from the question text and the target text;
s3, embedding characters, embedding binary characters and embedding and splicing weighted words to obtain joint feature expression;
and S4, inputting the combined feature expression into a neural network to complete entity identification.
Preferably, the method for extracting character embedding in step S2 includes:
serializing the question text into Q ═ Q1,q2,...,qm],qiRepresenting ith character in question text, and representing target text in a serialized mode of C ═ C1,c2,...,c],ciRepresenting the ith character in the data text;
q and C are connected in series to form X ═ X1,x2,...,xl],xiBelongs to Q ═ C and l ═ m + n;
operation of searching character embedding table is carried out to obtain vector matrix capable of being input into BERT model
Figure BDA0003003875280000021
Ith element of E
Figure BDA0003003875280000022
wc(xi) Representing a character xiEmbedding characters in table wcThe vector representation of (1); d represents the dimension of each character vector in the character embedding table;
the vector matrix E is subjected to character embedding, wherein the ith character in the character embedding is
Figure BDA0003003875280000023
wbertA character embedding table representing the BERT model.
Preferably, the method for extracting the weighted word embedding in step S2 includes:
the four sets of B, M, E, S are constructed as follows
Figure BDA0003003875280000024
Figure BDA0003003875280000025
Figure BDA0003003875280000026
Figure BDA0003003875280000027
Wherein D represents an external dictionary, wi,kRepresenting a subsequence [ X ] in an input sequence Xi,xi+1,...,xk],B(xi) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe start character of (a); m (x)i) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe middle character of (1); e (x)i) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe end character of (1); s (x)i) Representing the current character matched in the external dictionary D, and if the four sets have the condition of empty matching, filling by using a word NONE;
constructing weighted word embeddings as follows
Figure BDA0003003875280000031
In the formula (I), the compound is shown in the specification,
Figure BDA0003003875280000032
indicating the ith character, v, in the embedding of the weighted words(B)、vs(M)、vs(E)、vs(S) represents B, M, E, S corresponding weighted representations, respectively.
Preferably, the weighted representation v of the set of words L is calculated ass(D)
Figure BDA0003003875280000033
Wherein z (w) represents the frequency of appearance of the word w in the external dictionary D, ωword(w) a word-embedding representation of the vocabulary w found in the word-embedding table, Z representing a set of word frequencies, Z ═ Σw∈B∪M∪E∪Sz(w)。
Preferably, step S4 includes:
inputting the combined feature expression into a neural network to extract feature information;
and predicting the character probability and the entity span of the characteristic information to complete entity identification.
Preferably, the neural network is BilSTM.
In summary, compared with the prior art, the invention has the following technical effects:
(1) because the character Embedding only extracts the characteristics at the level of the context character, in order to extract the characteristics with richer semanteme, the invention pertinently introduces the external dictionary information to enhance the characteristic expression of model input, namely introduces a binary Word Embedding (Bigram Embedding) unit and a Weighted Word Embedding (Weighted Word Embedding) unit trained by large-scale corpus, thereby leading the effect of entity identification to be better.
(2) Because the BERT pre-training model only supports character-level input for Chinese, the obtained initial vector matrix E is used as the input of the BERT pre-training model, the BERT model can perform position coding on the initial vector matrix E and simultaneously extract more accurate context-related semantic information by adopting a multi-head attention mechanism, and the character-level vector representation output by training and adjusting the BERT pre-training model is more suitable for the context of the bridge detection report text through tiny adjustment.
(3) The method for generating the weighted word embedding not only can well introduce the word embedding in the dictionary, but also has no loss of context information because the matching result can be accurately recovered from the four character sets.
(4) The weighting algorithm does not adopt a dynamic weighting algorithm such as an attention mechanism, but adopts a static value of the occurrence frequency of the vocabularies, because the occurrence frequency of the vocabularies can be obtained by statistics in advance, and the method can greatly accelerate the speed of calculating the weight of each vocabulary.
(5) In the invention, the probability of the character as the start index and the end index and the probability of the whole entity span are calculated, and the method has the advantages that the outgoing entity and the nested entity contained in the text can be decoded simultaneously, and the final answer is obtained.
(6) The invention adopts a BilSTM (Bidirectional Long Short-Term Memory) model for coding and extracts richer context Bidirectional characteristic information.
Drawings
FIG. 1 is a flowchart of an embodiment of a bridge inspection field text entity recognition method based on machine reading understanding disclosed in the present invention;
FIG. 2 is an overall architecture diagram of another embodiment of a bridge inspection field text entity recognition method based on machine reading understanding;
fig. 3 is a diagram illustrating an example of a weighted word embedding unit.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1 and 2, the invention discloses a method for recognizing a text entity in a bridge detection field based on machine reading understanding, which comprises the following steps:
s1, acquiring a question text and a target text;
s2, extracting character embedding, binary character embedding and weighted word embedding from the question text and the target text;
s3, embedding characters, embedding binary characters and embedding and splicing weighted words to obtain joint feature expression;
and S4, inputting the combined feature expression into a neural network to complete entity identification.
The problem text in the present invention includes a priori information. Because the character Embedding only extracts the characteristics at the level of the context character, in order to extract the characteristics with richer semanteme, the invention pertinently introduces the external dictionary information to enhance the characteristic expression of model input, namely introduces a binary Word Embedding (Bigram Embedding) unit and a Weighted Word Embedding (Weighted Word Embedding) unit trained by large-scale corpus, thereby leading the effect of entity identification to be better.
In specific implementation, the method for extracting and embedding characters in step S2 includes:
serializing the question text into Q ═ Q1,q2,...,qm],qiRepresenting ith character in question text, and representing target text in a serialized mode of C ═ C1,c2,...,c],ciRepresenting dataThe ith character in the text;
q and C are connected in series to form X ═ X1,x2,...,xl],xiBelongs to Q ═ C and l ═ m + n;
operation of searching character embedding table is carried out to obtain vector matrix capable of being input into BERT model
Figure BDA0003003875280000041
Ith element of E
Figure BDA0003003875280000042
wc(xi) Representing a character xiEmbedding characters in table wcThe vector representation of (1); d represents the dimension of each character vector in the character embedding table;
the vector matrix E is subjected to character embedding, wherein the ith character in the character embedding is
Figure BDA0003003875280000051
wbertA character embedding table representing the BERT model.
Because the BERT pre-training model only supports character-level input for Chinese, the obtained initial vector matrix E is used as the input of the BERT pre-training model, the BERT model can perform position coding on the initial vector matrix E and simultaneously extract more accurate context-related semantic information by adopting a multi-head attention mechanism, and the character-level vector representation output by training and adjusting the BERT pre-training model is more suitable for the context of the bridge detection report text through tiny adjustment.
The introduction of binary word embedding well copes with the problem of characterization of different entities composed of the same character. In the field of bridge detection, there are a large number of entity expressions composed of two characters, such as "bridge pier", "bridge abutment", "abutment cap", "abutment body", etc., and it is easy to find that most of the entity composed of two characters usually contains the same character, such as "bridge", "pier", "abutment", etc., although the same character, the semantic information expressed in different entities is different. Thus, the input character is converted into binaryWord-embedded expressions to enhance the semantic expression of input data on both entities and non-entities, where wbRepresenting a binary word embedding table. The ith character of the binary word is
Figure BDA0003003875280000052
As shown in fig. 3, in specific implementation, the method for extracting weighted word embedding in step S2 includes:
the four sets of B, M, E, S are constructed as follows
Figure BDA0003003875280000053
Figure BDA0003003875280000054
Figure BDA0003003875280000055
Figure BDA0003003875280000056
Wherein D represents an external dictionary, wi,kRepresenting a subsequence [ X ] in an input sequence Xi,xi+1,...,xk],B(xi) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe start character of (a); m (x)i) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe middle character of (1); e (x)i) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe end character of (1); s (x)i) Representing the current character matched in the external dictionary D, and if the four sets have the condition of empty matching, filling by using a word NONE;
constructing weighted word embeddings as follows
Figure BDA0003003875280000057
In the formula (I), the compound is shown in the specification,
Figure BDA0003003875280000058
indicating the ith character, v, in the embedding of the weighted words(B)、vs(M)、vs(E)、vs(S) represents B, M, E, S corresponding weighted representations, respectively.
The method for generating the weighted word embedding not only can well introduce the word embedding in the dictionary, but also has no loss of context information because the matching result can be accurately recovered from the four character sets.
In specific implementation, the weighted expression v of the word set L is calculated according to the following formulas(L)
Figure BDA0003003875280000061
Wherein z (w) represents the frequency of appearance of the word w in the external dictionary D, ωword(w) a word-embedding representation of the vocabulary w found in the word-embedding table, Z representing a set of word frequencies, Z ═ Σw∈B∪M∪E∪Sz(w)。
L is regenerated according to the old dictionary, and the biggest difference between L and the old dictionary is that the embedding of words is weighted by the operation of the formula.
The weighting algorithm does not adopt a dynamic weighting algorithm such as an attention mechanism, but adopts a static value of the occurrence frequency of the vocabularies, because the occurrence frequency of the vocabularies can be obtained by statistics in advance, and the method can greatly accelerate the speed of calculating the weight of each vocabulary.
In specific implementation, step S4 includes:
inputting the combined feature expression into a neural network to extract feature information;
and predicting the character probability and the entity span of the characteristic information to complete entity identification.
In the invention, the probability of the character as the start index and the end index and the probability of the whole entity span are calculated, and the method has the advantages that the outgoing entity and the nested entity contained in the text can be decoded simultaneously, and the final answer is obtained.
In specific implementation, the neural network is BilSTM.
The invention adopts a BilSTM (Bidirectional Long Short-Term Memory) model for coding and extracts richer context Bidirectional characteristic information.
Finally, it is noted that the above-mentioned embodiments illustrate rather than limit the invention, and that, while the invention has been described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A bridge detection field text entity identification method based on machine reading understanding is characterized by comprising the following steps:
s1, acquiring a question text and a target text;
s2, extracting character embedding, binary character embedding and weighted word embedding from the question text and the target text;
s3, embedding characters, embedding binary characters and embedding and splicing weighted words to obtain joint feature expression;
and S4, inputting the combined feature expression into a neural network to complete entity identification.
2. The bridge detection field text entity recognition method based on machine reading understanding of claim 1, wherein the method for extracting character embedding in step S2 comprises:
serializing the question text into Q ═ Q1,q2,...,qm],qiRepresenting ith character in question text, and representing target text in a serialized mode of C ═ C1,c2,...,c],ciNumber of representationsAccording to the ith character in the text;
q and C are connected in series to form X ═ X1,x2,...,xl],xiBelongs to Q ═ C and l ═ m + n;
operation of searching character embedding table is carried out to obtain vector matrix capable of being input into BERT model
Figure FDA0003003875270000011
Ith element of E
Figure FDA0003003875270000012
wc(xi) Representing a character xiEmbedding characters in table wcThe vector representation of (1); d represents the dimension of each character vector in the character embedding table;
the vector matrix E is subjected to character embedding, wherein the ith character in the character embedding is
Figure FDA0003003875270000013
wbertA character embedding table representing the BERT model.
3. The bridge detection field text entity recognition method based on machine reading understanding of claim 2, wherein the method for extracting the weighted word embedding in the step S2 comprises:
the four sets of B, M, E, S are constructed as follows
Figure FDA0003003875270000014
Figure FDA0003003875270000015
Figure FDA0003003875270000016
Figure FDA0003003875270000017
Wherein D represents an external dictionary, wi,kRepresenting a subsequence [ X ] in an input sequence Xi,xi+1,...,xk],B(xi) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe start character of (a); m (x)i) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe middle character of (1); e (x)i) Representing a sub-sequence w matched in an external dictionary Di,kMiddle character xiIs wi,kThe end character of (1); s (x)i) Representing the current character matched in the external dictionary D, and if the four sets have the condition of empty matching, filling by using a word NONE;
constructing weighted word embeddings as follows
Figure FDA0003003875270000021
In the formula (I), the compound is shown in the specification,
Figure FDA0003003875270000022
indicating the ith character, v, in the embedding of the weighted words(B)、vs(M)、vs(E)、vs(S) represents B, M, E, S corresponding weighted representations, respectively.
4. The bridge detection field text entity recognition method based on machine-readable understanding of claim 3, wherein the weighted representation v of the set of words L is calculated as followss(L)
Figure FDA0003003875270000023
Wherein z (w) represents the frequency at which the word w appears in the external dictionary DRate, ωword(w) a word-embedding representation of the vocabulary w found in the word-embedding table, Z representing a set of word frequencies, Z ═ Σw∈B∪M∪E∪Sz(w)。
5. The bridge inspection field text entity recognition method based on machine-readable understanding of any one of claims 1 to 4, wherein the step S4 includes:
inputting the combined feature expression into a neural network to extract feature information;
and predicting the character probability and the entity span of the characteristic information to complete entity identification.
6. The bridge inspection field text entity recognition method based on machine-readable understanding of claim 5, wherein the neural network is BilSTM.
CN202110357215.9A 2021-04-01 2021-04-01 Bridge detection field text entity identification method based on machine reading understanding Active CN113033206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110357215.9A CN113033206B (en) 2021-04-01 2021-04-01 Bridge detection field text entity identification method based on machine reading understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110357215.9A CN113033206B (en) 2021-04-01 2021-04-01 Bridge detection field text entity identification method based on machine reading understanding

Publications (2)

Publication Number Publication Date
CN113033206A true CN113033206A (en) 2021-06-25
CN113033206B CN113033206B (en) 2022-04-22

Family

ID=76453874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110357215.9A Active CN113033206B (en) 2021-04-01 2021-04-01 Bridge detection field text entity identification method based on machine reading understanding

Country Status (1)

Country Link
CN (1) CN113033206B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113935324A (en) * 2021-09-13 2022-01-14 昆明理工大学 Cross-border national culture entity identification method and device based on word set feature weighting
CN115879474A (en) * 2023-02-14 2023-03-31 华东交通大学 Fault nested named entity identification method based on machine reading understanding

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532303A (en) * 2019-09-04 2019-12-03 重庆交通大学 A kind of information retrieval and the potential relationship method of excavation for Bridge Management & Maintenance information
CN111091000A (en) * 2019-12-24 2020-05-01 深圳视界信息技术有限公司 Processing system and method for extracting user fine-grained typical opinion data
CN111160031A (en) * 2019-12-13 2020-05-15 华南理工大学 Social media named entity identification method based on affix perception
CN111178074A (en) * 2019-12-12 2020-05-19 天津大学 Deep learning-based Chinese named entity recognition method
US20200342056A1 (en) * 2019-04-26 2020-10-29 Tencent America LLC Method and apparatus for natural language processing of medical text in chinese
CN112101027A (en) * 2020-07-24 2020-12-18 昆明理工大学 Chinese named entity recognition method based on reading understanding

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200342056A1 (en) * 2019-04-26 2020-10-29 Tencent America LLC Method and apparatus for natural language processing of medical text in chinese
CN110532303A (en) * 2019-09-04 2019-12-03 重庆交通大学 A kind of information retrieval and the potential relationship method of excavation for Bridge Management & Maintenance information
CN111178074A (en) * 2019-12-12 2020-05-19 天津大学 Deep learning-based Chinese named entity recognition method
CN111160031A (en) * 2019-12-13 2020-05-15 华南理工大学 Social media named entity identification method based on affix perception
CN111091000A (en) * 2019-12-24 2020-05-01 深圳视界信息技术有限公司 Processing system and method for extracting user fine-grained typical opinion data
CN112101027A (en) * 2020-07-24 2020-12-18 昆明理工大学 Chinese named entity recognition method based on reading understanding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN GONG: "Hierarchical LSTM with char-subword-wird tree-structure respresentation for Chinese named entity recognition", 《SCIENCE CHINA》 *
张海楠 等: "基于深度神经网络的中文命名实体识别", 《中文信息学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113935324A (en) * 2021-09-13 2022-01-14 昆明理工大学 Cross-border national culture entity identification method and device based on word set feature weighting
CN113935324B (en) * 2021-09-13 2022-10-28 昆明理工大学 Cross-border national culture entity identification method and device based on word set feature weighting
CN115879474A (en) * 2023-02-14 2023-03-31 华东交通大学 Fault nested named entity identification method based on machine reading understanding

Also Published As

Publication number Publication date
CN113033206B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN111209401A (en) System and method for classifying and processing sentiment polarity of online public opinion text information
CN114064918B (en) Multi-modal event knowledge graph construction method
CN111737496A (en) Power equipment fault knowledge map construction method
CN113312501A (en) Construction method and device of safety knowledge self-service query system based on knowledge graph
CN113033206B (en) Bridge detection field text entity identification method based on machine reading understanding
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN111143553B (en) Method and system for identifying specific information of real-time text data stream
CN113190656B (en) Chinese named entity extraction method based on multi-annotation frame and fusion features
CN113051929A (en) Entity relationship extraction method based on fine-grained semantic information enhancement
CN115599902B (en) Oil-gas encyclopedia question-answering method and system based on knowledge graph
CN111309918A (en) Multi-label text classification method based on label relevance
CN114491024B (en) Specific field multi-label text classification method based on small sample
CN114153973A (en) Mongolian multi-mode emotion analysis method based on T-M BERT pre-training model
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN115827819A (en) Intelligent question and answer processing method and device, electronic equipment and storage medium
CN111460097B (en) TPN-based small sample text classification method
CN116756303A (en) Automatic generation method and system for multi-topic text abstract
CN115017879A (en) Text comparison method, computer device and computer storage medium
CN114398900A (en) Long text semantic similarity calculation method based on RoBERTA model
CN113505222A (en) Government affair text classification method and system based on text circulation neural network
CN116522165A (en) Public opinion text matching system and method based on twin structure
Hua et al. A character-level method for text classification
CN113792144B (en) Text classification method of graph convolution neural network based on semi-supervision
CN115795060A (en) Entity alignment method based on knowledge enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant