CN110287483B - Unregistered word recognition method and system utilizing five-stroke character root deep learning - Google Patents
Unregistered word recognition method and system utilizing five-stroke character root deep learning Download PDFInfo
- Publication number
- CN110287483B CN110287483B CN201910492347.5A CN201910492347A CN110287483B CN 110287483 B CN110287483 B CN 110287483B CN 201910492347 A CN201910492347 A CN 201910492347A CN 110287483 B CN110287483 B CN 110287483B
- Authority
- CN
- China
- Prior art keywords
- character
- deep learning
- neural network
- model
- wubi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000013135 deep learning Methods 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 19
- 238000003062 neural network model Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 4
- 238000010276 construction Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 2
- 238000004088 simulation Methods 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 description 4
- 230000015654 memory Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 102100022493 Mucin-6 Human genes 0.000 description 1
- 108010008692 Mucin-6 Proteins 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 229910052984 zinc sulfide Inorganic materials 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention belongs to the technical field of processing natural language data, and discloses an unregistered word recognition method and system for deep learning by utilizing five strokes of roots, which are used for converting Chinese characters into 4 English letters according to a five strokes of roots table; then training a neural network model by taking the embedded vector as the embedded vector of the model and inputting the embedded vector corresponding to the word of the corpus; finally, the model outputs the closest word vector in the previous corpus, and the word vector is used as an important basis for identifying the unregistered words, so that the unregistered words can be better identified. The invention provides a neural network entity recognition method by utilizing the radicals of five strokes, which can improve the performance of recognizing unregistered words by a neural network model by utilizing the Chinese character words with similar radicals, most of which have the same part of speech and similar five strokes codes. The invention uses word vectors to represent words based on deep learning, solves the sparse problem of high latitude vector space, and is simpler and more effective.
Description
Technical Field
The invention belongs to the technical field of processing natural language data, and particularly relates to an unregistered word recognition method and system utilizing five-stroke character root deep learning.
Background
Currently, the current state of the art commonly used in the industry is as follows: the "named entity" widely used in the field of natural language processing at present was originally proposed in 1996 at the sixth information understanding conference, and most of the research of MUC-6 is based on rule methods, such as: word shape or word part vocabulary rules. And formulating character matching rules according to prompt words, context and the like before and after the named entity, and mainly focusing on an information extraction task. Named entities are the 7 subclasses of named entities that Sekine considers general for objects of interest to solve a particular problem and do not meet the application requirements for automatic question-answering and information retrieval.
In Chinese word segmentation, the unregistered words (Out of Vocabulary, OOV) are very important factors influencing word segmentation effect, and the named entities are the most obvious one of the unregistered words, so the named entities are the problem that Chinese automatic word segmentation cannot avoid. The rule-based method requires to manually formulate a plurality of rules, has low feasibility, has poor portability when the application fields have large differences, and requires to reformulate the rules; two ideas exist in the machine learning-based method, one is that all named entity boundaries in a text are recognized first, and then the entities in the text are classified by using a model; the other is a serialization labeling method, each word in the corpus can be provided with a plurality of candidate class labels, and the labels correspond to positions in various named entities and cannot identify the unregistered words.
In the existing recognition model, the neural network model (such as LSTM, RNN and the like) entity recognition shows stronger competitiveness. Because the neural network model uses characters in the training set as basic input units, the login words are easy to identify, and the test results on the experimental data set also verify that the model can identify the login words, but the method cannot identify the non-login words well.
In summary, the problems of the prior art are:
(1) The rule-based method requires manual formulation of a plurality of rules, has low feasibility, and has poor portability when the application fields are very different, and the rules need to be re-formulated.
(2) The recognition method based on the machine learning method and the neural network model cannot recognize the unknown words.
The difficulty of solving the technical problems is as follows:
with the research of the academic community on the named entity recognition, the named entity recognition can be performed according to different models and algorithms.
Meaning of solving the technical problems:
currently, the terminology of each field is huge in category, content is generalized, information quantity is large, and constitution is complex. Thus, people cannot accurately and completely describe or express the Chinese character by using a plurality of modes such as alias, shorthand, word and the like, and problems occur, and misuse of wrongly written characters, ambiguous words, close meanings and the like are often mixed. This can have a serious impact on the name recognition of the field. In conclusion, the five-stroke character root is used for identifying the unregistered words, so that the method has important significance and practical application value. The model provided by the invention utilizes the characteristics of the five-stroke character roots. Compared with the traditional model using word vectors, the model provided by the invention can well avoid the influence caused by word segmentation errors.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a method and a system for identifying unregistered words by utilizing five-stroke character root deep learning.
The invention is realized in such a way that the method for identifying the unregistered word by utilizing the five-stroke word root deep learning specifically comprises the following steps:
embedding and merging the wubi into original characters, and constructing a comprehensive character representation for each character in an input sentence;
searching an embedded alphabet of the English letter corresponding to the character;
automatically extracting n-gram characteristics of character information by using a cnn neural network, and simulating different n-gram characteristics by generating different characteristic mapping sets; dividing each character into strokes to generate an n-gram model containing character representations;
step four, adopting convolution neural networks of filters with different sizes to simulate a traditional n-gram model;
inputting the character vector into an LSTM neural network model for training, and carrying out context information and modeling on each English letter in the character;
and step six, merging the character vectors, embedding the radical integrated characters into an output terminal provided for the LSTM neural network, so as to decode and predict the final tag sequence of the input sentence.
Further, in the first step, the constructing a comprehensive character representation for each character specifically includes:
for each Chinese character, converting into 4 English letters according to a five-stroke radical table, and adding "·" as filling for Chinese characters with parts less than 4 English letters.
Another object of the present invention is to provide an unregistered word recognition system using wubi root deep learning based on the unregistered word recognition method using wubi root deep learning, the unregistered word recognition system using wubi root deep learning including:
the character construction module is used for embedding and combining the wubi into the original characters and constructing a comprehensive character representation for each character in the input sentence;
the character searching module is used for searching an embedded alphabet corresponding to English letters;
the model construction module is used for automatically extracting the n-gram characteristics of the character information by applying the cnn neural network and simulating different n-gram characteristics by generating different characteristic mapping sets; dividing each character into strokes to generate an n-gram model containing character representations;
the model simulation module is used for simulating a traditional n-gram model by adopting convolutional neural networks of filters with different sizes;
the training module is used for inputting the character vector into the LSTM neural network model for training, and carrying out context information and modeling on each English letter in the character;
and the character embedding module is used for merging the character vectors, embedding the radical integrated characters into an output end provided for the LSTM neural network, and decoding and predicting the final marking sequence of the input sentence.
It is another object of the present invention to provide a computer program applying the unregistered word recognition method using wubi root deep learning.
Another object of the present invention is to provide an information data processing terminal implementing the unregistered word recognition method using wubi root deep learning.
It is another object of the present invention to provide a computer readable storage medium comprising instructions that when executed on a computer cause the computer to perform the method of unregistered word recognition using wurtzite deep learning.
In summary, the invention has the advantages and positive effects that:
the invention provides a neural network entity recognition method by utilizing the radicals of five strokes, which can improve the performance of recognizing unregistered words by a neural network model by utilizing the Chinese character words with similar radicals, most of which have the same part of speech and similar five strokes codes.
The invention uses word vectors to represent words based on deep learning, solves the sparse problem of high latitude vector space, and the word vectors per se contain more semantic information than manually selected features, and can acquire the feature representation of unified vector space from the text fused by multi-source heterogeneous data, thereby being simpler and more effective.
The invention converts word embedding into letter embedding, and converts each Chinese character into 4 English letters by utilizing the principle that five strokes of Chinese characters with the same meaning are similar in coding, thereby improving the performance of the neural network model in identifying the unregistered words.
The invention can replace the strokes, and the strokes of each Chinese character are used as words to be embedded, so that the accuracy of identifying the unregistered words by the model can be improved; meanwhile, the main stream level can be achieved only by the word vector and the character vector, and the effect can be further improved by adding high-quality dictionary features.
The invention combines LSTM and five-stroke character root model for identifying Chinese naming entity. The model encodes the input character sequence and all potential vocabularies matching the wubi root dictionary. In contrast to character-based methods, the present invention explicitly utilizes word and word order information. The gating loop unit enables the model to select the most relevant characters and words from sentences to generate better named entity recognition results.
The invention uses five-stroke character roots to represent Chinese characters, and the representations are combined as character embedding, so that the form and semantic information of exploring characters can be enhanced; according to the invention, the n-gram characteristics are automatically extracted by using the neural network, each character is divided into strokes to provide an n-gram model, each character is represented by 4 English letters, and fuzzy information can be brought to different characters with the same type, so that the performance of identifying the unregistered words by an algorithm is improved.
The invention adopts the five-stroke representation method and embeds the integrated character of the character root to form the final input, and then adopts the convolution neural network of the filters with different sizes to simulate the traditional n-gram model, thereby being beneficial to identifying the unregistered words.
The five-stroke method provided by the invention can distinguish words with similar structures. If the characters are less than four English letters, the blank letters can be used for filling initialization embedding to ensure that each character has four stroke level representations, and stroke input vector values are continuously updated during training of the model, so that the performance of the model can be enhanced.
Drawings
FIG. 1 is a flowchart of an unregistered word recognition method using five-stroke radical deep learning according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an unregistered word recognition method using five-stroke radical deep learning according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The technical scheme of the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1-2, the method for identifying the unregistered word by using the deep learning of the five-stroke character root provided by the embodiment of the present invention specifically includes:
s101: embedding and merging the wubi into the original character, and constructing a comprehensive character representation for each character in the input sentence; for each Chinese character, converting into 4 English letters according to a five-stroke radical table, and adding "·" as filling for Chinese characters with parts less than 4 English letters before or after;
s102: searching an embedded alphabet of the English letters corresponding to the characters;
s103: automatically extracting n-gram characteristics of character information by using a cnn neural network, and simulating different n-gram characteristics by generating different characteristic mapping sets; dividing each character into strokes to generate an n-gram model containing character representations;
s104: adopting convolution neural networks of filters with different sizes to simulate a traditional n-gram model;
s105: inputting the character vector into an LSTM neural network model for training, and carrying out context information and modeling on each English letter in the character;
s106: the character vectors are combined and the root integrated character is embedded into an output provided to the LSTM neural network to decode and predict the final tag sequence of the input sentence.
The technical scheme of the invention is further described below with reference to specific embodiments.
Example 1:
the invention combines LSTM and five-stroke character root model for identifying Chinese naming entity. The present invention encodes the input character sequence and all potential vocabularies that match the wubi root dictionary. In contrast to character-based methods, the present invention explicitly utilizes word and word order information. The gating loop unit enables the model to select the most relevant characters and words from sentences to generate better named entity recognition results.
In the aspect of word input embedding, the embodiment of the invention utilizes five-stroke character roots to represent Chinese characters, and the representations are combined as character embedding, so that the form and semantic information of explored characters can be enhanced, and n-gram characteristics can be automatically extracted by using a neural network. Each character is divided into strokes to propose an n-gram model, each character being represented by 4 english letters. For different characters with the same type, the implementation of the method can bring fuzzy information, so that the performance of the algorithm for identifying the unregistered words is improved.
Table 1 comparison of two character encoding methods
Word(s) | Five-stroke representation |
Exquisite bell | Wang Ren and B (gwyc) |
Bell with bell | Gold man, B (qwyc) |
The embodiment of the invention adopts a five-stroke representation method and embeds the integrated character of the character root to form final input, and then adopts the convolution neural network of filters with different sizes to simulate the traditional n-gram model, thereby being beneficial to identifying the unregistered words.
Named entity recognition is widely applied to various fields, such as recognizing person names and place names from a sentence, recognizing product names from medical drugs, recognizing product names from e-commerce sales searches, and the like. The invention combines a long-term memory circulation network with the five-stroke character root model, has better performance for identifying the named entity in the field of financial insurance, and improves the accuracy of identifying the insurance name.
It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (4)
1. The method for identifying the unregistered word by utilizing the five-stroke word root deep learning is characterized by comprising the following steps of:
embedding and merging the wubi into original characters, and constructing a comprehensive character representation for each character in an input sentence;
searching an embedded alphabet of the English letter corresponding to the character;
automatically extracting n-gram characteristics of character information by using a cnn neural network, and simulating different n-gram characteristics by generating different characteristic mapping sets; dividing each character into strokes to generate an n-gram model containing character representations;
step four, adopting convolution neural networks of filters with different sizes to simulate a traditional n-gram model;
inputting the character vector into an LSTM neural network model for training, and carrying out context information and modeling on each English letter in the character;
step six, merging the character vectors, embedding the integrated character of the radical into an output end provided for the LSTM neural network, so as to decode and predict a final marking sequence of the input sentence;
in the first step, the construction of a comprehensive character representation for each character specifically includes: for each Chinese character, converting into 4 English letters according to a five-stroke radical table, and adding "·" as filling for Chinese characters with parts less than 4 English letters.
2. An unregistered word recognition system using wubi root deep learning based on the unregistered word recognition method using wubi root deep learning of claim 1, wherein the unregistered word recognition system using wubi root deep learning includes:
the character construction module is used for embedding and combining the wubi into the original characters and constructing a comprehensive character representation for each character in the input sentence;
the character searching module is used for searching an embedded alphabet corresponding to English letters;
the model construction module is used for automatically extracting the n-gram characteristics of the character information by applying the cnn neural network and simulating different n-gram characteristics by generating different characteristic mapping sets; dividing each character into strokes to generate an n-gram model containing character representations;
the model simulation module is used for simulating a traditional n-gram model by adopting convolutional neural networks of filters with different sizes;
the training module is used for inputting the character vector into the LSTM neural network model for training, and carrying out context information and modeling on each English letter in the character;
and the character embedding module is used for merging the character vectors, embedding the radical integrated characters into an output end provided for the LSTM neural network, and decoding and predicting the final marking sequence of the input sentence.
3. An information data processing terminal implementing the unregistered word recognition method using wubi root deep learning according to claim 1.
4. A computer readable storage medium comprising instructions that when executed on a computer cause the computer to perform the method of unregistered word recognition with wubi root deep learning of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910492347.5A CN110287483B (en) | 2019-06-06 | 2019-06-06 | Unregistered word recognition method and system utilizing five-stroke character root deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910492347.5A CN110287483B (en) | 2019-06-06 | 2019-06-06 | Unregistered word recognition method and system utilizing five-stroke character root deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110287483A CN110287483A (en) | 2019-09-27 |
CN110287483B true CN110287483B (en) | 2023-12-05 |
Family
ID=68003508
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910492347.5A Active CN110287483B (en) | 2019-06-06 | 2019-06-06 | Unregistered word recognition method and system utilizing five-stroke character root deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287483B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126160B (en) * | 2019-11-28 | 2023-04-07 | 天津瑟威兰斯科技有限公司 | Intelligent Chinese character structure evaluation method and system constructed based on five-stroke input method |
CN111523325A (en) * | 2020-04-20 | 2020-08-11 | 电子科技大学 | Chinese named entity recognition method based on strokes |
CN112507190B (en) * | 2020-12-17 | 2023-04-07 | 新华智云科技有限公司 | Method and system for extracting keywords of financial and economic news |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106354701A (en) * | 2016-08-30 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Chinese character processing method and device |
CN107832289A (en) * | 2017-10-12 | 2018-03-23 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM CNN |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
CN108595592A (en) * | 2018-04-19 | 2018-09-28 | 成都睿码科技有限责任公司 | A kind of text emotion analysis method based on five-stroke form code character level language model |
CN108829823A (en) * | 2018-06-13 | 2018-11-16 | 北京信息科技大学 | A kind of file classification method |
CN108875021A (en) * | 2017-11-10 | 2018-11-23 | 云南大学 | A kind of sentiment analysis method based on region CNN-LSTM |
CN109033042A (en) * | 2018-06-28 | 2018-12-18 | 中译语通科技股份有限公司 | BPE coding method and system, machine translation system based on the sub- word cell of Chinese |
CN109388807A (en) * | 2018-10-30 | 2019-02-26 | 中山大学 | The method, apparatus and storage medium of electronic health record name Entity recognition |
CN109597891A (en) * | 2018-11-26 | 2019-04-09 | 重庆邮电大学 | Text emotion analysis method based on two-way length Memory Neural Networks in short-term |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3433795A4 (en) * | 2016-03-24 | 2019-11-13 | Ramot at Tel-Aviv University Ltd. | Method and system for converting an image to text |
-
2019
- 2019-06-06 CN CN201910492347.5A patent/CN110287483B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106354701A (en) * | 2016-08-30 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Chinese character processing method and device |
CN107832289A (en) * | 2017-10-12 | 2018-03-23 | 北京知道未来信息技术有限公司 | A kind of name entity recognition method based on LSTM CNN |
CN108875021A (en) * | 2017-11-10 | 2018-11-23 | 云南大学 | A kind of sentiment analysis method based on region CNN-LSTM |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
CN108595592A (en) * | 2018-04-19 | 2018-09-28 | 成都睿码科技有限责任公司 | A kind of text emotion analysis method based on five-stroke form code character level language model |
CN108829823A (en) * | 2018-06-13 | 2018-11-16 | 北京信息科技大学 | A kind of file classification method |
CN109033042A (en) * | 2018-06-28 | 2018-12-18 | 中译语通科技股份有限公司 | BPE coding method and system, machine translation system based on the sub- word cell of Chinese |
CN109388807A (en) * | 2018-10-30 | 2019-02-26 | 中山大学 | The method, apparatus and storage medium of electronic health record name Entity recognition |
CN109597891A (en) * | 2018-11-26 | 2019-04-09 | 重庆邮电大学 | Text emotion analysis method based on two-way length Memory Neural Networks in short-term |
Non-Patent Citations (1)
Title |
---|
基于改进模拟退火的神经网络降质图像恢复;潘梅森;《计算机工程与设计》;20061231;第27卷(第24期);第1-4页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110287483A (en) | 2019-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738004B (en) | Named entity recognition model training method and named entity recognition method | |
CN108124477B (en) | Improving word segmenters to process natural language based on pseudo data | |
JP2022028887A (en) | Method, apparatus, electronic device and storage medium for correcting text errors | |
CN108804423B (en) | Medical text feature extraction and automatic matching method and system | |
CN110287483B (en) | Unregistered word recognition method and system utilizing five-stroke character root deep learning | |
CN111310441A (en) | Text correction method, device, terminal and medium based on BERT (binary offset transcription) voice recognition | |
CN105512110B (en) | A kind of wrongly written character word construction of knowledge base method based on fuzzy matching with statistics | |
CN111599340A (en) | Polyphone pronunciation prediction method and device and computer readable storage medium | |
CN111666758A (en) | Chinese word segmentation method, training device and computer readable storage medium | |
CN110807335A (en) | Translation method, device, equipment and storage medium based on machine learning | |
CN114036950A (en) | Medical text named entity recognition method and system | |
Wu et al. | A multimodal attention fusion network with a dynamic vocabulary for TextVQA | |
CN111881256B (en) | Text entity relation extraction method and device and computer readable storage medium equipment | |
CN114912450B (en) | Information generation method and device, training method, electronic device and storage medium | |
CN112434520A (en) | Named entity recognition method and device and readable storage medium | |
CN115935959A (en) | Method for labeling low-resource glue word sequence | |
CN111597815A (en) | Multi-embedded named entity identification method, device, equipment and storage medium | |
JP2018206262A (en) | Word linking identification model learning device, word linking detection device, method and program | |
CN112632956A (en) | Text matching method, device, terminal and storage medium | |
CN115906854A (en) | Multi-level confrontation-based cross-language named entity recognition model training method | |
CN116029300A (en) | Language model training method and system for strengthening semantic features of Chinese entities | |
CN115358227A (en) | Open domain relation joint extraction method and system based on phrase enhancement | |
CN114298032A (en) | Text punctuation detection method, computer device and storage medium | |
Yadav et al. | Image Processing-Based Transliteration from Hindi to English | |
CN112966510A (en) | Weapon equipment entity extraction method, system and storage medium based on ALBERT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |