CN107844473B - Word sense disambiguation method based on context similarity calculation - Google Patents

Word sense disambiguation method based on context similarity calculation Download PDF

Info

Publication number
CN107844473B
CN107844473B CN201710876243.5A CN201710876243A CN107844473B CN 107844473 B CN107844473 B CN 107844473B CN 201710876243 A CN201710876243 A CN 201710876243A CN 107844473 B CN107844473 B CN 107844473B
Authority
CN
China
Prior art keywords
speech
word
disambiguated
context
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710876243.5A
Other languages
Chinese (zh)
Other versions
CN107844473A (en
Inventor
周俏丽
孟禹光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Aerospace University
Original Assignee
Shenyang Aerospace University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aerospace University filed Critical Shenyang Aerospace University
Priority to CN201710876243.5A priority Critical patent/CN107844473B/en
Publication of CN107844473A publication Critical patent/CN107844473A/en
Application granted granted Critical
Publication of CN107844473B publication Critical patent/CN107844473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The invention relates to a word sense disambiguation method based on context similarity calculation, which comprises the following steps: processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC; screening parts of speech, and only keeping real words including nouns, adjectives, adverbs and verbs; training a bidirectional LSTM model by using the corpus with the screened part of speech; inputting example sentences of words to be disambiguated into a bidirectional LSTM model to obtain context vectors; inputting the context of the word to be disambiguated into a bidirectional LSTM model to obtain a context vector of the word to be disambiguated; calculating cosine similarity of the context vector of the word to be disambiguated and the context vector of the example sentence, and further selecting the semantics of the word to be disambiguated by using a k-nearest neighbor method according to the obtained similarity result. The invention can better model the semantic, directly combines the word and the part of speech with underlining behind the word, obtains the word vector which well distinguishes different parts of speech of the same word, and improves the disambiguation accuracy rate by 0.5 percent on the basis of the baseline experiment.

Description

Word sense disambiguation method based on context similarity calculation
Technical Field
The invention relates to a natural language translation technology, in particular to a word sense disambiguation method based on context similarity calculation.
Background
Word sense disambiguation, WSD for short, is a long-history problem and has wide application. Currently, there are three categories, supervised, unsupervised and knowledge-based. Although published supervised word sense disambiguation systems perform well when given large-scale training corpora of specific semantics, the lack of large-scale markup corpora is a major problem. This problem can be solved to some extent using pre-trained word vectors. Because the word vector trained on large-scale corpus in advance is used, more semantic grammar information is contained, and the supervised system is trained by using the word vector, the performance is improved. To infer the meaning of a word in a sentence, the target word and the context of the target word need to be clearly expressed. Context is defined as the portion of a sentence that remains after the word to be disambiguated is removed from the sentence. In order to better compute the context similarity, the context also needs to be represented in the form of a vector.
In previous disambiguation tasks, the context was simply represented by summing or weighted averaging word vectors over a window of target words. But word vectors pre-trained using this method contain very limited information due to the inherent link between the target word and its overall context. To infer word senses in a sentence, both the target word and the context vector of the target word need to contain information about the entire sentence. A common drawback of many current disambiguation systems is that they do not contain information about the order of the words. And the LSTM (Long Short-Term Memory network), especially the two-way LSTM (Long Short-Term Memory network), overcomes the defects, can model all words around the target word, and takes the word order into consideration. However, the two-way LSTM models, which use different parts of speech of a word as a point for modeling, are not very accurate, since the same word has different meanings if the parts of speech are different. The context2vec model trained under the condition of no addition of parts of speech does not utilize the parts of speech, and words with different parts of speech are regarded as the same word for modeling, so that the word which is supposed to have multiple semantics is represented by only one vector in a semantic space.
Disclosure of Invention
Aiming at the defects that the disambiguation accuracy is poor and the like caused by modeling that different parts of speech of a word are regarded as one point in the prior art for word meaning disambiguation, the invention aims to solve the problem of providing the word meaning disambiguation method based on the context similarity calculation, which can well distinguish the different parts of speech of the same word and improve the disambiguation accuracy.
In order to solve the technical problems, the invention adopts the technical scheme that:
the invention relates to a word sense disambiguation method based on context similarity calculation, which comprises the following steps:
1) processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC;
2) screening parts of speech, and only keeping real words including nouns, adjectives, adverbs and verbs;
3) training a bidirectional LSTM model by using the corpus with the screened part of speech;
4) inputting example sentences of words to be disambiguated into a bidirectional LSTM model to obtain context vectors;
5) inputting the context of the word to be disambiguated into a bidirectional LSTM model to obtain a context vector of the word to be disambiguated;
6) calculating cosine similarity of the context vector of the word to be disambiguated and the context vector of the example sentence, and further selecting the semantics of the word to be disambiguated by using a k-nearest neighbor method according to the obtained similarity result.
In step 1), training the model using the part-of-speech tagged version of ukWaC is: the part of speech is automatically labeled by TreeTagger, and the part of speech and the word are combined together, and the part of speech information is added into the model to be trained together.
In step 3), the bidirectional LSTM model is trained by the linguistic data with the screened part of speech, and sentences with the sentence length less than or equal to 64 words are used.
In the step 4), the example sentence of the word to be disambiguated is input into the bidirectional LSTM model, and similarity calculation is carried out on the example sentence and the context vector of the word to be disambiguated, so that the example sentence which is closest to the word to be disambiguated is selected.
In the step 1), 3 different part-of-speech tagging modes are adopted to train the model originally, including fine classification part-of-speech tags, rough classification part-of-speech tags and part-of-speech tags only using real words, the part-of-speech tagging method only using the real words is to map the original part-of-speech tags, the rest parts-of-speech are not considered any more, then the words and the parts-of-speech are connected together in a word-part-of-speech mode to be regarded as a new word, and then the model is trained.
And mapping the original part-of-speech tag by using a part-of-speech tag method of the real word, selecting 4 kinds of real word parts-of-speech, mapping the part-of-speech of the subdivided classes of the 4 kinds of real words into corresponding part-of-speech classes, and removing the rest part-of-speech features from the corpus.
The invention has the following beneficial effects and advantages:
1. the method firstly analyzes the influence of words with different parts of speech on the semantics, selects several words with the largest influence on the semantics to label, better models the semantics, directly combines the words and the parts of speech with underlining behind the words to be used as one word, inputs the word into a context2vec model to train, and obtains a word vector to well distinguish the different parts of speech of the same word.
2. The training model labeled by the method is used for disambiguation, and the disambiguation accuracy is improved by 0.5% on the basis of the baseline experiment.
Drawings
FIG. 1 is a graphical representation of the variation of points in semantic space after adding parts of speech in the present invention;
FIG. 2 is a diagram of context2vec model with parts of speech added in the present invention.
Detailed Description
The invention is further elucidated with reference to the accompanying drawings.
The invention relates to a word sense disambiguation method based on context similarity calculation, which comprises the following steps:
1) processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC;
2) screening parts of speech, and only keeping real words including nouns, adjectives, adverbs and verbs;
3) training a bidirectional LSTM model by using the corpus with the screened part of speech;
4) inputting example sentences of words to be disambiguated into a bidirectional LSTM model to obtain context vectors;
5) inputting the context of the word to be disambiguated into a bidirectional LSTM model to obtain a context vector of the word to be disambiguated;
6) calculating cosine similarity of the context vector of the word to be disambiguated and the context vector of the example sentence, and further selecting the semantics of the word to be disambiguated by using a k-nearest neighbor method according to the obtained similarity result.
Part of speech is very important semantic grammar information, because the ukWaC corpus used by the context vector model trained by the invention provides part of speech tagged versions, the invention directly combines the word and the part of speech with underlining behind the word, regards the word as a word and inputs the word into a context2vec (as shown in figure 3) model for training, and thus the obtained word vector can distinguish different parts of speech of the same word.
As shown in fig. 1, before adding part of speech, the plane word contains 3 forms of verb prototype, noun singular and adjective, 3 different semantic grammatical information are regarded as the same point, which is obviously unreasonable, but after adding part of speech, different parts of speech can be modeled respectively, and the adjective, the noun singular and the verb prototype of the plane are redistributed in space to have points belonging to their semantics. In this way, semantic grammar information can be better captured.
In the step 1), the disambiguation effect of the model trained by the part-of-speech tagging method only using real words is found to be the best by comparing the disambiguation effects of the model trained by adopting 3 different part-of-speech tagging modes, namely, the part-of-speech tagging of the fine classification, the part-of-speech tagging of the coarse classification and the part-of-speech tagging of the real words only. The original part-of-speech tag is subdivided into several types of parts-of-speech from the part-of-speech of a certain large category, the invention only selects a part-of-speech tagging method of 4 types of real word parts-of-speech, maps the part-of-speech of the subdivided category of the original part-of-speech tag into a corresponding part-of-speech category, removes the rest part-of-speech characteristics from the corpus without considering, then connects the word and the part-of-speech together in a word-part-of-speech mode to be regarded as a new word, and trains a model.
In this embodiment, a part-of-speech tagged version of ukWaC of 20 hundred million words is used to train the model, the part-of-speech is automatically tagged by TreeTagger, and the part-of-speech and the word are combined together and connected in a "_" manner, for example: applet, which is written as applet nn constitutes a new word. Thus, the part-of-speech information can be added into the model to be trained together. The context vector of the word to be disambiguated is obtained by using the obtained model, and the part of speech information is included.
In the step 2), only a few real words are reserved when the part of speech is screened: nouns, adjectives, adverbs, verbs. The effect of this step is to reduce the influence of other words with small influence on the semantic meaning and error-prone part-of-speech tagging on the quality of the model.
As a label sense disambiguation data set, the invention adopts a 2004 Senseval-3 lexical sample dataset which comprises 7860 training samples and 3944 test samples. This data set was used for parameter tuning and testing disambiguation accuracy.
Firstly, part-of-speech tagging is carried out on a training corpus, only the following tags are reserved for the tagged training corpus, and other tags are reserved for prototypes.
TABLE 1 part-of-speech tagging of real words
Figure GDA0002761485500000041
For example: it is the half a late pitch, with bicycle-type handlebars and a squirting lever at the rear, while the which you step on to activate It.
In order to train a sentence in the corpus used for two-way LSTM initially, the invention changes the sentence into:
It_pp is_vbz quite_rb a_dt hefty_jj spade_nn,_,with_in bicycle_nn-_:type_nn handlebars_nns and_cc a_dt sprung_vvn lever_nn at_in the_dt rear_nn,_,which_wdt you_pp step_vvp on_in to_to activate_vv it_pp._sent
only a few part-of-speech tags appearing in table 1 are retained, others retain the original words, and the above sentences become after the filtering:
It is quite_rb a hefty_jj spade_nn,with bicycle_nn-type_nn handlebars_nn and a sprung_vv lever_nn at the rear_nn,which you step_vv on to activate_vv it.
and 3) training the bidirectional LSTM model by using the linguistic data with the screened part of speech. To speed up the training process and facilitate comparison with the baseline experiment, sentences with sentence lengths greater than 64 are not used, which results in a 10% reduction in corpus size.
In the step 4), the example sentence of the word to be disambiguated is input into the bidirectional LSTM model to obtain a context vector. The step is used for calculating the similarity with the context vector of the word to be disambiguated, so that the example sentence closest to the word context to be disambiguated is selected, and the semantic corresponding to the example sentence is the real semantic of the word to be disambiguated.
The above is done to the sentences containing the words to be disambiguated and the example sentences containing the words to be disambiguatedLower partAnd (6) processing. If the original sentence is Avoid matching one in the kitchen, as fuels from recording one in the kitchen to activate the word, wherein the activate is the word to be disambiguated, the words are processed to be Avoid _ nn matching _ vv one in the kitchen _ nn, as fuels _ nn matching _ nn two in the kitchen _ rb in the kitchen _ jj to activate the word \vv the alarm _ nn. replaces activate _ vv in the sentence by [ 2 ]]And inputting the context vector v of the sentence into the previously trained bidirectional LSTM model0
Word to be disambiguated has example sentence S1....SnReplacing the word to be disambiguated with [ 2 ]]Inputting the two-way LSTM model to obtain n context vectors v1...vnV is to be0Are respectively connected with v1...vnCalculating cosine similarity to obtain n values, taking the maximum 5 values, and assuming that corresponding sentences are S respectivelyx1,Sx2,Sx3,Sx4,Sx5Each sentence corresponds to a semantic, and the semantic with the most occurrence times in the 5 sentences is the semantic of the word to be disambiguated.
In order to find out the most effective part-of-speech feature introduction method, the invention uses 3 different part-of-speech tagging modes to train the model: fine classification part-of-speech tags, coarse classification part-of-speech tags, part-of-speech tags using only real words.
TABLE 2 context of predicted word and disambiguation result comparison
Figure GDA0002761485500000051
From the comparison of the results in table 1, it can be seen that although the disambiguation accuracy of number 1 is higher than that of number 2, the target word predicted by its context does not conform to semantics or even grammar (after adding part of speech, the part of speech can be predicted together, and for comparison, it is not listed here). After the part of speech is added, it can be seen that the target word can be well predicted by number 2. The addition of the part of speech is explained, and the model plays a better role. However, although the context predicted word has a better effect after the part of speech is added, the accuracy rate in the word sense disambiguation task does not reach the effect before the part of speech is added, and the reason for this is probably because the TreeTagger used in this embodiment has too many part of speech tags.
And comparing the results of the table 1, and adding the part-of-speech marks of the rough classification, wherein the No. 3 does not achieve the disambiguation effect of the No. 1, but the context predicted word is more consistent with the semantic grammar. And according to the comparison of the results of No. 2 and No. 3, the disambiguation accuracy and the context prediction words are very close. This is because the two parts of speech tagging methods label the null words. The corpus is examined, wherein some of the thats labeled DT are labeled IN and the upon labeled IN is labeled RP. Although the number of the virtual words is limited, the frequency of each virtual word appearing in the training corpus is high, and the virtual words are generally used for forming a sentence frame, and if the words are labeled incorrectly, the words have larger influence on a semantic space.
As can be seen from the results in Table 1, the sequence number 4 is better than the model trained by the previous 3 methods, both in predicting the word and disambiguating effect. By comparing the two indexes, the conclusion can be drawn: and training a context2vec model by using the training corpus labeled by the real word label 2, so that the model performance is improved.
The above results are all compared with results obtained by using a single k value, and are not necessarily representative. In order to further illustrate the representativeness of the results, the model with the serial number 4 and the model with the serial number 1 are compared by using different k values (1 to 10) to obtain disambiguation results, and the disambiguation effect is basically improved under different k values through comparison, so that the conclusion obtained before the invention is met.
This example was also compared with other systems disambiguated in SE-3, and the results are shown in Table 2, with the second best results S-2 being achieved by Rothe and Schutze (2015), and the previous best results S-1 being achieved by Ando (2006), which is a 1.2% improvement and a much simpler process. S-3 is the result of the non-lexical context2vec model using the k-nearest neighbor method with k being 1. Ours-1 is the result obtained in this example using k ═ 5 using the context2vec model without addition of part of speech, and Ours-2 adds part of speech features to Ours-1 and selects the same value of k. It can be seen that the disambiguation accuracy rate is improved by 0.5% after the part-of-speech feature is added.
In this section, the invention uses 3 different part-of-speech tagging approaches to label the corpus and train the model. The first two of the words can not achieve the effect before adding the part of speech, and the analysis reason is that the number of times of appearance of the dummy words in the training corpus is large, the dummy words are main components forming a sentence frame, the part of speech marking errors of the dummy words can bring larger influence to modeling, and the semantemes expressed by different parts of speech of the words are always the same. The experimental result table name real word mark 2 obtains better effect than that without adding word property, and the result on the same test set shows that the accuracy rate of the patent method is improved by nearly 2% compared with the best result published by the prior art.
TABLE 3 results of different systems in SE-3 test set
Figure GDA0002761485500000061
TABLE 4 promotion on different systems after addition of part-of-speech
Figure GDA0002761485500000062

Claims (3)

1. A word sense disambiguation method based on context similarity calculation is characterized by comprising the following steps:
1) processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC;
2) screening parts of speech, and only keeping real words including nouns, adjectives, adverbs and verbs;
3) training a bidirectional LSTM model by using the corpus with the screened part of speech;
4) inputting example sentences of words to be disambiguated into a bidirectional LSTM model to obtain context vectors;
5) inputting the context of the word to be disambiguated into a bidirectional LSTM model to obtain a context vector of the word to be disambiguated;
6) calculating cosine similarity of the context vector of the word to be disambiguated and the context vector of the example sentence, and further selecting the semantics of the word to be disambiguated by using a k nearest neighbor method according to the obtained similarity result;
in the step 1), a model is trained by using a part-of-speech tagging method only using real words, the part-of-speech tagging method only using the real words is to map original part-of-speech tags, the rest parts-of-speech are not considered, then words and parts-of-speech are connected together in a word-part-of-speech mode to be regarded as a new word, and then the model is trained;
mapping original part-of-speech marks by using a part-of-speech marking method of real words only, selecting 4 kinds of part-of-speech of real words, mapping the part-of-speech of the subdivided classes of the 4 kinds of real words into corresponding part-of-speech classes, and removing the rest part-of-speech characteristics from the corpus;
in step 1), training the model using the part-of-speech tagged version of ukWaC is: the part of speech is automatically marked by TreeTagger, the part of speech and the word are combined together, and part of speech information is added into the model to be trained together; the adverb, the comparative adverb RBR and the adverb highest-level RBS are mapped as an adverb; the plural nouns NNS of the common nouns, the singular nouns NN of the common nouns, the singular nouns NP of the proper nouns and the plural nouns NPS of the proper nouns are mapped into the singular nouns NN of the common nouns; mapping the adjective highest-level JJS, the comparative-level adjective JJR and the adjective JJ into an adjective JJ; the real verb basic form VV, the real verb past VVD, the verb or present participle VVG of the real verb, the past participle VVN of the real verb, the present tense of the real verb does not include the third person named single VVP and the present tense of the real verb includes the third person named single VVZ mapped to the real verb basic form VV.
2. The method of claim 1, wherein the bidirectional LSTM model is trained in step 3) with the filtered linguistic data, and sentences with a sentence length of 64 words or less are used.
3. The word sense disambiguation method based on the context similarity calculation of claim 1, wherein in step 4), the example sentence of the word to be disambiguated is input into the bi-directional LSTM model, and the context vector of the word to be disambiguated is subjected to the similarity calculation, so as to select the example sentence closest to the context of the word to be disambiguated.
CN201710876243.5A 2017-09-25 2017-09-25 Word sense disambiguation method based on context similarity calculation Active CN107844473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710876243.5A CN107844473B (en) 2017-09-25 2017-09-25 Word sense disambiguation method based on context similarity calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710876243.5A CN107844473B (en) 2017-09-25 2017-09-25 Word sense disambiguation method based on context similarity calculation

Publications (2)

Publication Number Publication Date
CN107844473A CN107844473A (en) 2018-03-27
CN107844473B true CN107844473B (en) 2020-12-18

Family

ID=61661705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710876243.5A Active CN107844473B (en) 2017-09-25 2017-09-25 Word sense disambiguation method based on context similarity calculation

Country Status (1)

Country Link
CN (1) CN107844473B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622311A (en) * 2017-10-09 2018-01-23 深圳市唯特视科技有限公司 A kind of robot learning by imitation method based on contextual translation
CN109697292B (en) * 2018-12-17 2023-04-21 北京百度网讯科技有限公司 Machine translation method, device, electronic equipment and medium
CN111444676A (en) * 2018-12-28 2020-07-24 北京深知无限人工智能研究院有限公司 Part-of-speech tagging method, device, equipment and storage medium
CN109753569A (en) * 2018-12-29 2019-05-14 上海智臻智能网络科技股份有限公司 A kind of method and device of polysemant discovery
CN110705295B (en) * 2019-09-11 2021-08-24 北京航空航天大学 Entity name disambiguation method based on keyword extraction
CN111259655B (en) * 2019-11-07 2023-07-18 上海大学 Logistics intelligent customer service problem similarity calculation method based on semantics
CN111310475B (en) * 2020-02-04 2023-03-10 支付宝(杭州)信息技术有限公司 Training method and device of word sense disambiguation model
CN112949319B (en) * 2021-03-12 2023-01-06 江南大学 Method, device, processor and storage medium for marking ambiguous words in text

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243129A (en) * 2015-09-30 2016-01-13 清华大学深圳研究生院 Commodity property characteristic word clustering method
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7197457B2 (en) * 2003-04-30 2007-03-27 Robert Bosch Gmbh Method for statistical language modeling in speech recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243129A (en) * 2015-09-30 2016-01-13 清华大学深圳研究生院 Commodity property characteristic word clustering method
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
context2vec: Learning Generic Context Embedding with Bidirectional LSTM;Oren Melamud等;《Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning》;20160131;第2-12页 *
词义消歧研究: 资源、方法与评测;吴云芳;《当代语言学》;20090228;第11卷(第2期);第113-122页 *

Also Published As

Publication number Publication date
CN107844473A (en) 2018-03-27

Similar Documents

Publication Publication Date Title
CN107844473B (en) Word sense disambiguation method based on context similarity calculation
WO2018028077A1 (en) Deep learning based method and device for chinese semantics analysis
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
CN103678684B (en) A kind of Chinese word cutting method based on navigation information retrieval
CN107818085B (en) Answer selection method and system for reading understanding of reading robot
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
Adler et al. An unsupervised morpheme-based HMM for Hebrew morphological disambiguation
CN111125349A (en) Graph model text abstract generation method based on word frequency and semantics
CN107220232A (en) Keyword extracting method and device, equipment and computer-readable recording medium based on artificial intelligence
CN110222178A (en) Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
CN106503192A (en) Name entity recognition method and device based on artificial intelligence
CN109388743B (en) Language model determining method and device
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN106570180A (en) Artificial intelligence based voice searching method and device
CN110704621A (en) Text processing method and device, storage medium and electronic equipment
CN111475622A (en) Text classification method, device, terminal and storage medium
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN111666758A (en) Chinese word segmentation method, training device and computer readable storage medium
CN109086340A (en) Evaluation object recognition methods based on semantic feature
CN109033320A (en) A kind of bilingual news Aggreagation method and system
CN109062904A (en) Logical predicate extracting method and device
CN110633467A (en) Semantic relation extraction method based on improved feature fusion
CN115017903A (en) Method and system for extracting key phrases by combining document hierarchical structure with global local information
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant