CN114896394B - Event trigger word detection and classification method based on multilingual pre-training model - Google Patents
Event trigger word detection and classification method based on multilingual pre-training model Download PDFInfo
- Publication number
- CN114896394B CN114896394B CN202210404007.4A CN202210404007A CN114896394B CN 114896394 B CN114896394 B CN 114896394B CN 202210404007 A CN202210404007 A CN 202210404007A CN 114896394 B CN114896394 B CN 114896394B
- Authority
- CN
- China
- Prior art keywords
- sentence
- vector
- language
- word
- cross
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000001514 detection method Methods 0.000 title claims abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 99
- 238000005065 mining Methods 0.000 claims abstract description 5
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 abstract description 6
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/49—Data-driven translation using very large corpora, e.g. the web
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of natural language processing, in particular to an event trigger word detection and classification method based on a multilingual pre-training model. The invention effectively reduces the workload of manually marking the event and is convenient to expand to other event types; the multi-language pre-training model is used for coding, so that multiple languages can be subjected to similarity mining and prediction on a vector pool, and the workload of training a model by one language is reduced; the method has the advantages that a multi-language pre-training model is used, and the data volume enriched by external news corpuses such as Chinese and English is used in the preparation stage, so that the problem that the vector pool is not enriched by using the method by using scarce language resources is avoided.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to an event trigger word detection and classification method based on a multilingual pre-training model.
Background
With the progress of science and technology, internet users have grown, and news information on the internet has been shown to have grown explosively. How to extract effective information from a large number of complex news events is a hot spot for research. Researchers have proposed information extraction techniques: structured information is extracted from unstructured text. Event extraction is one of the most challenging tasks in information extraction technology, and mainly aims at researching how to extract basic information of an event from unstructured text, and the basic information comprises event trigger word detection, event trigger word classification, event element identification and event element role identification.
The existing event trigger word detection classification method mainly aims at single languages in a centralized way, and rarely aims at multiple languages; most are researching language rich in resources, and few are researching language scarce in resources. But as deep learning techniques develop, the importance of event extraction research for scarce resource languages is increasing. Moreover, the existing event extraction model is often severely dependent on labor-intensive and domain-specific annotation, is only effective for label classification and domain used during training, and cannot be effectively migrated to labels in other domains.
Disclosure of Invention
The invention aims to provide an event trigger word detection and classification method based on a multilingual pre-training model, and aims to solve the problems that the event trigger word detection and classification model is only suitable for single languages and has large marking workload in the prior art.
In order to achieve the above purpose, the present invention provides a method for detecting and classifying event trigger words based on a multilingual pre-training model, comprising the following steps;
obtaining the paraphrasing of the event trigger words and the paraphrasing of the event elements respectively by using a word vector model, thereby obtaining a first set and a second set respectively,
defining the first set as a first anchor word and the second set as a second anchor word;
mining the external news corpus by taking the first anchor words and the second anchor words as centers respectively to obtain a first sentence set and a second sentence set containing the anchor words;
defining the first sentence set as a first anchor sentence, and defining the second sentence set as a second anchor sentence;
vector coding is carried out on the first anchor sentence and the second anchor sentence respectively, so that a first cross-language sentence vector and a second cross-language sentence vector are obtained respectively;
storing the first cross-language sentence vector and the second cross-language sentence vector into a vector pool;
semantic annotation is carried out on sentences to be predicted by using a semantic role annotation tool;
coding the vocabulary obtained by semantic annotation in the sentence to be predicted by using a multi-language pre-training model to obtain a cross-language word vector;
and comparing the cross-language word vector with the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, wherein the highest similarity is a prediction result.
The method comprises the steps of respectively obtaining the paraphrasing of the event trigger word and the paraphrasing of the event element by using a word vector model, thereby respectively obtaining a first set and a second set, and comprising the following steps:
searching synonyms of the event trigger words and the synonyms of the event elements which are found to be predefined by using a Word2Vec Word vector model;
and manually screening to obtain the first set and the second set.
The vector encoding is performed on the first anchor sentence and the second anchor sentence, so as to obtain a first cross-language sentence vector and a second cross-language sentence vector, respectively, including:
when aiming at the event triggering word coding, firstly word segmentation is carried out on the first anchor point sentence, then the segmented sentence is input into a pre-training model to obtain a weighted word vector combination, and at the moment, all word vectors are summed and divided by the number of words in the sentence to obtain the first cross-language sentence vector;
when the event element is coded, firstly, word segmentation is carried out on the whole second anchor point sentence, then the event element in the sentence is covered by [ MASK ], then a pre-training model is input to obtain a combination of weighted word vectors, and the second cross-language sentence vector is obtained by summing and dividing the number.
The semantic annotation of the sentence to be predicted by using the semantic role annotation tool comprises the following steps:
and carrying out different semantic role labeling on different languages of the input sentence to be predicted.
The method for comparing the similarity between the cross-language word vector and the first cross-language sentence vector and the second cross-language sentence vector in the vector pool includes:
acquiring a cross-language word vector of an event trigger word and a cross-language word vector of an event element in the sentence to be predicted;
calculating the predictive value of the event trigger word vector or the event element word vector of the sentence to be predicted;
the obtained predictive scores are ranked according to the similarity scores, and the word with the highest similarity is the event trigger word or the event element.
The invention has the beneficial effects that: the workload of manually marking the event is effectively reduced, and the method is convenient to expand to other event types; the multi-language pre-training model is used for coding, so that multiple languages can be subjected to similarity mining and prediction on a vector pool, and the workload of training a model by one language is reduced; by using a cross-language pre-training model, the data volume enriched by using external news corpuses such as Chinese and English in the preparation stage is used, so that the problem that the vector pool is not enriched by using the method for enriching the news corpuses by using scarce language resources is avoided.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for detecting and classifying multilingual event trigger words based on a pre-training model.
Fig. 2 is a schematic diagram of a preparation stage of a method for detecting and classifying multilingual event trigger words based on a pre-training model according to the present invention.
FIG. 3 is a schematic diagram of a prediction stage of a multi-language event trigger word detection and classification method based on a pre-training model according to the present invention.
Detailed Description
Referring to fig. 1 to 3, fig. 1 is a flowchart illustrating a method for detecting and classifying event trigger words based on a multilingual pre-training model according to an embodiment of the present invention. Specifically, as shown in fig. 1, the method for detecting and classifying event trigger words based on the multilingual pre-training model may include the following steps:
s101, using a word vector model to respectively obtain the paraphrasing of the event trigger word and the paraphrasing of the event element, thereby respectively obtaining a first set and a second set.
Specifically, for the preparation stage, a Word2Vec Word vector model is used for searching and finding the predefined synonyms of the event trigger words and the synonyms of the event elements, and the first set and the second set are obtained by manually screening.
The word vector model is trained on a large-scale corpus, and word senses are calculated in the form of real-valued vectors.
S102, defining the first set as a first anchor word and defining the second set as a second anchor word.
S103, centering on the first anchor point word and the second anchor point word respectively for the external news corpus.
Specifically, an external news corpus is introduced, news sentences containing trigger words and event elements are screened out from the news corpus to serve as anchor sentences, and preparation is made for subsequent tasks.
S103, defining the first sentence set as a first anchor sentence and defining the second sentence set as a second anchor sentence.
S104, respectively carrying out vector coding on the first anchor sentence and the second anchor sentence, so as to respectively obtain a first cross-language sentence vector and a second cross-language sentence vector.
Specifically, the event trigger words and the event elements are encoded separately using two methods. And for the event trigger words, encoding by using the whole sentence. For the event element, the event element is masked and then encoded as shown in equation 1. The event trigger words provide important semantics in sentences, so it is not recommended to use a mask to mask them.
When the event triggering word is coded, firstly, word segmentation is carried out on the first anchor point sentence, then, the segmented sentence is input into a pre-training model to obtain a weighted word vector combination, and at the moment, all word vectors are summed and divided by the number of words in the sentence to obtain the first cross-language sentence vector; when the event element is coded, firstly, word segmentation is carried out on the whole second anchor point sentence, then the event element words in the sentence are covered by [ MASK ], then a pre-training model is input to obtain a combination of weighted word vectors, and the second cross-language sentence vectors are obtained by summing and dividing the number. Equation 2 is a word embedding method of the event trigger word and the event element:
where s is the set of word vectors in the sentence. And storing all sentence vectors into a vector pool for use in a prediction stage.
S105, storing the first cross-language sentence vector and the second cross-language sentence vector into a vector pool.
S106, semantic annotation is carried out on the sentences to be predicted by using a semantic role annotation tool.
Specifically, for the prediction stage, different semantic role labels are carried out for different languages of the input sentence to be predicted, and the labeled sentence is obtained.
And S107, encoding the vocabulary obtained by carrying out semantic annotation on the sentence to be predicted by using a multilingual pre-training model to obtain a cross-language word vector.
Specifically, the event trigger words and event elements in the sentence to be predicted are encoded by the same method to obtain the word vector.
S108, comparing the cross-language word vector with the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, wherein the highest similarity is the prediction result.
Obtaining word vectors of event trigger words in the sentence to be predictedAnd the word vector of the event element->Calculating a predictive score of an event trigger word vector or an event element word vector of the predicted sentence by using a formula 3:
where x is a word vector of event trigger words or event elements in the sentence to be predicted, and y is a word vector of event trigger words or event elements in the vector pool. Since the word vectors in the vector pool for specifying an event trigger word or event element are a set, equation 4 yields y:
e in equation 4 is an event trigger word or an event element word. NN in equation 3 k Meaning k nearest neighbor, margin is a calculation mode defined by the invention, is an optimization of cosine distance, and can effectively alleviate the phenomenon of 'hubness', wherein margin is defined as shown in formula 5:
margin (a, b) =a/b equation 5
And at this time, sorting the obtained predicted value candidate values according to the similarity scores, wherein the word with the highest similarity is the event trigger word or the event element.
The invention effectively reduces the workload of manually marking the event and is convenient to expand to other event types; the cross-language pre-training model is used for coding, so that multiple languages can be subjected to similarity mining and prediction on a vector pool, and the workload of training a model by one language is reduced; by using a cross-language pre-training model, the data volume enriched by using external news corpuses such as Chinese and English in the preparation stage is used, so that the problem that the vector pool is not enriched by using the method for enriching the news corpuses by using scarce language resources is avoided.
Claims (5)
1. The event trigger word detection and classification method based on the multilingual pre-training model is characterized by comprising the following steps of:
respectively obtaining a paraphrasing of the event trigger word and a paraphrasing of the event element by using a word vector model, thereby respectively obtaining a first set and a second set;
defining the first set as a first anchor word and the second set as a second anchor word;
mining the external news corpus by taking the first anchor words and the second anchor words as centers respectively to obtain a first sentence set and a second sentence set containing the anchor words;
defining the first sentence set as a first anchor sentence, and defining the second sentence set as a second anchor sentence;
respectively carrying out cross-language vector coding on the first anchor sentence and the second anchor sentence so as to respectively obtain a first cross-language sentence vector and a second cross-language sentence vector;
storing the first cross-language sentence vector and the second cross-language sentence vector into a vector pool;
semantic annotation is carried out on sentences to be predicted by using a semantic role annotation tool;
coding the vocabulary obtained by semantic annotation in the sentence to be predicted by using a multi-language pre-training model to obtain a cross-language word vector;
and comparing the cross-language word vector with the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, wherein the highest similarity is a prediction result.
2. The method for detecting and classifying event trigger words based on a multilingual pre-training model according to claim 1, wherein the obtaining the paraphrasing of the event trigger words and the paraphrasing of the event elements using the word vector model, respectively, thereby obtaining the first set and the second set, respectively, comprises:
searching synonyms of the event trigger words and the synonyms of the event elements which are found to be predefined by using a Word2Vec Word vector model;
and manually screening to obtain the first set and the second set.
3. The method for detecting and classifying event trigger words based on a multilingual pre-training model according to claim 2, wherein the step of performing cross-language vector coding on the first anchor sentence and the second anchor sentence by using a multilingual model to obtain a first cross-language sentence vector and a second cross-language sentence vector, respectively, comprises:
when aiming at the event triggering word coding, firstly word segmentation is carried out on the first anchor point sentence, then the segmented sentence is input into a multi-language model to obtain a weighted word vector combination, and at the moment, all word vectors are summed and divided by the number of words in the sentence to obtain the first cross-language sentence vector;
when the event element is coded, firstly, word segmentation is carried out on the whole second anchor point sentence, then the event element in the sentence is covered by [ MASK ], then a multi-language model is input to obtain the combination of weighted word vectors, and the second cross-language sentence vector is obtained by summing and dividing the number.
4. The method for detecting and classifying event trigger words based on a multilingual pre-training model according to claim 3, wherein the semantic labeling of sentences to be predicted using a semantic role labeling tool comprises:
and carrying out different semantic role labeling on different languages of the input sentence to be predicted.
5. The method for detecting and classifying event trigger words based on a multilingual pre-training model as claimed in claim 4, wherein the step of comparing the cross-language word vector with the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, wherein the highest similarity is a prediction result, comprises:
acquiring a cross-language word vector of an event trigger word and a cross-language word vector of an event element in the sentence to be predicted;
calculating the predictive value of the event trigger word vector or the event element word vector of the sentence to be predicted;
the obtained predictive scores are ranked according to the similarity scores, and the word with the highest similarity is the event trigger word or the event element.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210404007.4A CN114896394B (en) | 2022-04-18 | 2022-04-18 | Event trigger word detection and classification method based on multilingual pre-training model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210404007.4A CN114896394B (en) | 2022-04-18 | 2022-04-18 | Event trigger word detection and classification method based on multilingual pre-training model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114896394A CN114896394A (en) | 2022-08-12 |
CN114896394B true CN114896394B (en) | 2024-04-05 |
Family
ID=82718401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210404007.4A Active CN114896394B (en) | 2022-04-18 | 2022-04-18 | Event trigger word detection and classification method based on multilingual pre-training model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896394B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213995A (en) * | 2018-08-02 | 2019-01-15 | 哈尔滨工程大学 | A kind of across language text similarity assessment technology based on the insertion of bilingual word |
CN111680488A (en) * | 2020-06-08 | 2020-09-18 | 浙江大学 | Cross-language entity alignment method based on knowledge graph multi-view information |
CN112287695A (en) * | 2020-09-18 | 2021-01-29 | 昆明理工大学 | Cross-language bilingual pre-training and Bi-LSTM-based Chinese-character-cross parallel sentence pair extraction method |
CN112580330A (en) * | 2020-10-16 | 2021-03-30 | 昆明理工大学 | Vietnamese news event detection method based on Chinese trigger word guidance |
CN113901209A (en) * | 2021-09-15 | 2022-01-07 | 昆明理工大学 | Chinese cross-language event detection method based on type perception |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8527262B2 (en) * | 2007-06-22 | 2013-09-03 | International Business Machines Corporation | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
-
2022
- 2022-04-18 CN CN202210404007.4A patent/CN114896394B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213995A (en) * | 2018-08-02 | 2019-01-15 | 哈尔滨工程大学 | A kind of across language text similarity assessment technology based on the insertion of bilingual word |
CN111680488A (en) * | 2020-06-08 | 2020-09-18 | 浙江大学 | Cross-language entity alignment method based on knowledge graph multi-view information |
CN112287695A (en) * | 2020-09-18 | 2021-01-29 | 昆明理工大学 | Cross-language bilingual pre-training and Bi-LSTM-based Chinese-character-cross parallel sentence pair extraction method |
CN112580330A (en) * | 2020-10-16 | 2021-03-30 | 昆明理工大学 | Vietnamese news event detection method based on Chinese trigger word guidance |
CN113901209A (en) * | 2021-09-15 | 2022-01-07 | 昆明理工大学 | Chinese cross-language event detection method based on type perception |
Non-Patent Citations (2)
Title |
---|
唐亮 ; 席耀一 ; 彭波 ; 刘香伟 ; 易绵竹 ; .基于词向量的越汉跨语言事件检索研究.中文信息学报.2018,(第03期),全文. * |
彭晓娅 ; 周栋 ; .跨语言词向量研究综述.中文信息学报.2020,(第02期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN114896394A (en) | 2022-08-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8131539B2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN110175246B (en) | Method for extracting concept words from video subtitles | |
CN109460552B (en) | Method and equipment for automatically detecting Chinese language diseases based on rules and corpus | |
CN112101028A (en) | Multi-feature bidirectional gating field expert entity extraction method and system | |
Layton et al. | Recentred local profiles for authorship attribution | |
CN110119510B (en) | Relationship extraction method and device based on transfer dependency relationship and structure auxiliary word | |
CN111046660B (en) | Method and device for identifying text professional terms | |
CN110413972B (en) | Intelligent table name field name complementing method based on NLP technology | |
CN110457690A (en) | A kind of judgment method of patent creativeness | |
CN110704638A (en) | Clustering algorithm-based electric power text dictionary construction method | |
CN116822517A (en) | Multi-language translation term identification method | |
CN111444720A (en) | Named entity recognition method for English text | |
CN111368532B (en) | Topic word embedding disambiguation method and system based on LDA | |
CN114896394B (en) | Event trigger word detection and classification method based on multilingual pre-training model | |
CN112990388B (en) | Text clustering method based on concept words | |
Tukur et al. | Parts-of-speech tagging of Hausa-based texts using hidden Markov model | |
Hollingsworth et al. | Retrieving hierarchical text structure from typeset scientific articles–a prerequisite for e-science text mining | |
Huang et al. | A pragmatic approach for classical Chinese word segmentation | |
Maheswari et al. | Rule based morphological variation removable stemming algorithm | |
CN109960720B (en) | Information extraction method for semi-structured text | |
Mohamed et al. | ADPBC: Arabic dependency parsing based corpora for information extraction | |
CN112487134A (en) | Scientific and technological text problem extraction method based on extremely simple abstract strategy | |
De Pauw et al. | African language technology: The data-driven perspective | |
WANGLEM et al. | Pattern-sensitive loanword estimation for thai text clustering | |
Monson et al. | Probabilistic ParaMor. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |