CN114896394A - Event trigger detection and classification method based on multi-language pre-training model - Google Patents
Event trigger detection and classification method based on multi-language pre-training model Download PDFInfo
- Publication number
- CN114896394A CN114896394A CN202210404007.4A CN202210404007A CN114896394A CN 114896394 A CN114896394 A CN 114896394A CN 202210404007 A CN202210404007 A CN 202210404007A CN 114896394 A CN114896394 A CN 114896394A
- Authority
- CN
- China
- Prior art keywords
- sentence
- language
- vector
- cross
- event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000001514 detection method Methods 0.000 title claims abstract description 14
- 239000013598 vector Substances 0.000 claims abstract description 100
- 238000005065 mining Methods 0.000 claims abstract description 5
- 230000001960 triggered effect Effects 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 abstract description 6
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/49—Data-driven translation using very large corpora, e.g. the web
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of natural language processing, in particular to an event trigger detection and classification method based on a multi-language pre-training model. The invention effectively weakens the workload of manual event marking and is convenient to expand to other event types; the multi-language pre-training model is used for coding, so that similarity mining and prediction can be performed on multiple languages in a vector pool, and the workload of training a model by one language is reduced; by using the multi-language pre-training model, the abundant data volume of external news corpora such as Chinese and English is used in the preparation stage, so that the problem that the vector pool is not expanded by the abundant news corpora when the method is used by scarce language resources is solved.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to an event trigger word detection and classification method based on a multi-language pre-training model.
Background
With the progress of science and technology and the increase of internet users, the presentation of news information on the internet is increased explosively. How to extract effective information from a large number of complicated news events becomes a hot point of research. Researchers have proposed information extraction techniques: structured information is extracted from unstructured text. Event extraction is the most challenging task in information extraction technology, and is mainly used for researching how to extract basic information of an event from an unstructured text, wherein the basic information comprises event trigger word detection, event trigger word classification, event element identification and event element role identification.
The existing event trigger word detection and classification method mainly aims at single language, and rarely aims at multiple languages; most of the research resources are rich languages, and few research resources are scarce languages. However, with the development of deep learning technology, the importance of event extraction research on the language with scarce resources is gradually increased. And the existing event extraction model is usually dependent on the labor-intensive and specific field annotation, is only effective on the label classification and field used in training, and cannot be effectively transferred to labels in other fields.
Disclosure of Invention
The invention aims to provide an event trigger detection and classification method based on a multi-language pre-training model, and aims to solve the problems that the event trigger detection and classification model is only suitable for monolingual and has large marking workload in the prior art.
In order to achieve the aim, the invention provides an event trigger word detection and classification method based on a multi-language pre-training model, which comprises the following steps;
respectively obtaining the similar meaning words of the event trigger words and the similar meaning words of the event elements by using a word vector model so as to respectively obtain a first set and a second set,
defining the first set as a first anchor word and defining the second set as a second anchor word;
mining external news corpora respectively by taking the first anchor words and the second anchor words as centers to respectively obtain a first sentence set and a second sentence set containing the anchor words;
defining the first sentence subset to be combined into a first anchor sentence, and defining the second sentence subset to be combined into a second anchor sentence;
vector coding is carried out on the first anchor sentence and the second anchor sentence respectively, so that a first cross-language sentence vector and a second cross-language sentence vector are obtained respectively;
storing the first cross-language sentence vector and the second cross-language sentence vector into a vector pool;
performing semantic annotation on the sentence to be predicted by using a semantic role annotation tool;
coding the vocabulary obtained by semantic annotation in the sentence to be predicted by using a multi-language pre-training model to obtain a cross-language word vector;
and comparing the cross-language word vector with the similarity of the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, wherein the highest similarity is the prediction result.
Wherein, the obtaining the near meaning words of the event trigger words and the near meaning words of the event elements by using the word vector model respectively so as to obtain the first set and the second set respectively comprises:
using a Word2Vec Word vector model to find a predefined synonym of the event trigger and a synonym of the event element;
and manually screening to obtain the first set and the second set.
Wherein the vector-coding the first anchor sentence and the second anchor sentence respectively to obtain a first cross-language sentence vector and a second cross-language sentence vector respectively comprises:
when the event trigger word is coded, firstly segmenting the first anchor sentence, inputting the segmented sentences into a pre-training model to obtain a weighted word vector combination, and at the moment, summing all the word vectors and dividing the sum by the number of words in the sentences to obtain the first cross-language sentence vector;
when the event elements are coded, the whole second anchor sentence is segmented, the event elements in the sentence are covered by [ MASK ], then a pre-training model is input to obtain a combination of weighted word vectors, and the second cross-language sentence vector is obtained by summing and dividing the number of the word vectors.
The semantic annotation of the sentence to be predicted by using the semantic role annotation tool comprises the following steps:
and labeling different semantic roles according to different languages of the input sentence to be predicted.
Wherein, the similarity comparison is performed between the cross-language word vector and the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, and the highest similarity is the prediction result, including:
obtaining cross-language word vectors of event trigger words and cross-language word vectors of event elements in the sentence to be predicted;
calculating the prediction score of the event trigger word vector or the event element word vector of the sentence to be predicted;
and sequencing the obtained prediction scores according to similarity scores, wherein the word with the highest similarity is the event trigger word or the event element.
The invention has the beneficial effects that: the workload of manual event marking is effectively reduced, and the method is convenient to expand to other event types; the multi-language pre-training model is used for coding, so that similarity mining and prediction can be performed on multiple languages in a vector pool, and the workload of training a model by one language is reduced; by using the cross-language pre-training model, the abundant data volume of external news corpora such as Chinese and English is used in the preparation stage, so that the problem that the vector pool is not expanded by the abundant news corpora when the method is used by scarce language resources is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for detecting and classifying multilingual event triggers based on a pre-trained model according to the present invention.
FIG. 2 is a schematic diagram of the preparation stage of a multi-language event trigger detection and classification method based on a pre-trained model according to the present invention.
FIG. 3 is a schematic diagram of the prediction stage of a multi-language event trigger detection and classification method based on a pre-training model according to the present invention.
Detailed Description
Referring to fig. 1 to fig. 3, fig. 1 is a flowchart illustrating an event trigger detection and classification method based on a multi-language pre-training model according to an embodiment of the present invention. Specifically, as shown in fig. 1, the method for detecting and classifying event triggers based on the multi-language pre-training model may include the following steps:
s101, respectively obtaining a near meaning word of the event trigger word and a near meaning word of the event element by using a word vector model, thereby respectively obtaining a first set and a second set.
Specifically, in the preparation stage, Word2Vec Word vector model is used for searching and finding out the synonym of the predefined event trigger Word and the synonym of the event element, and the first set and the second set are obtained through manual screening.
The word vector model is a model trained on a large-scale corpus and used for calculating word senses in a real-value vector mode.
S102, defining the first set as a first anchor word and defining the second set as a second anchor word.
S103, centering on the first anchor word and the second anchor word for the external news corpus respectively.
Specifically, an external news corpus is introduced, news sentences containing trigger words and event elements are screened from the news corpus to serve as anchor sentences, and preparation is made for subsequent tasks.
S103, defining the first sentence subset to be a first anchor sentence, and defining the second sentence subset to be a second anchor sentence.
And S104, respectively carrying out vector coding on the first anchor sentence and the second anchor sentence, thereby respectively obtaining a first cross-language sentence vector and a second cross-language sentence vector.
Specifically, the event trigger word and the event element are encoded by two methods. And encoding the event trigger words by using the whole sentence. And for the event elements, using a mask to shield the event elements and then encoding the event elements as shown in formula 1. The event trigger provides important semantics in the sentence and so masking it is not recommended.
When the event trigger word is coded, firstly segmenting the first anchor sentence, inputting the segmented sentences into a pre-training model to obtain a weighted word vector combination, and summing all the word vectors and dividing the sum by the number of words in the sentences to obtain the first cross-language sentence vector; when the event element is coded, the whole second anchor sentence is segmented, the event element words in the sentence are covered by [ MASK ], then a pre-training model is input to obtain the combination of the weighted word vectors, and the second cross-language sentence vectors are obtained by summing and dividing the sum by the number. Formula 2 is a word embedding method for the event trigger word and the event element:
here s is a set of word vectors in a sentence. And storing all sentence vectors into a vector pool for a prediction stage to use.
S105, storing the first cross-language sentence vector and the second cross-language sentence vector into a vector pool.
And S106, performing semantic annotation on the sentence to be predicted by using a semantic role annotation tool.
Specifically, in the prediction stage, different semantic role labels are performed on different languages of the input sentence to be predicted to obtain a labeled sentence.
S107, the words obtained by semantic annotation in the sentence to be predicted are encoded by using a multi-language pre-training model to obtain a cross-language word vector.
Specifically, the event trigger words and the event elements in the sentence to be predicted are encoded by the same method to obtain the word vectors.
S108, carrying out similarity comparison on the cross-language word vector and the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, wherein the highest similarity is a prediction result.
Obtaining word vectors of event trigger words in the sentence to be predictedWord vector of sum event elementCalculating the prediction score of the event trigger word vector or the event element word vector of the prediction sentence by using formula 3:
here, x is a word vector of the event-triggered word or the event element in the sentence to be predicted, and y is a word vector of the event-triggered word or the event element in the vector pool. Since the word vector in the vector pool for specifying an event trigger word or event element is a set, formula 4 yields y:
e in formula 4 is an event trigger word or an event element word. NN in equation 3 k The meaning of k is near, margin is a calculation mode defined by the invention, is an optimization to cosine distance, and can often effectively reduce the phenomenon of 'hubness', and the definition of margin in the invention is shown as formula 5:
margin (a, b) ═ a/b formula 5
And at the moment, sequencing the obtained predicted value candidate values according to similarity scores, wherein the word with the highest similarity is the event trigger word or the event element.
The invention effectively weakens the workload of manual event marking and is convenient to expand to other event types; the cross-language pre-training model is used for coding, so that similarity mining and prediction can be performed on multiple languages in a vector pool, and the workload of training a model by one language is reduced; by using the cross-language pre-training model and using the abundant data volume of the external news corpora such as Chinese and English in the preparation stage, the problem that the vector pool is not expanded by the abundant news corpora when the method is used by scarce language resources is solved.
Claims (5)
1. An event trigger detection and classification method based on a multi-language pre-training model is characterized by comprising the following steps:
respectively obtaining the similar meaning words of the event trigger words and the similar meaning words of the event elements by using a word vector model so as to respectively obtain a first set and a second set;
defining the first set as a first anchor word and defining the second set as a second anchor word;
mining external news corpora respectively by taking the first anchor words and the second anchor words as centers to respectively obtain a first sentence set and a second sentence set containing the anchor words;
defining the first sentence subset to be combined into a first anchor sentence, and defining the second sentence subset to be combined into a second anchor sentence;
respectively carrying out cross-language vector coding on the first anchor sentence and the second anchor sentence so as to respectively obtain a first cross-language sentence vector and a second cross-language sentence vector;
storing the first cross-language sentence vector and the second cross-language sentence vector into a vector pool;
performing semantic annotation on the sentence to be predicted by using a semantic role annotation tool;
coding the vocabulary obtained by semantic annotation in the sentence to be predicted by using a multi-language pre-training model to obtain a cross-language word vector;
and comparing the cross-language word vector with the similarity of the first cross-language sentence vector and the second cross-language sentence vector in the vector pool, wherein the highest similarity is the prediction result.
2. The method for detecting and classifying event-triggered words based on the multi-lingual pre-training model as claimed in claim 1, wherein said obtaining the synonyms of the event-triggered words and the synonyms of the event elements respectively using the word vector model to obtain the first set and the second set respectively comprises:
using a Word2Vec Word vector model to find a predefined synonym of the event trigger and a synonym of the event element;
and manually screening to obtain the first set and the second set.
3. The method for multi-lingual pre-training model-based event-triggered word detection and classification of claim 2, wherein the cross-language vector encoding of the first anchor sentence and the second anchor sentence using the multi-lingual model to obtain a first cross-language sentence vector and a second cross-language sentence vector, respectively, comprises:
when the event trigger word is coded, firstly segmenting the first anchor sentence, inputting the segmented sentences into a multi-language model to obtain a weighted word vector combination, and then summing all word vectors and dividing the sum by the number of words in the sentences to obtain the first cross-language sentence vector;
when the event elements are coded, the whole second anchor sentence is segmented, the event elements in the sentence are covered by [ MASK ], then a multi-language model is input to obtain a combination of weighted word vectors, and the second cross-language sentence vector is obtained by summing and dividing the number of the word vectors.
4. The method for detecting and classifying event-triggered words based on the multi-lingual pre-training model according to claim 3, wherein the semantic labeling of the sentence to be predicted by using the semantic character labeling tool comprises:
and marking different semantic roles according to different languages of the input sentence to be predicted.
5. The method for detecting and classifying event-triggered words based on the multi-lingual pre-training model as claimed in claim 4, wherein the similarity between the cross-language word vector and the first cross-language sentence vector and the second cross-language sentence vector in the vector pool is compared, and the highest similarity is the predicted result, comprising:
obtaining cross-language word vectors of event trigger words and cross-language word vectors of event elements in the sentence to be predicted;
calculating the prediction score of the event trigger word vector or the event element word vector of the sentence to be predicted;
and sequencing the obtained prediction scores according to similarity scores, wherein the word with the highest similarity is the event trigger word or the event element.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210404007.4A CN114896394B (en) | 2022-04-18 | 2022-04-18 | Event trigger word detection and classification method based on multilingual pre-training model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210404007.4A CN114896394B (en) | 2022-04-18 | 2022-04-18 | Event trigger word detection and classification method based on multilingual pre-training model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114896394A true CN114896394A (en) | 2022-08-12 |
CN114896394B CN114896394B (en) | 2024-04-05 |
Family
ID=82718401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210404007.4A Active CN114896394B (en) | 2022-04-18 | 2022-04-18 | Event trigger word detection and classification method based on multilingual pre-training model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896394B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080319735A1 (en) * | 2007-06-22 | 2008-12-25 | International Business Machines Corporation | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
CN109213995A (en) * | 2018-08-02 | 2019-01-15 | 哈尔滨工程大学 | A kind of across language text similarity assessment technology based on the insertion of bilingual word |
CN111680488A (en) * | 2020-06-08 | 2020-09-18 | 浙江大学 | Cross-language entity alignment method based on knowledge graph multi-view information |
CN112287695A (en) * | 2020-09-18 | 2021-01-29 | 昆明理工大学 | Cross-language bilingual pre-training and Bi-LSTM-based Chinese-character-cross parallel sentence pair extraction method |
CN112580330A (en) * | 2020-10-16 | 2021-03-30 | 昆明理工大学 | Vietnamese news event detection method based on Chinese trigger word guidance |
CN113901209A (en) * | 2021-09-15 | 2022-01-07 | 昆明理工大学 | Chinese cross-language event detection method based on type perception |
-
2022
- 2022-04-18 CN CN202210404007.4A patent/CN114896394B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080319735A1 (en) * | 2007-06-22 | 2008-12-25 | International Business Machines Corporation | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
CN109213995A (en) * | 2018-08-02 | 2019-01-15 | 哈尔滨工程大学 | A kind of across language text similarity assessment technology based on the insertion of bilingual word |
CN111680488A (en) * | 2020-06-08 | 2020-09-18 | 浙江大学 | Cross-language entity alignment method based on knowledge graph multi-view information |
CN112287695A (en) * | 2020-09-18 | 2021-01-29 | 昆明理工大学 | Cross-language bilingual pre-training and Bi-LSTM-based Chinese-character-cross parallel sentence pair extraction method |
CN112580330A (en) * | 2020-10-16 | 2021-03-30 | 昆明理工大学 | Vietnamese news event detection method based on Chinese trigger word guidance |
CN113901209A (en) * | 2021-09-15 | 2022-01-07 | 昆明理工大学 | Chinese cross-language event detection method based on type perception |
Non-Patent Citations (2)
Title |
---|
唐亮;席耀一;彭波;刘香伟;易绵竹;: "基于词向量的越汉跨语言事件检索研究", 中文信息学报, no. 03, 15 March 2018 (2018-03-15) * |
彭晓娅;周栋;: "跨语言词向量研究综述", 中文信息学报, no. 02, 15 February 2020 (2020-02-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN114896394B (en) | 2024-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8131539B2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN109614620B (en) | HowNet-based graph model word sense disambiguation method and system | |
CN112380864B (en) | Text triple labeling sample enhancement method based on translation | |
Kawahara et al. | Inducing example-based semantic frames from a massive amount of verb uses | |
CN113312922B (en) | Improved chapter-level triple information extraction method | |
CN111046660B (en) | Method and device for identifying text professional terms | |
CN112101014A (en) | Chinese chemical industry document word segmentation method based on mixed feature fusion | |
Nehar et al. | Rational kernels for Arabic root extraction and text classification | |
CN111444720A (en) | Named entity recognition method for English text | |
Saleh et al. | TxLASM: A novel language agnostic summarization model for text documents | |
CN111368532B (en) | Topic word embedding disambiguation method and system based on LDA | |
Aejas et al. | Named entity recognition for cultural heritage preservation | |
CN114896394B (en) | Event trigger word detection and classification method based on multilingual pre-training model | |
CN112990388B (en) | Text clustering method based on concept words | |
Mohamed et al. | ADPBC: Arabic dependency parsing based corpora for information extraction | |
Neri et al. | Text Mining applied to multilingual corpora | |
CN109960720B (en) | Information extraction method for semi-structured text | |
Mohamed et al. | Arabic-SOS: segmentation, stemming, and orthography standardization for classical and pre-modern standard Arabic | |
Baishya et al. | Present state and future scope of Assamese text processing | |
Seresangtakul et al. | Thai-Isarn dialect parallel corpus construction for machine translation | |
De Pauw et al. | African language technology: The data-driven perspective | |
Abafogi | Normalized Statistical Algorithm for Afaan Oromo Word Sense Disambiguation | |
Mirzanezhad et al. | Using morphological analyzer to statistical POS Tagging on Persian Text | |
Li et al. | Sentence Boundary Disambiguation for Tibetan Based on Attention Mechanism at the Syllable Level | |
Oh et al. | A statistical model for automatic extraction of korean transliterated foreign words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |