CN110377738A - Merge the Vietnamese news event detecting method of interdependent syntactic information and convolutional neural networks - Google Patents
Merge the Vietnamese news event detecting method of interdependent syntactic information and convolutional neural networks Download PDFInfo
- Publication number
- CN110377738A CN110377738A CN201910635489.2A CN201910635489A CN110377738A CN 110377738 A CN110377738 A CN 110377738A CN 201910635489 A CN201910635489 A CN 201910635489A CN 110377738 A CN110377738 A CN 110377738A
- Authority
- CN
- China
- Prior art keywords
- vietnamese
- event
- information
- neural networks
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 35
- 238000001514 detection method Methods 0.000 claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000004927 fusion Effects 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 51
- 238000003780 insertion Methods 0.000 claims description 9
- 230000037431 insertion Effects 0.000 claims description 9
- 238000013480 data collection Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 230000008569 process Effects 0.000 abstract description 5
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 239000000284 extract Substances 0.000 description 6
- 238000012360 testing method Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to the Vietnamese news event detecting methods for merging interdependent syntactic information and convolutional neural networks, belong to natural language processing technique field.The present invention collects the more bilingual newsletter archive of the Chinese first, and according to the feature of event, event type, the mark system for event detection is arranged, forms training data.Then the convolutional neural networks for merging interdependent syntactic information are detected for sentence level Vietnamese media event.The meaning of a word, location information, part-of-speech information and name entity information are merged in an encoding process first.Secondly using the feature between the continuous word of traditional convolutional encoding, using the feature between the discontinuous word of convolutional encoding for merging interdependent syntactic information, fusion two parts feature realizes media event detection as event code.The present invention achieves very good effect in media event detection.
Description
Technical field
The present invention relates to the Vietnamese news event detecting methods for merging interdependent syntactic information and convolutional neural networks, belong to
Natural language processing technique field.
Background technique
Event detection is that the important information of natural language processing extracts task, and purport identifies the event of specified type in text.
Currently, event detection research is mostly unfolded under Chinese, English-speaking environment, since Vietnamese belongs to scarcity of resources type languages, for
Temporarily nobody is related to the event detection of Vietnamese.Therefore, using artificial intelligence technology, machine is detected automatically in Vietnamese newsletter archive
Media event become the difficult point and one of key technology of task.
Event detection task is based primarily upon following two categories method at present.(1) machine learning method.Zhang Xuan et al. propose with
DPEMM model is the event extraction frame of core.Pei Donghui et al. proposes the subevent classification based on supporting vector machine model certainly
Dynamic identification.The improvement that Gao Yongbing et al. carries out TF-IDF for the feature of microblogging obtains Event Distillation result.(2) deep learning side
Method.Nguyen et al. proposes that a kind of integrated processes based on recurrent neural network carry out English event on the basis of existing research
It extracts.Chen et al. proposes that the more pond convolutional neural networks (DMCNN) of dynamic solve the identification of multiple events in sentence and share
The problem of parameter matches.Nguyen et al. carries out convolution using the word in convolutional neural networks distich, is implied in sentence with obtaining
Semantic information;The above-mentioned detection method for being directed to other Languages, therefore the invention proposes a kind of interdependent syntactic informations of fusion
With the Vietnamese news event detecting method of convolutional neural networks.
Summary of the invention
The present invention provides the Vietnamese news event detecting method for merging interdependent syntactic information and convolutional neural networks, with
For solving Vietnamese media event detection classification problem, the more bilingual media event type detection of the Chinese is realized.
The technical scheme is that merging the Vietnamese media event detection of interdependent syntactic information and convolutional neural networks
Event type, the mark for event detection is arranged according to the feature of event in method, first the collection more bilingual newsletter archive of the Chinese
System forms training data.Then the convolutional neural networks for merging interdependent syntactic information, for sentence level Vietnamese news thing
Part is detected.The meaning of a word, location information, part-of-speech information and name entity information are merged in an encoding process first.Secondly benefit
With the feature between the continuous word of traditional convolutional encoding, the spy between the discontinuous word of convolutional encoding for merging interdependent syntactic information is utilized
Sign, fusion two parts feature realize media event detection as event code;
Specific step is as follows for the detection method:
Step1, corpus are collected: it collects and is used for Vietnamese event detection newsletter archive, use Scrapy as crawling tool,
User's operation is imitated, customizes different templates for Vietnamese news website, mould is formulated according to the path XPath of page data element
Plate obtains detailed data, obtains such as headline, news time, body.Duplicate removal and screening are carried out to newsletter archive again;
Step2, building corpus: by the mark system of Vietnamese event detection, according to the language feature of Vietnamese with
And Vietnamese newsletter archive is marked in the demand of event detection, and the Vietnamese news corpus marked is divided into trained language
Material, testing material and verifying collection;
As a preferred solution of the present invention, in the step Step2, media event text is made of trigger word and parameter,
Trigger word can clearly express a kind of event and occur, and the main word of trigger event is usually single verb or noun, and parameter is retouched
State the information such as time, place, the personage of event generation;Mark system uses the extensible markup language tissue text of XML, point
It is other that trigger word, parameter, event category are marked, the Vietnamese newsletter archive being collected into is marked, Vietnamese is established
Media event detection data collection.It is as shown in table 1 to trigger vocabulary.
Table 1 is triggering vocabulary
Step3, text vector: training Vietnamese term vector merges term vector, position vector, the word of word sequence in sentence
Property vector sum entity type vector is as mode input;
As a preferred solution of the present invention, in the step Step3, using the method training of skip-gram language model
Vietnamese term vector, respectively construct position insertion table, part of speech insertion table, entity type insertion table by location information, part-of-speech information,
Entity type information is embedded into vector.
Step4, building merge convolutional neural networks (the Dependency Parsing of interdependent syntactic information
Convolutional Neural Networks, DPCNN) model: on the basis of step Step3, using convolutional neural networks
With the convolutional neural networks for merging interdependent syntactic information, media event sentence coding is obtained, training event detection disaggregated model is realized
The more bilingual media event type detection of the Chinese;
As a preferred solution of the present invention, in the step Step4, using continuous word in traditional multi-kernel convolution coding sentence
Between semantic information, while using the semantic information between discontinuous word in the convolutional encoding sentence for merging interdependent syntactic information,
Merge semantic information of the two-part semantic information as current sentence.
Model proposed by the present invention is made of three parts: (1) sentence coding layer, (2) convolutional layer, (3) pond layer.Work as input
When S1 event sentence, context of methods model is as shown in Figure 2: S1:Nam,cóTrung(translation: million refugee of Vietnam, only China relieves);
(1) coding layer
Firstly, word grade information in sentence is converted into real-valued vectors by coding layer, the input as neural network.If X=
{ x1, x2, x3 ..., xn } is the sentence that a length is n, and wherein xi is i-th of word in sentence.Appoint in natural language processing
In business, the semantic information of word is related with its position in sentence, the identification and semanteme of part of speech and entity type information to trigger word
Understanding play the role of promotion.It is defeated as model that term vector, position vector, part of speech vector sum entity type vector are merged herein
Enter.
Term vector is a real-valued vectors, and this method can also be using word2vec model training method training Vietnam's words and phrases
Vector.Position encoded a part as coding is introduced the semantic structure information of current word by this method.Position vector, which refers to, to be worked as
The relative position of preceding word and trigger word.For example, in S1, "(appearing) " and " Phase between (refugee) "
Contraposition is set to 6.Since part of speech and entity type help to obtain current word semantics information, part-of-speech tagging is carried out to Vietnamese, and
It defines part of speech and is embedded in table, 28 kinds of part of speech labels are embedded into part of speech vector.Entity recognition is named to Vietnamese, definition is real
Body type is embedded in table, identifies that name, place name, institution term, time in sentence etc. names entity, entity tag is embedded into
In entity vector.Table shares ten kinds of entity types, be divided into three categories (entity class, time class and numeric class), seven groups (name,
Mechanism name, place name, time, date, currency and percentage).
(2) convolutional layer
Convolutional layer captures the combination semantic information of entire sentence, and by these valuable semantic compressions to Feature Mapping
In.Filter w in convolution algorithm can extract the feature in convolution window between word.When convolution kernel size is m, in window
M word { xi,xi+1,xi+2,...,xi+m-1Use xI:i+m-1It indicates, obtained convolution feature ciIt indicates, formula is as follows:
ci=f (wxi:i+m-1+b) (1)
Wherein b (b ∈ R) is bias term, and f is nonlinear activation function, and w is characterized weight, and filter is applied in sentence
Each possible window { x1:m,x2:m+1,...,xn+m-1:n}.Since the feature in sentence is not single, make in convolution process
Different characteristic is obtained with multiple filters, as k filter W={ w of use1,w2,...,wkWhen, the following public affairs of convolution algorithm
Formula indicates:
cji=f (wjxi:i+m-1+bj) (2)
Wherein, [1, k] j ∈, wjIt is characterized weight, bjIt is expressed as biasing.
Construct the interdependent syntax tree of Vietnamese, as shown in Fig. 3, by analysis it is found that "(relief) " and "Syntactic relation " SBV (subject-predicate relationship) " between (refugee) " facilitate judgement "Ra (appearing) " not goes out
The trigger word of seat life event.
Convolution algorithm can capture the semantic information in window between continuous word, can not capture discontinuous word outside window
Feature is herein introduced the information outside window by interdependent syntactic analysis.Interdependent information indicates by D={ N, E }, wherein N=
{ x1, x2, x3 ..., xp } (p≤n) indicates that there are all word nodes of dependence in sentence, there is two word nodes of dependence
It is indicated by xs, t;E is side between two word nodes, and each side (xs, xt) represents from word node xs sensing word node xt, and has
Interdependent information labels L (xs, xt).For example, in Fig. 2, node "(relief) " and "(refugee) " it
Between the interdependent information labels of directed edge be The method that Kipf et al. is proposed
It indicates, since information flow is more than the direction indicated according to label, is added to self-loopa (xs, xs) and reverse edge here
(xt,xs).Self-loopa has the label of L (xs, xs), and the label of reverse edge is L^ (xt, xs).Specific interdependent information labels tool
There is fixed parameter, the calculating of dependent feature is as follows:
(3)
Wherein the range of j is 1 to k, and f is nonlinear activation function, and WL (xi, N) is original side respectively there are three types of form, instead
Xiang Bian, self-loopa side, bL (xi, N) are bias term.Finally, convolution feature and the convolution feature for merging interdependent syntactic information are spelled
It connects, as current sentence feature, formula is as follows:
Eji=Wicji+(1-Wi)hi (4)
E ∈ R, k (n-m+1) are the matrix of consequence that convolution obtains, and k is the number of filter, and n is sentence length, m filter
Window size, (1-Wi)hiIt indicates to merge interdependent syntactic information feature, WicjiIndicate convolution feature.
(3) pond layer
Pond layer can extract the most representative feature in convolution feature.The method for choosing maximum pond herein, it is public
Formula is as follows:
E*=Each-max (E1,E2,E3,...,Ek) (5)
For k filter, the local feature of most worthy in each filter is extracted, other characteristic values are all abandoned, k
A local feature aggregates into a vector E* as event code.Finally, event code is sent into full articulamentum, soft- is used
Max activation primitive classifies to E*, obtains the class probability of event, is predicted according to type of the probability distribution to event,
Its formula are as follows:
Wherein SiPresentation class probability, C indicate classification number, and i indicates classification index.The range of i is 1 to 6 (comprising non-thing
Including part type).
Step5, event type detection: it to needing to identify that the more bilingual media event sentence of the Chinese encodes, then will extract new
Input vector of the feature vector of news event sentence as disaggregated model, obtains final classification results by disaggregated model.
The beneficial effects of the present invention are: the convolutional neural networks that the present invention merges interdependent syntactic information can be new to sentence level
News event is detected.The meaning of a word, location information, part-of-speech information and name entity information have been merged in an encoding process.Secondly benefit
With the feature between the continuous word of traditional convolutional encoding, the spy between the discontinuous word of convolutional encoding for merging interdependent syntactic information is utilized
Sign merges two parts feature as event code, to realize event detection.The experimental results showed that this method is in media event
Very good effect is achieved in detection classification.
Detailed description of the invention
Fig. 1 is the flow chart in the present invention;
Fig. 2 is the DPCNN Method Modeling flow diagram proposed in the present invention;
Fig. 3 is the interdependent syntactic analysis result figure of S1 in the present invention.
Specific embodiment
Embodiment 1: as shown in Figure 1-3, merge the Vietnamese media event inspection of interdependent syntactic information and convolutional neural networks
Survey method, specific step is as follows for the detection method:
Step1, corpus are collected: being used Scrapy as tool is crawled, crawled following news website: news agency, Vietnam
Http:// www.vnagency.com.vn, country, Vietnam English newspaper http://vietnamnews.vnagency.com.vn,
Vietnam telecommunication network http://www.vnn.vn, Vietnam Economic Times http://www.vneconomy.com.vn;Collection is used for
Vietnamese event detection newsletter archive carries out duplicate removal and screening to newsletter archive;
As a preferred solution of the present invention, in the step Step1, use Scrapy as the tool that crawls, imitate user
Operation customizes different templates for Vietnamese news website, formulates template according to the path XPath of page data element and obtains in detail
It counts evidence accurately, obtains such as headline, news time, body.
Step2, building corpus: by the mark system of Vietnamese event detection, according to the language feature of Vietnamese with
And Vietnamese newsletter archive is marked in the demand of event detection, by the Vietnamese news corpus marked according to 8:1:1's
Pro rate training corpus, testing material and verifying collection;Wherein it is labelled with leader's travel activity field altogether after pretreatment
Vietnamese newsletter archive 1233, totally 9576 event sentences;
As a preferred solution of the present invention, in the step Step2, media event text is made of trigger word and parameter,
Trigger word can clearly express a kind of event and occur, and the main word of trigger event is usually single verb or noun, and parameter is retouched
State the information such as time, place, the personage of event generation;Mark system uses the extensible markup language tissue text of XML, point
It is other that trigger word, parameter, event category are marked, the Vietnamese newsletter archive being collected into is marked, Vietnamese is established
Media event detection data collection.
Step3, text vector: training Vietnamese term vector merges term vector, position vector, the word of word sequence in sentence
Property vector sum entity type vector is as mode input;
As a preferred solution of the present invention, in the step Step3, using the method training of skip-gram language model
Vietnamese term vector, respectively construct position insertion table, part of speech insertion table, entity type insertion table by location information, part-of-speech information,
Entity type information is embedded into vector.
Step4, building event category detection model: on the basis of step Step3, using convolutional neural networks and fusion
The convolutional neural networks of interdependent syntactic information, obtain media event sentence coding, and training event detection disaggregated model realizes that the Chinese is more double
Language media event type detection;
As a preferred solution of the present invention, in the step Step4, using continuous word in traditional multi-kernel convolution coding sentence
Between semantic information, while using the semantic information between discontinuous word in the convolutional encoding sentence for merging interdependent syntactic information,
Merge semantic information of the two-part semantic information as current sentence.
Step5, event type detection: it to needing to identify that the more bilingual media event sentence of the Chinese encodes, then will extract new
Input vector of the feature vector of news event sentence as disaggregated model, obtains final classification results by disaggregated model.
In order to verify effect of the invention, it is provided with comparative experiments, using accuracy rate (P), recall rate (R) and F value (F)
As evaluation index.
Wherein, A is the quantity of correct identification events type, and B is the quantity of wrong identification event type, and C is unrecognized
The quantity of the correct identification events type arrived.
(1) to probe into influence of the model number of plies to experimental result, the mould of the present invention of 1 layer, 2 layers and 3 layers convolution is respectively adopted
Type is tested, and optimum number of strata is found, and experimental result is as shown in table 2:
Influence of the 2 model number of plies of table to experimental result
The model number of plies | P (%) | R (%) | F (%) |
1 | 74.04 | 62.63 | 70.08 |
2 | 76.78 | 64.25 | 71.45 |
3 | 75.53 | 59.01 | 68.23 |
By analysis it is found that having reached optimum efficiency when the quantity of convolutional layer is 2, recall rate, accuracy rate and F value difference
It is 75.78%, 64.25%, 70.45%.When the convolution number of plies is 3, the performance of model is declined.Therefore, in subsequent experimental
In, model is all made of two layers of convolution.
(2) coding characteristic is probed into
The coding characteristic incorporated for word embeding layer is probed into, and after removing a certain item coding vector, remaining 2 classes are compiled
Code vector and term vector merge the input as model, probe into influence of the different coding feature to model performance of the present invention, test
The results are shown in Table 3:
Influence of 3 coding characteristic of table to experimental result
After removing a certain item coding vector known to analysis, accuracy rate, recall rate, F value and this paper model of model
Compared to being declined, thus demonstrates while event detection performance can be improved using three kinds of coding vectors.
(3) different models are probed into
To prove effect of this paper model in Vietnamese event detection task, this paper model is believed with interdependent syntax is not added
Traditional convolutional neural networks of breath and the figure convolutional neural networks for merging interdependent syntactic information compare, experimental result such as 4 institute of table
Show:
The different model performance comparisons of table 4
Different models | P (%) | R (%) | F (%) |
CNN | 73.23 | 66.14 | 69.23 |
GCN | 75.00 | 63.92 | 70.24 |
DPCNN | 76.78 | 64.25 | 71.45 |
Pass through the modelling effect of DPCNN (convolutional neural networks for merging interdependent syntactic analysis) and GCN known to comparative analysis
Better than CNN, therefore the interdependent syntactic information of introducing can capture the information that CNN is not captured.GCN and DPCNN is compared, can be seen
It arriving, the F value of DPCNN has 0.19% promotion, illustrate that most information can be captured by GCN, but simultaneously using continuous
Convolutional Neural and merge the convolutional neural networks of interdependent syntactic information and can capture more implicit informations in sentence.
It is analyzed by testing above with instance data, this method propose a kind of for the new of Vietnamese media event detection
Type neural network model, the Model Fusion term vector, position vector, part of speech vector sum name entity vector to capture word rank
Semantic information, while semantic letter is obtained using the convolutional neural networks of traditional convolutional neural networks and the interdependent syntactic information of fusion
Breath.It compares by model being arranged different parameters, and by best model and basic skills, it was demonstrated that this method is in Vietnamese
Preferable effect is reached on media event Detection task.
Above in conjunction with attached drawing, the embodiment of the present invention is explained in detail, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.
Claims (5)
1. merging the Vietnamese news event detecting method of interdependent syntactic information and convolutional neural networks, it is characterised in that:
Specific step is as follows for the detection method:
Step1, corpus are collected: being collected and are used for Vietnamese event detection newsletter archive, carry out duplicate removal and screening to newsletter archive;
Step2, building corpus: by the mark system of Vietnamese event detection, according to the language feature and thing of Vietnamese
Vietnamese newsletter archive is marked in the demand of part detection, and the Vietnamese news corpus marked is divided into training corpus, is surveyed
Try corpus and verifying collection;
Step3, text vector: training Vietnamese term vector, merge the term vector of word sequence in sentence, position vector, part of speech to
Amount and entity type vector are as mode input;
Step4, building event category detection model: it on the basis of step Step3, using convolutional neural networks and merges interdependent
The convolutional neural networks of syntactic information, obtain media event sentence coding, and training event detection disaggregated model realizes that the Chinese is more bilingual new
Hear event type detection;
Step5, event type detection: to needing to identify that the more bilingual media event sentence of the Chinese encodes, news thing will then be extracted
Input vector of the feature vector of part sentence as disaggregated model obtains final classification results by disaggregated model.
2. the Vietnamese media event detection side of fusion interdependent syntactic information and convolutional neural networks according to claim 1
Method, it is characterised in that: in the step Step1, use Scrapy as the tool that crawls, imitate user's operation, be that Vietnamese is new
It hears website and customizes different templates, template is formulated according to the path XPath of page data element and obtains detailed data, is obtained as new
Hear title, news time, body.
3. the Vietnamese media event detection side of fusion interdependent syntactic information and convolutional neural networks according to claim 1
Method, it is characterised in that: in the step Step2, media event text is made of trigger word and parameter, and trigger word can clear table
Occur up to a kind of event, the main word of trigger event is usually single verb or noun, parameter describe event generation time,
The information such as place, personage;Mark system use XML extensible markup language tissue text, respectively to trigger word, parameter,
Event category is marked, and the Vietnamese newsletter archive being collected into is marked, and establishes Vietnamese media event detection data
Collection.
4. the Vietnamese media event detection side of fusion interdependent syntactic information and convolutional neural networks according to claim 1
Method, it is characterised in that: in the step Step3, using the method training Vietnamese term vector of skip-gram language model, divide
Not Gou Jian position insertion table, part of speech insertion table, entity type insertion table location information, part-of-speech information, entity type information is embedding
Enter into vector.
5. the Vietnamese media event detection side of fusion interdependent syntactic information and convolutional neural networks according to claim 1
Method, it is characterised in that: in the step Step4, the semantic information in sentence between continuous word is encoded using traditional multi-kernel convolution,
Simultaneously using the semantic information between discontinuous word in the convolutional encoding sentence for merging interdependent syntactic information, two-part semanteme is merged
Semantic information of the information as current sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910635489.2A CN110377738A (en) | 2019-07-15 | 2019-07-15 | Merge the Vietnamese news event detecting method of interdependent syntactic information and convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910635489.2A CN110377738A (en) | 2019-07-15 | 2019-07-15 | Merge the Vietnamese news event detecting method of interdependent syntactic information and convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110377738A true CN110377738A (en) | 2019-10-25 |
Family
ID=68253129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910635489.2A Pending CN110377738A (en) | 2019-07-15 | 2019-07-15 | Merge the Vietnamese news event detecting method of interdependent syntactic information and convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110377738A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826313A (en) * | 2019-10-31 | 2020-02-21 | 北京声智科技有限公司 | Information extraction method, electronic equipment and computer readable storage medium |
CN111159336A (en) * | 2019-12-20 | 2020-05-15 | 银江股份有限公司 | Semi-supervised judicial entity and event combined extraction method |
CN111259672A (en) * | 2020-02-12 | 2020-06-09 | 新疆大学 | Chinese tourism field named entity identification method based on graph convolution neural network |
CN111581396A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax |
CN111597811A (en) * | 2020-05-09 | 2020-08-28 | 北京合众鼎成科技有限公司 | Financial chapter-level multi-correlation event extraction method based on graph neural network algorithm |
CN111666373A (en) * | 2020-05-07 | 2020-09-15 | 华东师范大学 | Chinese news classification method based on Transformer |
CN111897908A (en) * | 2020-05-12 | 2020-11-06 | 中国科学院计算技术研究所 | Event extraction method and system fusing dependency information and pre-training language model |
CN111966865A (en) * | 2020-07-21 | 2020-11-20 | 西北大学 | Method for extracting features by utilizing airspace map convolutional layer based on table lookup sub-network |
CN112085104A (en) * | 2020-09-10 | 2020-12-15 | 杭州中奥科技有限公司 | Event feature extraction method and device, storage medium and electronic equipment |
CN112163416A (en) * | 2020-10-09 | 2021-01-01 | 北京理工大学 | Event joint extraction method for merging syntactic and entity relation graph convolution network |
CN112307364A (en) * | 2020-11-25 | 2021-02-02 | 哈尔滨工业大学 | Character representation-oriented news text place extraction method |
CN112580330A (en) * | 2020-10-16 | 2021-03-30 | 昆明理工大学 | Vietnamese news event detection method based on Chinese trigger word guidance |
CN112668319A (en) * | 2020-12-18 | 2021-04-16 | 昆明理工大学 | Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance |
CN113239142A (en) * | 2021-04-26 | 2021-08-10 | 昆明理工大学 | Trigger-word-free event detection method fused with syntactic information |
CN113627170A (en) * | 2021-07-01 | 2021-11-09 | 昆明理工大学 | Multi-feature fusion Vietnamese keyword generation method |
CN113626577A (en) * | 2021-07-01 | 2021-11-09 | 昆明理工大学 | Chinese cross-language news event element extraction method based on reading understanding |
CN114004236A (en) * | 2021-09-18 | 2022-02-01 | 昆明理工大学 | Chinese cross-language news event retrieval method integrated with event entity knowledge |
CN114444484A (en) * | 2022-01-13 | 2022-05-06 | 重庆邮电大学 | Document-level event extraction method and system based on double-layer graph |
CN116011461A (en) * | 2023-03-02 | 2023-04-25 | 文灵科技(北京)有限公司 | Concept abstraction system and method based on event classification model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239445A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | The method and system that a kind of media event based on neutral net is extracted |
CN109800413A (en) * | 2018-12-11 | 2019-05-24 | 北京百度网讯科技有限公司 | Recognition methods, device, equipment and the readable storage medium storing program for executing of media event |
-
2019
- 2019-07-15 CN CN201910635489.2A patent/CN110377738A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239445A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | The method and system that a kind of media event based on neutral net is extracted |
CN109800413A (en) * | 2018-12-11 | 2019-05-24 | 北京百度网讯科技有限公司 | Recognition methods, device, equipment and the readable storage medium storing program for executing of media event |
Non-Patent Citations (3)
Title |
---|
XIAO LIU 等: "Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation", 《ARXIV:1809.09078V2》 * |
侯加英: "汉越双语新闻话题发现研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
潘清清: "越南语新闻事件元素抽取方法研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826313A (en) * | 2019-10-31 | 2020-02-21 | 北京声智科技有限公司 | Information extraction method, electronic equipment and computer readable storage medium |
CN111159336B (en) * | 2019-12-20 | 2023-09-12 | 银江技术股份有限公司 | Semi-supervised judicial entity and event combined extraction method |
CN111159336A (en) * | 2019-12-20 | 2020-05-15 | 银江股份有限公司 | Semi-supervised judicial entity and event combined extraction method |
CN111259672A (en) * | 2020-02-12 | 2020-06-09 | 新疆大学 | Chinese tourism field named entity identification method based on graph convolution neural network |
CN111581396A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax |
CN111581396B (en) * | 2020-05-06 | 2023-03-31 | 西安交通大学 | Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax |
CN111666373A (en) * | 2020-05-07 | 2020-09-15 | 华东师范大学 | Chinese news classification method based on Transformer |
CN111597811A (en) * | 2020-05-09 | 2020-08-28 | 北京合众鼎成科技有限公司 | Financial chapter-level multi-correlation event extraction method based on graph neural network algorithm |
CN111597811B (en) * | 2020-05-09 | 2021-11-12 | 北京合众鼎成科技有限公司 | Financial chapter-level multi-correlation event extraction method based on graph neural network algorithm |
CN111897908A (en) * | 2020-05-12 | 2020-11-06 | 中国科学院计算技术研究所 | Event extraction method and system fusing dependency information and pre-training language model |
CN111897908B (en) * | 2020-05-12 | 2023-05-02 | 中国科学院计算技术研究所 | Event extraction method and system integrating dependency information and pre-training language model |
CN111966865B (en) * | 2020-07-21 | 2023-09-22 | 西北大学 | Method for extracting features by using space domain map convolution layer based on table look-up sub-network |
CN111966865A (en) * | 2020-07-21 | 2020-11-20 | 西北大学 | Method for extracting features by utilizing airspace map convolutional layer based on table lookup sub-network |
CN112085104B (en) * | 2020-09-10 | 2024-04-12 | 杭州中奥科技有限公司 | Event feature extraction method and device, storage medium and electronic equipment |
CN112085104A (en) * | 2020-09-10 | 2020-12-15 | 杭州中奥科技有限公司 | Event feature extraction method and device, storage medium and electronic equipment |
CN112163416B (en) * | 2020-10-09 | 2021-11-02 | 北京理工大学 | Event joint extraction method for merging syntactic and entity relation graph convolution network |
CN112163416A (en) * | 2020-10-09 | 2021-01-01 | 北京理工大学 | Event joint extraction method for merging syntactic and entity relation graph convolution network |
CN112580330A (en) * | 2020-10-16 | 2021-03-30 | 昆明理工大学 | Vietnamese news event detection method based on Chinese trigger word guidance |
CN112580330B (en) * | 2020-10-16 | 2023-09-12 | 昆明理工大学 | Vietnam news event detection method based on Chinese trigger word guidance |
CN112307364A (en) * | 2020-11-25 | 2021-02-02 | 哈尔滨工业大学 | Character representation-oriented news text place extraction method |
CN112668319A (en) * | 2020-12-18 | 2021-04-16 | 昆明理工大学 | Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance |
CN113239142A (en) * | 2021-04-26 | 2021-08-10 | 昆明理工大学 | Trigger-word-free event detection method fused with syntactic information |
CN113626577A (en) * | 2021-07-01 | 2021-11-09 | 昆明理工大学 | Chinese cross-language news event element extraction method based on reading understanding |
CN113627170A (en) * | 2021-07-01 | 2021-11-09 | 昆明理工大学 | Multi-feature fusion Vietnamese keyword generation method |
CN113627170B (en) * | 2021-07-01 | 2024-05-28 | 昆明理工大学 | Multi-feature fusion Vietnam keyword generation method |
CN114004236A (en) * | 2021-09-18 | 2022-02-01 | 昆明理工大学 | Chinese cross-language news event retrieval method integrated with event entity knowledge |
CN114004236B (en) * | 2021-09-18 | 2024-04-30 | 昆明理工大学 | Cross-language news event retrieval method integrating knowledge of event entity |
CN114444484A (en) * | 2022-01-13 | 2022-05-06 | 重庆邮电大学 | Document-level event extraction method and system based on double-layer graph |
CN116011461A (en) * | 2023-03-02 | 2023-04-25 | 文灵科技(北京)有限公司 | Concept abstraction system and method based on event classification model |
CN116011461B (en) * | 2023-03-02 | 2023-07-21 | 文灵科技(北京)有限公司 | Concept abstraction system and method based on event classification model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110377738A (en) | Merge the Vietnamese news event detecting method of interdependent syntactic information and convolutional neural networks | |
CN108519890B (en) | Robust code abstract generation method based on self-attention mechanism | |
CN110334213B (en) | Method for identifying time sequence relation of Hanyue news events based on bidirectional cross attention mechanism | |
CN112231472B (en) | Judicial public opinion sensitive information identification method integrated with domain term dictionary | |
CN110609983B (en) | Structured decomposition method for policy file | |
CN104573028A (en) | Intelligent question-answer implementing method and system | |
CN112668319B (en) | Vietnamese news event detection method based on Chinese information and Vietnamese statement method guidance | |
CN106570180A (en) | Artificial intelligence based voice searching method and device | |
CN110413768A (en) | A kind of title of article automatic generation method | |
CN109033166B (en) | Character attribute extraction training data set construction method | |
CN113157860B (en) | Electric power equipment maintenance knowledge graph construction method based on small-scale data | |
CN111444704A (en) | Network security keyword extraction method based on deep neural network | |
CN112580330B (en) | Vietnam news event detection method based on Chinese trigger word guidance | |
CN110119443A (en) | A kind of sentiment analysis method towards recommendation service | |
CN112966097A (en) | NLP-based marketing company financial news-express automatic generation method and system | |
CN117474507A (en) | Intelligent recruitment matching method and system based on big data application technology | |
CN111984782A (en) | Method and system for generating text abstract of Tibetan language | |
CN112287197A (en) | Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases | |
CN110929518B (en) | Text sequence labeling algorithm using overlapping splitting rule | |
Nayan et al. | Named entity recognition for indian languages | |
CN110502759A (en) | The Chinese for incorporating classified dictionary gets over the outer word treatment method of hybrid network nerve machine translation set | |
Buchholz | Distinguishing complements from adjuncts using memory-based learning | |
CN111274354B (en) | Referee document structuring method and referee document structuring device | |
CN107894977A (en) | With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary | |
CN116662643A (en) | Legal recommendation method, legal recommendation system, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191025 |
|
RJ01 | Rejection of invention patent application after publication |