CN113657090A - Military news long text layering event extraction method - Google Patents

Military news long text layering event extraction method Download PDF

Info

Publication number
CN113657090A
CN113657090A CN202110970577.5A CN202110970577A CN113657090A CN 113657090 A CN113657090 A CN 113657090A CN 202110970577 A CN202110970577 A CN 202110970577A CN 113657090 A CN113657090 A CN 113657090A
Authority
CN
China
Prior art keywords
event
military news
text
military
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110970577.5A
Other languages
Chinese (zh)
Inventor
张静
胡军
栾瑞鹏
孙悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese People's Liberation Army 32801
Original Assignee
Chinese People's Liberation Army 32801
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese People's Liberation Army 32801 filed Critical Chinese People's Liberation Army 32801
Priority to CN202110970577.5A priority Critical patent/CN113657090A/en
Publication of CN113657090A publication Critical patent/CN113657090A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a military news long text layering event extraction method, which comprises the following specific steps of obtaining the content of a military news information webpage, and extracting military news text data in the webpage; performing text preprocessing, performing word segmentation and part-of-speech tagging on the contents of the military news text, performing word segmentation on the title of the military news text and constructing a trigger word to obtain a classification result of the title of the military news text; identifying an event sentence in a military news text; screening out event sentences with the similarity between the military news text and the trigger words larger than a certain threshold value; extracting event elements from the event sentences and labeling roles; and according to the role description result of the event element, performing event description. The method determines the vocabulary with the most contribution degree of the statement expression theme as the trigger word of the military news, classifies the event category by using the trigger word and connects the event elements in series to form an event chain of the military news super-long text at the chapter level.

Description

Military news long text layering event extraction method
Technical Field
The invention relates to the technical field of natural language processing, in particular to a military news long text layering event extraction method.
Background
In the technical field of natural language processing, an event refers to a series of activities of some characters around a certain direction under a certain space and time. The event extraction technology is used for extracting the content and key words which are interested by the user from the unstructured text data and presenting the content and the key words to the user in a structured mode.
Military engineering construction is the key point of national defense confidentiality of various countries, and military technology is always developed in the most advanced direction, so that effective analysis of military news becomes a means for tracking the technological frontier. The current common approaches for event extraction are:
1. the traditional matching and statistical methods, such as static template matching and word frequency statistical analysis, have a great number of natural language change modes, and slight change can cause drastic change of semantics, so that the method is invalid and has low extraction efficiency. For extraction of military news, the design and writer of the template needs deep military knowledge reserves, and the threshold is high.
2. Neural networks and machine learning methods, such as long-and-short-term memory networks, word packets, and the like, are used for vector encoding of words, and pre-training language models are used for transfer learning. However, due to the limitation of parallel processor cache, the method still has difficulty in meeting the requirement of military news comprehensive processing at the level of ultra-long chapters.
Disclosure of Invention
Aiming at the problems that the current military news event extraction design mode is difficult, the long text on chapter level is difficult to deal with and the like, the invention provides the method for extracting the hierarchical events of the long text on the military news, which can efficiently extract the events of the long text on chapter level military news.
The invention discloses a method for extracting a long text layering event of military news, which comprises the following steps,
s1, acquiring the content of the military news information webpage, and extracting military news text data in the webpage;
the step S1 includes acquiring a website link of the military news from each military information website by using a web crawler tool, constructing an HTTP request for accessing a website body of the military news, accessing webpage data of the military news, after the webpage data is obtained, analyzing the webpage data, extracting military news text data in the webpage, searching for a new military news hyperlink in the analyzed webpage data, and acquiring and analyzing the webpage data for the new military news hyperlink.
S2, text preprocessing is performed on the extracted military news text data, which specifically includes,
s21, extracting the title and release date of the military news text respectively;
s22, performing word segmentation and part-of-speech tagging on the contents of the military news text;
s23, segmenting the titles of the military news texts, constructing trigger words, defining event types, and classifying the events of the military news and the titles of the military news texts respectively to obtain classification results of the events of the military news and the titles of the military news texts;
the method comprises the steps of using a natural language processing tool to perform word segmentation on the title of a military news text, extracting keywords of a military news subject from word segmentation results, and constructing trigger words of corresponding event categories according to military news event categories corresponding to the keywords. Firstly, judging whether the title or the text content of the military news text has the keywords of the military news subject, if the key word of the main body of the military news exists, classifying the event of the military news into the event category corresponding to the key word to obtain the category information to which the event of the military news belongs, if there are no keywords of the military news subject in the title or body content of the military news text, or the result of segmenting the headlines of military news text cannot completely cover the keywords of the military news subject, the similarity between the vocabulary obtained by segmenting the title of the military news text and the trigger words of the event category is calculated by utilizing the similarity between the vocabularies, if the similarity is larger than a certain threshold value, attributing the news text titles of the military affairs to the event category corresponding to the trigger word;
calculating the similarity between the vocabulary of the military news text title and the trigger word of the event category, and describing the vocabulary by using the semantic description formula to obtain the semantic description formula, w, of each vocabulary1And w2Respectively representing the semantic description formulas of two different vocabularies, calculating the path length d between the two vocabularies under a semantic hierarchy, selecting a proper adjusting parameter alpha, and calculating the vocabulary w by using a word similarity formula of the networl1And w2The similarity between the web words is represented by the following formula:
Figure BDA0003225771940000031
wherein, Sim (w)1,w2) The expression vocabulary w1And w2The similarity between them.
And S24, sequencing the events of the military news and the classification results of the text titles of the military news obtained in the step S23 according to the date and the sequence of the release of the military news.
S3, carrying out sentence segmentation and word segmentation on the military news text, calculating the similarity between the military news text and the trigger word, and identifying event sentences in the military news text;
performing part-of-speech analysis on a word segmentation result obtained after the military news text is segmented to obtain real words in the military news text, respectively calculating first independent primitive senses description, other independent primitive senses description, relation primitive descriptions and relation symbol description of the real words and the trigger words in the military news text, and then respectively calculating four similarities of the real words and the trigger words in the military news text by using the word similarity formula of the web in the step S23 under the four primitive senses descriptions, and calculating the similarities of the real words and the trigger words in the military news text for all the real words in the military news textAnd respectively averaging the obtained similarity under each semantic description, and taking the averaged result as the similarity calculation result under each semantic description, thereby obtaining four similarity calculation results which are expressed as SimiI is 1,2,3 and 4, and then calculating the average value of the four similarity calculation results as the military news text s and the trigger word w0Final similarity Sim (s, w)0) The expression is as follows:
Figure BDA0003225771940000032
wherein, Sim (s, w)0) Representing military news text s and trigger word w0The final similarity of (c).
The first independent semantic meaning description formula is used for describing a real word as a characteristic structure, the basic attribute of the structural characteristic is the first independent semantic meaning description formula, the other independent semantic meaning description formulas are used for describing a real word as a characteristic structure, other characteristics except the basic attribute of the structural characteristic are other independent semantic meaning description formulas, the value of the other characteristics is a set, and the elements in the set are basic semantic meanings; the relation semantic primitive description formula is used for describing all relations in the semantic expression into a characteristic structure, each characteristic of the characteristic structure is a relation semantic primitive description formula, and the attribute of each characteristic is a basic semantic primitive or a specific word; the relational symbol description refers to a structural feature for all relational symbol description expressions in a semantic expression, the value of each structural feature is a relational symbol description, the value of each structural feature is a set, and the elements of the set are a basic semantic or a specific word.
S4, screening out event sentences, the similarity of which to the trigger words is greater than a certain threshold value, in the military news text, and keeping the event sentences in the military news text;
calculating the similarity between the event sentence of the text in the military news and the trigger word, selecting the event sentence with the highest similarity as the event type label of the military news, keeping the event sentence with the similarity value larger than a certain threshold value with the trigger word, and removing the event sentence with the similarity value smaller than the certain threshold value with the trigger word from the military news text.
S5, extracting event elements from the event sentence and marking roles;
the step S5, which specifically includes,
s51, segmenting the event sentence, extracting the main body of the event, and analyzing the main body and the segmentation result of the sentence according to the part of speech;
and performing part-of-speech analysis on the segmentation result to obtain verb vocabularies of the military news text, and respectively calculating the similarity of all verb vocabularies and trigger words by using a word similarity formula of the web. If the similarity between a verb vocabulary and a trigger word in the sentence of the military news text is higher than a certain threshold value, the sentence in which the verb vocabulary is located is reserved as an event sentence, and the trigger word is marked as the category of the event sentence. And if a plurality of trigger words appear in one event sentence, the event sentence is considered to be a multi-event sentence, and a plurality of category labels are marked on the event sentence and statistics is carried out.
S52, according to the trigger word, the grammar dependency tree and the context information of the trigger word in the military news text, screening the event elements from the event sentence;
and after the category of the event sentence is labeled, searching and extracting event elements from the event sentence according to the template of the event, or extracting the dependency relationship of the event sentence by using a grammar dependency tree to obtain the event elements of the event sentence. Event elements include the time, place, relevant subject, action, etc. at which the event occurred. The template of the event is defined by the user according to the description rule of the event.
S53, describing the event elements in the event sentence;
and after the event elements are obtained, describing roles of the event elements by adopting a description rule, wherein the description rule is used for describing the roles of the event elements according to the parts of speech and the positions of the event elements extracted from the event sentence.
And S6, according to the role description result of the event element, performing event description.
Under different role category labels, event elements extracted from the event sentence are integrated by using a syntax analysis tree to obtain an event description which is composed of the event elements of the event sentence and can express a complete semantic meaning, the event description's relationships of event moving objects, main and subordinate relationships and the like are analyzed, and the event description is filled by using an event template to obtain a hierarchical event extraction result of a military news long text.
The invention has the beneficial effects that:
1. the invention can analyze various sentence patterns by extracting the event based on the pre-training language model and fusing the ultra-long text through event semantics and linguistics, explore the expression rule corresponding to the sentence patterns and provide the construction method of the corresponding event.
2. The extraction method of the military news long text event elements is constructed by deeply researching and comparing word segmentation and part-of-speech tagging in text preprocessing.
3. The method comprises the steps of obtaining structured data and calculating the number of words strongly associated with preset keywords in each sentence on the basis of the structured data to determine the contribution degree of the words expressing the subject of the sentence, taking the words with large contribution degree as trigger words of military news, and using the trigger words to connect event elements in series to form an event chain of the military news ultra-long text at the chapter level.
Drawings
Fig. 1 is a flowchart of an implementation of the military news long text hierarchical event extraction method and device of the present invention.
Detailed Description
For a better understanding of the present disclosure, an example is given here. Fig. 1 is a flowchart of an implementation of the military news long text hierarchical event extraction method and device of the present invention.
The invention discloses a method for extracting a long text layering event of military news, which comprises the following steps,
s1, acquiring the content of the military news information webpage, and extracting military news text data in the webpage;
the step S1 includes acquiring a website link of the military news from each military information website by using a web crawler tool, constructing an HTTP request for accessing a website body of the military news, accessing webpage data of the military news, after the webpage data is obtained, analyzing the webpage data, extracting military news text data in the webpage, searching for a new military news hyperlink in the analyzed webpage data, and acquiring and analyzing the webpage data for the new military news hyperlink.
S2, text preprocessing is performed on the extracted military news text data, which specifically includes,
s21, extracting the title and release date of the military news text respectively;
s22, performing word segmentation and part-of-speech tagging on the contents of the military news text;
and performing word segmentation and part-of-speech tagging on military news texts by using a word segmentation tool, such as a language technology platform of Haughard, so as to prepare for subsequent word frequency analysis statistics.
S23, segmenting the titles of the military news texts, constructing trigger words, defining event types, and classifying the events of the military news and the titles of the military news texts respectively to obtain classification results of the events of the military news and the titles of the military news texts;
the method comprises the steps of using a natural language processing tool to perform word segmentation on the title of military news, extracting keywords of a military news subject from word segmentation results, and constructing trigger words of corresponding event categories according to military news event categories corresponding to the keywords, wherein the construction of the trigger words can play a decisive role in identifying events. Firstly, judging whether the title or the text content of the military news text has the keywords of the military news subject, if the key word of the main body of the military news exists, classifying the event of the military news into the event category corresponding to the key word to obtain the category information to which the event of the military news belongs, if there are no keywords of the military news subject in the title or body content of the military news text, or the result of segmenting the headlines of military news text cannot completely cover the keywords of the military news subject, the similarity between the vocabulary obtained by segmenting the title of the military news text and the trigger words of the event category is calculated by utilizing the similarity between the vocabularies, if the similarity is larger than a certain threshold value, attributing the news text titles of the military affairs to the event category corresponding to the trigger word;
calculating the similarity between the vocabulary of the military news text title and the trigger word of the event category, and describing the vocabulary by using the semantic description formula to obtain the semantic description formula, w, of each vocabulary1And w2Respectively representing the semantic description formulas of two different vocabularies, calculating the path length d between the two vocabularies under a semantic hierarchy, selecting a proper adjusting parameter alpha, and calculating the vocabulary w by using a word similarity formula of the networl1And w2The similarity between the web words is represented by the following formula:
Figure BDA0003225771940000061
wherein, Sim (w)1,w2) The expression vocabulary w1And w2The similarity between them. A military news entity refers to a key person or thing in the news that performs an event action.
And S24, sequencing the events of the military news and the classification results of the text titles of the military news obtained in the step S23 according to the date and the sequence of the release of the military news.
S3, carrying out sentence segmentation and word segmentation on the military news text, calculating the similarity between the sentences in the military news text and the trigger words, and identifying event sentences in the military news text;
performing part-of-speech analysis on a word segmentation result obtained after the military news text is segmented to obtain real words in the military news text, respectively calculating a first independent primitive description, other independent primitive descriptions, a relation primitive description and a relation symbol description of the real words and trigger words in the military news text, and then describing the four primitivesCalculating four similarities of real words and trigger words in the military news text by using the web word similarity formula in step S23, calculating the similarity of all real words and trigger words in the military news text under the description of each kind of sememe, averaging the obtained similarities under the descriptions of each kind of sememe, and taking the result obtained after averaging as the similarity calculation result under the same kind of sememe, thereby obtaining four similarity calculation results expressed as SimiI is 1,2,3 and 4, and then calculating the average value of the four similarity calculation results as the military news text s and the trigger word w0Final similarity Sim (s, w)0) The expression is as follows:
Figure BDA0003225771940000071
the first independent semantic meaning description formula is used for describing a real word as a characteristic structure, the basic attribute of the structural characteristic is the first independent semantic meaning description formula, the other independent semantic meaning description formulas are used for describing a real word as a characteristic structure, other characteristics except the basic attribute of the structural characteristic are other independent semantic meaning description formulas, the value of the other characteristics is a set, and the elements in the set are basic semantic meanings; the relation semantic primitive description formula is used for describing all relations in the semantic expression into a characteristic structure, each characteristic of the characteristic structure is a relation semantic primitive description formula, and the attribute of each characteristic is a basic semantic primitive or a specific word; the relational symbol description refers to a structural feature for all relational symbol description expressions in a semantic expression, the value of each structural feature is a relational symbol description, the value of each structural feature is a set, and the elements of the set are a basic semantic or a specific word.
S4, screening out event sentences, the similarity of which to the trigger words is greater than a certain threshold value, in the military news text, and keeping the event sentences in the military news text;
calculating the similarity between the event sentence of the text in the military news and the trigger word, selecting the event sentence with the highest similarity as the event type label of the military news, keeping the event sentence with the similarity value larger than a certain threshold value with the trigger word, and removing the event sentence with the similarity value smaller than the certain threshold value with the trigger word from the military news text.
S5, extracting event elements from the event sentence and marking roles;
the method comprises the following steps of utilizing similar semaphores to expand trigger words, marking the words and the trigger words with higher similarity in a text as category labels of news, comparing event sentences with the trigger words by using most repeated event words as verbs, and designating the trigger words as the categories of the event sentences if the event sentences contain trigger word actions, wherein the defined military news trigger words comprise: "enqueue", "confrontation", "stealth", "combat", "penetration", "training", "collision", "countermeasures", "counterattack", and the like.
The step S5, which specifically includes,
s51, segmenting the event sentence, extracting the main body of the event, and analyzing the main body and the segmentation result of the sentence according to the part of speech;
and performing part-of-speech analysis on the segmentation result to obtain verb vocabularies of the military news text, and respectively calculating the similarity of all verb vocabularies and trigger words by using a word similarity formula of the web. If the similarity between a verb vocabulary and a trigger word in the sentence of the military news text is higher than a certain threshold value, the sentence in which the verb vocabulary is located is reserved as an event sentence, and the trigger word is marked as the category of the event sentence. And if a plurality of trigger words appear in one event sentence, the event sentence is considered to be a multi-event sentence, and a plurality of category labels are marked on the event sentence and statistics is carried out.
S52, according to the trigger word, the grammar dependency tree and the context information of the trigger word in the military news text, screening the event elements from the event sentence;
and after the category of the event sentence is labeled, searching and extracting event elements from the event sentence according to the template of the event, or extracting the dependency relationship of the event sentence by using a grammar dependency tree to obtain the event elements of the event sentence. Event elements include the time, place, relevant subject, action, etc. at which the event occurred. The template of the event is defined by the user according to the description rule of the event.
S53, describing the event elements in the event sentence;
after the event elements are obtained, role description is carried out on the event elements by adopting a description rule, wherein the description rule is used for carrying out role description on the event elements according to the parts of speech and the positions of the event elements extracted from the event sentence, and comprises the following steps: the subject nouns of clauses are generally event subjects, the predicate verbs in clauses are generally event actions, also defined as triggers, the object nouns of clauses are generally event objects, and so on.
And S6, according to the role description result of the event element, performing event description.
Under different role category labels, event elements extracted from the event sentence are integrated by using a syntax analysis tree to obtain an event description which is composed of the event elements of the event sentence and can express a complete semantic meaning, the event description's relationships of event moving objects, main and subordinate relationships and the like are analyzed, and the event description is filled by using an event template to obtain a hierarchical event extraction result of a military news long text.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (7)

1. A military news long text layering event extraction method is characterized by comprising the following specific steps,
s1, acquiring the content of the military news information webpage, and extracting military news text data in the webpage;
s2, text preprocessing is performed on the extracted military news text data, which specifically includes,
s21, extracting the title and release date of the military news text respectively;
s22, performing word segmentation and part-of-speech tagging on the contents of the military news text;
s23, segmenting the titles of the military news texts, constructing trigger words, defining event types, and classifying the events of the military news and the titles of the military news texts respectively to obtain classification results of the events of the military news and the titles of the military news texts;
s24, sorting the military news events and the classification results of the military news text titles obtained in the step S23 according to the date and sequence of the military news release;
s3, carrying out sentence segmentation and word segmentation on the military news text, calculating the similarity between the military news text and the trigger word, and identifying event sentences in the military news text;
s4, screening out event sentences, the similarity of which to the trigger words is greater than a certain threshold value, in the military news text, and keeping the event sentences in the military news text;
s5, extracting event elements from the event sentence and marking roles;
s6, according to the role description result of the event element, performing event description;
under different role category labels, event elements extracted from the event sentence are integrated by using a syntax analysis tree to obtain an event description which is composed of the event elements of the event sentence and can express a complete semantic, the event description's actor-guest relationship and the dominator-predicate relationship are analyzed, and the event description is filled by using an event template to obtain a hierarchical event extraction result of a military news long text.
2. The military news long-text hierarchical event extraction method of claim 1,
the step S1 includes acquiring a website link of the military news from each military information website by using a web crawler tool, constructing an HTTP request for accessing a website body of the military news, accessing webpage data of the military news, after the webpage data is obtained, analyzing the webpage data, extracting military news text data in the webpage, searching for a new military news hyperlink in the analyzed webpage data, and acquiring and analyzing the webpage data for the new military news hyperlink.
3. The military news long-text hierarchical event extraction method of claim 1,
the step S23, using a natural language processing tool to perform word segmentation on the title of the military news text, extracting keywords of a military news subject from the word segmentation result, and constructing trigger words of corresponding event categories according to the military news event categories corresponding to the keywords; firstly, judging whether the title or the text content of the military news text has the keywords of the military news subject, if the key word of the main body of the military news exists, classifying the event of the military news into the event category corresponding to the key word to obtain the category information to which the event of the military news belongs, if there are no keywords of the military news subject in the title or body content of the military news text, or the result of segmenting the headlines of military news text cannot completely cover the keywords of the military news subject, the similarity between the vocabulary obtained by segmenting the title of the military news text and the trigger words of the event category is calculated by utilizing the similarity between the vocabularies, if the similarity is larger than a certain threshold value, attributing the news text titles of the military affairs to the event category corresponding to the trigger word;
calculating the similarity between the vocabulary of the military news text title and the trigger word of the event category, and describing the vocabulary by using the semantic description formula to obtain the semantic description formula, w, of each vocabulary1And w2Respectively representing the semantic description formulas of two different vocabularies, calculating the path length d between the two vocabularies under a semantic hierarchy, selecting a proper adjusting parameter alpha, and calculating the vocabulary w by using a word similarity formula of the networl1And w2The similarity between the web words is represented by the following formula:
Figure FDA0003225771930000021
wherein, Sim (w)1,w2) The expression vocabulary w1And w2The similarity between them.
4. The military news long-text hierarchical event extraction method of claim 3,
the step S3 is to perform part-of-speech analysis on the segmentation result obtained after the segmentation of the military news text to obtain real words in the military news text, to calculate the first independent semantic description, other independent semantic descriptions, relationship semantic descriptions and relationship symbolic descriptions of the real words and the trigger words in the military news text respectively, then to calculate four similarities of the real words and the trigger words in the military news text respectively by using the known network word similarity formula in the step S23 under the four semantic descriptions, to calculate the similarities of all the real words in the military news text and the trigger words under each semantic description, and to average the obtained similarities under each semantic description respectively, to obtain the result obtained after averaging as the similarity calculation result under the same semantic description, thereby obtaining four similarity calculation results, it is represented by SimiI is 1,2,3 and 4, and then calculating the average value of the four similarity calculation results as the military news text s and the trigger word w0Final similarity Sim (s, w)0) The expression is as follows:
Figure FDA0003225771930000031
wherein, Sim (s, w)0) Representing military news text s and trigger word w0The final similarity of (c).
5. The military news long-text hierarchical event extraction method of claim 4,
the first independent semantic meaning description formula is used for describing a real word as a characteristic structure, the basic attribute of the structural characteristic is the first independent semantic meaning description formula, the other independent semantic meaning description formulas are used for describing a real word as a characteristic structure, other characteristics except the basic attribute of the structural characteristic are other independent semantic meaning description formulas, the value of the other characteristics is a set, and the elements in the set are basic semantic meanings; the relation semantic primitive description formula is used for describing all relations in the semantic expression into a characteristic structure, each characteristic of the characteristic structure is a relation semantic primitive description formula, and the attribute of each characteristic is a basic semantic primitive or a specific word; the relational symbol description refers to a structural feature for all relational symbol description expressions in a semantic expression, the value of each structural feature is a relational symbol description, the value of each structural feature is a set, and the elements of the set are a basic semantic or a specific word.
6. The military news long-text hierarchical event extraction method of claim 3,
the step S5, which specifically includes,
s51, segmenting the event sentence, extracting the main body of the event, and analyzing the main body and the segmentation result of the sentence according to the part of speech;
performing word segmentation on a military news text to obtain a word segmentation result, performing part-of-speech analysis on the word segmentation result to obtain verb words of the military news text, and respectively calculating to obtain the similarity of all verb words and trigger words by using a word similarity formula of a web; if the similarity between a verb vocabulary and a trigger word in the sentence of the military news text is higher than a certain threshold value, taking the sentence in which the verb vocabulary is positioned as an event sentence for reservation, and marking the trigger word as the category of the event sentence; if a plurality of trigger words appear in one event sentence, the event sentence is considered to be a multi-event sentence, a plurality of category labels are marked on the event sentence, and statistics is carried out;
s52, according to the trigger word, the grammar dependency tree and the context information of the trigger word in the military news text, screening the event elements from the event sentence;
after the category of the event sentence is labeled, searching and extracting event elements from the event sentence according to a template of the event, or extracting the dependency relationship of the event sentence by using a syntax dependency tree to obtain the event elements of the event sentence; the event elements comprise the time, place, related subjects and behavior of the event; the template of the event is obtained by defining the user according to the description rule of the event;
s53, describing the event elements in the event sentence;
and after the event elements are obtained, describing roles of the event elements by adopting a description rule, wherein the description rule is used for describing the roles of the event elements according to the parts of speech and the positions of the event elements extracted from the event sentence.
7. The military news long-text hierarchical event extraction method of claim 1,
step S4, calculating the similarity between the event sentence of the text in the military news and the trigger word, selecting the event sentence with the highest similarity as the event category label of the military news, retaining the event sentence with the similarity value greater than a certain threshold value with the trigger word, and removing the event sentence with the similarity value less than the certain threshold value with the trigger word from the military news text.
CN202110970577.5A 2021-08-23 2021-08-23 Military news long text layering event extraction method Pending CN113657090A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110970577.5A CN113657090A (en) 2021-08-23 2021-08-23 Military news long text layering event extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110970577.5A CN113657090A (en) 2021-08-23 2021-08-23 Military news long text layering event extraction method

Publications (1)

Publication Number Publication Date
CN113657090A true CN113657090A (en) 2021-11-16

Family

ID=78481609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110970577.5A Pending CN113657090A (en) 2021-08-23 2021-08-23 Military news long text layering event extraction method

Country Status (1)

Country Link
CN (1) CN113657090A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302794A (en) * 2015-10-30 2016-02-03 苏州大学 Chinese homodigital event recognition method and system
CN110941692A (en) * 2019-09-28 2020-03-31 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for extracting news events of Internet politics outturn class

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302794A (en) * 2015-10-30 2016-02-03 苏州大学 Chinese homodigital event recognition method and system
CN110941692A (en) * 2019-09-28 2020-03-31 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for extracting news events of Internet politics outturn class

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔莹: "基于相似义原和依存句法的政外领域事件抽取方法", 《计算机工程与科学》 *

Similar Documents

Publication Publication Date Title
CN109189942B (en) Construction method and device of patent data knowledge graph
CN106874378B (en) Method for constructing knowledge graph based on entity extraction and relation mining of rule model
CN106503055A (en) A kind of generation method from structured text to iamge description
CN106372061A (en) Short text similarity calculation method based on semantics
Rashid et al. Feature level opinion mining of educational student feedback data using sequential pattern mining and association rule mining
CN108681574A (en) A kind of non-true class quiz answers selection method and system based on text snippet
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
US11170169B2 (en) System and method for language-independent contextual embedding
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN110134934A (en) Text emotion analysis method and device
CN108763192B (en) Entity relation extraction method and device for text processing
CN111859961A (en) Text keyword extraction method based on improved TopicRank algorithm
CN107451116B (en) Statistical analysis method for mobile application endogenous big data
CN111444713B (en) Method and device for extracting entity relationship in news event
CN113590810B (en) Abstract generation model training method, abstract generation device and electronic equipment
CN114881043A (en) Deep learning model-based legal document semantic similarity evaluation method and system
CN112711666B (en) Futures label extraction method and device
CN111815426B (en) Data processing method and terminal related to financial investment and research
Jha et al. Hsas: Hindi subjectivity analysis system
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
Tang et al. Text semantic understanding based on knowledge enhancement and multi-granular feature extraction
CN112507115B (en) Method and device for classifying emotion words in barrage text and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211116

RJ01 Rejection of invention patent application after publication