CN112464646A

CN112464646A - Text emotion analysis method for defense intelligence library in national defense field

Info

Publication number: CN112464646A
Application number: CN202011318544.4A
Authority: CN
Inventors: 董文轩; 晏裕生; 江洋; 李斌; 李兴亚; 苏慧超; 孙孟阳; 姚晗
Original assignee: China Institute Of Marine Technology & Economy
Current assignee: China Institute Of Marine Technology & Economy
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-03-09

Abstract

The invention relates to a text emotion analysis method and system for a defense intelligence library in the field of national defense, wherein the method comprises the following steps: acquiring a text of a defense intelligence library in the national defense field; carrying out segmentation processing to obtain a sentence set; preprocessing and performing word segmentation by adopting a conditional random field algorithm; performing condition screening on each sentence by using a CHI statistical method based on a subjective 2-POS model to obtain a subjective sentence set; carrying out degree grading on the emotion expression words; then, judging a symbolic sentence; carrying out emotional tendency statistics on each vocabulary in the subjective sentences, calculating the final score of each subjective sentence according to an emotional calculation model, and calculating the final emotional score of the text; and calculating the emotional tendency value of the text. By adopting the text emotion analysis method, the defense intelligence library text report in the national defense field is subjected to autonomous analysis, the accuracy and timeliness of the analysis are improved, and a quick and accurate reference is provided for scientific and technical personnel in the national defense field.

Description

Text emotion analysis method for defense intelligence library in national defense field

Technical Field

The invention relates to the field of text classification emotion analysis, in particular to a text emotion analysis method and system for a defense intelligence library in the field of national defense.

Background

With the rapid development of the internet, more and more internet users change from simply acquiring internet information to creating internet information. Blogs, forums, discussion groups in the internet have emerged with a large amount of subjective text published by users. These subjective texts may be user comments about a certain product or service, or public opinions about a certain news event or national policy, etc. Potential consumers can obtain relevant comments when purchasing a certain product or service to provide decision reference, and government departments can also browse public opinions about news events or national policies to know public opinions. These subjective texts are growing exponentially each day, and manual analysis alone consumes a lot of manpower and time. Therefore, the computer is adopted to automatically analyze the emotion of the subjective text expression, and the computer becomes a hot spot of current academic research, and the research direction of the hot spot is text emotion analysis.

Text emotion Analysis (Sentiment Analysis) refers to a process of analyzing, processing and extracting subjective text with emotional colors by using natural language processing and text mining technologies. The text sentiment analysis method can be divided into four levels, namely word level, phrase level, sentence level, chapter level and the like according to the analysis granularity. Each level of object analysis corresponds to unique sentiment analysis results (positive, negative and neutral). Currently, the text emotion analysis research covers a plurality of fields including natural language processing, text mining, information retrieval, information extraction, machine learning, artificial intelligence and the like, and the text emotion analysis result has great significance for optimizing government, enterprise and consumer decisions, so that the technology is widely concerned by a plurality of scholars and research institutions.

The defense intelligence library particularly refers to an intelligence library which mainly takes the research on aspects of national security, national defense strategy, military strategy, strategy evaluation, operational concept and the like and indirectly or directly provides decision support service for military and military industry. It produces a great deal of research effort each year, and the type of effort is mostly in the form of text reports. The research result of the defense intelligence library usually contains emotional tendency to relevant affairs in the national defense field, and effective reference can be provided for national defense safety, national defense construction and the like by analyzing the emotion of the research result.

The application of the text sentiment analysis in the national defense science and technology field, particularly in the national defense field oriented defense intelligence base is limited to a certain extent, mainly because the content of the national defense science and technology field defense intelligence base report is different from the contents of microblog, forum comments, user evaluation and the like, and the research result has more authoritative instructive significance, so that the requirements on timeliness and accuracy of the text sentiment analysis are outstanding. On one hand, the national defense field terms in the text report of the defense intelligence library are more, and the pre-training time of words is greatly increased, so that the background knowledge body is difficult to construct and the timeliness requirement is difficult to meet; on the other hand, the Chiense report is usually in a chapter and paragraph format, which contains a large number of sentences, and the sentences may have complicated relations such as turning and sequential relations, and the analysis is difficult, and it is difficult to ensure high accuracy in the existing chapter-level-based text emotion analysis models, such as the LSTM model or the CRF model.

Disclosure of Invention

The invention provides a text emotion analysis method and system for a defense domain defense intelligence library, which are used for solving the problems in the prior art, the text emotion analysis method and system are used for dividing chapter texts layer by layer from top to bottom according to sentence levels and word levels, the improvement is carried out on the basis of the conventional CRF algorithm, the self-improved CHI statistical method is combined, the Hownet dictionary is divided according to the emotion degree in a weighted manner, and the final emotion analysis result is formed by summarizing from bottom to top, so that the accuracy and the timeliness of text emotion analysis of the defense domain defense intelligence library are improved.

In order to achieve the purpose, the invention provides the following technical scheme:

a text emotion analysis method for defense intelligence base in the field of national defense comprises the following steps:

acquiring Text of a defense intelligence library in the national defense field;

segmenting chapters in the Text according to a preset word segmentation model to obtain a sentence set T ═ T { (T)₁，t₂，……，t_nN is a natural number;

for the sentence set T ═ T obtained in the above step₁，t₂，……，t_nProcessing in a preset mode, and adopting a conditional random field algorithm to process each sentence T in the sentence set T_iPerforming word segmentation, wherein i is 1,2, … …, n, to obtain word segmented text data;

based on the segmented text data obtained in the above steps, each sentence ti is subjected to condition screening by using a CHI statistical method based on a subjective 2-POS model, and each sentence t is subjected to condition screening_iPerforming subjective and objective emotion weight value assignment and judging step of the subjective and objective emotion weight value to obtain a subjective sentence set T '═ T'₁，t′₂，……，t′_sS is a natural number less than or equal to n;

importing a pre-established emotion dictionary, carrying out degree grade division on emotion expression words, and giving corresponding word weight values according to the difference of the degree grades;

based on each subjective sentence t 'obtained in the step'_lWherein l is 1,2, … …, s, making a symbolic sentence judgment, andeach subjective sentence t 'according to the judgment result'_lEndowing different characteristic weight values;

according to the emotion dictionary, the subjective sentence t'_lPerforming emotion tendency statistics on each vocabulary in the sentence, and performing emotion calculation on each subjective sentence t 'according to an emotion calculation model'_lCalculating the final score of the Text, and calculating the final emotion score of the Text;

and calculating an emotional tendency value O of the Text.

Preferably, in the above step, the preset word segmentation model is a common punctuation mark, wherein the common punctuation mark is set as comma, period, question mark and exclamation mark.

Preferably, the pair obtains a set of sentences T ═ T₁，t₂，……，t_nProcessing in a preset mode, and specifically comprising:

each sentence t is divided by adopting a preset elimination rule_iRemoving characters and/or words with preset attributes contained in the text, wherein the characters and/or words with the preset attributes at least comprise special symbols, null values and stop words;

the conditional random field algorithm is adopted to set the sentence set T ═ T₁，t₂，……，t_nEvery sentence t in }_iPerforming word segmentation, specifically comprising:

each sentence t processed in a preset mode_iSetting the observation sequence as an observation sequence, setting the sequence output after conditional random field operation as a state sequence based on the input observation sequence, wherein the state sequence forms a Markov random field, and searching each sentence t in the conditional random field operation process_iSequence of states of maximum probability as each sentence t_iFinal word segmentation result set t_i＝{w_i1，w_i2，……，w_ij，……，w_imIn which w_ijRepresenting a sentence t_iThe j-th cut word with the part-of-speech attribute in the list is i-1, 2, … …, n, j-i-1, 2, … …, m, and m is a natural number.

Preferably, the subjective 2 basis-CHI statistical method of POS model for said each sentence t_iAnd (3) carrying out condition screening, which specifically comprises the following steps:

each sentence t_iThe words in the sentence are classified according to the parts of speech, the sequence combination of 2 continuous parts of speech in the sentence is used as one item for identifying the text, and the statistics is carried out by using the following formula:

wherein, χ²For the emotional statistical score, pat_tiIndicates a certain 2-POS, c_kThe term "subjective sentiment" means objective when k is 0, subjective when k is 1, N means the number of all sentences in the sentence set T, and a means the characteristic word pat_tiAnd belong to class c_kB represents the number of sentences containing the feature word pat_tiBut not in class c_kC indicates the sentence belongs to the category C_kNot including the characteristic word pat_tiD indicates that the sentence does not belong to the category c_kNor does it contain the characteristic word pat_tiThe number of sentences of (1);

according to the emotion statistical score condition, screening out chi²The 2-POS item of the top ten points is scored, and the sentence t containing the 2-POS item_iWeight value w of_tiAdding 1 and weighting value w_tiSentence t greater than 0_iIs judged as a subjective sentence t'_iObtaining the subjective sentence set T '═ T'₁，t′₂，……，t′_s}。

Preferentially, the pre-established emotion dictionary is a Hownet emotion dictionary, the degree grades comprise at least three grades, and the emotion expression degrees among the at least three grades are sequentially decreased; and the word weight values corresponding to the three levels are 1.5, 1.0 and 0.5, respectively.

Preferably, the symbolic sentences include sentences containing summarized and/or turning words, or sentences of segment heads and segment tails in the text;

judging the symbolic sentence, if it is symbolicThe sentence is given its characteristic weight value_sp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentence_sp＝1.0。

Preferably, the performing emotional tendency statistics specifically includes:

according to the emotion dictionary, the subjective sentence t'_lEach word w in (1)_lkAnd (3) carrying out emotional tendency statistics: if the word w_lkIf the words belong to positive emotion words, the word w_lkFlag value of_kIf the term w is 1_lkBelonging to negative emotion words, the word w_lkFlag value of_k-1, and calculating the subjective sentence t'_lIs finally scored

Calculating the final emotion score of the Text

Wherein l is 1,2, … …, s, k is 1,2, … …, m, s, m are all natural numbers.

Preferably, the emotion tendency value O of the Text is calculated as sign (Ori)_T) Wherein sign is a sign function when Ori_TIf greater than 0, O is 1, representing an aggressive view; when Ori_TWhen equal to 0, O is 0, representing a neutral point of view; when Ori_TWhen the ratio is less than 0, O is-1, which means negative viewpoint.

A text emotion analysis system for defense field defense intelligence base, the text emotion analysis system comprises:

the defense intelligence library Text acquisition module is used for acquiring Text of a defense intelligence library in the national defense field;

the Text segmentation module is used for segmenting chapters in the Text according to a preset word segmentation model to obtain a sentence set;

the preprocessing and word segmentation module is used for preprocessing the sentence set and segmenting words of the preprocessed sentences by adopting a preset model to obtain text data after word segmentation;

the screening and judging module is used for carrying out condition screening on the text data after word segmentation and carrying out weight adding judgment to obtain a subjective sentence set;

the emotion degree grading module is used for grading the degree of the emotion expression words and endowing corresponding word weight values;

the symbolic sentence judgment module is used for judging the symbolic sentences and giving characteristic weight values according to the judgment results;

the emotion score calculation module is used for calculating the final emotion score of the text;

and the emotion tendency judgment module is used for judging the emotion tendency of the text.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a text sentiment analysis method towards the defense domain will-intelligent.

According to the specific embodiment provided by the invention, the technical scheme of the invention can obtain the following technical effects:

(1) by the CHI statistical method based on the subjective 2-POS model, noise words irrelevant to the category can be automatically removed, on one hand, the operation speed and the model construction efficiency can be effectively improved, the analysis timeliness is guaranteed, on the other hand, the influence of noise data on an analysis result can be removed or reduced, and the analysis accuracy is improved.

(2) According to the degree of the emotion expression words, 3-level weight division is carried out on the Hownet emotion dictionary, the situation that the traditional Hownet emotion dictionary is only divided into a positive part and a negative part is changed, and the accuracy of an analysis result is improved.

(3) The text body of the anti-affair wisdom library is split step by step from top to bottom according to chapter clauses and sentence clauses, sentence-level analysis is taken as a main part, and the emotion analysis results of the whole chapter are formed by summarizing from bottom to top according to sentence integration, so that the fine granularity of analysis is improved, meanwhile, the analysis is prevented from being developed and analyzed word by word according to word levels, and the high accuracy and the high timeliness of the analysis are guaranteed to a certain extent.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a text emotion analysis method for defense domain defense intelligence base;

FIG. 2 is a schematic structural diagram of a text emotion analysis system facing defense domain defense intelligence.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. The description of the exemplary embodiments is merely illustrative and is in no way intended to limit the disclosure, its application, or uses. The present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that: unless otherwise indicated, the relative arrangement of parts and steps, the composition of materials, numerical expressions and values, etc., set forth in these embodiments should be construed as merely illustrative, and not a limitation.

All terms (including technical or scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs unless specifically defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

FIG. 1 is a schematic flow chart of a text emotion analysis method for a defense intelligence library in the field of national defense. As shown in fig. 1, the text emotion analysis method includes the following steps:

step S1: acquiring Text of a defense intelligence library in the national defense field;

step S2: and segmenting chapters in the Text according to a preset segmentation model, wherein the preset segmentation model is specifically a common punctuation mark, and the common punctuation mark at least comprises a comma, a comma and a sentence. ", question mark"? "and exclamation Point"! "etc., and may also include semicolons"; ", ellipses" … … ", etc., from the text chapters of the segmentation process, the sentence set T ═ T is obtained₁，t₂，……，t_nN is a natural number;

step S3: the sentence set T ═ { T } obtained in the above step S2₁，t₂，……，t_nPerforming preset mode processing, wherein the preset mode processing comprises removing special symbols (such as "#", and the like), null values (null), and the like in the sentence, and removing stop words (some words which are not meaningful at all, such as Chinese words of "o", "kayi", and the like);

then, a Conditional Random Field (CRF) algorithm is used to set T ═ T in the sentence set processed in the preset manner₁，t₂，……，t_nEvery sentence t in }_iPerforming word segmentation, wherein i is 1,2, … …, n.

The principle of the CRF algorithm is as follows: taking the sentence "i love tiananmen" as an example, assuming that X ═ i, love, tiananmen } is the result of the word segmentation given as input, then the probability that Y ═ noun, verb, noun } should be the maximum. The input sequence X, also called observation sequence, and the output sequence Y, also called state sequence, which constitutes a markov random field, so that the process of deriving the probability of a state sequence from an observation sequence comprises the probability of transforming the previous state into the next state (i.e. transition probability) and the probability of going from a state variable to an observation variable (i.e. emission probability).

The CRF word segmentation process specifically comprises the following steps:

(1) CRF uses the following letters to represent the state of each word:

the prefix is represented by B;

in the words, M is adopted for representation;

the word end is represented by E;

single word, using S to represent;

(2) in the operation process of the CRF, the output sequence Y with the maximum probability of sentences is searched for and used as a final word segmentation result. In fact, after the word positions are labeled, the words between B and E and the S single words are formed into participles. For example: after CRF labeling, the 'I love Tiananmen' is formed: I/S love/S day/B ampere/M gate/E, the word segmentation result of the sentence is: my (noun)/love (verb) heaven and earth (noun).

For another example, after CRF labeling of the sentence "i like a research creature", there may be a plurality of word segmentation results. The following takes two word segmentation results as examples.

Then, for a plurality of word segmentation results, the probability of the word segmentation results appearing in the whole corpus is calculated. In the word combination of "research biology", the probability of occurrence of "research" and "biology" is higher than that of "research biology" and "creature", so the first segmentation result is determined as the wrong segmentation result, and the output sequence with the maximum probability is the second segmentation result, i.e. SBEBEBE.

Then, after the CRF operation is finished, the maximum probability output sequence of each sentence is found, and finally the text data set t after word segmentation is obtained_i＝{w_i1，w_i2，……，w_im}。

Step S4: based on the segmented text data set t obtained in the step_i＝{w_i1，w_i2，……，w_imApplication ofCHI statistical method based on subjective 2-POS model and aiming at each sentence t_iPerforming condition screening to screen out 2-POS items with emotion statistical scores positioned at the first few digits, such as the first 100 digits, and selecting a sentence t containing the 2-POS items_iAdding 1 to the weight value, and the weight value w_tiAnd (4) judging. The weight value w_tiAlso called subjective and objective emotional weight values. Will w_tiSentence t greater than 0_iIs judged as a subjective sentence t'_lFrom this, a subjective sentence set T ' ═ T ' is obtained '₁，t′₂，……，t′_s1,2, … …, s, s is a natural number less than or equal to n;

the 2-POS model is a language model in which words in a sentence are classified according to their parts of speech, and then a combination of n consecutive parts of speech in the sentence is used as one item to represent a text, and when n is 2, the language model is called a 2-POS model. For example: "I love Tiananmen", the word segmentation and part of speech tagging are: "my (noun)/love (verb)/Tiananmen (noun)", the 2-POS model of the sentence is "noun-verb, verb-noun", wherein "noun-verb" is 1 2-POS item. The 2-POS items reflecting subjective emotion are called 2-POS subjective modes, and the 2-POS items reflecting objective emotion are called 2-POS objective modes.

The CHI statistical method based on the subjective 2-POS model is as follows:

wherein, χ²For the emotional statistical score, pat_tiIndicates a certain 2-POS, c_kThe term "subjective sentiment" means objective when k is 0, subjective when k is 1, N means the number of all sentences in the sentence set T, and a means the characteristic word pat_tiAnd belong to class c_kB represents the number of sentences containing the feature word pat_tiBut not in class c_kC indicates the sentence belongs to the category C_kNot including the characteristic word pat_tiD indicates that the sentence does not belong to the category c_kNor does it contain the characteristic word pat_tiThe number of sentences of (1). pat_tiFor c_kChi of²The higher the statistical score, the greater the relevance of the 2-POS item to the category, and the higher the probability that the sentence containing the 2-POS item belongs to the category.

Next, feature words pat are used_tiIs "war chariot", category c_kFor the "army" example, the reason why the calculation formula includes the term A/(A + C) is explained in detail.

According to the preceding definition, item A represents the number of documents that contain "war chariot" and belong to the category "army"; item B represents the number of documents that contain "chariot" but do not belong to the "army" category; item C represents the number of documents that do not contain "chariot" but belong to the "army" category; item D represents the number of documents that neither contain a "chariot" nor belong to the "army" category.

Therefore, the chi can be obtained by the formula²(chariot, army) value. Further, in the same manner, χ can also be obtained²(chariot, navy), chi²(battleship, army), χ²(warship, navy), etc.

In the analysis of the statistical results, if the feature word "battleship" appears less in the "army" category and appears more in the "navy" category, it indicates that the feature word has a low contribution rate to the "army" category, and the feature word should be excluded as noise for the "army" category.

Here, it is difficult to eliminate the noise as described above in the conventional CHI statistical method. This is because, if the number of occurrences of the "battleship" in the "naval" category in the document is greater than the number of occurrences of the "battle vehicle" in the "army" category, the "battleship" will be ranked higher than the statistical ranking of the "battle vehicle", resulting in the noise being preserved and affecting the accuracy of the result.

In the present invention, the formula also includes a/(a + C) term. Therefore, for the characteristic words (such as warships) with small occurrence frequency in the category of the "army", the A/(A + C) term is extremely small and can be eliminated as noise. On the other hand, for feature words (e.g., combat vehicles) that appear more frequently in the "army" category, this would result in a/(A + C) term being larger and can be retained as a valid result.

Step S5: and importing a pre-established emotion dictionary, wherein the emotion dictionary can be a Hownet emotion dictionary of the Hownet, and grading is carried out according to the degree of emotion expression words. Specifically, the degree levels include at least three levels, weighted by weight_kAnd (4) showing. lev1, lev2, lev3, the degree of emotional expression between at least three levels decreasing in order, wherein lev1 indicates very strong (corresponding emotional expressions such as "super", "very", "extremely", "special", etc., here non-exhaustive), lev2 indicates strong (corresponding emotional expressions such as "very", "especially", "real", etc., here non-exhaustive), lev3 indicates strong (corresponding emotional expressions such as "some", "slightly", etc., here non-exhaustive); and corresponding word weight values are given according to the difference of degree levels, and the word weight values weight corresponding to three levels lev1, lev2 and lev3_k1.5, 1.0 and 0.5 respectively.

Step S6: based on each subjective sentence t 'obtained in the step'_lAnd the participle result t 'obtained according to the above step S4'_l＝{w′_l1，w′_l2，……，w′_lmMaking a symbolic sentence judgment, wherein,

a tokenized sentence includes at least sentences containing the summarized and/or turning vocabulary of "in summary", "difficult to follow", "but", etc., since such sentences often represent the true sentiment of the author, as well as sentences at the beginning and/or end of the paragraph in the text.

Each subjective sentence t 'according to the judgment result'_lEndowing different characteristic weight values_spIf the sentence belongs to the symbolic sentence, the characteristic weight value is given to the sentence_sp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentence_sp＝1.0。

Step S7: according to the emotion dictionary, the subjective sentence t'_lEach word w in (1)_lkMaking emotional tendency statistics if the vocabulary w_lkIf the words belong to positive emotion words, the word w_lkFlag value of_kIf the term w is 1_lkBelonging to negative emotion words, the word w_lkFlag value of_k-1, then calculating the subjective sentence t'_lThe final score of (a):

then according to subjective sentence t'_lCalculating a final emotion score of the Text:

Step S8: calculating an emotional tendency value O of the Text,

O＝sign(Ori_T)

wherein sign is a sign function, when Ori_TIf greater than 0, O is 1, representing an aggressive view; when Ori_TWhen equal to 0, O is 0, representing a neutral point of view; when Ori_TWhen the ratio is less than 0, O is-1, which means negative viewpoint.

FIG. 2 is a schematic structural diagram of a text emotion analysis system for defense intelligence in the field of national defense. As shown in fig. 2, the text emotion analysis system 10 includes:

the defense intelligence library Text acquisition module 101 is used for acquiring a Text of a defense intelligence library in the national defense field;

the Text segmentation module 102 is configured to segment chapters in the Text according to a preset word segmentation model to obtain a sentence set;

the preprocessing and word segmentation module 103 is configured to preprocess the sentence set, and segment words of the preprocessed sentences by using a preset model to obtain word-segmented text data;

the screening and judging module 104 is used for performing condition screening on the text data after word segmentation, and performing weight adding judgment to obtain a subjective sentence set;

the emotion degree grading module 105 is used for grading the degree of the emotion expression words and giving corresponding word weight values;

a symbolic sentence judgment module 106, configured to judge a symbolic sentence, and assign a feature weight value according to a judgment result;

an emotion score calculation module 107, configured to calculate a final emotion score of the text;

and the emotional tendency judging module 108 is used for judging the emotional tendency of the text.

It is clear to a person skilled in the art that the solution according to the embodiments of the invention can be implemented by means of software and/or hardware. The term "module" in the present specification refers to software and/or hardware that can perform a specific function independently or in cooperation with other components, where the hardware may be, for example, an FPGA (Field-Programmable Gate Array), an IC (Integrated Circuit), or the like.

The various modules of the embodiments of the present invention may be implemented by analog circuits that implement the functions described in the embodiments of the present invention, or by software that executes the functions described in the embodiments of the present invention.

The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the program is executed by a processor to realize the steps of the text emotion analysis method facing to the defense intelligence library in the national defense field. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.

It should be noted that the invention focuses on defense field defense intelligence library text report data, and the improvement of chapter-level emotion analysis algorithm can be applied to text reports of other professional fields.

(example 1)

An embodiment of the present invention is described below. In the present embodiment, a specific text is taken as an example for explanation. In this embodiment, textual sentiment analysis is performed for use in report generation for "new infrastructure".

Firstly, acquiring Text of defense intelligence library in the national defense field:

and then, segmenting the text according to a preset word segmentation model, wherein the preset word segmentation model is a common punctuation mark. Thus, the following sentence sets are obtained.

Then, for the sentence set T obtained in the previous step, a Conditional Random Field (CRF) algorithm is adopted to process each sentence T_iThe word segmentation of (2). Thus, the following word segmentation results are obtained.

That is, by the above-described word segmentation processing, for each sentence t_iA text data set t is obtained_i＝{w_i1，w_i2，……，w_im}。

Then, each sentence t is subjected to CHI statistical method based on the subjective 2-POS model_iAnd (4) performing conditional screening to screen out 2-POS items with emotion statistical scores positioned in the first few digits, such as the first 3 digits, aiming at the category of 'new construction'. It should be noted that in the screening of the actual emotion statistical score, a larger number of 2-POS items should be screened to ensure the accuracy of the text emotion analysis, but in the present embodiment, only the top 3 2-POS items are screened for simplicity. E.g. for t₁-t₃Go on the sieveThe selected 2-POS items are as follows:

"capital (noun) -achievement (noun)", "outstanding (adjective) -achievement (noun)", "slight (adjective) -insufficient (noun)"

Next, based on the aforementioned screened 2-POS items, for the sentence t containing each 2-POS item_iCarry out weight value w_tiAnd adding 1.

For example, sentence t₁The system comprises 2-POS items of capital construction (noun) -achievement (noun) and outstanding (adjective) -achievement (noun), and then a sentence t is processed₁A weight value of 2 is assigned. Sentence t₂Does not contain any 2-POS item, then for sentence t₂A weight value of 0 is assigned. Sentence t₃Containing the 2-POS term "a little (adjective) -a little (noun)", then for sentence t₃A weight value of 1 is assigned.

Then, a weight value w is performed_tiIs determined by_tiSentences greater than 0 'are judged as subjective sentences t'₁. In this example, the sentence t₁And t₃The subjective sentence is judged, and a subjective sentence set T' is obtained by (T)₁，t₃)。

And then, importing a pre-established emotion dictionary, and carrying out grade division according to the degree of the emotion expression words. Specifically, the degree scale includes at least three scales: lev1, lev2, lev3, the degree of emotional expression between at least three levels decreasing in order, wherein lev1 indicates very strong (corresponding emotional expressions such as "super", "very", "extremely", "special", etc., here non-exhaustive), lev2 indicates strong (corresponding emotional expressions such as "very", "especially", "real", etc., here non-exhaustive), lev3 indicates strong (corresponding emotional expressions such as "some", "slightly", etc., here non-exhaustive); and corresponding word weight values are given according to the difference of degree levels, and the word weight values weight corresponding to three levels lev1, lev2 and lev3_k1.5, 1.0 and 0.5 respectively. At least 3 levels as referred to herein are predefined.

Here, for example, "highlight" the weight of the emotional expression word_k1.5, a sense of "slightWeight for expressing words_kIs 0.5.

Then, for sentence t₁And t₃And then, judging the symbolic sentences. In particular, a tokenized sentence includes at least sentences containing a generalized and/or turning vocabulary of "in summary", "difficult to follow", "but", etc., since such sentences often represent the true sentiment of the author, as well as sentences at the beginning and/or end of a paragraph in the text. If the sentence belongs to the symbolic sentence, the characteristic weight value is given to the sentence_sp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentence_sp1.0. The summarized and/or turning vocabulary is either predefined or obtained from a corpus.

I.e. sentence t₁Does not contain any summarizing and/or turning vocabulary, and thus is endowed with weight_sp1.0. Specific t₃Contains the summarized and/or inflected word "however", thus giving weight_sp1.25. Thus, sentence t₁The weight value of (a) is calculated to be 1.5 × 1.0 ═ 1.5, sentence t₂The weight value of (a) is calculated to be 0.5 × 1.25 — 0.625.

Then, according to the emotion dictionary, each vocabulary w in the subjective sentence_lkMaking emotional tendency statistics if the vocabulary w_lkIf the words belong to positive emotion words, the word w_lkFlag value of_kIf the term w is 1_lkBelonging to negative emotion words, the word w_lkFlag value of_k-1, then calculating the subjective sentence t'_lIs finally scored.

Specifically, for the main sentence t₁Containing the positive emotion word "salient" and thus the flag value flag for that word_k1. In addition, for the main sentence t₃It contains the negative emotion word "little" and therefore the flag value flag for that word_kIs-1. Thus, sentence t₁Of (d) or_t11.5 × 1 × 1.0 ═ 1.5, sentence t₃Of (d) or_t30.5 × (-1) × 1.25 ═ 0.625. It is to be noted that, in the present embodiment, each is, for the sake of simplicityThe sentence only contains one emotional vocabulary, and when a plurality of emotional vocabularies are contained in the sentence, the final score Ori of the sentence_tThe weighted result of all emotion vocabulary, i.e., the following equation, should be used.

And then, adding the final scores of the subjective sentences of the whole text T to obtain the final emotion score of the text T. In this example, the final score of the text T is Ori_T＝1.5+(－0.625)＝0.875。

Then, the final emotion score Ori of the text is calculated_TCompare to 0. When Ori_TIf > 0, the emotional tendency is judged to be "positive", and when Ori_TWhen 0 is set, the emotional tendency is determined to be "neutral", and when Ori is set_TIf < 0, the emotional tendency is determined to be "negative". In this embodiment, Ori_TAnd is more than 0, so that' China has achieved outstanding achievement in the aspect of capital construction and carries out the deployment of the next stage, but still has considerable defects. "the emotional tendency of this piece of text is" positive ".

It should be understood that the above-mentioned embodiments are only for illustrating the present invention, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and its inventive concept within the technical scope of the present invention, and shall be covered by the protection scope of the present invention.

Claims

1. A text emotion analysis method for defense intelligence base in the field of national defense is characterized by comprising the following steps:

step S2: segmenting chapters in the Text according to a preset word segmentation model to obtain a sentence set T ═ T { (T)₁，t₂，……，t_nWhere n is a natural number；

Step S3: setting T to { T "the sentence set obtained in the step S2₁，t₂，……，t_nProcessing in a preset mode, and adopting a conditional random field algorithm to process each sentence T in the sentence set T_iPerforming word segmentation, wherein i is 1,2, … …, n, to obtain word segmented text data;

step S4: based on the segmented text data obtained in the step S3, applying a CHI statistical method based on a subjective 2-POS model to each sentence t_iConditional filtering is performed by for each sentence t_iPerforming subjective and objective emotion weight value assignment and judging step of the subjective and objective emotion weight value to obtain a subjective sentence set T '═ T'₁，t′₂，……，t′_sS is a natural number less than or equal to n;

step S5: importing a pre-established emotion dictionary, carrying out degree grade division on emotion expression words, and giving corresponding word weight values according to the difference of the degree grades;

step S6: based on each subjective sentence t 'obtained in the step S4'_lWherein l ═ 1,2, … …, s, a symbolic sentence judgment is made, and each subjective sentence t 'is judged according to the judgment result'_lEndowing different characteristic weight values;

step S7: according to the emotion dictionary, the subjective sentence t'_lPerforming emotion tendency statistics on each vocabulary in the sentence, and performing emotion calculation on each subjective sentence t 'according to an emotion calculation model'_lCalculating the final score of the Text, and calculating the final emotion score of the Text;

step S8: and calculating an emotional tendency value O of the Text.

2. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,

in step S2, the predetermined word segmentation model is a common punctuation mark, wherein the common punctuation mark is set as comma, period, question mark and exclamation mark.

3. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,

the pair of sentence sets T ═ { T ] obtained in the step S2₁，t₂，……，t_nProcessing in a preset mode, and specifically comprising:

4. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,

the CHI statistical method based on the subjective 2-POS model is used for each sentence t_iAnd (3) carrying out condition screening, which specifically comprises the following steps:

each sentence t_iThe words in the sentence are classified according to the parts of speech, and the sequence combination of 2 continuous parts of speech in the sentence is used as one item for identifying the text, and the method is favorable forThe following formula is used for statistics:

5. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,

the pre-established emotion dictionary is a Hownet emotion dictionary, the degree grades comprise at least three grades, and the emotion expression degrees among the at least three grades are sequentially decreased; and the word weight values corresponding to the three levels are 1.5, 1.0 and 0.5, respectively.

6. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,

in step S6, the symbolic sentences include sentences containing summarized and/or turning words, or sentences of segment head and segment tail in the text;

judging the symbolic sentence, if the symbolic sentence belongs to the symbolic sentence, giving a weight value to the characteristic of the symbolic sentence_sp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentence_sp＝1.0。

7. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,

the step S7 of performing emotional tendency statistics specifically includes:

Calculating the final emotion score of the Text

8. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,

calculating the emotional tendency value O of Text sign (Ori)_T) Wherein sign is a sign function when Ori_TIf greater than 0, O is 1, representing an aggressive view; when Ori_TWhen equal to 0, O is 0, representing a neutral point of view; when Ori_TWhen the ratio is less than 0, O is-1, which means negative viewpoint.

9. The utility model provides a text emotion analysis system towards national defense field housekeeping intelligence storehouse which characterized in that: the text emotion analysis system comprises:

10. A computer-readable storage medium having stored thereon a computer program, characterized in that,

the computer program when executed by a processor implements the steps of a method for textual sentiment analysis for the national defense domain will be oriented towards the intellectual defense phase according to any one of claims 1 to 8.