CN112464646A - Text emotion analysis method for defense intelligence library in national defense field - Google Patents

Text emotion analysis method for defense intelligence library in national defense field Download PDF

Info

Publication number
CN112464646A
CN112464646A CN202011318544.4A CN202011318544A CN112464646A CN 112464646 A CN112464646 A CN 112464646A CN 202011318544 A CN202011318544 A CN 202011318544A CN 112464646 A CN112464646 A CN 112464646A
Authority
CN
China
Prior art keywords
sentence
text
emotion
subjective
defense
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011318544.4A
Other languages
Chinese (zh)
Inventor
董文轩
晏裕生
江洋
李斌
李兴亚
苏慧超
孙孟阳
姚晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Institute Of Marine Technology & Economy
Original Assignee
China Institute Of Marine Technology & Economy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Institute Of Marine Technology & Economy filed Critical China Institute Of Marine Technology & Economy
Priority to CN202011318544.4A priority Critical patent/CN112464646A/en
Publication of CN112464646A publication Critical patent/CN112464646A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention relates to a text emotion analysis method and system for a defense intelligence library in the field of national defense, wherein the method comprises the following steps: acquiring a text of a defense intelligence library in the national defense field; carrying out segmentation processing to obtain a sentence set; preprocessing and performing word segmentation by adopting a conditional random field algorithm; performing condition screening on each sentence by using a CHI statistical method based on a subjective 2-POS model to obtain a subjective sentence set; carrying out degree grading on the emotion expression words; then, judging a symbolic sentence; carrying out emotional tendency statistics on each vocabulary in the subjective sentences, calculating the final score of each subjective sentence according to an emotional calculation model, and calculating the final emotional score of the text; and calculating the emotional tendency value of the text. By adopting the text emotion analysis method, the defense intelligence library text report in the national defense field is subjected to autonomous analysis, the accuracy and timeliness of the analysis are improved, and a quick and accurate reference is provided for scientific and technical personnel in the national defense field.

Description

Text emotion analysis method for defense intelligence library in national defense field
Technical Field
The invention relates to the field of text classification emotion analysis, in particular to a text emotion analysis method and system for a defense intelligence library in the field of national defense.
Background
With the rapid development of the internet, more and more internet users change from simply acquiring internet information to creating internet information. Blogs, forums, discussion groups in the internet have emerged with a large amount of subjective text published by users. These subjective texts may be user comments about a certain product or service, or public opinions about a certain news event or national policy, etc. Potential consumers can obtain relevant comments when purchasing a certain product or service to provide decision reference, and government departments can also browse public opinions about news events or national policies to know public opinions. These subjective texts are growing exponentially each day, and manual analysis alone consumes a lot of manpower and time. Therefore, the computer is adopted to automatically analyze the emotion of the subjective text expression, and the computer becomes a hot spot of current academic research, and the research direction of the hot spot is text emotion analysis.
Text emotion Analysis (Sentiment Analysis) refers to a process of analyzing, processing and extracting subjective text with emotional colors by using natural language processing and text mining technologies. The text sentiment analysis method can be divided into four levels, namely word level, phrase level, sentence level, chapter level and the like according to the analysis granularity. Each level of object analysis corresponds to unique sentiment analysis results (positive, negative and neutral). Currently, the text emotion analysis research covers a plurality of fields including natural language processing, text mining, information retrieval, information extraction, machine learning, artificial intelligence and the like, and the text emotion analysis result has great significance for optimizing government, enterprise and consumer decisions, so that the technology is widely concerned by a plurality of scholars and research institutions.
The defense intelligence library particularly refers to an intelligence library which mainly takes the research on aspects of national security, national defense strategy, military strategy, strategy evaluation, operational concept and the like and indirectly or directly provides decision support service for military and military industry. It produces a great deal of research effort each year, and the type of effort is mostly in the form of text reports. The research result of the defense intelligence library usually contains emotional tendency to relevant affairs in the national defense field, and effective reference can be provided for national defense safety, national defense construction and the like by analyzing the emotion of the research result.
The application of the text sentiment analysis in the national defense science and technology field, particularly in the national defense field oriented defense intelligence base is limited to a certain extent, mainly because the content of the national defense science and technology field defense intelligence base report is different from the contents of microblog, forum comments, user evaluation and the like, and the research result has more authoritative instructive significance, so that the requirements on timeliness and accuracy of the text sentiment analysis are outstanding. On one hand, the national defense field terms in the text report of the defense intelligence library are more, and the pre-training time of words is greatly increased, so that the background knowledge body is difficult to construct and the timeliness requirement is difficult to meet; on the other hand, the Chiense report is usually in a chapter and paragraph format, which contains a large number of sentences, and the sentences may have complicated relations such as turning and sequential relations, and the analysis is difficult, and it is difficult to ensure high accuracy in the existing chapter-level-based text emotion analysis models, such as the LSTM model or the CRF model.
Disclosure of Invention
The invention provides a text emotion analysis method and system for a defense domain defense intelligence library, which are used for solving the problems in the prior art, the text emotion analysis method and system are used for dividing chapter texts layer by layer from top to bottom according to sentence levels and word levels, the improvement is carried out on the basis of the conventional CRF algorithm, the self-improved CHI statistical method is combined, the Hownet dictionary is divided according to the emotion degree in a weighted manner, and the final emotion analysis result is formed by summarizing from bottom to top, so that the accuracy and the timeliness of text emotion analysis of the defense domain defense intelligence library are improved.
In order to achieve the purpose, the invention provides the following technical scheme:
a text emotion analysis method for defense intelligence base in the field of national defense comprises the following steps:
acquiring Text of a defense intelligence library in the national defense field;
segmenting chapters in the Text according to a preset word segmentation model to obtain a sentence set T ═ T { (T)1,t2,……,tnN is a natural number;
for the sentence set T ═ T obtained in the above step1,t2,……,tnProcessing in a preset mode, and adopting a conditional random field algorithm to process each sentence T in the sentence set TiPerforming word segmentation, wherein i is 1,2, … …, n, to obtain word segmented text data;
based on the segmented text data obtained in the above steps, each sentence ti is subjected to condition screening by using a CHI statistical method based on a subjective 2-POS model, and each sentence t is subjected to condition screeningiPerforming subjective and objective emotion weight value assignment and judging step of the subjective and objective emotion weight value to obtain a subjective sentence set T '═ T'1,t′2,……,t′sS is a natural number less than or equal to n;
importing a pre-established emotion dictionary, carrying out degree grade division on emotion expression words, and giving corresponding word weight values according to the difference of the degree grades;
based on each subjective sentence t 'obtained in the step'lWherein l is 1,2, … …, s, making a symbolic sentence judgment, andeach subjective sentence t 'according to the judgment result'lEndowing different characteristic weight values;
according to the emotion dictionary, the subjective sentence t'lPerforming emotion tendency statistics on each vocabulary in the sentence, and performing emotion calculation on each subjective sentence t 'according to an emotion calculation model'lCalculating the final score of the Text, and calculating the final emotion score of the Text;
and calculating an emotional tendency value O of the Text.
Preferably, in the above step, the preset word segmentation model is a common punctuation mark, wherein the common punctuation mark is set as comma, period, question mark and exclamation mark.
Preferably, the pair obtains a set of sentences T ═ T1,t2,……,tnProcessing in a preset mode, and specifically comprising:
each sentence t is divided by adopting a preset elimination ruleiRemoving characters and/or words with preset attributes contained in the text, wherein the characters and/or words with the preset attributes at least comprise special symbols, null values and stop words;
the conditional random field algorithm is adopted to set the sentence set T ═ T1,t2,……,tnEvery sentence t in }iPerforming word segmentation, specifically comprising:
each sentence t processed in a preset modeiSetting the observation sequence as an observation sequence, setting the sequence output after conditional random field operation as a state sequence based on the input observation sequence, wherein the state sequence forms a Markov random field, and searching each sentence t in the conditional random field operation processiSequence of states of maximum probability as each sentence tiFinal word segmentation result set ti={wi1,wi2,……,wij,……,wimIn which wijRepresenting a sentence tiThe j-th cut word with the part-of-speech attribute in the list is i-1, 2, … …, n, j-i-1, 2, … …, m, and m is a natural number.
Preferably, the subjective 2 basis-CHI statistical method of POS model for said each sentence tiAnd (3) carrying out condition screening, which specifically comprises the following steps:
each sentence tiThe words in the sentence are classified according to the parts of speech, the sequence combination of 2 continuous parts of speech in the sentence is used as one item for identifying the text, and the statistics is carried out by using the following formula:
Figure BDA0002792069540000031
wherein, χ2For the emotional statistical score, pattiIndicates a certain 2-POS, ckThe term "subjective sentiment" means objective when k is 0, subjective when k is 1, N means the number of all sentences in the sentence set T, and a means the characteristic word pattiAnd belong to class ckB represents the number of sentences containing the feature word pattiBut not in class ckC indicates the sentence belongs to the category CkNot including the characteristic word pattiD indicates that the sentence does not belong to the category ckNor does it contain the characteristic word pattiThe number of sentences of (1);
according to the emotion statistical score condition, screening out chi2The 2-POS item of the top ten points is scored, and the sentence t containing the 2-POS itemiWeight value w oftiAdding 1 and weighting value wtiSentence t greater than 0iIs judged as a subjective sentence t'iObtaining the subjective sentence set T '═ T'1,t′2,……,t′s}。
Preferentially, the pre-established emotion dictionary is a Hownet emotion dictionary, the degree grades comprise at least three grades, and the emotion expression degrees among the at least three grades are sequentially decreased; and the word weight values corresponding to the three levels are 1.5, 1.0 and 0.5, respectively.
Preferably, the symbolic sentences include sentences containing summarized and/or turning words, or sentences of segment heads and segment tails in the text;
judging the symbolic sentence, if it is symbolicThe sentence is given its characteristic weight valuesp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentencesp=1.0。
Preferably, the performing emotional tendency statistics specifically includes:
according to the emotion dictionary, the subjective sentence t'lEach word w in (1)lkAnd (3) carrying out emotional tendency statistics: if the word wlkIf the words belong to positive emotion words, the word wlkFlag value ofkIf the term w is 1lkBelonging to negative emotion words, the word wlkFlag value ofk-1, and calculating the subjective sentence t'lIs finally scored
Figure BDA0002792069540000041
Calculating the final emotion score of the Text
Figure BDA0002792069540000042
Wherein l is 1,2, … …, s, k is 1,2, … …, m, s, m are all natural numbers.
Preferably, the emotion tendency value O of the Text is calculated as sign (Ori)T) Wherein sign is a sign function when OriTIf greater than 0, O is 1, representing an aggressive view; when OriTWhen equal to 0, O is 0, representing a neutral point of view; when OriTWhen the ratio is less than 0, O is-1, which means negative viewpoint.
A text emotion analysis system for defense field defense intelligence base, the text emotion analysis system comprises:
the defense intelligence library Text acquisition module is used for acquiring Text of a defense intelligence library in the national defense field;
the Text segmentation module is used for segmenting chapters in the Text according to a preset word segmentation model to obtain a sentence set;
the preprocessing and word segmentation module is used for preprocessing the sentence set and segmenting words of the preprocessed sentences by adopting a preset model to obtain text data after word segmentation;
the screening and judging module is used for carrying out condition screening on the text data after word segmentation and carrying out weight adding judgment to obtain a subjective sentence set;
the emotion degree grading module is used for grading the degree of the emotion expression words and endowing corresponding word weight values;
the symbolic sentence judgment module is used for judging the symbolic sentences and giving characteristic weight values according to the judgment results;
the emotion score calculation module is used for calculating the final emotion score of the text;
and the emotion tendency judgment module is used for judging the emotion tendency of the text.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a text sentiment analysis method towards the defense domain will-intelligent.
According to the specific embodiment provided by the invention, the technical scheme of the invention can obtain the following technical effects:
(1) by the CHI statistical method based on the subjective 2-POS model, noise words irrelevant to the category can be automatically removed, on one hand, the operation speed and the model construction efficiency can be effectively improved, the analysis timeliness is guaranteed, on the other hand, the influence of noise data on an analysis result can be removed or reduced, and the analysis accuracy is improved.
(2) According to the degree of the emotion expression words, 3-level weight division is carried out on the Hownet emotion dictionary, the situation that the traditional Hownet emotion dictionary is only divided into a positive part and a negative part is changed, and the accuracy of an analysis result is improved.
(3) The text body of the anti-affair wisdom library is split step by step from top to bottom according to chapter clauses and sentence clauses, sentence-level analysis is taken as a main part, and the emotion analysis results of the whole chapter are formed by summarizing from bottom to top according to sentence integration, so that the fine granularity of analysis is improved, meanwhile, the analysis is prevented from being developed and analyzed word by word according to word levels, and the high accuracy and the high timeliness of the analysis are guaranteed to a certain extent.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a text emotion analysis method for defense domain defense intelligence base;
FIG. 2 is a schematic structural diagram of a text emotion analysis system facing defense domain defense intelligence.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. The description of the exemplary embodiments is merely illustrative and is in no way intended to limit the disclosure, its application, or uses. The present disclosure may be embodied in many different forms and is not limited to the embodiments described herein. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. It should be noted that: unless otherwise indicated, the relative arrangement of parts and steps, the composition of materials, numerical expressions and values, etc., set forth in these embodiments should be construed as merely illustrative, and not a limitation.
All terms (including technical or scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs unless specifically defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
FIG. 1 is a schematic flow chart of a text emotion analysis method for a defense intelligence library in the field of national defense. As shown in fig. 1, the text emotion analysis method includes the following steps:
step S1: acquiring Text of a defense intelligence library in the national defense field;
step S2: and segmenting chapters in the Text according to a preset segmentation model, wherein the preset segmentation model is specifically a common punctuation mark, and the common punctuation mark at least comprises a comma, a comma and a sentence. ", question mark"? "and exclamation Point"! "etc., and may also include semicolons"; ", ellipses" … … ", etc., from the text chapters of the segmentation process, the sentence set T ═ T is obtained1,t2,……,tnN is a natural number;
step S3: the sentence set T ═ { T } obtained in the above step S21,t2,……,tnPerforming preset mode processing, wherein the preset mode processing comprises removing special symbols (such as "#", and the like), null values (null), and the like in the sentence, and removing stop words (some words which are not meaningful at all, such as Chinese words of "o", "kayi", and the like);
then, a Conditional Random Field (CRF) algorithm is used to set T ═ T in the sentence set processed in the preset manner1,t2,……,tnEvery sentence t in }iPerforming word segmentation, wherein i is 1,2, … …, n.
The principle of the CRF algorithm is as follows: taking the sentence "i love tiananmen" as an example, assuming that X ═ i, love, tiananmen } is the result of the word segmentation given as input, then the probability that Y ═ noun, verb, noun } should be the maximum. The input sequence X, also called observation sequence, and the output sequence Y, also called state sequence, which constitutes a markov random field, so that the process of deriving the probability of a state sequence from an observation sequence comprises the probability of transforming the previous state into the next state (i.e. transition probability) and the probability of going from a state variable to an observation variable (i.e. emission probability).
The CRF word segmentation process specifically comprises the following steps:
(1) CRF uses the following letters to represent the state of each word:
the prefix is represented by B;
in the words, M is adopted for representation;
the word end is represented by E;
single word, using S to represent;
(2) in the operation process of the CRF, the output sequence Y with the maximum probability of sentences is searched for and used as a final word segmentation result. In fact, after the word positions are labeled, the words between B and E and the S single words are formed into participles. For example: after CRF labeling, the 'I love Tiananmen' is formed: I/S love/S day/B ampere/M gate/E, the word segmentation result of the sentence is: my (noun)/love (verb) heaven and earth (noun).
For another example, after CRF labeling of the sentence "i like a research creature", there may be a plurality of word segmentation results. The following takes two word segmentation results as examples.
Figure BDA0002792069540000071
Then, for a plurality of word segmentation results, the probability of the word segmentation results appearing in the whole corpus is calculated. In the word combination of "research biology", the probability of occurrence of "research" and "biology" is higher than that of "research biology" and "creature", so the first segmentation result is determined as the wrong segmentation result, and the output sequence with the maximum probability is the second segmentation result, i.e. SBEBEBE.
Then, after the CRF operation is finished, the maximum probability output sequence of each sentence is found, and finally the text data set t after word segmentation is obtainedi={wi1,wi2,……,wim}。
Step S4: based on the segmented text data set t obtained in the stepi={wi1,wi2,……,wimApplication ofCHI statistical method based on subjective 2-POS model and aiming at each sentence tiPerforming condition screening to screen out 2-POS items with emotion statistical scores positioned at the first few digits, such as the first 100 digits, and selecting a sentence t containing the 2-POS itemsiAdding 1 to the weight value, and the weight value wtiAnd (4) judging. The weight value wtiAlso called subjective and objective emotional weight values. Will wtiSentence t greater than 0iIs judged as a subjective sentence t'lFrom this, a subjective sentence set T ' ═ T ' is obtained '1,t′2,……,t′s1,2, … …, s, s is a natural number less than or equal to n;
the 2-POS model is a language model in which words in a sentence are classified according to their parts of speech, and then a combination of n consecutive parts of speech in the sentence is used as one item to represent a text, and when n is 2, the language model is called a 2-POS model. For example: "I love Tiananmen", the word segmentation and part of speech tagging are: "my (noun)/love (verb)/Tiananmen (noun)", the 2-POS model of the sentence is "noun-verb, verb-noun", wherein "noun-verb" is 1 2-POS item. The 2-POS items reflecting subjective emotion are called 2-POS subjective modes, and the 2-POS items reflecting objective emotion are called 2-POS objective modes.
The CHI statistical method based on the subjective 2-POS model is as follows:
Figure BDA0002792069540000081
wherein, χ2For the emotional statistical score, pattiIndicates a certain 2-POS, ckThe term "subjective sentiment" means objective when k is 0, subjective when k is 1, N means the number of all sentences in the sentence set T, and a means the characteristic word pattiAnd belong to class ckB represents the number of sentences containing the feature word pattiBut not in class ckC indicates the sentence belongs to the category CkNot including the characteristic word pattiD indicates that the sentence does not belong to the category ckNor does it contain the characteristic word pattiThe number of sentences of (1). pattiFor ckChi of2The higher the statistical score, the greater the relevance of the 2-POS item to the category, and the higher the probability that the sentence containing the 2-POS item belongs to the category.
Next, feature words pat are usedtiIs "war chariot", category ckFor the "army" example, the reason why the calculation formula includes the term A/(A + C) is explained in detail.
According to the preceding definition, item A represents the number of documents that contain "war chariot" and belong to the category "army"; item B represents the number of documents that contain "chariot" but do not belong to the "army" category; item C represents the number of documents that do not contain "chariot" but belong to the "army" category; item D represents the number of documents that neither contain a "chariot" nor belong to the "army" category.
Therefore, the chi can be obtained by the formula2(chariot, army) value. Further, in the same manner, χ can also be obtained2(chariot, navy), chi2(battleship, army), χ2(warship, navy), etc.
In the analysis of the statistical results, if the feature word "battleship" appears less in the "army" category and appears more in the "navy" category, it indicates that the feature word has a low contribution rate to the "army" category, and the feature word should be excluded as noise for the "army" category.
Here, it is difficult to eliminate the noise as described above in the conventional CHI statistical method. This is because, if the number of occurrences of the "battleship" in the "naval" category in the document is greater than the number of occurrences of the "battle vehicle" in the "army" category, the "battleship" will be ranked higher than the statistical ranking of the "battle vehicle", resulting in the noise being preserved and affecting the accuracy of the result.
In the present invention, the formula also includes a/(a + C) term. Therefore, for the characteristic words (such as warships) with small occurrence frequency in the category of the "army", the A/(A + C) term is extremely small and can be eliminated as noise. On the other hand, for feature words (e.g., combat vehicles) that appear more frequently in the "army" category, this would result in a/(A + C) term being larger and can be retained as a valid result.
Step S5: and importing a pre-established emotion dictionary, wherein the emotion dictionary can be a Hownet emotion dictionary of the Hownet, and grading is carried out according to the degree of emotion expression words. Specifically, the degree levels include at least three levels, weighted by weightkAnd (4) showing. lev1, lev2, lev3, the degree of emotional expression between at least three levels decreasing in order, wherein lev1 indicates very strong (corresponding emotional expressions such as "super", "very", "extremely", "special", etc., here non-exhaustive), lev2 indicates strong (corresponding emotional expressions such as "very", "especially", "real", etc., here non-exhaustive), lev3 indicates strong (corresponding emotional expressions such as "some", "slightly", etc., here non-exhaustive); and corresponding word weight values are given according to the difference of degree levels, and the word weight values weight corresponding to three levels lev1, lev2 and lev3k1.5, 1.0 and 0.5 respectively.
Step S6: based on each subjective sentence t 'obtained in the step'lAnd the participle result t 'obtained according to the above step S4'l={w′l1,w′l2,……,w′lmMaking a symbolic sentence judgment, wherein,
a tokenized sentence includes at least sentences containing the summarized and/or turning vocabulary of "in summary", "difficult to follow", "but", etc., since such sentences often represent the true sentiment of the author, as well as sentences at the beginning and/or end of the paragraph in the text.
Each subjective sentence t 'according to the judgment result'lEndowing different characteristic weight valuesspIf the sentence belongs to the symbolic sentence, the characteristic weight value is given to the sentencesp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentencesp=1.0。
Step S7: according to the emotion dictionary, the subjective sentence t'lEach word w in (1)lkMaking emotional tendency statistics if the vocabulary wlkIf the words belong to positive emotion words, the word wlkFlag value ofkIf the term w is 1lkBelonging to negative emotion words, the word wlkFlag value ofk-1, then calculating the subjective sentence t'lThe final score of (a):
Figure BDA0002792069540000101
then according to subjective sentence t'lCalculating a final emotion score of the Text:
Figure BDA0002792069540000102
wherein l is 1,2, … …, s, k is 1,2, … …, m, s, m are all natural numbers.
Step S8: calculating an emotional tendency value O of the Text,
O=sign(OriT)
wherein sign is a sign function, when OriTIf greater than 0, O is 1, representing an aggressive view; when OriTWhen equal to 0, O is 0, representing a neutral point of view; when OriTWhen the ratio is less than 0, O is-1, which means negative viewpoint.
FIG. 2 is a schematic structural diagram of a text emotion analysis system for defense intelligence in the field of national defense. As shown in fig. 2, the text emotion analysis system 10 includes:
the defense intelligence library Text acquisition module 101 is used for acquiring a Text of a defense intelligence library in the national defense field;
the Text segmentation module 102 is configured to segment chapters in the Text according to a preset word segmentation model to obtain a sentence set;
the preprocessing and word segmentation module 103 is configured to preprocess the sentence set, and segment words of the preprocessed sentences by using a preset model to obtain word-segmented text data;
the screening and judging module 104 is used for performing condition screening on the text data after word segmentation, and performing weight adding judgment to obtain a subjective sentence set;
the emotion degree grading module 105 is used for grading the degree of the emotion expression words and giving corresponding word weight values;
a symbolic sentence judgment module 106, configured to judge a symbolic sentence, and assign a feature weight value according to a judgment result;
an emotion score calculation module 107, configured to calculate a final emotion score of the text;
and the emotional tendency judging module 108 is used for judging the emotional tendency of the text.
It is clear to a person skilled in the art that the solution according to the embodiments of the invention can be implemented by means of software and/or hardware. The term "module" in the present specification refers to software and/or hardware that can perform a specific function independently or in cooperation with other components, where the hardware may be, for example, an FPGA (Field-Programmable Gate Array), an IC (Integrated Circuit), or the like.
The various modules of the embodiments of the present invention may be implemented by analog circuits that implement the functions described in the embodiments of the present invention, or by software that executes the functions described in the embodiments of the present invention.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the program is executed by a processor to realize the steps of the text emotion analysis method facing to the defense intelligence library in the national defense field. The computer-readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It should be noted that the invention focuses on defense field defense intelligence library text report data, and the improvement of chapter-level emotion analysis algorithm can be applied to text reports of other professional fields.
(example 1)
An embodiment of the present invention is described below. In the present embodiment, a specific text is taken as an example for explanation. In this embodiment, textual sentiment analysis is performed for use in report generation for "new infrastructure".
Firstly, acquiring Text of defense intelligence library in the national defense field:
Figure BDA0002792069540000111
and then, segmenting the text according to a preset word segmentation model, wherein the preset word segmentation model is a common punctuation mark. Thus, the following sentence sets are obtained.
Figure BDA0002792069540000121
Then, for the sentence set T obtained in the previous step, a Conditional Random Field (CRF) algorithm is adopted to process each sentence TiThe word segmentation of (2). Thus, the following word segmentation results are obtained.
Figure BDA0002792069540000122
That is, by the above-described word segmentation processing, for each sentence tiA text data set t is obtainedi={wi1,wi2,……,wim}。
Then, each sentence t is subjected to CHI statistical method based on the subjective 2-POS modeliAnd (4) performing conditional screening to screen out 2-POS items with emotion statistical scores positioned in the first few digits, such as the first 3 digits, aiming at the category of 'new construction'. It should be noted that in the screening of the actual emotion statistical score, a larger number of 2-POS items should be screened to ensure the accuracy of the text emotion analysis, but in the present embodiment, only the top 3 2-POS items are screened for simplicity. E.g. for t1-t3Go on the sieveThe selected 2-POS items are as follows:
"capital (noun) -achievement (noun)", "outstanding (adjective) -achievement (noun)", "slight (adjective) -insufficient (noun)"
Next, based on the aforementioned screened 2-POS items, for the sentence t containing each 2-POS itemiCarry out weight value wtiAnd adding 1.
For example, sentence t1The system comprises 2-POS items of capital construction (noun) -achievement (noun) and outstanding (adjective) -achievement (noun), and then a sentence t is processed1A weight value of 2 is assigned. Sentence t2Does not contain any 2-POS item, then for sentence t2A weight value of 0 is assigned. Sentence t3Containing the 2-POS term "a little (adjective) -a little (noun)", then for sentence t3A weight value of 1 is assigned.
Then, a weight value w is performedtiIs determined bytiSentences greater than 0 'are judged as subjective sentences t'1. In this example, the sentence t1And t3The subjective sentence is judged, and a subjective sentence set T' is obtained by (T)1,t3)。
And then, importing a pre-established emotion dictionary, and carrying out grade division according to the degree of the emotion expression words. Specifically, the degree scale includes at least three scales: lev1, lev2, lev3, the degree of emotional expression between at least three levels decreasing in order, wherein lev1 indicates very strong (corresponding emotional expressions such as "super", "very", "extremely", "special", etc., here non-exhaustive), lev2 indicates strong (corresponding emotional expressions such as "very", "especially", "real", etc., here non-exhaustive), lev3 indicates strong (corresponding emotional expressions such as "some", "slightly", etc., here non-exhaustive); and corresponding word weight values are given according to the difference of degree levels, and the word weight values weight corresponding to three levels lev1, lev2 and lev3k1.5, 1.0 and 0.5 respectively. At least 3 levels as referred to herein are predefined.
Here, for example, "highlight" the weight of the emotional expression wordk1.5, a sense of "slightWeight for expressing wordskIs 0.5.
Then, for sentence t1And t3And then, judging the symbolic sentences. In particular, a tokenized sentence includes at least sentences containing a generalized and/or turning vocabulary of "in summary", "difficult to follow", "but", etc., since such sentences often represent the true sentiment of the author, as well as sentences at the beginning and/or end of a paragraph in the text. If the sentence belongs to the symbolic sentence, the characteristic weight value is given to the sentencesp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentencesp1.0. The summarized and/or turning vocabulary is either predefined or obtained from a corpus.
I.e. sentence t1Does not contain any summarizing and/or turning vocabulary, and thus is endowed with weightsp1.0. Specific t3Contains the summarized and/or inflected word "however", thus giving weightsp1.25. Thus, sentence t1The weight value of (a) is calculated to be 1.5 × 1.0 ═ 1.5, sentence t2The weight value of (a) is calculated to be 0.5 × 1.25 — 0.625.
Then, according to the emotion dictionary, each vocabulary w in the subjective sentencelkMaking emotional tendency statistics if the vocabulary wlkIf the words belong to positive emotion words, the word wlkFlag value ofkIf the term w is 1lkBelonging to negative emotion words, the word wlkFlag value ofk-1, then calculating the subjective sentence t'lIs finally scored.
Specifically, for the main sentence t1Containing the positive emotion word "salient" and thus the flag value flag for that wordk1. In addition, for the main sentence t3It contains the negative emotion word "little" and therefore the flag value flag for that wordkIs-1. Thus, sentence t1Of (d) ort11.5 × 1 × 1.0 ═ 1.5, sentence t3Of (d) ort30.5 × (-1) × 1.25 ═ 0.625. It is to be noted that, in the present embodiment, each is, for the sake of simplicityThe sentence only contains one emotional vocabulary, and when a plurality of emotional vocabularies are contained in the sentence, the final score Ori of the sentencetThe weighted result of all emotion vocabulary, i.e., the following equation, should be used.
Figure BDA0002792069540000141
And then, adding the final scores of the subjective sentences of the whole text T to obtain the final emotion score of the text T. In this example, the final score of the text T is OriT=1.5+(-0.625)=0.875。
Then, the final emotion score Ori of the text is calculatedTCompare to 0. When OriTIf > 0, the emotional tendency is judged to be "positive", and when OriTWhen 0 is set, the emotional tendency is determined to be "neutral", and when Ori is setTIf < 0, the emotional tendency is determined to be "negative". In this embodiment, OriTAnd is more than 0, so that' China has achieved outstanding achievement in the aspect of capital construction and carries out the deployment of the next stage, but still has considerable defects. "the emotional tendency of this piece of text is" positive ".
It should be understood that the above-mentioned embodiments are only for illustrating the present invention, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and its inventive concept within the technical scope of the present invention, and shall be covered by the protection scope of the present invention.

Claims (10)

1. A text emotion analysis method for defense intelligence base in the field of national defense is characterized by comprising the following steps:
step S1: acquiring Text of a defense intelligence library in the national defense field;
step S2: segmenting chapters in the Text according to a preset word segmentation model to obtain a sentence set T ═ T { (T)1,t2,……,tnWhere n is a natural number;
Step S3: setting T to { T "the sentence set obtained in the step S21,t2,……,tnProcessing in a preset mode, and adopting a conditional random field algorithm to process each sentence T in the sentence set TiPerforming word segmentation, wherein i is 1,2, … …, n, to obtain word segmented text data;
step S4: based on the segmented text data obtained in the step S3, applying a CHI statistical method based on a subjective 2-POS model to each sentence tiConditional filtering is performed by for each sentence tiPerforming subjective and objective emotion weight value assignment and judging step of the subjective and objective emotion weight value to obtain a subjective sentence set T '═ T'1,t′2,……,t′sS is a natural number less than or equal to n;
step S5: importing a pre-established emotion dictionary, carrying out degree grade division on emotion expression words, and giving corresponding word weight values according to the difference of the degree grades;
step S6: based on each subjective sentence t 'obtained in the step S4'lWherein l ═ 1,2, … …, s, a symbolic sentence judgment is made, and each subjective sentence t 'is judged according to the judgment result'lEndowing different characteristic weight values;
step S7: according to the emotion dictionary, the subjective sentence t'lPerforming emotion tendency statistics on each vocabulary in the sentence, and performing emotion calculation on each subjective sentence t 'according to an emotion calculation model'lCalculating the final score of the Text, and calculating the final emotion score of the Text;
step S8: and calculating an emotional tendency value O of the Text.
2. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,
in step S2, the predetermined word segmentation model is a common punctuation mark, wherein the common punctuation mark is set as comma, period, question mark and exclamation mark.
3. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,
the pair of sentence sets T ═ { T ] obtained in the step S21,t2,……,tnProcessing in a preset mode, and specifically comprising:
each sentence t is divided by adopting a preset elimination ruleiRemoving characters and/or words with preset attributes contained in the text, wherein the characters and/or words with the preset attributes at least comprise special symbols, null values and stop words;
the conditional random field algorithm is adopted to set the sentence set T ═ T1,t2,……,tnEvery sentence t in }iPerforming word segmentation, specifically comprising:
each sentence t processed in a preset modeiSetting the observation sequence as an observation sequence, setting the sequence output after conditional random field operation as a state sequence based on the input observation sequence, wherein the state sequence forms a Markov random field, and searching each sentence t in the conditional random field operation processiSequence of states of maximum probability as each sentence tiFinal word segmentation result set ti={wi1,wi2,……,wij,……,wimIn which wijRepresenting a sentence tiThe j-th cut word with the part-of-speech attribute in the list is i-1, 2, … …, n, j-i-1, 2, … …, m, and m is a natural number.
4. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,
the CHI statistical method based on the subjective 2-POS model is used for each sentence tiAnd (3) carrying out condition screening, which specifically comprises the following steps:
each sentence tiThe words in the sentence are classified according to the parts of speech, and the sequence combination of 2 continuous parts of speech in the sentence is used as one item for identifying the text, and the method is favorable forThe following formula is used for statistics:
Figure FDA0002792069530000021
wherein, χ2For the emotional statistical score, pattiIndicates a certain 2-POS, ckThe term "subjective sentiment" means objective when k is 0, subjective when k is 1, N means the number of all sentences in the sentence set T, and a means the characteristic word pattiAnd belong to class ckB represents the number of sentences containing the feature word pattiBut not in class ckC indicates the sentence belongs to the category CkNot including the characteristic word pattiD indicates that the sentence does not belong to the category ckNor does it contain the characteristic word pattiThe number of sentences of (1);
according to the emotion statistical score condition, screening out chi2The 2-POS item of the top ten points is scored, and the sentence t containing the 2-POS itemiWeight value w oftiAdding 1 and weighting value wtiSentence t greater than 0iIs judged as a subjective sentence t'iObtaining the subjective sentence set T '═ T'1,t′2,……,t′s}。
5. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,
the pre-established emotion dictionary is a Hownet emotion dictionary, the degree grades comprise at least three grades, and the emotion expression degrees among the at least three grades are sequentially decreased; and the word weight values corresponding to the three levels are 1.5, 1.0 and 0.5, respectively.
6. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,
in step S6, the symbolic sentences include sentences containing summarized and/or turning words, or sentences of segment head and segment tail in the text;
judging the symbolic sentence, if the symbolic sentence belongs to the symbolic sentence, giving a weight value to the characteristic of the symbolic sentencesp1.25; if the sentence does not belong to the symbolic sentence, the characteristic weight value is given to the sentencesp=1.0。
7. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,
the step S7 of performing emotional tendency statistics specifically includes:
according to the emotion dictionary, the subjective sentence t'lEach word w in (1)lkAnd (3) carrying out emotional tendency statistics: if the word wlkIf the words belong to positive emotion words, the word wlkFlag value ofkIf the term w is 1lkBelonging to negative emotion words, the word wlkFlag value ofk-1, and calculating the subjective sentence t'lIs finally scored
Figure FDA0002792069530000031
Calculating the final emotion score of the Text
Figure FDA0002792069530000032
Wherein l is 1,2, … …, s, k is 1,2, … …, m, s, m are all natural numbers.
8. The method for textual emotion analysis for national defense domain defense intelligence library according to claim 1,
calculating the emotional tendency value O of Text sign (Ori)T) Wherein sign is a sign function when OriTIf greater than 0, O is 1, representing an aggressive view; when OriTWhen equal to 0, O is 0, representing a neutral point of view; when OriTWhen the ratio is less than 0, O is-1, which means negative viewpoint.
9. The utility model provides a text emotion analysis system towards national defense field housekeeping intelligence storehouse which characterized in that: the text emotion analysis system comprises:
the defense intelligence library Text acquisition module is used for acquiring Text of a defense intelligence library in the national defense field;
the Text segmentation module is used for segmenting chapters in the Text according to a preset word segmentation model to obtain a sentence set;
the preprocessing and word segmentation module is used for preprocessing the sentence set and segmenting words of the preprocessed sentences by adopting a preset model to obtain text data after word segmentation;
the screening and judging module is used for carrying out condition screening on the text data after word segmentation and carrying out weight adding judgment to obtain a subjective sentence set;
the emotion degree grading module is used for grading the degree of the emotion expression words and endowing corresponding word weight values;
the symbolic sentence judgment module is used for judging the symbolic sentences and giving characteristic weight values according to the judgment results;
the emotion score calculation module is used for calculating the final emotion score of the text;
and the emotion tendency judgment module is used for judging the emotion tendency of the text.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that,
the computer program when executed by a processor implements the steps of a method for textual sentiment analysis for the national defense domain will be oriented towards the intellectual defense phase according to any one of claims 1 to 8.
CN202011318544.4A 2020-11-23 2020-11-23 Text emotion analysis method for defense intelligence library in national defense field Pending CN112464646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011318544.4A CN112464646A (en) 2020-11-23 2020-11-23 Text emotion analysis method for defense intelligence library in national defense field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011318544.4A CN112464646A (en) 2020-11-23 2020-11-23 Text emotion analysis method for defense intelligence library in national defense field

Publications (1)

Publication Number Publication Date
CN112464646A true CN112464646A (en) 2021-03-09

Family

ID=74799233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011318544.4A Pending CN112464646A (en) 2020-11-23 2020-11-23 Text emotion analysis method for defense intelligence library in national defense field

Country Status (1)

Country Link
CN (1) CN112464646A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 Method and device for analyzing emotion tendentiousness of subjective text
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN108920545A (en) * 2018-06-13 2018-11-30 四川大学 The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension
CN110096696A (en) * 2018-06-11 2019-08-06 电子科技大学 A kind of Chinese long text sentiment analysis method
CN110162781A (en) * 2019-04-09 2019-08-23 国金涌富资产管理有限公司 A kind of finance text subjectivity sentence automatic identifying method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 Method and device for analyzing emotion tendentiousness of subjective text
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
CN105022725A (en) * 2015-07-10 2015-11-04 河海大学 Text emotional tendency analysis method applied to field of financial Web
CN110096696A (en) * 2018-06-11 2019-08-06 电子科技大学 A kind of Chinese long text sentiment analysis method
CN108920545A (en) * 2018-06-13 2018-11-30 四川大学 The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension
CN110162781A (en) * 2019-04-09 2019-08-23 国金涌富资产管理有限公司 A kind of finance text subjectivity sentence automatic identifying method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐明 等: "基于改进卡方统计的微博特征提取方法", 《计算机工程与应用》, vol. 50, no. 19, pages 113 - 117 *

Similar Documents

Publication Publication Date Title
Sulea et al. Predicting the law area and decisions of french supreme court cases
CN104636425B (en) A kind of network individual or colony&#39;s Emotion recognition ability prediction and method for visualizing
US9633008B1 (en) Cognitive presentation advisor
CN111538828A (en) Text emotion analysis method and device, computer device and readable storage medium
CN110555203A (en) Text replying method, device, server and storage medium
CN112434164B (en) Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration
CN107463715A (en) English social media account number classification method based on information gain
JP2018005690A (en) Information processing apparatus and program
Prabowo et al. Hierarchical multi-label classification to identify hate speech and abusive language on Indonesian twitter
Mohanty et al. Resumate: A prototype to enhance recruitment process with NLP based resume parsing
US11436278B2 (en) Database creation apparatus and search system
Panicheva et al. Author clustering with and without topical features
CN110399603A (en) A kind of text-processing technical method and system based on sense-group division
Niyozmatova et al. Classification Based On Decision Trees And Neural Networks
Fu et al. Domain ontology based automatic question answering
Tual et al. A benchmark of nested named entity recognition approaches in historical structured documents
Narendra et al. Named entity recognition based resume parser and summarizer
CN112464646A (en) Text emotion analysis method for defense intelligence library in national defense field
KR20230134711A (en) Researcher matching device, matching method and computer program for industry-university collaboration project
Panchala et al. Hate speech & offensive language detection using ML &NLP
Polonijo et al. Propaganda detection using sentiment aware ensemble deep learning
KR101240330B1 (en) System and method for mutidimensional document classification
Abdulla et al. Fake News Detection: A Graph Mining Approach
Hamza et al. Text mining: A survey of Arabic root extraction algorithms
Dhea Noranita et al. Classification of MTI Student Thesis Documents at Bina Darma University Palembang Using Naïve Bayes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination