CN117669566A - Real-time data online intelligent processing method for layout file - Google Patents
Real-time data online intelligent processing method for layout file Download PDFInfo
- Publication number
- CN117669566A CN117669566A CN202410121659.6A CN202410121659A CN117669566A CN 117669566 A CN117669566 A CN 117669566A CN 202410121659 A CN202410121659 A CN 202410121659A CN 117669566 A CN117669566 A CN 117669566A
- Authority
- CN
- China
- Prior art keywords
- emotion
- noun
- word
- species
- logic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 230000008451 emotion Effects 0.000 claims abstract description 357
- 230000002159 abnormal effect Effects 0.000 claims abstract description 28
- 238000001514 detection method Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000000739 chaotic effect Effects 0.000 claims abstract description 14
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 37
- 230000011218 segmentation Effects 0.000 claims description 30
- 230000002996 emotional effect Effects 0.000 claims description 20
- 230000005856 abnormality Effects 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 2
- 238000000844 transformation Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 238000012512 characterization method Methods 0.000 description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 6
- 208000017194 Affective disease Diseases 0.000 description 3
- 208000019022 Mood disease Diseases 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 208000012839 conversion disease Diseases 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of layout file data processing, in particular to a real-time data online intelligent processing method of a layout file, which comprises the following steps: acquiring layout text data, and calculating associated emotion polarity words of each emotion word; for each noun, calculating the full text emotion feature tendency and emotion conversion confusion of various emotion types of the kth noun; constructing local emotion characteristic trends of various emotion types of each kth noun; constructing a position variation index of each kth noun according to the word position of the noun in the text data set; calculating the abnormal change value of the emotion logic characteristics of each kth noun on various emotion types, and further calculating the internal emotion logic chaotic coefficient of each kth noun on the kth emotion type; and carrying out intelligent processing on the text data set by combining with the LOF anomaly detection algorithm. The invention accurately analyzes the noun of the logic error in the format file and ensures the data processing effect.
Description
Technical Field
The application relates to the technical field of layout file data processing, in particular to a real-time data online intelligent processing method of layout files.
Background
The format file is a file type which has independent format and is solidified and presented, can ensure that the same file presents the same display effect on different devices, and is widely applied to the fields of electronic contract, file notification, file management and the like. The most important data in the layout file is text data, and how to ensure the safety and reliability of the text data in the transmission process of the layout file is always the development direction of the technical field.
The format file is used as a common office file type, and has the function of automatically correcting the text data in the use process of a user. The traditional text data of the layout file based on data cleaning is mostly based on a rule text data error correction method, and only grammar error detection and misprinted word detection can be carried out on words in one sentence in the text data, but logic errors which occur among sentences and among sentences in the text data are not further detected.
Disclosure of Invention
In order to solve the technical problems, the invention provides a real-time data online intelligent processing method of layout files, which aims to solve the existing problems.
The invention discloses a real-time data online intelligent processing method of format files, which adopts the following technical scheme:
the embodiment of the invention provides a real-time data online intelligent processing method of format files, which comprises the following steps:
obtaining layout text data and performing word segmentation processing to obtain a word segmentation data set, wherein the same word segmentation is the same kind of word; extracting nouns, emotion words, corresponding emotion types, emotion intensity degrees, emotion polarity words and corresponding polarity values in the word segmentation data set; extracting associated emotion polarity words of each emotion word;
for each noun, according to the firstEmotion intensity degree of emotion words of various emotion types in sentences of species nouns, polarity value of associated emotion polarity words and +.>Word spacing between species nouns gets +.>Full text emotion feature trends of various emotion types of species nouns; according to->The full text emotion characteristic tendency of various emotion types in sentences where species nouns are located is obtained +.>Affective conversion confusion of various affective types of species nouns; according to every +.>Emotion intensity degree of each emotion word of various emotion types in sentences of species nouns, polarity value of associated emotion polarity word and +.>Word interval construction of seed nouns each +.>Local emotional characteristic trends of various emotion types of species nouns; according to said each->Word positions of seed nouns in a text data set to construct each +.>Position variability index of species nouns; constructing each +.>Abnormal change values of emotion logic characteristics of seed nouns on various emotion types;
obtaining the internal emotion logic confusion of each sentence according to the full text emotion feature tendency, the local emotion feature tendency and the information quantity of each noun of each emotion type in each sentence; according to the firstInternal emotional logic confusion of each sentence where the seed noun is located, every +.>The full text emotion characteristic trend and the local emotion characteristic trend of various emotion types of the species noun are obtained to obtain each +.>The species noun is at->Internal emotion logic chaotic coefficients on the emotion type;
obtaining each first according to abnormal change value of emotion logic characteristics and internal emotion logic chaotic coefficientAnd (3) carrying out intelligent processing on the text data set by combining various emotion type confusion of species nouns and a LOF anomaly detection algorithm.
Further, the extracting the associated emotion polarity word of each emotion word includes:
presetting a plurality of words at intervals on two sides of the emotion word, and selecting one emotion polarity word closest to the emotion word from the words as an associated emotion polarity word of the emotion word.
Further, the firstA full text emotion feature propensity for various emotion types of a noun, comprising:
for the firstFirst->Type of emotion, calculate +.>The sentence in which the seed noun is located is +.>The product of the emotion intensity degree of emotion words of emotion type and the polarity value of the associated emotion polarity word, and statistics of emotion words and +.>Word interval between seed nouns, obtaining the ratio of the product to the word interval, and adding +.>The average value of the ratio of all sentences of the species noun is taken as +.>First->Full text emotion feature trends of the emotion types;
when the emotion words do not have the associated emotion polarity words, the polarity value of the associated emotion polarity words of the emotion words is set to be 1.
Further, the firstThe degree of confusion of emotion transformations of various emotion types of a noun includes:
the absolute value of the ratio is recorded as a first absolute value, and the first absolute value is calculatedThe average value of the first absolute values of all sentences of the species noun is recorded as a first average value, and the average value and the +.>First->Difference value of absolute value of full-text emotion characteristic trend of emotion type, and absolute value of ratio of the difference value to first mean value is taken as +.>First->Affective transformation confusion of emotion type.
Further, said constructing each of the firstThe local emotional characteristic trends of various emotion types of a noun include:
for the first in the text data setPerson->Species nouns, calculate +.>Marking the product of the emotion intensity degree of each emotion word of emotion type and the polarity value of the associated emotion polarity word as a first product, obtaining the ratio of the first product to the word interval, and adding the +.>The average of the ratios of all emotion words of the emotion type is taken as +.>Person->First->Local emotional characteristic trends of emotion types.
Further, said constructing each of the firstThe position variability index of a term includes:
statistics of every thWord sequence number of the seed noun in the word segmentation data set is used as word position, and the seed noun is +.>Person, th->Person, th->Person->Word positions of the seed nouns are respectively marked as +.>First->Person->Position variability index of species nouns->The expression of (2) is:
in the method, in the process of the invention,is->Word number of a noun.
Further, said constructing each of the firstThe abnormal change value of the emotion logic characteristics of the noun on various emotion types comprises the following steps:
acquisition of the firstPerson->The species noun is at->Local emotional characteristic trend on the species emotion type and +.>The species noun is at->Calculating the absolute value of the difference value of the full-text emotion characteristic tendency on the emotion type and the +.>First->Obtaining a minimum value of the difference value and 0 by using the difference value of the emotion conversion chaos of the emotion type, and taking the ratio of the minimum value to the position variation index as the +.>Person->The species noun is at->Abnormal change value of emotion logic characteristics on emotion type.
Further, the internal emotion logic confusion of each sentence includes:
for each sentence, counting the information quantity of each noun in the sentence in the word segmentation data set, and setting the first noun of each nounAnd obtaining the absolute value of the difference between the local emotion feature tendency and the full-text emotion feature tendency in the emotion type, marking the product of the absolute value of the difference and the information quantity as a first product, and taking the sum of the first products of all nouns in the sentence as the internal emotion logic confusion of the sentence.
Further, each of the firstThe species noun is at->The internal emotion logic chaotic coefficient expression on emotion type is as follows:
in the method, in the process of the invention,is->Person->The seed word is in->Internal emotion logic chaotic coefficient on emotion type, < ->Is->The seed noun is at->Internal emotion logic confusion of each sentence, N is +.>The number of sentences in which the seed noun is located, +.>Is->Person->The species noun is at->Local emotional characteristic trend of the species emotion type, +.>Is->The species noun is at->Full text emotion feature on emotion type +.>Is->Person->Information amount of the seed noun in the word segmentation data set.
Further, the abnormal change value according to the emotion logic characteristics and the internal emotion logic mixtureObtaining each of the first random coefficientsThe intelligent processing of the text data set by combining the confusion degree of various emotion types of the nouns and the LOF anomaly detection algorithm comprises the following steps:
will be the firstPerson->The species noun is at->The product of the abnormal change value of the emotion logic characteristic on the emotion type and the internal emotion logic chaotic coefficient is taken as the +.>Person->First->Disorder of emotion types;
all of the text data setThe confusion of various emotion types of species nouns are used as the input of an LOF abnormality detection algorithm, and the LOF abnormality detection algorithm is output as each +.>Seed noun->Detection value, when said->When the detection value is greater than or equal to a preset threshold value, the corresponding noun has logic errors, and the noun with the logic errors is screened out.
The invention has at least the following beneficial effects:
according to the invention, nouns in the text data of the format file are subjected to emotion logic detection, nouns with obvious difference between emotion logic and the whole text are marked as abnormal data, and real-time online processing of the data is completed. For the same word, the emotion characterization intensity degree of each word representing emotion in sentences appearing by the same word is used for constructing the full text emotion feature tendency and emotion conversion confusion degree of the noun and representing the emotion logic tendency of the word in the whole text.
Further constructing local emotion feature trends of the nouns through the emotion features of the words in the local, making the local emotion feature trends and the full text emotion feature trends poor, and using emotion conversion disorder as a threshold value to obtain abnormal change values of emotion logic features of the words, wherein the abnormal degree of emotion logic of the words relative to the whole article is represented; further, in the sentence where the word is located, calculating the influence of the word on the degree of confusion caused by the emotion logic of the sentence, obtaining an internal emotion logic confusion coefficient, integrating the internal emotion logic confusion coefficient with the emotion logic characteristic abnormal change value to be used as a quantization index of the emotion logic confusion degree of the word, representing the logic confusion degree of the word, detecting the same kind of word by using an LOF abnormality detection algorithm, screening out nouns with logic errors, and completing real-time online processing of text data of layout file data.
Compared with the traditional format file text data processing method, the method can only detect simple grammar errors through the rule making, can further screen out the semantically abnormal part of the text data through the logic coherence degree among the whole text, can accurately detect abnormal data of the published file text data, and obtains better data processing effect.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of an online intelligent processing method for real-time data of layout files;
FIG. 2 is a schematic diagram of a logic anomaly word detection process.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of the specific implementation, structure, characteristics and effects of the method for online intelligent processing of real-time data of layout files according to the invention in combination with the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a specific scheme of an online intelligent processing method for real-time data of format files, which is specifically described below with reference to the accompanying drawings.
The invention provides an online intelligent processing method for real-time data of a format file, in particular to an online intelligent processing method for real-time data of a format file, referring to fig. 1, comprising the following steps:
and S001, acquiring text data and performing word segmentation.
Acquiring text data of a format file, taking the text data as input, adopting a word segmentation algorithm based on a statistical rule, wherein a corpus of the word segmentation algorithm is easy to acquire in the field of public information, outputting the text data subjected to word segmentation, and recording the acquired text data as a word segmentation data set. Wherein the same word is divided after word segmentationWords are words of the same kind.
So far, text data of the format file can be obtained, word segmentation processing is carried out, and word segmentation data sets are obtained by obtaining each word segment.
Step S002, the full text emotion feature tendency and emotion conversion confusion degree are built, an emotion logic feature abnormal change value is calculated, the sentence internal emotion confusion degree is further built, and nouns with emotion logic errors are screened.
According to the embodiment, nouns in the text data are taken as base points, words expressing emotion near the nouns in the text data are judged, emotion logic changes of the nouns in local are judged, text data with logic errors are represented by combination of the nouns and the emotion words which are possibly abnormal are screened out through the emotion logic changes of the nouns in the whole text, the abnormal text data are processed online, and a layout file user is assisted in writing.
Therefore, the nouns, the words expressing emotion and the words expressing emotion polarity need to be screened first. In this embodiment, a text database matching manner is adopted to screen out the required nouns, and the screening process specifically includes: word segmentation data setThe method comprises the steps of (1) inputting, outputting the part of speech of each word in a word segmentation data set by adopting a part of speech discrimination algorithm based on a word part of speech database; further word segmentation data set->For input, a word matching algorithm based on an emotion word text database is adopted, word types expressing emotion in a word segmentation data set are output, and emotion attributes of the word types expressing emotion in each process are obtained, wherein +.>The word category expressing emotion is marked +.>Wherein->Seven emotion types are used as emotion types, and are represented by numerals 1 to 7, and +.>Is the emotion intensity degree; further word segmentation data set->For input, a word matching algorithm based on an emotion polarity word text database is adopted, word types expressing emotion polarities in a word segmentation data set are output, and a polarity value of each word type expressing emotion polarities is obtained, wherein ∈>The polarity value of the word class expressing emotion polarity is recorded as +.>Its value belongs to the interval->Positive values represent positive polarity words, negative values represent negative polarity words, and larger absolute values represent greater polarity of the emotion expressed.
Calculate the firstFirst->The full text emotion feature trend and emotion conversion confusion of emotion types are as follows:
in the method, in the process of the invention,is->The seed noun is at->The>The emotion intensity degree of emotion words of the emotion type; />First->The seed noun is at->The>The polarity value of the associated emotion polarity word of the emotion type needs to be explained, wherein the associated emotion polarity word of the emotion word is a search range of the emotion word, which is formed by A words at intervals on two sides of the emotion word, and one emotion polarity word closest to the emotion word is selected from the search range, in this embodiment A=3, if the emotion word does not have the associated emotion polarity word, the emotion word corresponds to the emotion polarity word +.>;Is->The seed noun is at->The>Emotional words of the emotion type and +.>Word spacing of seed nouns; />Is->The species noun is at->Full text emotion feature trend on seed emotion type, +.>Is->First->Affective transformation disorder of species emotion type, N is +.>The number of sentences in which the seed noun is located.
In the method, in the process of the invention,characterization of->The seed noun is at->The>The emotion weight of emotion words of the emotion type, wherein positive values represent positive polarity words, negative values represent negative polarity words, and the larger the absolute value is, the stronger the emotion polarity is expressed; />Is the reciprocal of the word spacing, the larger the value of which represents the pairThe emotion words are at->In the sentence and->The closer the seed noun is, the emotion word pair ++>The affective characteristics of the species nouns are more affected; finally will be->The +.f. of all sentences in which the species noun appears>Emotion word pair of emotion type +.>Averaging the affective influence of species nouns to obtain the full-text affective characteristic tendency ++>Characterizing->The species noun is at->Emotion tendencies in emotion types, positive values representing positive polarity words, negative values representing negative polarity words, and larger absolute values representing greater polarity of emotion expressed. Further, is->For characterizing affective disorder, when affective influence +.>When all the values are positive or negative, the value of the numerator in the emotion conversion disorder formula is 0, and then +.>Is 0, characterize->The emotion logic of the seed noun is not changed; when the difference corresponding to the molecules in the affective disorder formula is not 0, the larger the molecule is, the more +.>The more the emotion logic of a noun changes throughout the text data. Finally->Characterization of->First->The degree of emotion logic conversion in the emotion type, the larger the value is representing +.>The species noun is at->The greater the change in emotion logic over emotion types.
According to the method of the present embodiment, the first step can be calculatedThe emotion logic of the species noun changes throughout the text data, and the embodiment will further calculate +.>Emotional logic of species nouns in local emotional characteristic trend, +.>The larger the local emotion logic characteristic tendency of the seed noun is different from the emotion logic of the whole text data, the +.>The more likely a noun is to be logically wrong locally.
Thus, the present embodiment will analyze the firstPerson->Species noun->The expression of the local emotion characteristic trend of the emotion type is as follows:
in the method, in the process of the invention,is at +.>Person->The sentence in which the seed noun is located is +.>Type->The emotion intensity degree of each emotion word; />Is at +.>Person->The sentence in which the seed noun is located is +.>Type->Polarity values of associated emotion polarity words of the emotion words; />Is at +.>Person->The sentence in which the seed noun is located is +.>Type->The emotion words and->Word spacing of seed nouns; />Is at +.>Person->The sentence in which the seed noun is located is +.>Number of emotion words of emotion type. Wherein->Person->The term means ++>Person->A noun.
In the method, in the process of the invention,characterization of->Person->The +.f. in the sentence where the species noun is located>The>The emotion tendency of each emotion word, wherein positive values represent positive polarity words, negative values represent negative polarity words, and the larger the absolute value is, the larger the emotion polarity is expressed; />The larger represents at->Person->Sentence where seed noun is located->Type->Distance of emotion words->Person->The closer the distance of species nouns is, at +.>Person->In the sentence of species noun +.>Individual emotion word pair->The affective characteristics of the species nouns are more affected; then->Represents->Individual emotion word pair->Person->The affective influence of species nouns is stronger as the value is larger, and finally the average value is obtainedRepresents->Person->First->The local emotion feature tendency of the emotion type, positive value represents positive polarity word, negative value represents negative polarity word, and the larger the absolute value is, the larger the expressed emotion polarity is.
Further, build feature NoPerson->Local emotion feature value variation data set of a noun:wherein->Is->Person->The species noun is at->Local emotional characteristic trend of the species emotion type, +.>Representative is +.>Person->The seed noun is in the word segmentation data set at the +.>The word positions.
Further, for the firstComparing the local emotion characteristic value of the seed noun with emotion characteristics in the whole text data, and calculating the +.>Whether the noun has logic errors locally or not, and constructing an emotion logic characteristic abnormal change value:
in the method, in the process of the invention,is->Person->The species noun is at->Abnormal change values of emotion logic characteristics on emotion types; />Is->Word number of seed nouns; />Is->Person->The species noun is at->Local emotion feature trends of the emotion types; />Is->The species noun is at->Full text emotion feature trends in the emotion type; />Is->First->Affective transformation confusion of the emotion type; />Is->Person->Position variability index of species nouns; />Is the minimum value.
In the method, in the process of the invention,is the difference between the local emotion feature tendency and the full text emotion feature tendency, the larger the difference is, the more ∈the difference is>Person->The species noun is at->Emotion logic in the emotion type and all other +.>The species noun is at->The larger the emotion logic difference in the emotion type is, the +.>Person->The more likely a seed noun is a word that has a logical error; />As the affective disorder, the larger the value is, the +.>Species nouns are in the text at +.>The emotion logic changes shown on the emotion types are disordered; in formula->In (1), is->Person->Species nouns and->Person->Species noun and->Person->The sum of the distances of species nouns, the larger the value, the more +.>Person->The longer the context distance between the species noun and the adjacent homonoun, the longer the context distance, the more text data author is about +.>The greater the probability of emotional logic changes of the species noun, therefore +.>As a weight, the difference between the local emotion feature tendency and the full text emotion feature tendency is processed +.>Person->The longer the context distance between a species noun and an adjacent homonoun, the smaller the weight,the smaller represents->Person->The less likely a term is a word whose emotion logic is incorrect.
Further, the embodiment constructs the internal emotion confusion of sentences and is used for screening words with emotion logic errors. Calculate the firstPerson->The internal emotion logic chaotic coefficient caused by the words in the text data is specifically calculated as follows:
in the method, in the process of the invention,is->Internal emotion logic confusion of individual sentences; />Is->The>The personal noun is at->Local emotion feature trends in the seed emotion type; />Is->The>The personal noun is at->Full text emotion feature trends in the emotion type; />Is->Noun number of the individual sentences; />Is->The>The amount of information in the word segmentation data set of each noun, it should be noted that,the calculation of the information quantity is a well-known technology in the field, and can be obtained through calculation in the prior art, and the calculation process is not repeated; />Is->Person->The species noun is at->Local emotion feature trends of the emotion types; />Is->The species noun is at->Full text emotion feature trends in the emotion type; />Is->Person->Information quantity of species nouns in the word segmentation data set; />Is->Person->The seed word is in->On emotion typeInternal emotion logic chaotic coefficient,>is->The seed noun is at->Internal emotion logic confusion of each sentence, N is +.>The number of sentences in which the seed noun is located.
In the method, in the process of the invention,is->The>The personal noun is at->The difference between the local emotion feature trend and the full text emotion feature trend in the emotion type is larger to represent the +.>The more confusing the emotion logic of each sentence; />Characterization of->Personal noun pair->The information amount contributed by each sentence is larger, and the larger the value is, the more information amount contributed is, the +.>Personal noun pair->The larger the influence of the individual sentences, the more ∈th is calculated>When the emotional logic disorder degree of the sentence is the +.>The weight occupied by the individual nouns should be greater; finally get->Internal emotional logic confusion of individual sentences>The larger the value is, the more ∈>The more confusing the internal emotion logic of the individual sentences.
Further atIn the formula of->Characterization of->Person->The +.>The size of the contribution of the internal emotional logic confusion of the sentences, and therefore,characterization of->The seed noun is at->The internal affective logic disorder of the sentence is deleted +.>Person->The change of the internal emotion logic confusion after the noun is planted, and the larger the change is, the more +.>Person->The more likely a species noun will result in +.>The emotion logic of the sentence becomes confused, so +.>Person->The more likely a term is a logically incorrect word.
For each of the first set of text dataSpecies nouns, in->Person->By way of example, the species noun is at +.>Abnormal change value of emotion logic characteristics on emotion type +.>And internal emotion logic disorder coefficient->Is the product of (1) as each->First->Disorder of emotion type, use +.>Obtaining each +.>Various emotion type confusion of a noun.
Further, all the first items in the text data setThe confusion of various emotion types of species nouns are used as the input of an LOF abnormality detection algorithm, and the LOF abnormality detection algorithm is output as each +.>Seed noun->The detection value, threshold value, in this embodiment 1.5, can be set by the practitioner by himself, when +.>Person->Seed noun->Detection value->When in use, then->Person->The nouns have logic errors, the nouns are marked with errors, words with the logic errors are screened, and the online intelligent processing of the format file data is completed. The logic abnormal word detection flow chart is shown in fig. 2, wherein the abnormal word and the logic abnormal word refer to a logically incorrect noun in the embodiment.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; the technical solutions described in the foregoing embodiments are modified or some of the technical features are replaced equivalently, so that the essence of the corresponding technical solutions does not deviate from the scope of the technical solutions of the embodiments of the present application, and all the technical solutions are included in the protection scope of the present application.
Claims (10)
1. The real-time data online intelligent processing method of the format file is characterized by comprising the following steps of:
obtaining layout text data and performing word segmentation processing to obtain a word segmentation data set, wherein the same word segmentation is the same kind of word; extracting nouns, emotion words, corresponding emotion types, emotion intensity degrees, emotion polarity words and corresponding polarity values in the word segmentation data set; extracting associated emotion polarity words of each emotion word;
for each noun, according to the firstEmotion intensity degree of emotion words of various emotion types in sentences of species nouns, polarity value of associated emotion polarity words and +.>Word spacing between species nouns gets +.>Full text emotion feature trends of various emotion types of species nouns; according to->The full text emotion characteristic tendency of various emotion types in sentences where species nouns are located is obtained +.>Affective conversion confusion of various affective types of species nouns; according to every +.>Emotion intensity degree of each emotion word of various emotion types in sentences of species nouns, polarity value of associated emotion polarity word and +.>Word interval construction of seed nouns each +.>Local emotional characteristic trends of various emotion types of species nouns; according to said each->Word positions of seed nouns in a text data set to construct each +.>Species nounPosition variation index of (2); constructing each +.>Abnormal change values of emotion logic characteristics of seed nouns on various emotion types;
obtaining the internal emotion logic confusion of each sentence according to the full text emotion feature tendency, the local emotion feature tendency and the information quantity of each noun of each emotion type in each sentence; according to the firstInternal emotional logic confusion of each sentence where the seed noun is located, every +.>The full text emotion characteristic trend and the local emotion characteristic trend of various emotion types of the species noun are obtained to obtain each +.>The species noun is at->Internal emotion logic chaotic coefficients on the emotion type;
obtaining each first according to abnormal change value of emotion logic characteristics and internal emotion logic chaotic coefficientAnd (3) carrying out intelligent processing on the text data set by combining various emotion type confusion of species nouns and a LOF anomaly detection algorithm.
2. The method for online intelligent processing of real-time data of layout files according to claim 1, wherein the extracting the associated emotion polarity words of each emotion word comprises:
presetting a plurality of words at intervals on two sides of the emotion word, and selecting one emotion polarity word closest to the emotion word from the words as an associated emotion polarity word of the emotion word.
3. The online intelligent processing method for real-time data of layout files according to claim 2, wherein the first step is thatA full text emotion feature propensity for various emotion types of a noun, comprising:
for the firstFirst->Type of emotion, calculate +.>The sentence in which the seed noun is located is +.>The product of the emotion intensity degree of emotion words of emotion type and the polarity value of the associated emotion polarity word, and statistics of emotion words and +.>Word interval between seed nouns, obtaining the ratio of the product to the word interval, and adding +.>The average value of the ratio of all sentences of the species noun is taken as +.>First->Full text emotion feature trends of the emotion types;
when the emotion words do not have the associated emotion polarity words, the polarity value of the associated emotion polarity words of the emotion words is set to be 1.
4. The online intelligent processing method for real-time data of layout file according to claim 3, wherein the first step isThe degree of confusion of emotion transformations of various emotion types of a noun includes:
the absolute value of the ratio is recorded as a first absolute value, and the first absolute value is calculatedThe average value of the first absolute values of all sentences of the species noun is recorded as a first average value, and the average value and the +.>First->Difference value of absolute value of full-text emotion characteristic trend of emotion type, and absolute value of ratio of the difference value to first mean value is taken as +.>First->Affective transformation confusion of emotion type.
5. A method for online intelligent processing of real-time data of layout files according to claim 3, wherein each of the first and second layout files is constructedThe local emotional characteristic trends of various emotion types of a noun include:
for textThe first in the data setPerson->Species nouns, calculate +.>Marking the product of the emotion intensity degree of each emotion word of emotion type and the polarity value of the associated emotion polarity word as a first product, obtaining the ratio of the first product to the word interval, and adding the +.>The average of the ratios of all emotion words of the emotion type is taken as +.>Person->First->Local emotional characteristic trends of emotion types.
6. The online intelligent processing method for real-time data of layout files according to claim 1, wherein each of the first and second structures is constructedThe position variability index of a term includes:
statistics of every thWord sequence number of the seed noun in the word segmentation data set is used as word position, and the seed noun is +.>First, secondPerson, th->Person->Word positions of the seed nouns are respectively marked as +.>First, thePerson->Position variability index of species nouns->The expression of (2) is:
in the method, in the process of the invention,is->Word number of a noun.
7. The online intelligent processing method for real-time data of layout files according to claim 4, wherein each of the first and second layout files is constructedEmotion of species nouns on various emotion typesThe logic characteristic abnormal change value includes:
acquisition of the firstPerson->The species noun is at->Local emotional characteristic trend on the species emotion type and +.>The species noun is at->Calculating the absolute value of the difference value of the full-text emotion characteristic tendency on the emotion type and the +.>First->Obtaining a minimum value of the difference value and 0 by using the difference value of the emotion conversion chaos of the emotion type, and taking the ratio of the minimum value to the position variation index as the +.>Person->The species noun is at->Abnormal change value of emotion logic characteristics on emotion type.
8. The online intelligent processing method for real-time data of layout files according to claim 3, wherein the internal emotion logic confusion degree of each sentence comprises:
for each sentence, counting the information quantity of each noun in the sentence in the word segmentation data set, and setting the first noun of each nounAnd obtaining the absolute value of the difference between the local emotion feature tendency and the full-text emotion feature tendency in the emotion type, marking the product of the absolute value of the difference and the information quantity as a first product, and taking the sum of the first products of all nouns in the sentence as the internal emotion logic confusion of the sentence.
9. The online intelligent processing method for real-time data of layout files according to claim 8, wherein each of the first and second layout files comprises a plurality of layout filesThe species noun is at->The internal emotion logic chaotic coefficient expression on emotion type is as follows:
in the method, in the process of the invention,is->Person->The seed word is in->Internal emotion logic chaotic coefficients on the emotion type,is->The seed noun is at->Internal emotion logic confusion of each sentence, N is +.>The number of sentences in which the seed noun is located,is->Person->The species noun is at->Local emotional characteristic trend of the species emotion type, +.>Is->The species noun is at->Full text emotion feature on emotion type +.>Is->Person->Information amount of the seed noun in the word segmentation data set.
10. The online intelligent processing method of real-time data of layout files according to claim 9, wherein each of the first and second values is obtained according to abnormal change values of emotion logic characteristics and internal emotion logic chaotic coefficientsThe intelligent processing of the text data set by combining the confusion degree of various emotion types of the nouns and the LOF anomaly detection algorithm comprises the following steps:
will be the firstPerson->The species noun is at->The product of the abnormal change value of the emotion logic characteristic on the emotion type and the internal emotion logic chaotic coefficient is taken as the +.>Person->First->Disorder of emotion types;
all of the text data setThe confusion of various emotion types of species nouns are used as the input of an LOF abnormality detection algorithm, and the LOF abnormality detection algorithm is output as each +.>Seed noun->Detection value, when said->When the detection value is greater than or equal to a preset threshold value, the corresponding noun has logic errors, and the noun with the logic errors is screened out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410121659.6A CN117669566B (en) | 2024-01-30 | 2024-01-30 | Real-time data online intelligent processing method for layout file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410121659.6A CN117669566B (en) | 2024-01-30 | 2024-01-30 | Real-time data online intelligent processing method for layout file |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117669566A true CN117669566A (en) | 2024-03-08 |
CN117669566B CN117669566B (en) | 2024-04-09 |
Family
ID=90064370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410121659.6A Active CN117669566B (en) | 2024-01-30 | 2024-01-30 | Real-time data online intelligent processing method for layout file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117669566B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103123620A (en) * | 2012-12-11 | 2013-05-29 | 中国互联网新闻中心 | Web text sentiment analysis method based on propositional logic |
KR102325022B1 (en) * | 2020-09-22 | 2021-11-11 | 김백기 | On-line image and review integrated analysis method and system using deep learning-based hybrid analysis method |
CN115907801A (en) * | 2022-11-24 | 2023-04-04 | 天翼电子商务有限公司 | E-commerce evaluation information processing method, system, equipment and medium |
CN115906810A (en) * | 2022-12-13 | 2023-04-04 | 中科世通亨奇(北京)科技有限公司 | Abnormal speech analysis method and equipment based on time series and viewpoint mining |
JP2023113268A (en) * | 2022-02-03 | 2023-08-16 | 株式会社Screenホールディングス | Text mining method, text mining program, and text mining apparatus |
-
2024
- 2024-01-30 CN CN202410121659.6A patent/CN117669566B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103123620A (en) * | 2012-12-11 | 2013-05-29 | 中国互联网新闻中心 | Web text sentiment analysis method based on propositional logic |
KR102325022B1 (en) * | 2020-09-22 | 2021-11-11 | 김백기 | On-line image and review integrated analysis method and system using deep learning-based hybrid analysis method |
JP2023113268A (en) * | 2022-02-03 | 2023-08-16 | 株式会社Screenホールディングス | Text mining method, text mining program, and text mining apparatus |
CN115907801A (en) * | 2022-11-24 | 2023-04-04 | 天翼电子商务有限公司 | E-commerce evaluation information processing method, system, equipment and medium |
CN115906810A (en) * | 2022-12-13 | 2023-04-04 | 中科世通亨奇(北京)科技有限公司 | Abnormal speech analysis method and equipment based on time series and viewpoint mining |
Non-Patent Citations (1)
Title |
---|
任卓琳: "评论信息的无意义检测与异常检测", 《中国优秀硕士学位论文全文数据库 信息科技辑 (月刊)》, no. 2018, 15 March 2018 (2018-03-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN117669566B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111414393A (en) | Semantic similar case retrieval method and equipment based on medical knowledge graph | |
CN109036577B (en) | Diabetes complication analysis method and device | |
Chang et al. | Research on detection methods based on Doc2vec abnormal comments | |
CN107656952A (en) | The modeling method of parallel intelligent case recommended models | |
CN111292848A (en) | Bayesian estimation-based medical knowledge map assisted reasoning method | |
CN111191048A (en) | Emergency call question-answering system construction method based on knowledge graph | |
CN112131322B (en) | Time sequence classification method and device | |
CN108519971A (en) | A kind of across languages theme of news similarity comparison methods based on Parallel Corpus | |
CN110046228A (en) | Short text subject identifying method and system | |
CN111507827A (en) | Health risk assessment method, terminal and computer storage medium | |
CN112732910B (en) | Cross-task text emotion state evaluation method, system, device and medium | |
Ahmed et al. | Short text clustering algorithms, application and challenges: A survey | |
CN112149411B (en) | Method for constructing body in clinical application field of antibiotics | |
Shi et al. | DeepDiagnosis: DNN-based diagnosis prediction from pediatric big healthcare data | |
CN114420233A (en) | Method for extracting post-structured information of Chinese electronic medical record | |
CN112420148A (en) | Medical image report quality control system, method and medium based on artificial intelligence | |
Zweigenbaum et al. | Multiple Methods for Multi-class, Multi-label ICD-10 Coding of Multi-granularity, Multilingual Death Certificates. | |
CN109448808B (en) | Abnormal prescription screening method based on multi-view theme modeling technology | |
CN110299194A (en) | The similar case recommended method with the wide depth model of improvement is indicated based on comprehensive characteristics | |
CN117669566B (en) | Real-time data online intelligent processing method for layout file | |
Schraagen | Aspects of record linkage | |
CN115062602B (en) | Sample construction method and device for contrast learning and computer equipment | |
CN112559862B (en) | Product feature clustering method based on similarity of adjacent words | |
CN112287665A (en) | Chronic disease data analysis method and system based on natural language processing and integrated training | |
Falissard et al. | A deep artificial neural network based model for underlying cause of death prediction from death certificates |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |