CN117669566A - Real-time data online intelligent processing method for layout file - Google Patents

Real-time data online intelligent processing method for layout file Download PDF

Info

Publication number
CN117669566A
CN117669566A CN202410121659.6A CN202410121659A CN117669566A CN 117669566 A CN117669566 A CN 117669566A CN 202410121659 A CN202410121659 A CN 202410121659A CN 117669566 A CN117669566 A CN 117669566A
Authority
CN
China
Prior art keywords
emotion
noun
word
species
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410121659.6A
Other languages
Chinese (zh)
Other versions
CN117669566B (en
Inventor
杨瑞钦
陆猛
朱静宇
赵云
庄玉龙
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dianju Information Technology Co ltd
Original Assignee
Beijing Dianju Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dianju Information Technology Co ltd filed Critical Beijing Dianju Information Technology Co ltd
Priority to CN202410121659.6A priority Critical patent/CN117669566B/en
Publication of CN117669566A publication Critical patent/CN117669566A/en
Application granted granted Critical
Publication of CN117669566B publication Critical patent/CN117669566B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of layout file data processing, in particular to a real-time data online intelligent processing method of a layout file, which comprises the following steps: acquiring layout text data, and calculating associated emotion polarity words of each emotion word; for each noun, calculating the full text emotion feature tendency and emotion conversion confusion of various emotion types of the kth noun; constructing local emotion characteristic trends of various emotion types of each kth noun; constructing a position variation index of each kth noun according to the word position of the noun in the text data set; calculating the abnormal change value of the emotion logic characteristics of each kth noun on various emotion types, and further calculating the internal emotion logic chaotic coefficient of each kth noun on the kth emotion type; and carrying out intelligent processing on the text data set by combining with the LOF anomaly detection algorithm. The invention accurately analyzes the noun of the logic error in the format file and ensures the data processing effect.

Description

Real-time data online intelligent processing method for layout file
Technical Field
The application relates to the technical field of layout file data processing, in particular to a real-time data online intelligent processing method of layout files.
Background
The format file is a file type which has independent format and is solidified and presented, can ensure that the same file presents the same display effect on different devices, and is widely applied to the fields of electronic contract, file notification, file management and the like. The most important data in the layout file is text data, and how to ensure the safety and reliability of the text data in the transmission process of the layout file is always the development direction of the technical field.
The format file is used as a common office file type, and has the function of automatically correcting the text data in the use process of a user. The traditional text data of the layout file based on data cleaning is mostly based on a rule text data error correction method, and only grammar error detection and misprinted word detection can be carried out on words in one sentence in the text data, but logic errors which occur among sentences and among sentences in the text data are not further detected.
Disclosure of Invention
In order to solve the technical problems, the invention provides a real-time data online intelligent processing method of layout files, which aims to solve the existing problems.
The invention discloses a real-time data online intelligent processing method of format files, which adopts the following technical scheme:
the embodiment of the invention provides a real-time data online intelligent processing method of format files, which comprises the following steps:
obtaining layout text data and performing word segmentation processing to obtain a word segmentation data set, wherein the same word segmentation is the same kind of word; extracting nouns, emotion words, corresponding emotion types, emotion intensity degrees, emotion polarity words and corresponding polarity values in the word segmentation data set; extracting associated emotion polarity words of each emotion word;
for each noun, according to the firstEmotion intensity degree of emotion words of various emotion types in sentences of species nouns, polarity value of associated emotion polarity words and +.>Word spacing between species nouns gets +.>Full text emotion feature trends of various emotion types of species nouns; according to->The full text emotion characteristic tendency of various emotion types in sentences where species nouns are located is obtained +.>Affective conversion confusion of various affective types of species nouns; according to every +.>Emotion intensity degree of each emotion word of various emotion types in sentences of species nouns, polarity value of associated emotion polarity word and +.>Word interval construction of seed nouns each +.>Local emotional characteristic trends of various emotion types of species nouns; according to said each->Word positions of seed nouns in a text data set to construct each +.>Position variability index of species nouns; constructing each +.>Abnormal change values of emotion logic characteristics of seed nouns on various emotion types;
obtaining the internal emotion logic confusion of each sentence according to the full text emotion feature tendency, the local emotion feature tendency and the information quantity of each noun of each emotion type in each sentence; according to the firstInternal emotional logic confusion of each sentence where the seed noun is located, every +.>The full text emotion characteristic trend and the local emotion characteristic trend of various emotion types of the species noun are obtained to obtain each +.>The species noun is at->Internal emotion logic chaotic coefficients on the emotion type;
obtaining each first according to abnormal change value of emotion logic characteristics and internal emotion logic chaotic coefficientAnd (3) carrying out intelligent processing on the text data set by combining various emotion type confusion of species nouns and a LOF anomaly detection algorithm.
Further, the extracting the associated emotion polarity word of each emotion word includes:
presetting a plurality of words at intervals on two sides of the emotion word, and selecting one emotion polarity word closest to the emotion word from the words as an associated emotion polarity word of the emotion word.
Further, the firstA full text emotion feature propensity for various emotion types of a noun, comprising:
for the firstFirst->Type of emotion, calculate +.>The sentence in which the seed noun is located is +.>The product of the emotion intensity degree of emotion words of emotion type and the polarity value of the associated emotion polarity word, and statistics of emotion words and +.>Word interval between seed nouns, obtaining the ratio of the product to the word interval, and adding +.>The average value of the ratio of all sentences of the species noun is taken as +.>First->Full text emotion feature trends of the emotion types;
when the emotion words do not have the associated emotion polarity words, the polarity value of the associated emotion polarity words of the emotion words is set to be 1.
Further, the firstThe degree of confusion of emotion transformations of various emotion types of a noun includes:
the absolute value of the ratio is recorded as a first absolute value, and the first absolute value is calculatedThe average value of the first absolute values of all sentences of the species noun is recorded as a first average value, and the average value and the +.>First->Difference value of absolute value of full-text emotion characteristic trend of emotion type, and absolute value of ratio of the difference value to first mean value is taken as +.>First->Affective transformation confusion of emotion type.
Further, said constructing each of the firstThe local emotional characteristic trends of various emotion types of a noun include:
for the first in the text data setPerson->Species nouns, calculate +.>Marking the product of the emotion intensity degree of each emotion word of emotion type and the polarity value of the associated emotion polarity word as a first product, obtaining the ratio of the first product to the word interval, and adding the +.>The average of the ratios of all emotion words of the emotion type is taken as +.>Person->First->Local emotional characteristic trends of emotion types.
Further, said constructing each of the firstThe position variability index of a term includes:
statistics of every thWord sequence number of the seed noun in the word segmentation data set is used as word position, and the seed noun is +.>Person, th->Person, th->Person->Word positions of the seed nouns are respectively marked as +.>First->Person->Position variability index of species nouns->The expression of (2) is:
in the method, in the process of the invention,is->Word number of a noun.
Further, said constructing each of the firstThe abnormal change value of the emotion logic characteristics of the noun on various emotion types comprises the following steps:
acquisition of the firstPerson->The species noun is at->Local emotional characteristic trend on the species emotion type and +.>The species noun is at->Calculating the absolute value of the difference value of the full-text emotion characteristic tendency on the emotion type and the +.>First->Obtaining a minimum value of the difference value and 0 by using the difference value of the emotion conversion chaos of the emotion type, and taking the ratio of the minimum value to the position variation index as the +.>Person->The species noun is at->Abnormal change value of emotion logic characteristics on emotion type.
Further, the internal emotion logic confusion of each sentence includes:
for each sentence, counting the information quantity of each noun in the sentence in the word segmentation data set, and setting the first noun of each nounAnd obtaining the absolute value of the difference between the local emotion feature tendency and the full-text emotion feature tendency in the emotion type, marking the product of the absolute value of the difference and the information quantity as a first product, and taking the sum of the first products of all nouns in the sentence as the internal emotion logic confusion of the sentence.
Further, each of the firstThe species noun is at->The internal emotion logic chaotic coefficient expression on emotion type is as follows:
in the method, in the process of the invention,is->Person->The seed word is in->Internal emotion logic chaotic coefficient on emotion type, < ->Is->The seed noun is at->Internal emotion logic confusion of each sentence, N is +.>The number of sentences in which the seed noun is located, +.>Is->Person->The species noun is at->Local emotional characteristic trend of the species emotion type, +.>Is->The species noun is at->Full text emotion feature on emotion type +.>Is->Person->Information amount of the seed noun in the word segmentation data set.
Further, the abnormal change value according to the emotion logic characteristics and the internal emotion logic mixtureObtaining each of the first random coefficientsThe intelligent processing of the text data set by combining the confusion degree of various emotion types of the nouns and the LOF anomaly detection algorithm comprises the following steps:
will be the firstPerson->The species noun is at->The product of the abnormal change value of the emotion logic characteristic on the emotion type and the internal emotion logic chaotic coefficient is taken as the +.>Person->First->Disorder of emotion types;
all of the text data setThe confusion of various emotion types of species nouns are used as the input of an LOF abnormality detection algorithm, and the LOF abnormality detection algorithm is output as each +.>Seed noun->Detection value, when said->When the detection value is greater than or equal to a preset threshold value, the corresponding noun has logic errors, and the noun with the logic errors is screened out.
The invention has at least the following beneficial effects:
according to the invention, nouns in the text data of the format file are subjected to emotion logic detection, nouns with obvious difference between emotion logic and the whole text are marked as abnormal data, and real-time online processing of the data is completed. For the same word, the emotion characterization intensity degree of each word representing emotion in sentences appearing by the same word is used for constructing the full text emotion feature tendency and emotion conversion confusion degree of the noun and representing the emotion logic tendency of the word in the whole text.
Further constructing local emotion feature trends of the nouns through the emotion features of the words in the local, making the local emotion feature trends and the full text emotion feature trends poor, and using emotion conversion disorder as a threshold value to obtain abnormal change values of emotion logic features of the words, wherein the abnormal degree of emotion logic of the words relative to the whole article is represented; further, in the sentence where the word is located, calculating the influence of the word on the degree of confusion caused by the emotion logic of the sentence, obtaining an internal emotion logic confusion coefficient, integrating the internal emotion logic confusion coefficient with the emotion logic characteristic abnormal change value to be used as a quantization index of the emotion logic confusion degree of the word, representing the logic confusion degree of the word, detecting the same kind of word by using an LOF abnormality detection algorithm, screening out nouns with logic errors, and completing real-time online processing of text data of layout file data.
Compared with the traditional format file text data processing method, the method can only detect simple grammar errors through the rule making, can further screen out the semantically abnormal part of the text data through the logic coherence degree among the whole text, can accurately detect abnormal data of the published file text data, and obtains better data processing effect.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of an online intelligent processing method for real-time data of layout files;
FIG. 2 is a schematic diagram of a logic anomaly word detection process.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of the specific implementation, structure, characteristics and effects of the method for online intelligent processing of real-time data of layout files according to the invention in combination with the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a specific scheme of an online intelligent processing method for real-time data of format files, which is specifically described below with reference to the accompanying drawings.
The invention provides an online intelligent processing method for real-time data of a format file, in particular to an online intelligent processing method for real-time data of a format file, referring to fig. 1, comprising the following steps:
and S001, acquiring text data and performing word segmentation.
Acquiring text data of a format file, taking the text data as input, adopting a word segmentation algorithm based on a statistical rule, wherein a corpus of the word segmentation algorithm is easy to acquire in the field of public information, outputting the text data subjected to word segmentation, and recording the acquired text data as a word segmentation data set. Wherein the same word is divided after word segmentationWords are words of the same kind.
So far, text data of the format file can be obtained, word segmentation processing is carried out, and word segmentation data sets are obtained by obtaining each word segment.
Step S002, the full text emotion feature tendency and emotion conversion confusion degree are built, an emotion logic feature abnormal change value is calculated, the sentence internal emotion confusion degree is further built, and nouns with emotion logic errors are screened.
According to the embodiment, nouns in the text data are taken as base points, words expressing emotion near the nouns in the text data are judged, emotion logic changes of the nouns in local are judged, text data with logic errors are represented by combination of the nouns and the emotion words which are possibly abnormal are screened out through the emotion logic changes of the nouns in the whole text, the abnormal text data are processed online, and a layout file user is assisted in writing.
Therefore, the nouns, the words expressing emotion and the words expressing emotion polarity need to be screened first. In this embodiment, a text database matching manner is adopted to screen out the required nouns, and the screening process specifically includes: word segmentation data setThe method comprises the steps of (1) inputting, outputting the part of speech of each word in a word segmentation data set by adopting a part of speech discrimination algorithm based on a word part of speech database; further word segmentation data set->For input, a word matching algorithm based on an emotion word text database is adopted, word types expressing emotion in a word segmentation data set are output, and emotion attributes of the word types expressing emotion in each process are obtained, wherein +.>The word category expressing emotion is marked +.>Wherein->Seven emotion types are used as emotion types, and are represented by numerals 1 to 7, and +.>Is the emotion intensity degree; further word segmentation data set->For input, a word matching algorithm based on an emotion polarity word text database is adopted, word types expressing emotion polarities in a word segmentation data set are output, and a polarity value of each word type expressing emotion polarities is obtained, wherein ∈>The polarity value of the word class expressing emotion polarity is recorded as +.>Its value belongs to the interval->Positive values represent positive polarity words, negative values represent negative polarity words, and larger absolute values represent greater polarity of the emotion expressed.
Calculate the firstFirst->The full text emotion feature trend and emotion conversion confusion of emotion types are as follows:
in the method, in the process of the invention,is->The seed noun is at->The>The emotion intensity degree of emotion words of the emotion type; />First->The seed noun is at->The>The polarity value of the associated emotion polarity word of the emotion type needs to be explained, wherein the associated emotion polarity word of the emotion word is a search range of the emotion word, which is formed by A words at intervals on two sides of the emotion word, and one emotion polarity word closest to the emotion word is selected from the search range, in this embodiment A=3, if the emotion word does not have the associated emotion polarity word, the emotion word corresponds to the emotion polarity word +.>Is->The seed noun is at->The>Emotional words of the emotion type and +.>Word spacing of seed nouns; />Is->The species noun is at->Full text emotion feature trend on seed emotion type, +.>Is->First->Affective transformation disorder of species emotion type, N is +.>The number of sentences in which the seed noun is located.
In the method, in the process of the invention,characterization of->The seed noun is at->The>The emotion weight of emotion words of the emotion type, wherein positive values represent positive polarity words, negative values represent negative polarity words, and the larger the absolute value is, the stronger the emotion polarity is expressed; />Is the reciprocal of the word spacing, the larger the value of which represents the pairThe emotion words are at->In the sentence and->The closer the seed noun is, the emotion word pair ++>The affective characteristics of the species nouns are more affected; finally will be->The +.f. of all sentences in which the species noun appears>Emotion word pair of emotion type +.>Averaging the affective influence of species nouns to obtain the full-text affective characteristic tendency ++>Characterizing->The species noun is at->Emotion tendencies in emotion types, positive values representing positive polarity words, negative values representing negative polarity words, and larger absolute values representing greater polarity of emotion expressed. Further, is->For characterizing affective disorder, when affective influence +.>When all the values are positive or negative, the value of the numerator in the emotion conversion disorder formula is 0, and then +.>Is 0, characterize->The emotion logic of the seed noun is not changed; when the difference corresponding to the molecules in the affective disorder formula is not 0, the larger the molecule is, the more +.>The more the emotion logic of a noun changes throughout the text data. Finally->Characterization of->First->The degree of emotion logic conversion in the emotion type, the larger the value is representing +.>The species noun is at->The greater the change in emotion logic over emotion types.
According to the method of the present embodiment, the first step can be calculatedThe emotion logic of the species noun changes throughout the text data, and the embodiment will further calculate +.>Emotional logic of species nouns in local emotional characteristic trend, +.>The larger the local emotion logic characteristic tendency of the seed noun is different from the emotion logic of the whole text data, the +.>The more likely a noun is to be logically wrong locally.
Thus, the present embodiment will analyze the firstPerson->Species noun->The expression of the local emotion characteristic trend of the emotion type is as follows:
in the method, in the process of the invention,is at +.>Person->The sentence in which the seed noun is located is +.>Type->The emotion intensity degree of each emotion word; />Is at +.>Person->The sentence in which the seed noun is located is +.>Type->Polarity values of associated emotion polarity words of the emotion words; />Is at +.>Person->The sentence in which the seed noun is located is +.>Type->The emotion words and->Word spacing of seed nouns; />Is at +.>Person->The sentence in which the seed noun is located is +.>Number of emotion words of emotion type. Wherein->Person->The term means ++>Person->A noun.
In the method, in the process of the invention,characterization of->Person->The +.f. in the sentence where the species noun is located>The>The emotion tendency of each emotion word, wherein positive values represent positive polarity words, negative values represent negative polarity words, and the larger the absolute value is, the larger the emotion polarity is expressed; />The larger represents at->Person->Sentence where seed noun is located->Type->Distance of emotion words->Person->The closer the distance of species nouns is, at +.>Person->In the sentence of species noun +.>Individual emotion word pair->The affective characteristics of the species nouns are more affected; then->Represents->Individual emotion word pair->Person->The affective influence of species nouns is stronger as the value is larger, and finally the average value is obtainedRepresents->Person->First->The local emotion feature tendency of the emotion type, positive value represents positive polarity word, negative value represents negative polarity word, and the larger the absolute value is, the larger the expressed emotion polarity is.
Further, build feature NoPerson->Local emotion feature value variation data set of a noun:wherein->Is->Person->The species noun is at->Local emotional characteristic trend of the species emotion type, +.>Representative is +.>Person->The seed noun is in the word segmentation data set at the +.>The word positions.
Further, for the firstComparing the local emotion characteristic value of the seed noun with emotion characteristics in the whole text data, and calculating the +.>Whether the noun has logic errors locally or not, and constructing an emotion logic characteristic abnormal change value:
in the method, in the process of the invention,is->Person->The species noun is at->Abnormal change values of emotion logic characteristics on emotion types; />Is->Word number of seed nouns; />Is->Person->The species noun is at->Local emotion feature trends of the emotion types; />Is->The species noun is at->Full text emotion feature trends in the emotion type; />Is->First->Affective transformation confusion of the emotion type; />Is->Person->Position variability index of species nouns; />Is the minimum value.
In the method, in the process of the invention,is the difference between the local emotion feature tendency and the full text emotion feature tendency, the larger the difference is, the more ∈the difference is>Person->The species noun is at->Emotion logic in the emotion type and all other +.>The species noun is at->The larger the emotion logic difference in the emotion type is, the +.>Person->The more likely a seed noun is a word that has a logical error; />As the affective disorder, the larger the value is, the +.>Species nouns are in the text at +.>The emotion logic changes shown on the emotion types are disordered; in formula->In (1), is->Person->Species nouns and->Person->Species noun and->Person->The sum of the distances of species nouns, the larger the value, the more +.>Person->The longer the context distance between the species noun and the adjacent homonoun, the longer the context distance, the more text data author is about +.>The greater the probability of emotional logic changes of the species noun, therefore +.>As a weight, the difference between the local emotion feature tendency and the full text emotion feature tendency is processed +.>Person->The longer the context distance between a species noun and an adjacent homonoun, the smaller the weight,the smaller represents->Person->The less likely a term is a word whose emotion logic is incorrect.
Further, the embodiment constructs the internal emotion confusion of sentences and is used for screening words with emotion logic errors. Calculate the firstPerson->The internal emotion logic chaotic coefficient caused by the words in the text data is specifically calculated as follows:
in the method, in the process of the invention,is->Internal emotion logic confusion of individual sentences; />Is->The>The personal noun is at->Local emotion feature trends in the seed emotion type; />Is->The>The personal noun is at->Full text emotion feature trends in the emotion type; />Is->Noun number of the individual sentences; />Is->The>The amount of information in the word segmentation data set of each noun, it should be noted that,the calculation of the information quantity is a well-known technology in the field, and can be obtained through calculation in the prior art, and the calculation process is not repeated; />Is->Person->The species noun is at->Local emotion feature trends of the emotion types; />Is->The species noun is at->Full text emotion feature trends in the emotion type; />Is->Person->Information quantity of species nouns in the word segmentation data set; />Is->Person->The seed word is in->On emotion typeInternal emotion logic chaotic coefficient,>is->The seed noun is at->Internal emotion logic confusion of each sentence, N is +.>The number of sentences in which the seed noun is located.
In the method, in the process of the invention,is->The>The personal noun is at->The difference between the local emotion feature trend and the full text emotion feature trend in the emotion type is larger to represent the +.>The more confusing the emotion logic of each sentence; />Characterization of->Personal noun pair->The information amount contributed by each sentence is larger, and the larger the value is, the more information amount contributed is, the +.>Personal noun pair->The larger the influence of the individual sentences, the more ∈th is calculated>When the emotional logic disorder degree of the sentence is the +.>The weight occupied by the individual nouns should be greater; finally get->Internal emotional logic confusion of individual sentences>The larger the value is, the more ∈>The more confusing the internal emotion logic of the individual sentences.
Further atIn the formula of->Characterization of->Person->The +.>The size of the contribution of the internal emotional logic confusion of the sentences, and therefore,characterization of->The seed noun is at->The internal affective logic disorder of the sentence is deleted +.>Person->The change of the internal emotion logic confusion after the noun is planted, and the larger the change is, the more +.>Person->The more likely a species noun will result in +.>The emotion logic of the sentence becomes confused, so +.>Person->The more likely a term is a logically incorrect word.
For each of the first set of text dataSpecies nouns, in->Person->By way of example, the species noun is at +.>Abnormal change value of emotion logic characteristics on emotion type +.>And internal emotion logic disorder coefficient->Is the product of (1) as each->First->Disorder of emotion type, use +.>Obtaining each +.>Various emotion type confusion of a noun.
Further, all the first items in the text data setThe confusion of various emotion types of species nouns are used as the input of an LOF abnormality detection algorithm, and the LOF abnormality detection algorithm is output as each +.>Seed noun->The detection value, threshold value, in this embodiment 1.5, can be set by the practitioner by himself, when +.>Person->Seed noun->Detection value->When in use, then->Person->The nouns have logic errors, the nouns are marked with errors, words with the logic errors are screened, and the online intelligent processing of the format file data is completed. The logic abnormal word detection flow chart is shown in fig. 2, wherein the abnormal word and the logic abnormal word refer to a logically incorrect noun in the embodiment.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; the technical solutions described in the foregoing embodiments are modified or some of the technical features are replaced equivalently, so that the essence of the corresponding technical solutions does not deviate from the scope of the technical solutions of the embodiments of the present application, and all the technical solutions are included in the protection scope of the present application.

Claims (10)

1. The real-time data online intelligent processing method of the format file is characterized by comprising the following steps of:
obtaining layout text data and performing word segmentation processing to obtain a word segmentation data set, wherein the same word segmentation is the same kind of word; extracting nouns, emotion words, corresponding emotion types, emotion intensity degrees, emotion polarity words and corresponding polarity values in the word segmentation data set; extracting associated emotion polarity words of each emotion word;
for each noun, according to the firstEmotion intensity degree of emotion words of various emotion types in sentences of species nouns, polarity value of associated emotion polarity words and +.>Word spacing between species nouns gets +.>Full text emotion feature trends of various emotion types of species nouns; according to->The full text emotion characteristic tendency of various emotion types in sentences where species nouns are located is obtained +.>Affective conversion confusion of various affective types of species nouns; according to every +.>Emotion intensity degree of each emotion word of various emotion types in sentences of species nouns, polarity value of associated emotion polarity word and +.>Word interval construction of seed nouns each +.>Local emotional characteristic trends of various emotion types of species nouns; according to said each->Word positions of seed nouns in a text data set to construct each +.>Species nounPosition variation index of (2); constructing each +.>Abnormal change values of emotion logic characteristics of seed nouns on various emotion types;
obtaining the internal emotion logic confusion of each sentence according to the full text emotion feature tendency, the local emotion feature tendency and the information quantity of each noun of each emotion type in each sentence; according to the firstInternal emotional logic confusion of each sentence where the seed noun is located, every +.>The full text emotion characteristic trend and the local emotion characteristic trend of various emotion types of the species noun are obtained to obtain each +.>The species noun is at->Internal emotion logic chaotic coefficients on the emotion type;
obtaining each first according to abnormal change value of emotion logic characteristics and internal emotion logic chaotic coefficientAnd (3) carrying out intelligent processing on the text data set by combining various emotion type confusion of species nouns and a LOF anomaly detection algorithm.
2. The method for online intelligent processing of real-time data of layout files according to claim 1, wherein the extracting the associated emotion polarity words of each emotion word comprises:
presetting a plurality of words at intervals on two sides of the emotion word, and selecting one emotion polarity word closest to the emotion word from the words as an associated emotion polarity word of the emotion word.
3. The online intelligent processing method for real-time data of layout files according to claim 2, wherein the first step is thatA full text emotion feature propensity for various emotion types of a noun, comprising:
for the firstFirst->Type of emotion, calculate +.>The sentence in which the seed noun is located is +.>The product of the emotion intensity degree of emotion words of emotion type and the polarity value of the associated emotion polarity word, and statistics of emotion words and +.>Word interval between seed nouns, obtaining the ratio of the product to the word interval, and adding +.>The average value of the ratio of all sentences of the species noun is taken as +.>First->Full text emotion feature trends of the emotion types;
when the emotion words do not have the associated emotion polarity words, the polarity value of the associated emotion polarity words of the emotion words is set to be 1.
4. The online intelligent processing method for real-time data of layout file according to claim 3, wherein the first step isThe degree of confusion of emotion transformations of various emotion types of a noun includes:
the absolute value of the ratio is recorded as a first absolute value, and the first absolute value is calculatedThe average value of the first absolute values of all sentences of the species noun is recorded as a first average value, and the average value and the +.>First->Difference value of absolute value of full-text emotion characteristic trend of emotion type, and absolute value of ratio of the difference value to first mean value is taken as +.>First->Affective transformation confusion of emotion type.
5. A method for online intelligent processing of real-time data of layout files according to claim 3, wherein each of the first and second layout files is constructedThe local emotional characteristic trends of various emotion types of a noun include:
for textThe first in the data setPerson->Species nouns, calculate +.>Marking the product of the emotion intensity degree of each emotion word of emotion type and the polarity value of the associated emotion polarity word as a first product, obtaining the ratio of the first product to the word interval, and adding the +.>The average of the ratios of all emotion words of the emotion type is taken as +.>Person->First->Local emotional characteristic trends of emotion types.
6. The online intelligent processing method for real-time data of layout files according to claim 1, wherein each of the first and second structures is constructedThe position variability index of a term includes:
statistics of every thWord sequence number of the seed noun in the word segmentation data set is used as word position, and the seed noun is +.>First, secondPerson, th->Person->Word positions of the seed nouns are respectively marked as +.>First, thePerson->Position variability index of species nouns->The expression of (2) is:
in the method, in the process of the invention,is->Word number of a noun.
7. The online intelligent processing method for real-time data of layout files according to claim 4, wherein each of the first and second layout files is constructedEmotion of species nouns on various emotion typesThe logic characteristic abnormal change value includes:
acquisition of the firstPerson->The species noun is at->Local emotional characteristic trend on the species emotion type and +.>The species noun is at->Calculating the absolute value of the difference value of the full-text emotion characteristic tendency on the emotion type and the +.>First->Obtaining a minimum value of the difference value and 0 by using the difference value of the emotion conversion chaos of the emotion type, and taking the ratio of the minimum value to the position variation index as the +.>Person->The species noun is at->Abnormal change value of emotion logic characteristics on emotion type.
8. The online intelligent processing method for real-time data of layout files according to claim 3, wherein the internal emotion logic confusion degree of each sentence comprises:
for each sentence, counting the information quantity of each noun in the sentence in the word segmentation data set, and setting the first noun of each nounAnd obtaining the absolute value of the difference between the local emotion feature tendency and the full-text emotion feature tendency in the emotion type, marking the product of the absolute value of the difference and the information quantity as a first product, and taking the sum of the first products of all nouns in the sentence as the internal emotion logic confusion of the sentence.
9. The online intelligent processing method for real-time data of layout files according to claim 8, wherein each of the first and second layout files comprises a plurality of layout filesThe species noun is at->The internal emotion logic chaotic coefficient expression on emotion type is as follows:
in the method, in the process of the invention,is->Person->The seed word is in->Internal emotion logic chaotic coefficients on the emotion type,is->The seed noun is at->Internal emotion logic confusion of each sentence, N is +.>The number of sentences in which the seed noun is located,is->Person->The species noun is at->Local emotional characteristic trend of the species emotion type, +.>Is->The species noun is at->Full text emotion feature on emotion type +.>Is->Person->Information amount of the seed noun in the word segmentation data set.
10. The online intelligent processing method of real-time data of layout files according to claim 9, wherein each of the first and second values is obtained according to abnormal change values of emotion logic characteristics and internal emotion logic chaotic coefficientsThe intelligent processing of the text data set by combining the confusion degree of various emotion types of the nouns and the LOF anomaly detection algorithm comprises the following steps:
will be the firstPerson->The species noun is at->The product of the abnormal change value of the emotion logic characteristic on the emotion type and the internal emotion logic chaotic coefficient is taken as the +.>Person->First->Disorder of emotion types;
all of the text data setThe confusion of various emotion types of species nouns are used as the input of an LOF abnormality detection algorithm, and the LOF abnormality detection algorithm is output as each +.>Seed noun->Detection value, when said->When the detection value is greater than or equal to a preset threshold value, the corresponding noun has logic errors, and the noun with the logic errors is screened out.
CN202410121659.6A 2024-01-30 2024-01-30 Real-time data online intelligent processing method for layout file Active CN117669566B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410121659.6A CN117669566B (en) 2024-01-30 2024-01-30 Real-time data online intelligent processing method for layout file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410121659.6A CN117669566B (en) 2024-01-30 2024-01-30 Real-time data online intelligent processing method for layout file

Publications (2)

Publication Number Publication Date
CN117669566A true CN117669566A (en) 2024-03-08
CN117669566B CN117669566B (en) 2024-04-09

Family

ID=90064370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410121659.6A Active CN117669566B (en) 2024-01-30 2024-01-30 Real-time data online intelligent processing method for layout file

Country Status (1)

Country Link
CN (1) CN117669566B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123620A (en) * 2012-12-11 2013-05-29 中国互联网新闻中心 Web text sentiment analysis method based on propositional logic
KR102325022B1 (en) * 2020-09-22 2021-11-11 김백기 On-line image and review integrated analysis method and system using deep learning-based hybrid analysis method
CN115907801A (en) * 2022-11-24 2023-04-04 天翼电子商务有限公司 E-commerce evaluation information processing method, system, equipment and medium
CN115906810A (en) * 2022-12-13 2023-04-04 中科世通亨奇(北京)科技有限公司 Abnormal speech analysis method and equipment based on time series and viewpoint mining
JP2023113268A (en) * 2022-02-03 2023-08-16 株式会社Screenホールディングス Text mining method, text mining program, and text mining apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123620A (en) * 2012-12-11 2013-05-29 中国互联网新闻中心 Web text sentiment analysis method based on propositional logic
KR102325022B1 (en) * 2020-09-22 2021-11-11 김백기 On-line image and review integrated analysis method and system using deep learning-based hybrid analysis method
JP2023113268A (en) * 2022-02-03 2023-08-16 株式会社Screenホールディングス Text mining method, text mining program, and text mining apparatus
CN115907801A (en) * 2022-11-24 2023-04-04 天翼电子商务有限公司 E-commerce evaluation information processing method, system, equipment and medium
CN115906810A (en) * 2022-12-13 2023-04-04 中科世通亨奇(北京)科技有限公司 Abnormal speech analysis method and equipment based on time series and viewpoint mining

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任卓琳: "评论信息的无意义检测与异常检测", 《中国优秀硕士学位论文全文数据库 信息科技辑 (月刊)》, no. 2018, 15 March 2018 (2018-03-15) *

Also Published As

Publication number Publication date
CN117669566B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN111414393A (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN109036577B (en) Diabetes complication analysis method and device
Chang et al. Research on detection methods based on Doc2vec abnormal comments
CN107656952A (en) The modeling method of parallel intelligent case recommended models
CN111292848A (en) Bayesian estimation-based medical knowledge map assisted reasoning method
CN111191048A (en) Emergency call question-answering system construction method based on knowledge graph
CN112131322B (en) Time sequence classification method and device
CN108519971A (en) A kind of across languages theme of news similarity comparison methods based on Parallel Corpus
CN110046228A (en) Short text subject identifying method and system
CN111507827A (en) Health risk assessment method, terminal and computer storage medium
CN112732910B (en) Cross-task text emotion state evaluation method, system, device and medium
Ahmed et al. Short text clustering algorithms, application and challenges: A survey
CN112149411B (en) Method for constructing body in clinical application field of antibiotics
Shi et al. DeepDiagnosis: DNN-based diagnosis prediction from pediatric big healthcare data
CN114420233A (en) Method for extracting post-structured information of Chinese electronic medical record
CN112420148A (en) Medical image report quality control system, method and medium based on artificial intelligence
Zweigenbaum et al. Multiple Methods for Multi-class, Multi-label ICD-10 Coding of Multi-granularity, Multilingual Death Certificates.
CN109448808B (en) Abnormal prescription screening method based on multi-view theme modeling technology
CN110299194A (en) The similar case recommended method with the wide depth model of improvement is indicated based on comprehensive characteristics
CN117669566B (en) Real-time data online intelligent processing method for layout file
Schraagen Aspects of record linkage
CN115062602B (en) Sample construction method and device for contrast learning and computer equipment
CN112559862B (en) Product feature clustering method based on similarity of adjacent words
CN112287665A (en) Chronic disease data analysis method and system based on natural language processing and integrated training
Falissard et al. A deep artificial neural network based model for underlying cause of death prediction from death certificates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant