CN111241290B - Comment tag generation method and device and computing equipment - Google Patents

Comment tag generation method and device and computing equipment Download PDF

Info

Publication number
CN111241290B
CN111241290B CN202010059910.2A CN202010059910A CN111241290B CN 111241290 B CN111241290 B CN 111241290B CN 202010059910 A CN202010059910 A CN 202010059910A CN 111241290 B CN111241290 B CN 111241290B
Authority
CN
China
Prior art keywords
dimension
word
emotion
words
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010059910.2A
Other languages
Chinese (zh)
Other versions
CN111241290A (en
Inventor
寇凯
息振兴
史立华
王田利
付一韬
杨林凤
谢健聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chezhi Interconnection Beijing Technology Co ltd
Original Assignee
Chezhi Interconnection Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chezhi Interconnection Beijing Technology Co ltd filed Critical Chezhi Interconnection Beijing Technology Co ltd
Priority to CN202010059910.2A priority Critical patent/CN111241290B/en
Publication of CN111241290A publication Critical patent/CN111241290A/en
Application granted granted Critical
Publication of CN111241290B publication Critical patent/CN111241290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a comment tag generation method, which is executed in a computing device, wherein a rule set is stored in the computing device, each element in the rule set generates a rule association relation for an evaluation dimension and a corresponding tag, and the method comprises the following steps: extracting a plurality of single-dimension clauses cut from a target comment, wherein the single-dimension clauses are clauses with only one dimension word and emotion word; based on a standard word dictionary, respectively replacing the dimension words and emotion words of each single-dimension clause with corresponding standard words; for each single-dimension clause, if the standard word after replacement can be matched with the corresponding label generation rule in the rule set, judging the single-dimension clause as a target single-dimension clause; and generating clause labels of the target single-dimension clauses based on label generation rules matched with each target single-dimension clause to form comment labels of the target comments. The invention also discloses a corresponding comment label generating device and computing equipment.

Description

Comment tag generation method and device and computing equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a comment tag generating method, apparatus, and computing device.
Background
With the rapid development of the information industry, more and more users purchase products and comment on various internet platforms, and users want to know the evaluation of the product by the crowd who has used the product before purchasing the product. Different from the whole emotion analysis, the dimension emotion analysis-based analysis granularity is finer, and the main purpose is to give a series of concise expressions according to comment information, so that the preference degree of a user group to each dimension of a certain product is illustrated. The fine granularity emotion analysis has important value in the aspects of deeply understanding merchant and user, mining user emotion and the like, and can be widely applied to the fields of personalized recommendation, intelligent search, product feedback, business safety and the like.
At present, a method based on a long-short-term memory network is available, the whole problem is regarded as a multi-dimensional emotion multi-classification problem, and if N categories exist, the emotion of each category is classified into two categories, and the category of the unified training model is 2N categories. In this case, it is required that the respective categories must have properties such as power, fuel consumption, etc. in the automotive field, which are clearly distinguished from each other. However, if the dimension is accurate to a finer dimension, such as front space, rear space, head space, leg space and the like in the space, the dimension information is very close, the training objects are short sentences, the context information is less, so that the accuracy is difficult to judge in the whole classification process, and the accuracy is very low.
In another way, chinese emotion words of sentences are extracted to express emotion tendencies of the whole comment, but word usage of comment articles is various, so that deviation of the extracted emotion words to sentence judgment results and actual results is likely to occur, and fine granularity emotion analysis errors are caused.
Disclosure of Invention
In view of the above, the present invention proposes a comment tag generation method, apparatus, and computing device, in an effort to solve, or at least solve, the above-presented problems.
According to an aspect of the present invention, there is provided a comment tag generating method executed in a computing device in which a rule set is stored, each element of the rule set generating a rule association relation for an evaluation dimension and a corresponding tag, the method including the steps of: extracting a plurality of single-dimension clauses cut from a target comment, wherein the single-dimension clauses are clauses with only one dimension word and emotion word; based on a pre-stored standard word dictionary, the dimension words and emotion words of each single-dimension clause are replaced by corresponding dimension standard words and emotion standard words respectively; for each single-dimension clause, if the standard word after replacement can be matched with the corresponding label generation rule in the rule set, judging the single-dimension clause as a target single-dimension clause; and generating clause labels of the target single-dimension clauses based on label generation rules matched with each target single-dimension clause, thereby obtaining comment labels of the target comments.
Optionally, in the comment tag generating method according to the present invention, further including the step of: carrying out emotion analysis on each target single-dimension clause by adopting an emotion analysis model to obtain emotion polarity of the target single-dimension clause, wherein the emotion polarity comprises at least one of positive emotion, negative emotion and neutral emotion.
Optionally, in the comment tag generating method according to the present invention, the step of extracting a plurality of single-dimensional clauses cut out from the target comment includes: dividing the target comment into a plurality of short sentences, and carrying out word dividing processing on the plurality of short sentences to identify the target short sentences with dimension words and emotion words simultaneously; and when the target phrase exceeds a preset word number or more than two emotion words exist, segmenting the target phrase into a plurality of single-dimension phrases based on a pre-trained phrase model.
Optionally, in the comment tag generating method according to the present invention, the step of dividing the target comment into a plurality of phrases includes: dividing the target comment into a plurality of short sentences according to punctuation marks of the target comment; if the target comment does not have punctuation marks, the target comment is segmented into a plurality of single-dimension clauses based on the clause model.
Optionally, in the comment tag generating method according to the present invention, the step of dividing the target comment into a plurality of phrases includes: and eliminating the target comments with the special sentence patterns, and then cutting the rest target comments into a plurality of short sentences, wherein the special sentence patterns comprise the back question sentences.
Optionally, in the comment tag generating method according to the present invention, the step of performing word segmentation processing on the plurality of phrases includes: and performing word segmentation processing on each short sentence by adopting a word segmentation model, and extracting at least one of dimension words, emotion words and filtering words in the short sentences, wherein the words respectively represent evaluation dimension, emotion tendency and interference information.
Optionally, in the comment tag generating method according to the present invention, the standard word dictionary includes at least one of a dimension word dictionary, a emotion word dictionary, a filter word dictionary, and a macro file; the macro file represents the association relation between the dimension word and the corresponding modifiable emotion word, and words in all standard word dictionaries are stored in the word segmentation model.
Optionally, in the comment tag generating method according to the present invention, further including the step of: training a polysemous word classifier, and determining whether the polysemous word belongs to an emotion word or a degree word or whether the polysemous word belongs to a noun or an emotion word according to the classifier.
Optionally, in the comment tag generating method according to the present invention, the clause model is adapted to output a front character and a rear character of a position where the segmentation symbol is added in the sentence to add the segmentation symbol between the front character and the rear character, thereby segmenting the sentence into a plurality of single-dimensional clauses.
Optionally, in the comment tag generating method according to the present invention, the method further includes a training step of a clause model: and acquiring text contents which are already segmented into single-dimension clauses, and training the constructed clause model by taking the text contents as a training set to obtain a trained clause model.
Optionally, in the comment tag generating method according to the present invention, the tag generating rule includes a plurality of placeholders, each of which represents one of the standard words X, and the plurality of placeholders are connected by a logical operation symbol; the standard word X comprises at least one of a dimension standard word, an emotion standard word related to the dimension and a filtering standard word, and the emotion standard word comprises at least one of a general emotion word, a positive emotion word and a negative emotion word.
Optionally, in the comment tag generating method according to the present invention, the logical operator includes at least one of "%", "()", "and", "or", and "not", wherein "%" is filled with the standard word X, "()" represents a priority operation, "and" represents a simultaneous occurrence, "or" represents one of them, and "not" represents an absence.
Optionally, in the comment tag generating method according to the present invention, the step of determining the target single-dimension clause includes: for each single-dimension clause, if the replaced dimension standard word and emotion standard word exist in a certain tag generation rule in the rule set at the same time, and the clause does not contain the filtering standard word in the tag generation rule, judging the single-dimension clause as a target single-dimension clause.
Optionally, in the comment tag generating method according to the present invention, each standard word dictionary is represented as a word cluster of standard words and synonyms thereof, the standard words include a first standard word and a second standard word, the standard words in the tag generating rule are all first standard words, and the replaced standard words are all replaced first standard words.
Optionally, in the comment tag generation method according to the present invention, a dimension standard word matched in the tag generation rule is determined; determining emotion standard words matched with the tag generation rules, or optionally selecting one emotion standard word with the same emotion polarity as that of the target single-dimension clause from the tag generation rules; and splicing the selected dimension standard words and emotion standard words into clause tags.
Optionally, in the comment tag generating method according to the present invention, the step of concatenating the selected dimension standard word and emotion standard word into the clause tag includes: respectively determining second standard words corresponding to the dimension standard words and the emotion standard words; selecting one dimension standard word from the first standard word and the second standard word of the dimension word, and selecting one emotion standard word from the first standard word and the second standard word of the emotion word; and splicing the selected dimension standard words and emotion standard words into clause labels.
Optionally, in the comment tag generating method according to the present invention, before extracting the plurality of single-dimensional clauses cut from the target comment, the method further includes the step of: and carrying out data preprocessing operation on the target comment, wherein the data preprocessing operation comprises at least one of deleting HTML codes, replacing invisible characters, case-to-case conversion of English characters, half-angle/full-angle conversion of punctuation marks and deleting nonstandard characters.
According to another aspect of the present invention, there is provided a comment tag generating apparatus adapted to reside in a computing device having stored therein a rule set, each element of the rule set generating a rule association for an evaluation dimension with a corresponding tag, the apparatus comprising: the clause segmentation module is suitable for extracting a plurality of single-dimension clauses segmented from the target comment, wherein the single-dimension clauses are clauses with only one dimension word and emotion word; the normalization module is suitable for replacing the dimension words and emotion words of each single-dimension clause with corresponding dimension standard words and emotion standard words respectively based on a prestored standard word dictionary; the rule matching module is suitable for judging each single-dimension clause as a target single-dimension clause if the standard word after replacement can be matched with the corresponding label generation rule in the rule set based on the rule matching module; the tag generation module is suitable for generating clause tags of the target single-dimension clauses based on tag generation rules matched with each target single-dimension clause, so that comment tags of the target comments are obtained.
According to yet another aspect of the present invention, there is provided a computing device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of the comment tag generation method described above.
According to yet another aspect of the present invention, there is provided a readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, implement the steps of the comment tag generation method described above.
According to the technical scheme, the comment content is segmented into the single-dimension clauses only containing one dimension word and emotion word, mutual independence among each clause is guaranteed, and fine granularity emotion analysis accuracy is improved. And extracting dimension words and emotion words in the single-dimension clause, generating rules by using categories and labels where complex rule sets are located, and judging each emotion polarity by using an emotion model. Therefore, the problems of small dimension distinction and few categories are effectively solved, emotion analysis is carried out on one category only when the emotion model is judged, errors caused by the distinction are greatly reduced, and the accuracy is greatly improved.
In addition, when complex rules are matched, normalized standard words are adopted for matching, so that statistics is convenient. When the label is generated, the emotion word expression of different object systems is set, for example, the emotion word expression of the dynamic label under the brand A is satisfactory in power, and the emotion word expression under the brand B is powerful in power and the like. The labels under each system can be counted, and one of the candidate words can be selected to generate the labels, so that the diversity of the labels is realized.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which set forth the various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to fall within the scope of the claimed subject matter. The above, as well as additional objects, features, and advantages of the present disclosure will become more apparent from the following detailed description when read in conjunction with the accompanying drawings. Like reference numerals generally refer to like parts or elements throughout the present disclosure.
FIG. 1 illustrates a block diagram of a computing device 100, according to one embodiment of the invention;
FIG. 2 illustrates a flow diagram of a comment tag generation method 200 according to one embodiment of the invention;
FIG. 3 illustrates a process exploded schematic diagram of comment tag generation according to one embodiment of the invention;
FIG. 4 shows a schematic diagram of a portion of an automotive domain index system according to one embodiment of the invention;
FIG. 5 shows a standard word schematic diagram according to one embodiment of the invention;
FIG. 6 shows a schematic diagram of a tag generation rule according to one embodiment of the invention
FIG. 7 illustrates a candidate word schematic at the time of generating a comment tag according to one embodiment of the invention;
FIG. 8 illustrates a schematic diagram of a generated comment tag according to one embodiment of the invention; and
fig. 9 shows a block diagram of a comment tag generating apparatus 900 according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 is a block diagram of a computing device 100 according to one embodiment of the invention. In a basic configuration 102, computing device 100 typically includes a system memory 106 and one or more processors 104. The memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing including, but not limited to: a microprocessor (μp), a microcontroller (μc), a digital information processor (DSP), or any combination thereof. The processor 104 may include one or more levels of caches, such as a first level cache 110 and a second level cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations, the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The system memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some implementations, the application 122 may be arranged to operate on an operating system with program data 124. Program data 124 includes instructions, in computing device 100 according to the present invention, program data 124 contains instructions for performing comment tag generation method 200.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to basic configuration 102 via bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices such as a display or speakers via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communication with one or more other computing devices 162 via one or more communication ports 164 over a network communication link.
The network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media in a modulated data signal, such as a carrier wave or other transport mechanism. A "modulated data signal" may be a signal that has one or more of its data set or changed in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or special purpose network, and wireless media such as acoustic, radio Frequency (RF), microwave, infrared (IR) or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 100 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., as part of a small-sized portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-watch device, a personal headset device, an application-specific device, or a hybrid device that may include any of the above functions. Computing device 100 may also be implemented as a personal computer including desktop and notebook computer configurations. In some embodiments, computing device 100 is configured to perform comment tag generation method 200.
FIG. 2 illustrates a flow diagram of a comment tag generation method 200 according to one embodiment of the invention. The method 200 is performed in a computing device, such as the computing device 100. The detailed process of the tag generation method 200 can be understood with reference to the decomposition process in fig. 3. As shown in fig. 2, the method starts at step S210.
In step S210, a plurality of single-dimension clauses cut from the target comment are extracted, the single-dimension clauses being clauses having only one of a dimension word and an emotion word.
According to one embodiment, before the splitting of the multiple single-dimension clauses in the target comment, the method further includes the steps of: and carrying out data preprocessing operation on the target comment to remove text information which exists in the text and can influence the result. The data preprocessing operation includes at least one of deleting HTML code, replacing invisible characters, case-to-case conversion of english characters, half/full angle conversion of punctuation marks, and deletion of non-standard characters. Specifically, a space or other special symbol is converted into comma, lowercase English is converted into uppercase English, half-angle symbols are all converted into full-angle symbols, and nonstandard characters such as rare characters are deleted.
According to another embodiment, the step of extracting a plurality of single-dimensional clauses cut from the target comment includes: and dividing the target comment into a plurality of short sentences, and performing word segmentation processing on the plurality of short sentences to identify the target short sentences with dimension words and emotion words. When the target phrase exceeds a predetermined number of words (e.g., 20 words, although not limited thereto) or there are more than two (including two) emotion words, the target phrase is split into a plurality of single-dimensional phrases based on a pre-trained phrase model.
Specifically, in a first step, a target comment is split into a plurality of phrases according to punctuation marks of the target comment. For example, the target comment is first split into long sentences according to periods, exclamation marks, semicolons, and the like, and then split into short sentences according to commas, stop marks, and the like in each long sentence. Of course, if the target comment does not have punctuation marks, the target comment is segmented into a plurality of single-dimension clauses directly based on the clause model.
Moreover, before dividing the target comment into a plurality of short sentences, the target comment with the special sentence pattern can be removed first, and then the rest target comments are divided into a plurality of short sentences. The special sentence patterns comprise a question-back sentence, an associated sentence and the like, and the special sentence patterns can be respectively judged by identifying the query symbol and the associated word in the sentence. The association sentence comprises at least one of turning relation, assumption relation, condition relation, selection relation, bearing relation, parallel relation and causal relation, and each association relation has corresponding association words, such as combination of 'although/but' combination, combination of 'if/just', combination of 'whenever/just', combination of 'not/just', and the like.
Considering that the related sentences may interfere with judgment of emotion polarity and even cause generation of wrong emotion labels, the method performs elimination processing on the related sentences containing the common related words and generates labels based on the rest corpus. This reduces the amount of corpus, but ensures that false emotion labels are not generated. Moreover, when the data volume of the comment corpus is very large, the influence of the rejecting method on the corpus volume is negligible.
For related sentences of the progressive relationship, for example, a "not only/but also" combination, a "not only/containing" combination, and the like, the emotion tendencies of the preceding and following sentences are identical. Therefore, in one implementation, the present invention can retain association sentences having progressive relationships, while deleting association sentences of other association relationships.
Secondly, after dividing the target comment into a plurality of short sentences, performing word segmentation processing on each short sentence by adopting a word segmentation model, and extracting at least one of dimension words, emotion words and filtering words which respectively represent evaluation dimension, emotion tendency and interference information.
For example, dimension words include, for example, power, fuel consumption, space, front head space, front tail space, and the like. The rating dimension represents a rating Aspect, which may be referred to as a rating class, rating Aspect or index system, which is a fine-grained rating dimension that includes path relationships from top to finest, i.e., major to minor, such as "space-front head space". FIG. 4 illustrates an example of a portion of an automotive domain metrics system according to one embodiment of the invention, where these evaluation dimensions may be summarized as 10 major classes of space, power, interior trim, fuel consumption, electricity consumption, cost performance, handling, comfort, appearance, failure, etc., and where the multiple major classes may be subdivided into 552 fine-grained minor classes, such as front head space, front tail space, trunk space, etc. The emotion words include, for example, emotion words of adjectives, spacious, large, satisfactory, good, bad, and the like.
The filtering words mainly comprise interference words irrelevant to dimensions, such as a schoolbag, plastics, roadsides and the like. These interfering words may affect the generation of labels and even lead to labels that generate false emotions. For example, a trunk may hold a small bag, and this grammar may extract the tag that the trunk is small, and the user wants the trunk to be large. Therefore, the invention extracts the filter words in the sentences to perform subsequent recognition processing. In one implementation, if a sentence (e.g., a phrase) contains a filter word, the sentence is deleted so that the sentence is no longer involved in subsequent tag generation processes. In another implementation, the filter term may be identified before it is processed in a subsequent step.
In addition, when word segmentation is carried out, the degree word and the special prompt word direction in each short sentence can be extracted, and the emotion degree and the auxiliary mood are correspondingly represented. The terms of degree include, for example, very, extremely, extra, twelve minutes, etc., and the auxiliary mood typically includes terms such as listening to, speaking to, meaning to, etc.
The word segmentation method is carried out by adopting a word segmentation model, wherein standard word dictionaries of each word are stored in the word segmentation model, and each standard word dictionary comprises at least one of a dimension word dictionary, a emotion word dictionary, a filtering word dictionary and a macro file. The macro file represents the association relation between the dimension word and the corresponding modifiable emotion word, for example, the space dimension modifiable emotion word comprises spacious, big, narrow, enough, small and the like. Each standard word may be expressed as a cluster of standard words and a plurality of synonyms thereof, the standard words may have a first standard word and a second standard word, and an example format is shown in fig. 5, where words at the left and right ends of "# # #" respectively represent the first standard word and the second standard word, and words after "@ @ @ @ @" represent synonyms of the standard word. These words are stored in the word segmentation model so that keywords in sentences are extracted from the words.
According to one embodiment, the word segmentation model includes at least one of an LTP language technology platform, a CRF++ conditional random field model, and an AC automaton model. Taking an AC automaton as an example, preparing a word list in advance, wherein the list contains all dimension words, emotion words, degree words, special prompt words, filtering words and the like, and inputting the dimension words, emotion words, degree words, special prompt words, filtering words and the like into the AC automaton in a list form.
In order to improve the accuracy of word segmentation results, each phrase is subjected to word segmentation processing by using a plurality of word segmentation models, and a plurality of word segmentation results are compared. When the word segmentation results of the word segmentation models are inconsistent with each other for a certain sentence, the word segmentation result with the longest word segmentation length is used as the reference. For example, in the word segmentation result of a phrase, the word segmentation result of some two models is "front row" and "head", and the word segmentation result of the third model is "front row head", and the phrase is based on the word segmentation result of the third model.
In addition, ambiguities may exist in the phrases, such as general, equiprobable words, which can be used as adjectives (power is very general, power is also possible) and also as degree words (four people can be done in the back row and four people can be sitting in the back row). The sense of space may be a noun (the sense of space in the front row is very sufficient) or an adjective (the sense of space in the front row is very sufficient). Therefore, the invention can train the polysemous word classifier and determine whether the polysemous word belongs to the emotion word or the degree word or whether the polysemous word belongs to the noun or the emotion word according to the classifier. For a certain word, a plurality of corpus containing the word can be collected, the attribute of the corpus is labeled in a classified mode, the corpus of the labeling party is divided into a training set and a testing set, modeling training is carried out based on the training set, testing is carried out based on the testing set, and a model with highest accuracy is selected as a final model.
Taking "general" as an example, the two classes are marked as emotion words and degree words, 100 corpus containing "general" are extracted from the corpus for marking, 80 corpus are randomly extracted as training sets, and the rest 20 test sets are obtained. And using word2vec to train 80 training set training word vectors as input features, using SVM models to train modeling of the input features and corresponding labels, and testing the rest 20 test sets. The same method is carried out for a plurality of times, one model with the highest accuracy is selected as a final model, and the accuracy is more than 95% after a plurality of tests.
Based on the word segmentation result, short sentences containing dimension words and emotion words can be identified as target short sentences and used for generating labels; and only dimension words or only emotion words, or short sentences without dimension words or emotion words are rejected. It should be noted that, a special word library may also be maintained in the computing device, and specific representations of some dimension words, for example, "strong push back feel" of user comments actually refers to strong power, so when these specific representations are identified, these words may be replaced with corresponding dimension words, where short sentences that have no dimension word originally have dimension words.
Thirdly, when the target comment does not have punctuation marks, or the cut short sentence is overlong, or the cut short sentence has more than two emotion words, the sentence dividing model is used for dividing the sentence.
Taking a short sentence of 'trunk is large and deep' as an example for more than two emotion words, in one implementation mode, the short sentence is segmented into 'trunk is large and deep', and the segmented first sentence contains dimension words and emotion words at the same time and can be used as a single-dimension clause; and the second sentence contains only emotion words, which will be rejected. This way the accuracy of all clauses finally presented is guaranteed. And when the comment corpus is very large, the method for deleting part of evaluation words has negligible influence on the actual results.
Of course, in another implementation, the emotion words in the following sentence may be added with the subject, that is, with the subject of "trunk" based on the dimension word of the first sentence, so that the sentence with more than two emotion words is split into two single-dimension clauses. This approach increases the expected amount of evaluation material, but false additions may occur when adding subjects of emotion words, resulting in the generation of evaluations of other dimensions.
In addition, in some comments, words such as a 'front-row very space sense' and a 'car very dynamic sense' appear, and the 'space sense' and the 'dynamic sense' can simultaneously represent dimension words and emotion words, so that the sentences can be considered to have dimension words and emotion words at the same time and can be used as a single dimension clause.
The clause model is adapted to output pre-and post-characters of the position of addition of the segmentation symbol in the sentence, that is, pre-and next-marks in the following table, to add the segmentation symbol (such as comma) between the pre-and post-characters, thereby segmenting the sentence into a plurality of single-dimensional clauses. It should be understood that the last word at the end of a sentence has only the front character mark and no rear character mark, and that either comma or period can be appended to the symbol to make a sentence segmentation.
Optionally, step S210 may further include a step of training the clause model: and acquiring text contents which are already segmented into single-dimension clauses, and training the constructed clause model by taking the text contents as a training set to obtain a trained clause model. The training set is marked with a front character position and a rear character position added by a segmentation symbol when the text content is segmented into single-dimension clauses.
Sheet of paper Mr. You good That is You See One at a lower part To the point of When in use You Can be used for Directly and directly Conversely
o o pre next o o pre next o o o o pre
The sentence pattern is derived from the problem of sequence labeling, and the invention uses a CRF conditional random field pattern to label keywords pre and next, before and after the sign is added, and after the corpus in the corpus is collected for manual labeling, a training set and a testing set are generated according to the proportion of 8:2 so as to respectively train and test the pattern. The accuracy of the method for corpus segmentation can reach more than 93%, the segmentation effect is good, the accuracy of subsequent emotion analysis is improved, and one dimension is ensured to be modified by one emotion word only. For example for sentences that do not contain punctuation marks: the interior decoration is slightly rough, and the seat can be placed into the bottled large cola front row seat in a large ultra-deep way without the armrest box, and the seat is better without the armrest. The sentence is processed and comma is added to become: the interior decoration is slightly rough, but the armrest box is large enough to be placed into bottled cola in ultra-deep mode, and the front row seat has no armrest and is good if the front row seat has the armrest.
Through step S210, a plurality of single-dimension clauses in the target comment are extracted.
Subsequently, in step S220, the dimension word and emotion word of each single-dimension clause are replaced with the corresponding dimension standard word and emotion standard word, respectively, based on the pre-stored standard word dictionary.
That is, based on the dimension standard word dictionary, the dimension words are replaced with dimension standard words, and the emotion standard words are replaced with emotion standard words. As described above, the dimension standard words include a first dimension standard word and a second dimension standard word, and the emotion standard words include a first emotion standard word and a second emotion standard word, where the emotion standard words are replaced by corresponding first standard words, that is, by the first dimension standard words and the first emotion standard words when replaced. For example, as shown in fig. 5, "main driving" in the dimension word is replaced with "front line", and "i' me meaning" in the emotion word is replaced with "satisfaction".
In addition, in step S220, the identified filter term may also be converted into a filter criterion term. Words such as "cup", "spy photo", "cup", "toy" and the like can be converted into standard words such as "things". In addition, step S220 may further store the dimension standard words, emotion standard words, and filtering standard words extracted from each target single-dimension clause, for example, as a word tuple or as a group of words in a list, so as to facilitate subsequent rule matching.
Subsequently, in step S230, for each single-dimension clause, if the standard word based on which replacement can be matched to the corresponding tag generation rule in the rule set, the single-dimension clause is determined as the target single-dimension clause.
Here, a rule set is stored in the computing device in advance, and each element in the rule set generates a rule association relationship between the evaluation dimension and the corresponding label. The evaluation dimension may be represented by a dimension identification that may employ a hierarchical subdivision representation from major classes to minor classes, such as "space-ride space-front head space". The tag generation rule represents a rule matching method and a tag generation method for each dimension, and it is known that one tag generation rule can determine whether a sentence matches the rule, and generate a corresponding tag based on the rule. The evaluation dimension identification and the corresponding tag generation rule may be distinguished by special symbols, such as "# #" distinction.
The tag generation rule includes a plurality of placeholders, each representing a standard word X, and the plurality of placeholders are connected by logical operation symbols. The standard word X comprises at least one of a dimension standard word, an emotion standard word related to the dimension and a filtering standard word. The logical operators include at least one of "%", "()", "and", "or", and "not". The "%" is filled with standard words X, "()" represents a priority operation, "and" represents simultaneous occurrence, "or" represents one of them, and "not" represents a filter word connected after the occurrence of the filter word.
According to one embodiment, the tag generation rule may be expressed as: (% dimension standard (% and% emotion standard for the energy adjective dimension%) not% dimension independent). Where X between%x% represents a standard term for a keyword X, such as "% head space in front" represents one or more standard terms associated with "head space in front". Further, the standard words are the first standard words in the corresponding word clusters.
Based on this, the step of determining the target single-dimension clause includes: for each single-dimension clause, if the replaced dimension standard word and emotion standard word exist in a certain tag generation rule in the rule set at the same time, and the clause does not contain the filtering standard word in the tag generation rule, judging the single-dimension clause as a target single-dimension clause. Briefly, if a sentence contains words before "not" and does not contain words after "not" in a tag generation rule, then it is represented that the sentence matches the tag generation rule.
For dimension words, in one implementation, a dimension may be a large class of expressions, and then the dimension standard word at that time contains only standard words of the large class of words. For example, the dimension "space" includes only the dimension standard terms that are space. In another implementation, the dimension may be a fine-grained subclass representation, and the dimension word standard word includes standard words of a plurality of sub-dimension words if the dimension word is split into the plurality of sub-dimension words. If the replaced dimension word in the single-dimension clause simultaneously contains standard words of the plurality of sub-dimension words, the dimension word representing the single-dimension clause matches the dimension word rule.
For example, the dimension of "front head space" is divided into three dimension words of "front", "head" and "space" as fine-grained subclasses, and then the standard word of "front head space" needs to include the standard words of "front", "head" and "space" at the same time. Given that words in a single-dimension clause are converted into standard words, three words of front row, head and space are simultaneously included in the converted standard words, and the rule of "% front head space% is represented as dimension word matching in the sentence.
Next, matching of emotion word rules is performed, and "% of emotion standard words in adjective dimension" represents emotion standard words in adjective dimension. The emotion words capable of adjective dimension include general emotion words, positive emotion words and negative emotion words. Wherein, the general emotion words can be used for words of any object evaluation, such as standard words of 'good', 'satisfactory', 'normal', and the like. Positive emotion words represent good evaluation objects, for example, positive emotion words of a space include spacious, large, sufficient, and the like. Negative emotion words represent evaluation of negative emotion words for aberrations, e.g. space, including stenosis, crowding, smallness, etc. The evaluation of the emotion words can exclude the situation that the dimension words and the emotion words are mutually irrelevant in some user comments, such as 'space strong'.
As described above, these emotion words are all first standard words, and emotion words in a single dimension clause are converted to corresponding first emotion standard words. Therefore, after the dimension word after the single-dimension clause is replaced matches the dimension word rule in a certain tag generation rule, the single-dimension clause is further matched with the emotion word rule as long as the replaced emotion standard word exists in the emotion word rule in the tag generation rule.
For the matching of the filtering words, the filtering standard words are generally located after the "not" logical operator and represent dimension-independent words, such as dimension-independent words of space including things, speed bumps, roadsides, plastics and the like. If a filtering standard word exists in a certain tag generation rule and the filtering word standard word after one single-dimension clause is replaced comprises 'things', the filtering word rule is matched with the clause. Preferably, when the matching of the tag generation rule is performed, the matching of the filter word is performed first, and when a certain single-dimension clause hits the filter word rule of a certain tag generation rule, the tag generation rule can be directly eliminated, and then the matching of the dimension word and the emotion word is performed.
In a specific implementation mode, the invention adopts the inverted index to judge a plurality of candidate rules, finds the category of all dimension words and emotion words of the single dimension clause, and then judges the final unique label generation rule by using the stack quick matching rule. For example: if the dimension standard word of a single dimension clause is extracted to be 'space', all categories related to the 'space' are extracted, and then the categories of 'front row' and 'head' are used for locking 'space-riding space-front row head space'.
As described above, the dimension standard words, emotion standard words, and filtering standard words replaced by one single dimension clause are stored in the list. When stack matching is carried out, all candidate rules can be extracted to press stacks, then dimension words, emotion words and filtering words in the candidate rules are taken from the stack from the pointer at the top of the stack one by one and are compared with words in a list one by one, if the words in a certain tag generation rule are matched with the words in the list, the single-dimension clause is judged to be a target single-dimension clause, and the tag generation rule is hit.
Further, the tag generation rule may further include candidate evaluation words, connected by an "or" logical operator, representing a parallel option equivalent to a combination of a single dimension word and a single emotion word. For example, "the head of the front row has a sense of space" and "the head of the front row is spacious" are two alternative evaluation schemes. As shown in fig. 6, the tag generation rule thereof is: space-seating space-front head space # # # ((% front head space% and% energy adjective space) or% front head space sense) non% space independent word), wherein the front head space sense can be split into three standard words of front, head and space sense, and when a single dimension clause hits the candidate evaluation word, dimension word and emotion word thereof can be considered to hit the tag generation rule.
Subsequently, in step S240, the clause tag of each target single-dimension clause is generated based on the tag generation rule matched with the target single-dimension clause, so as to obtain the comment tag of the target comment.
The clause tags of the target single-dimension clauses together form comment tags of the target comments. For any target single-dimension clause, when generating a clause label, determining dimension standard words and emotion standard words matched in the label generation rule, and splicing the selected dimension standard words and emotion standard words into the clause label. For example, if the dimension standard word and the emotion standard word matched in the tag generation rule are "front head space" and "big", respectively, the clause tag "front head space big" may be generated.
Further, after the matched emotion standard word is determined, an emotion standard word with the same emotion polarity as the emotion standard word can be optionally selected from the tag generation rule, and the emotion standard word and the determined dimension standard word are spliced into a clause tag. For example, "spacious", "sufficient" and "large" are positive evaluation words of the evaluation space, and when it is determined that the matched emotion word is "spacious", one of the positive words may be optionally used as an emotion tag to increase the diversity of the generated tags.
According to one embodiment, in step S240, emotion analysis may be performed on each target single-dimension clause using an emotion analysis model, to obtain emotion polarities of the target single-dimension clause. Wherein the emotion polarity includes at least one of positive emotion, negative emotion and neutral emotion. The emotion analysis model is combined with a BERT model and a logistic regression model, wherein the BERT model is used for generating word vectors, effectively acquiring context information, and the logistic regression model is used for outputting the bi-classification polarity or the tri-classification polarity of emotion. The classification polarities include positive emotion and negative emotion, and the three classification polarities include positive emotion, negative emotion and neutral emotion.
In the training process of the emotion analysis model, a pre-training model is used for fine adjustment at the downstream, and Chinese wikipedia is used as training corpus for pre-training. Firstly, acquiring a plurality of texts with emotion classification labels, and inputting two columns of contents when downstream fine tuning: the first column is emotion category, the second column is emotion sentence, and finally, a classification model is generated according to the call parameters. The method comprises the steps of deploying a classification model into a remote service by using an open-source BERT-as-service open-source tool, acquiring sentence vectors of texts according to a service calling mode, and independently training a two-class or three-class LR model to acquire a final classification result. Through a real test, the classification accuracy can reach 94.6%, and the three classification accuracy can reach 94%. It can be seen that this model training is much more generalizable than the overall training.
Based on the above, when generating clause labels, the emotion polarity of the target single-dimension clause can be obtained first, and an emotion standard word with the same emotion polarity is selected from label generation rules matched with the target single-dimension clause. Thus, the clause tag can be spliced together with the dimension standard words matched in the tag generation rule. For example, in fig. 7, there are multiple candidate words such as "enough", "spacious" and "big" in the spatial forward emotion words, and when the target single-dimension clause is forward emotion polarity, one of the multiple candidate words may be selected to generate an emotion tag.
In this way, the invention firstly utilizes the complex rule set to extract dimension words and emotion words in sentences so as to hit the dimension category where the dimension words and emotion words are located, and then utilizes the emotion model to judge the emotion polarity of the sentences. Therefore, the problems of small dimension distinction and few categories can be effectively solved, emotion analysis is only carried out on one category when an emotion model is judged, one target single-dimension clause corresponds to one emotion polarity and corresponds to one clause label, errors caused by the distinction are greatly reduced, and the accuracy is improved.
Further, consider that a first standard word is used in the tag generation rule. Therefore, in order to further improve the tag diversity, after the corresponding first dimension standard word and the first emotion standard word are selected in the tag generation rule, the second dimension standard word corresponding to the first dimension standard word and the second emotion standard word corresponding to the first emotion standard word can be searched continuously. Thus, one dimension standard word is selected from the first standard word and the second standard word of the dimension word, one emotion standard word is selected from the first standard word and the second standard word of the emotion word, and the selected dimension standard word and emotion standard word are spliced into clause labels.
For example, the second standard word of the driver's seat is the driver's seat, the satisfied second standard word is the feeling and the intention, and thus the label which can be generated has the driver's seat satisfaction, the driver's seat intention, the driver's seat satisfaction, the driver's seat intention and the driver's seat intention, and 2*3 =6 alternatives are all satisfied, so that the label diversity is satisfied.
Further, in step S240, a corresponding tag word scheme may be set for each object system, where the tag word scheme includes the dimension standard word and the emotion standard word that should be selected. For example, "driver seat satisfaction" for a brand label, and "driver seat mind" for a brand label, increase the variety of brand evaluation.
Optionally, in step S240, an emotion word with the same emotion polarity as that of the target single-dimension clause may be optionally selected from the macro file in the dimension, and spliced with the dimension standard word to form a clause tag. And the clause label also comprises hidden attribute words and degree words in the target single-dimension clause. The expression form of the tag generation rule at this time may be: dimension words (implicit/mandatory) +attribute (mandatory/implicit) +degree (optional) +emotion words (mandatory), or dimension words (mandatory) +attribute (implicit) +degree (optional) +emotion words (mandatory) +emotion polarity. Wherein the "must have" means must have, the "hidden" means can hide the presence of the "optional" means may have or not.
If the following comment text exists: the most satisfactory point is to say the space, and in this case the bar is described in more detail, in which the front head of the polo car is quite spacious and the back pushing feeling is quite strong when the car is driven. Wherein, the front row head is spacious and hits the rule: space-riding space-front head space overall evaluation, hit dimension words are 'front head' (necessary), attribute words are 'space' (hidden), degree words are 'very' (optional), emotion words are 'spacious' (necessary), and emotion polarity is positive. The matched tag generation rule is: space-occupancy space-front space overall evaluation @ @ @ @ front # (front row and head space) and% space front #,% space front # # # # spacious, large, and like affective words. The corresponding generated labels are as follows: the front row header (space) is spacious, where "space" is a hidden attribute word.
The comment tag generation method is tested, 40 hot car systems are randomly selected in the public praise comments in the test process, tags and phrases of the car systems are checked for evaluation, 15 comments are checked for each short sentence under each tag, and whether the tags and the extracted content are correct or not is manually checked. FIG. 8 shows a schematic diagram of the generated comment tags including detailed classifications of each tag and each emotion polarity. The evaluation indexes comprise label accuracy and extracted content accuracy, and practice proves that the label accuracy reaches 94% and the extracted content accuracy reaches 96.5%.
Fig. 9 illustrates a block diagram of a comment tag generating apparatus 900 according to one embodiment of the invention that may reside in a computing device, such as computing device 100. As shown in fig. 9, the apparatus 900 includes a clause segmentation module 910, a normalization module 920, a rule matching module 930, and a tag generation module 940.
Clause segmentation module 910 extracts a plurality of single-dimension clauses segmented from the target comment, the single-dimension clause being a clause having only one of a dimension word and an emotion word. Optionally, the clause segmentation module 910 is further adapted to train a clause model and train an ambiguous word classifier, and determine whether the ambiguous word belongs to an emotion word or a degree word or whether the ambiguous word belongs to a noun or an emotion word according to the classifier. The clause segmentation module 910 may perform processing corresponding to the processing described above in step S210, and will not be described in detail herein.
The normalization module 920 replaces the dimension word and emotion word of each single dimension clause with the corresponding dimension standard word and emotion standard word, respectively, based on a pre-stored standard word dictionary. The normalization module 920 may perform a process corresponding to the process described above in step S220, and a detailed description will not be repeated here.
The rule matching module 930 determines, for each single-dimension clause, the single-dimension clause as the target single-dimension clause if the standard word based on its substitution can be matched to the corresponding tag generation rule in the rule set. The rule matching module 930 may perform a process corresponding to the process described above in step S230, and a detailed description will not be repeated here.
The tag generation module 940 generates a clause tag of each target single-dimension clause based on the tag generation rule matched with the target single-dimension clause, thereby obtaining a comment tag of the target comment. The tag generation module 940 may perform a process corresponding to the process described above in step S240, and a detailed description thereof will not be repeated here.
According to one embodiment of the present invention, apparatus 900 may further include a preprocessing module and an emotion analysis module (neither shown). The preprocessing module is suitable for carrying out data preprocessing operation on the target comments. The emotion analysis module is suitable for carrying out emotion analysis on each target single-dimension clause by adopting an emotion analysis model to obtain emotion polarity of the target single-dimension clause.
According to the technical scheme, fine-granularity emotion analysis based on user comments is provided, the dimension words and emotion words in sentences are extracted by using a complex rule set and are combined with an emotion analysis model to divide the fine-granularity emotion analysis into two parts, the first part extracts dimension classification, the second part emotion analysis, and the complexity of analysis is greatly reduced in engineering. In addition, the rule set is easy to understand and has high resolution performance in the process of extracting dimension words and emotion words, and is more similar to programming language design.
The method of A8, A6, further comprising the steps of: training a polysemous word classifier, and determining whether the polysemous word belongs to an emotion word or a degree word or whether the polysemous word belongs to a noun or an emotion word according to the classifier. A9, the method of A3, wherein the clause model is suitable for outputting a front character and a rear character of a segmentation symbol adding position in the sentence so as to add segmentation symbols between the front character and the rear character, thereby segmenting the sentence into a plurality of single-dimension clauses. A10, the method of A3, further comprising the training step of the clause model: and acquiring text contents which are already segmented into single-dimension clauses, and training the constructed clause model by taking the text contents as a training set to obtain a trained clause model.
A11, the method of any one of A1-A10, wherein the tag generation rule comprises a plurality of placeholders, each placeholder represents a standard word X, and the placeholders are connected through logic operation symbols; the standard word X comprises at least one of a dimension standard word, an emotion standard word related to the dimension and a filtering standard word, and the emotion standard word comprises at least one of a general emotion word, a positive emotion word and a negative emotion word. A12, the method of A11, wherein the logical operator comprises at least one of "%", "()", "and", "or", and "not", wherein "%" is filled with standard words X, "()" represents a priority operation, "and" represents simultaneous occurrence, "or" represents one of the two, and "not" represents non-occurrence. A13, the method of a11, wherein the step of determining the target single-dimension clause includes: for each single-dimension clause, if the replaced dimension standard word and emotion standard word exist in a certain tag generation rule in the rule set at the same time, and the clause does not contain the filtering standard word in the tag generation rule, judging the single-dimension clause as a target single-dimension clause.
A14, the method of any one of A1-A13, wherein each standard word dictionary is represented as a word cluster of standard words and synonyms thereof, the standard words comprise a first standard word and a second standard word, the standard words in the tag generation rule are all first standard words, and the replaced standard words are all replaced first standard words. A15, the method of A14, wherein the step of generating the clause tag of each target single-dimension clause based on the tag generation rule matched with the target single-dimension clause comprises the following steps: determining dimension standard words matched with the tag generation rules; determining emotion standard words matched with the tag generation rules, or optionally selecting one emotion standard word with the same emotion polarity as that of the target single-dimension clause from the tag generation rules; and splicing the selected dimension standard words and emotion standard words into clause tags.
A16, the method of A15, wherein the step of splicing the selected dimension standard words and emotion standard words into clause labels comprises the following steps: respectively determining second standard words corresponding to the dimension standard words and the emotion standard words; selecting one dimension standard word from the first standard word and the second standard word of the dimension word, and selecting one emotion standard word from the first standard word and the second standard word of the emotion word; and splicing the selected dimension standard words and emotion standard words into clause labels. The method of any one of A1-a16, wherein prior to extracting the plurality of single-dimensional clauses cut from the target comment, further comprising the steps of: and carrying out data preprocessing operation on the target comment, wherein the data preprocessing operation comprises at least one of deleting HTML codes, replacing invisible characters, case-to-case conversion of English characters, half-angle/full-angle conversion of punctuation marks and deleting nonstandard characters.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions of the methods and apparatus of the present invention, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U-drives, floppy diskettes, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the comment tag generation method of the present invention in accordance with instructions in the program code stored in the memory.
By way of example, and not limitation, readable media comprise readable storage media and communication media. The readable storage medium stores information such as computer readable instructions, data structures, program modules, or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with examples of the invention. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into a plurality of sub-modules.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Furthermore, some of the embodiments are described herein as methods or combinations of method elements that may be implemented by a processor of a computer system or by other means of performing the functions. Thus, a processor with the necessary instructions for implementing the described method or method element forms a means for implementing the method or method element. Furthermore, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is for carrying out the functions performed by the elements for carrying out the objects of the invention.
As used herein, unless otherwise specified the use of the ordinal terms "first," "second," "third," etc., to describe a general object merely denote different instances of like objects, and are not intended to imply that the objects so described must have a given order, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims (19)

1. A comment tag generation method executed in a computing device, the computing device having stored therein a rule set, each element of the rule set generating a rule association for an evaluation dimension and a corresponding tag, the method comprising the steps of:
extracting a plurality of single-dimension clauses cut from a target comment, wherein the single-dimension clauses are clauses with only one dimension word and emotion word, the dimension word represents the evaluation dimension, the evaluation dimension refers to an evaluation category, an evaluation aspect or an index system, and the evaluation dimension is a fine-grained evaluation dimension and comprises a path relation from a top level to a finest level, namely a path relation from a major class to a minor class; based on a pre-stored standard word dictionary, the dimension words and emotion words of each single-dimension clause are replaced by corresponding dimension standard words and emotion standard words respectively;
For each single-dimension clause, if the standard word after replacement can be matched with the corresponding label generation rule in the rule set, judging the single-dimension clause as a target single-dimension clause;
generating clause tags of each target single-dimension clause based on tag generation rules matched with the target single-dimension clause, so as to obtain comment tags of the target comments;
wherein the step of extracting the plurality of single-dimensional clauses cut from the target comment includes:
dividing the target comment into a plurality of short sentences, and carrying out word dividing processing on the plurality of short sentences to identify the target short sentences with dimension words and emotion words simultaneously;
when the target phrase exceeds a preset word number or more than two emotion words exist, the target phrase is segmented into a plurality of single-dimension phrases based on a pre-trained phrase model.
2. The method of claim 1, further comprising the step of:
carrying out emotion analysis on each target single-dimension clause by adopting an emotion analysis model to obtain emotion polarity of the target single-dimension clause, wherein the emotion polarity comprises at least one of positive emotion, negative emotion and neutral emotion.
3. The method of claim 1, wherein the step of splitting the target comment into a plurality of phrases comprises:
dividing the target comment into a plurality of short sentences according to punctuation marks of the target comment;
and if the target comment does not have punctuation marks, dividing the target comment into a plurality of single-dimension clauses based on the clause model.
4. The method of claim 1, wherein the step of splitting the target comment into a plurality of phrases comprises:
and eliminating the target comments with the special sentence patterns, and then cutting the rest target comments into a plurality of short sentences, wherein the special sentence patterns comprise the back question sentences.
5. The method of any of claims 1-4, wherein the step of word segmentation of the plurality of phrases comprises:
and performing word segmentation processing on each short sentence by adopting a word segmentation model, and extracting at least one of dimension words, emotion words and filtering words in the short sentences, wherein the words respectively represent evaluation dimension, emotion tendency and interference information.
6. The method of claim 5, wherein,
the standard word dictionary comprises at least one of a dimension word dictionary, an emotion word dictionary, a filtering word dictionary and a macro file;
and the macro file represents the association relation between the dimension word and the corresponding modifiable emotion word, and words in all standard word dictionaries are stored in the word segmentation model.
7. The method of claim 5, further comprising the step of:
training a polysemous word classifier, and determining whether the polysemous word belongs to an emotion word or a degree word or whether the polysemous word belongs to a noun or an emotion word according to the classifier.
8. The method of claim 1, wherein,
the clause model is suitable for outputting front characters and rear characters of the addition position of the segmentation symbol in the sentence so as to add the segmentation symbol between the front characters and the rear characters, thereby segmenting the sentence into a plurality of single-dimension clauses.
9. The method of claim 1, further comprising the step of training the clause model:
and acquiring text contents which are already segmented into single-dimension clauses, and training the constructed clause model by taking the text contents as a training set to obtain a trained clause model.
10. The method according to claim 1 to 4,
the tag generation rule comprises a plurality of placeholders, each placeholder represents a standard word X, and the placeholders are connected through logical operation symbols;
the standard word X comprises at least one of a dimension standard word, an emotion standard word related to the dimension and a filtering standard word, and the emotion standard word comprises at least one of a general emotion word, a positive emotion word and a negative emotion word.
11. The method of claim 10, wherein,
the logical operators comprise at least one of "%", "()", and "," or ", and" not ", wherein"% "is filled with standard words X," () "represents a preferential operation, and" represents simultaneous occurrence, or "represents one of the two, and" not "represents non-occurrence.
12. The method of claim 10, wherein the step of determining the target single-dimension clause comprises:
for each single-dimension clause, if the replaced dimension standard word and emotion standard word exist in a certain tag generation rule in the rule set at the same time, and the clause does not contain the filtering standard word in the tag generation rule, judging the single-dimension clause as a target single-dimension clause.
13. The method of any one of claim 1 to 4, wherein,
each standard word dictionary is expressed as a word cluster of standard words and synonyms thereof, the standard words comprise a first standard word and a second standard word, the standard words in the tag generation rule are all first standard words, and the replaced standard words are all replaced first standard words.
14. The method of claim 13, wherein the step of generating clause tags for each target single-dimensional clause based on the tag generation rule on which the target single-dimensional clause is matched comprises:
Determining dimension standard words matched with the tag generation rules;
determining emotion standard words matched with the tag generation rules, or optionally selecting one emotion standard word with the same emotion polarity as that of the target single-dimension clause from the tag generation rules;
and splicing the selected dimension standard words and emotion standard words into clause tags.
15. The method of claim 14, wherein the step of concatenating the selected dimension standard word and emotion standard word into clause tags comprises:
respectively determining second standard words corresponding to the dimension standard words and the emotion standard words;
selecting one dimension standard word from the first standard word and the second standard word of the dimension word, and selecting one emotion standard word from the first standard word and the second standard word of the emotion word; and
and splicing the selected dimension standard words and emotion standard words into clause tags.
16. The method of any one of claims 1-4, wherein prior to extracting the plurality of single-dimensional clauses cut from the target comment, further comprising the step of:
and carrying out data preprocessing operation on the target comment, wherein the data preprocessing operation comprises at least one of deleting HTML codes, replacing invisible characters, case-to-case conversion of English characters, half-angle/full-angle conversion of punctuation marks and deleting nonstandard characters.
17. A comment tag generating apparatus adapted to reside in a computing device having stored therein a rule set, each element of the rule set generating a rule association for an evaluation dimension with a corresponding tag, the apparatus comprising:
the clause segmentation module is suitable for extracting a plurality of single-dimension clauses which are segmented from the target comment, wherein the single-dimension clauses are clauses with only one dimension word and emotion word, the dimension word represents the evaluation dimension, the evaluation dimension refers to an evaluation category, an evaluation aspect or an index system, and the evaluation dimension is a fine-grained evaluation dimension and comprises a path relation from a top level to a finest level, namely a path relation from a major class to a minor class;
the normalization module is suitable for replacing the dimension words and emotion words of each single-dimension clause with corresponding dimension standard words and emotion standard words respectively based on a prestored standard word dictionary;
the rule matching module is suitable for judging each single-dimension clause as a target single-dimension clause if the standard word after replacement can be matched with the corresponding label generation rule in the rule set based on the rule matching module;
The tag generation module is suitable for generating clause tags of each target single-dimension clause based on tag generation rules matched with the target single-dimension clause, so that comment tags of the target comments are obtained;
wherein the clause segmentation module is further adapted to: dividing the target comment into a plurality of phrases, and performing word dividing processing on the phrases to identify a target phrase with dimension words and emotion words, wherein when the target phrase exceeds a preset word number or more than two emotion words exist, dividing the target phrase into a plurality of single-dimension phrases based on a pre-trained phrase model.
18. A computing device, comprising:
a memory;
one or more processors;
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-16.
19. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-16.
CN202010059910.2A 2020-01-19 2020-01-19 Comment tag generation method and device and computing equipment Active CN111241290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010059910.2A CN111241290B (en) 2020-01-19 2020-01-19 Comment tag generation method and device and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010059910.2A CN111241290B (en) 2020-01-19 2020-01-19 Comment tag generation method and device and computing equipment

Publications (2)

Publication Number Publication Date
CN111241290A CN111241290A (en) 2020-06-05
CN111241290B true CN111241290B (en) 2023-05-30

Family

ID=70866011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010059910.2A Active CN111241290B (en) 2020-01-19 2020-01-19 Comment tag generation method and device and computing equipment

Country Status (1)

Country Link
CN (1) CN111241290B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897955B (en) * 2020-07-13 2024-04-09 广州视源电子科技股份有限公司 Comment generation method, device, equipment and storage medium based on encoding and decoding
CN112559743B (en) * 2020-12-09 2024-02-13 深圳市网联安瑞网络科技有限公司 Method, device, equipment and storage medium for calculating government and enterprise network support
CN116090450A (en) * 2022-11-28 2023-05-09 荣耀终端有限公司 Text processing method and computing device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005128711A (en) * 2003-10-22 2005-05-19 Omron Corp Emotional information estimation method, character animation creation method, program using the methods, storage medium, emotional information estimation apparatus, and character animation creation apparatus
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension
WO2019214145A1 (en) * 2018-05-10 2019-11-14 平安科技(深圳)有限公司 Text sentiment analyzing method, apparatus and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005128711A (en) * 2003-10-22 2005-05-19 Omron Corp Emotional information estimation method, character animation creation method, program using the methods, storage medium, emotional information estimation apparatus, and character animation creation apparatus
CN106407236A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 An emotion tendency detection method for comment data
WO2019214145A1 (en) * 2018-05-10 2019-11-14 平安科技(深圳)有限公司 Text sentiment analyzing method, apparatus and storage medium
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李勇敢 ; 周学广 ; 孙艳 ; 张焕国 ; .中文微博情感分析研究与实现.软件学报.2017,(12),全文. *
李良强 ; 李开明 ; 白梨霏 ; 曹云忠 ; 吴亮 ; .网购农产品评论中的消费者情感标签抽取方法研究.电子科技大学学报(社科版).2018,(04),全文. *

Also Published As

Publication number Publication date
CN111241290A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111241290B (en) Comment tag generation method and device and computing equipment
Wong et al. Exploiting parse structures for native language identification
Zhang et al. Authorship identification from unstructured texts
Danisman et al. Feeler: Emotion classification of text using vector space model
White et al. How well sentence embeddings capture meaning
Mudinas et al. Combining lexicon and learning based approaches for concept-level sentiment analysis
Rain Sentiment analysis in amazon reviews using probabilistic machine learning
JP5356197B2 (en) Word semantic relation extraction device
CN108388660B (en) Improved E-commerce product pain point analysis method
Kieu et al. Sentiment analysis for Vietnamese
Shoukry et al. A hybrid approach for sentiment classification of Egyptian dialect tweets
US20160189057A1 (en) Computer implemented system and method for categorizing data
CN108885617B (en) Sentence analysis system and program
US20150199609A1 (en) Self-learning system for determining the sentiment conveyed by an input text
CN113569011B (en) Training method, device and equipment of text matching model and storage medium
CN111538828A (en) Text emotion analysis method and device, computer device and readable storage medium
Bilu et al. Claim synthesis via predicate recycling
Suchdev et al. Twitter sentiment analysis using machine learning and knowledge-based approach
Shyamasundar et al. Twitter sentiment analysis with different feature extractors and dimensionality reduction using supervised learning algorithms
Sapkota et al. Domain adaptation for authorship attribution: Improved structural correspondence learning
CN111507789A (en) Method and device for determining commodity attribute words and computing equipment
Rygl Automatic adaptation of author’s stylometric features to document types
JP5426292B2 (en) Opinion classification device and program
KR20110044112A (en) Semi-automatic building of pattern database for mining review of product attributes
Oghaz et al. Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant