CN112184323A - Evaluation label generation method and device, storage medium and electronic equipment - Google Patents

Evaluation label generation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112184323A
CN112184323A CN202011091969.6A CN202011091969A CN112184323A CN 112184323 A CN112184323 A CN 112184323A CN 202011091969 A CN202011091969 A CN 202011091969A CN 112184323 A CN112184323 A CN 112184323A
Authority
CN
China
Prior art keywords
evaluation
phrase
similarity
current
evaluation phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011091969.6A
Other languages
Chinese (zh)
Inventor
王千
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Fengzhi Technology Co ltd
Original Assignee
Shanghai Fengzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Fengzhi Technology Co ltd filed Critical Shanghai Fengzhi Technology Co ltd
Priority to CN202011091969.6A priority Critical patent/CN112184323A/en
Publication of CN112184323A publication Critical patent/CN112184323A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an evaluation label generation method and device, a storage medium and electronic equipment. Wherein, the method comprises the following steps: acquiring an evaluation text set issued by a plurality of user accounts in a network platform; performing word segmentation processing on each evaluation text in the evaluation text set respectively to obtain an evaluation word group set; converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases; determining a first current evaluation phrase and a second current evaluation phrase from the evaluation phrase set; and under the condition that the updated evaluation word group set reaches the label generation condition, extracting the evaluation labels for displaying on the network platform from the updated evaluation word group set. The invention solves the technical problem of disordered evaluation attribution caused by inaccurate evaluation labels.

Description

Evaluation label generation method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of networks, in particular to an evaluation label generation method and device, a storage medium and electronic equipment.
Background
With the development of online shopping, in order to facilitate users to browse the evaluation of commodities with a filter, an online shopping platform usually sets an evaluation tag on an evaluation page, so that users can perform evaluation browsing in corresponding categories by clicking the tag to quickly acquire corresponding evaluation information.
In the prior art, a fixed classification is usually set for an evaluation tag, so that an evaluation including the tag classification is assigned to the evaluation tag. However, as the number of evaluations increases, the contents of the comments are novel, and the generation of the evaluation tags is limited, so that the evaluations can be classified only in a certain type of existing evaluation tags, and the classification of the evaluation tags is not accurate enough, thereby causing a problem of disordered evaluation attribution.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an evaluation tag generation method and device, a storage medium and electronic equipment, which are used for at least solving the technical problem of disordered evaluation attribution caused by inaccurate evaluation tags.
According to an aspect of an embodiment of the present invention, there is provided an evaluation label generation method, including: acquiring an evaluation text set issued by a plurality of user accounts in a network platform; performing word segmentation processing on each evaluation text in the evaluation text set to obtain an evaluation word group set; converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases; repeatedly executing the following steps until the label generation condition is reached: determining two evaluation phrases from the evaluation phrase set as a first current evaluation phrase and a second current evaluation phrase; acquiring target similarity between the first current evaluation phrase and the second current evaluation phrase, wherein the target similarity is determined according to a first similarity used for indicating the similarity between the evaluation phrases and a second similarity used for indicating the similarity between evaluation word vectors; updating the evaluation word group set according to the comparison result of the target similarity and a first threshold value to obtain an updated evaluation word group set, and determining a next group of the first current evaluation word group and the second current evaluation word group from the updated evaluation word group set; and extracting the evaluation tags for presentation on the network platform from the updated evaluation phrase set when the updated evaluation phrase set reaches the tag generation condition.
According to another aspect of the embodiments of the present invention, there is also provided an evaluation label generation apparatus, including: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring an evaluation text set issued by a plurality of user accounts in a network platform; the processing module is used for performing word segmentation processing on each evaluation text in the evaluation text set to obtain an evaluation word group set; the conversion module is used for converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases; the control module is used for repeatedly executing the following steps until the label generating condition is reached: determining two evaluation phrases from the evaluation phrase set as a first current evaluation phrase and a second current evaluation phrase; acquiring target similarity between the first current evaluation phrase and the second current evaluation phrase, wherein the target similarity is determined according to a first similarity used for indicating the similarity between the evaluation phrases and a second similarity used for indicating the similarity between evaluation word vectors; updating the evaluation word group set according to the comparison result of the target similarity and a first threshold value to obtain an updated evaluation word group set, and determining a next group of the first current evaluation word group and the second current evaluation word group from the updated evaluation word group set; and a generation module, configured to extract the evaluation tag for presentation on the network platform from the updated evaluation phrase set when the updated evaluation phrase set reaches the tag generation condition.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned evaluation label generation method when running.
According to still another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the above-mentioned evaluation label generation method through the above-mentioned computer program.
In the embodiment of the invention, the method comprises the steps of obtaining the evaluation word vector of each evaluation phrase in the evaluation text, determining the latest evaluation word group set according to the comparison result of the target similarity between the evaluation word vectors and the first threshold, thereby extracting the representative words of the evaluation word group set as the evaluation labels, and achieving the purpose of generating the corresponding evaluation labels according to the evaluation text by controlling the label generation conditions of the evaluation word group set, thereby realizing the technical effect of the correspondence between the evaluation labels and the evaluation text, and further solving the technical problem of disordered evaluation attribution caused by inaccurate evaluation labels.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a schematic diagram of an application environment of an alternative evaluation tag generation method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of an alternative evaluation tag generation method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart diagram illustrating an alternative evaluation tag generation method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart diagram illustrating an alternative evaluation tag generation method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram illustrating yet another alternative evaluation tag generation method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an alternative evaluation label generating apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an alternative electronic device according to an embodiment of the invention;
fig. 8 is a schematic structural diagram of yet another alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present invention, there is provided a method for generating an evaluation tag, optionally, as an optional implementation manner, the method for generating an evaluation tag may be applied to, but is not limited to, an environment as shown in fig. 1. The terminal device 102 performs data interaction with the server 112 through the network 110.
Optionally, in this embodiment, the terminal device 102 is a terminal device using a network platform, and may include but is not limited to at least one of the following: mobile phones (such as Android phones, IOS phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, etc. The network 110 may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication. The server 112 may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and this is not limited in this embodiment.
As an alternative embodiment, as shown in fig. 2, the method for generating an evaluation label includes:
s202, acquiring an evaluation text set issued by a plurality of user accounts in a network platform;
s204, performing word segmentation processing on each evaluation text in the evaluation text set respectively to obtain an evaluation word group set;
s206, converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases;
s208, repeatedly executing the following steps until the label generation condition is reached:
s2082, determining two evaluation phrases from the evaluation phrase set as a first current evaluation phrase and a second current evaluation phrase;
s2084, obtaining a target similarity between the first current evaluation phrase and the second current evaluation phrase, wherein the target similarity is determined according to a first similarity used for indicating the similarity between the evaluation phrases and a second similarity used for indicating the similarity between the evaluation word vectors;
s2086, updating the evaluation word group set according to the comparison result of the target similarity and the first threshold value to obtain an updated evaluation word group set, and determining a next group of first current evaluation word group and second current evaluation word group from the updated evaluation word group set;
and S210, under the condition that the updated evaluation phrase set reaches the label generation condition, extracting the evaluation label for displaying on the network platform from the updated evaluation phrase set.
It should be noted that: the network platform refers to a network application platform that allows a user using the platform to evaluate based on items such as goods and services on the platform, that is, the network platform refers to a platform with an evaluation function, and may be, but is not limited to: shopping platform, service platform. The representation of the network platform may be, but is not limited to: web pages, application clients, applets.
The user account refers to an account number used by a user for logging in the network platform, and is used as identity authentication which is different from other users when the user acts on the network platform.
The evaluation text set refers to a set of information of evaluation performed by the user on the goods, services, and the like on the network platform based on the user account. The evaluation text may include, but is not limited to, the following: characters, numbers, expressions, symbols.
The evaluation word group set is a set of word groups obtained by processing the evaluation text. The processing manner for evaluating the word group of text processing may be, but is not limited to: text arrangement, format arrangement and word segmentation. Alternatively, the word segmentation process may be, but is not limited to: and splitting the text into word groups in a word stock. Optionally, the thesaurus may include, but is not limited to: chinese phrases, English phrases, E-commerce specific phrases, custom phrases. The E-commerce special phrase refers to a phrase applied to an online shopping platform. Alternatively, the custom phrase may be, but is not limited to being: a phrase referring to a compound word. Wherein, compound words may be, but are not limited to: the phrase contains complete semantic meaning and compound phrase formed by two or more than two phrases.
The evaluation word vector is data in a vector form generated by the evaluation phrase through a word vector model. The word vector model is a data processing model for performing semantic dimension reduction on input phrases. Alternatively, the word vector model may be, but is not limited to, using a Glove word vector model. The Glove word vector model is a word characterization tool based on global word frequency statistics and is used for expressing phrases into vector data consisting of real numbers, and the vector data capture semantic characteristics of the phrases.
The evaluation tag is an optional tag which is displayed at the top end of a web platform evaluation area page and is convenient for browsing the evaluation of the content including the content corresponding to the tag, and the evaluation under the corresponding category can be screened for browsing by clicking the tag.
The specific expression that the evaluation word group set reaches the tag generation condition may be, but is not limited to: the target similarity among all the evaluation phrases in the evaluation phrase set is greater than or equal to a first threshold value.
In the embodiment of the application, evaluation phrases are obtained according to the obtained evaluation texts, and the target similarity between the evaluation phrases is compared with a preset threshold until a latest evaluation word group set is obtained, so that evaluation labels corresponding to the evaluation word group set are generated. Through the control of the label generation conditions of the evaluation word group set, the purpose of generating corresponding evaluation labels according to the evaluation texts is achieved, so that the technical effect that the labels correspond to the evaluation texts is achieved, and the technical problem that the evaluation attribution is disordered due to inaccurate evaluation labels is solved.
As an optional implementation manner, the obtaining of the target similarity between the first current evaluation phrase and the second current evaluation phrase includes:
acquiring a first similarity and a second similarity, wherein the first similarity comprises: an editing similarity for indicating an editing distance between the first current evaluation phrase and the second current evaluation phrase, and a co-occurrence similarity for indicating the number of characters co-occurring in the first current evaluation phrase and the second current evaluation phrase, the second similarity including: the vector similarity is used for indicating the cosine distance between a first current evaluation word vector corresponding to the first current evaluation phrase and a second current evaluation word vector corresponding to the second current evaluation phrase;
and determining the target similarity according to the weighted summation result of the editing similarity, the co-occurrence similarity and the vector similarity.
It should be noted that: the first similarity includes an edit similarity and a co-occurrence similarity, the second similarity includes a vector similarity, and the target similarity is a result of a weighted sum of the edit similarity, the co-occurrence similarity, and the vector similarity.
Optionally, the calculation method of the editing similarity may be, but is not limited to:
Figure BDA0002722422980000071
wherein the bitsimRepresenting edit similarity, x representing edit distance, len (str1) representing the phrase length of the first current evaluation phrase, and len (str2) representing the phrase length of the second current evaluation phrase.
It should be noted that: the phrase length is the number of characters included in the phrase. The editing distance refers to the minimum editing operation step for converting a first current evaluation phrase into a second current evaluation phrase between two phrases. Wherein the editing operation comprises: replacing a character in the phrase, deleting a character in the phrase, and adding a new character in the phrase, wherein the editing operation takes one character as one step. For clarity, the description is given by way of example. For example, the first current evaluation phrase is: the quality is improved, and the second current evaluation phrase is: the quality is good. The least editing steps for converting 'quality improvement' into 'good quality' are as follows: the 'multiply' word is deleted, and then the 'up' is replaced by 'good', that is, the editing step is two steps, and the editing distance x is 2.
Optionally, the calculation method of the co-occurrence similarity may be, but is not limited to:
Figure BDA0002722422980000081
wherein, convusimRepresenting co-occurrence similarity, y representing the number of co-occurrence characters, len (str1) representing the phrase length of the first current evaluation phrase, and len (str2) representing the phrase length of the second current evaluation phrase.
It should be noted that: the phrase length is the number of characters included in the phrase. The number of co-occurring characters is the number of characters appearing in both the first current evaluation phrase and the second current evaluation phrase. Still taking "quality is superior" and "quality is good" as an example, if the co-occurring character is "quality", the number of co-occurring characters is 2, that is, y is 2.
Optionally, the calculation method of the vector similarity may be, but is not limited to:
Figure BDA0002722422980000082
wherein, w2vsimRepresents the similarity of the vector and represents the similarity of the vector,
Figure BDA0002722422980000083
a term vector representing the first current term group,
Figure BDA0002722422980000084
and the evaluation word vector represents the second current evaluation phrase.
Optionally, in this embodiment of the present application, the word vector model may be, but is not limited to, a 200-dimensional word vector.
Optionally, according to the weighted summation result of the editing similarity, the co-occurrence similarity and the vector similarity, the calculation method for determining the target similarity may be, but is not limited to:
totalsim=α×w2vsim+β×concursim+β×editsim (4)
wherein, totalsimRepresenting the target similarity, α and β are both set scaling factors, and α + β + β is 1.
And determining the proportional weights of the editing similarity, the co-occurrence similarity and the vector similarity through the numerical setting of alpha and beta, so as to clarify the proportional weights of the editing similarity and the co-occurrence similarity, which are similar to the word meaning, namely the vector similarity in determining and evaluating the similarity of the two phrases.
Preferably, α is set to 0.64 and β is set to 0.16.
In the embodiment of the application, the target similarity is obtained by calculating the editing similarity, the co-occurrence similarity and the vector similarity and determining the proportional weight of the editing similarity, namely the similarity of the phrases in the structure is calculated, and the similarity of the phrases in the semantics is also considered, so that the similarity calculation mode of the first current evaluation phrase and the second current evaluation phrase is more reasonable. Therefore, the determination of the evaluation phrase set is more reasonable, the tag generation is closer to the evaluation text, and the classification rationalization of the evaluation text is realized.
As an optional implementation manner, the updating the evaluation word group set according to the comparison result between the target similarity and the first threshold to obtain an updated evaluation word group set, and determining a next group of the first current evaluation word group and the second current evaluation word group from the updated evaluation word group set includes:
under the condition that the target similarity is greater than or equal to a first threshold value, combining a first current evaluation phrase and a second current evaluation phrase to serve as a new evaluation phrase subset so as to update the evaluation phrase set to obtain an updated evaluation phrase set; determining a next group of first current evaluation phrase and second current evaluation phrase from the updated evaluation phrase set;
and under the condition that the target similarity is smaller than a first threshold value, keeping the first current evaluation phrase and the second current evaluation phrase, acquiring a next evaluation phrase from the evaluation phrase set as a new second current evaluation phrase, and continuously acquiring the target similarity between the first current evaluation phrase and the new second current evaluation phrase.
The first threshold is a processing threshold for the target similarity, which is set in advance.
Alternatively, the evaluation phrase subset may be a phrase subset formed by merging when the target similarity between two evaluation phrases in the evaluation phrase set is greater than or equal to the first threshold.
As shown in fig. 3, after the target similarity between the first current evaluation phrase and the second current evaluation phrase is obtained, the target similarity is compared with a first threshold, that is, step S302 is executed to determine whether the target similarity is smaller than the first threshold. And executing step S304 when the target similarity is smaller than the first threshold, determining a next second current evaluation phrase from the evaluation phrase set, and continuously obtaining the target similarity between the first current evaluation phrase and the new second current evaluation phrase.
Optionally, in a case that the similarity between the first current evaluation phrase and the second current evaluation phrase is small, the first current evaluation phrase and the second current evaluation phrase are respectively updated to be the object evaluation phrase, that is, the updated evaluation phrase set exists: the evaluation method comprises the following steps of forming a first object evaluation phrase by a first current evaluation phrase, forming a second object evaluation phrase by a second current evaluation phrase, and removing the first current evaluation phrase and the second current evaluation phrase from an original evaluation phrase set to obtain a residual evaluation phrase set. The evaluation phrase continues as the first current evaluation phrase from the first object. And determining one evaluation phrase in the rest evaluation phrase sets as a new second current evaluation phrase, and continuing to perform target similarity calculation with the first current evaluation phrase. By calculating and comparing the target similarity one by one, the target similarity is used as a forming standard of the evaluation phrase subset, and the evaluation phrases form an independent object evaluation phrase under the condition that the target similarity is smaller than a first threshold, so that the condition that the classification of the evaluation phrases is not accurate enough due to the limitation of the number of the evaluation phrase subsets is avoided, and the classification accuracy of the evaluation text is ensured.
And if the target similarity is greater than or equal to the first threshold, executing step S306, where the first current evaluation phrase and the second current evaluation phrase form an evaluation phrase subset, and determining a next group of the first current evaluation phrase and the second current evaluation phrase from the updated evaluation phrase set.
Optionally, in a case that the similarity between the first current evaluation phrase and the second current evaluation phrase is greater than or equal to the first threshold, it is determined that the similarity between the first current evaluation phrase and the second current evaluation phrase is sufficient, so that the first current evaluation phrase and the second current evaluation phrase are merged to form an evaluation phrase subset, for example, the first evaluation phrase subset. For representing the difference, the updated first current evaluation phrase is represented by the first evaluation phrase, and the second current evaluation phrase is represented by the second evaluation phrase. At this time, the updated evaluation phrase set includes: and the first evaluation phrase subset and the original evaluation phrase set remove the first current evaluation phrase and the second current evaluation phrase to obtain a residual evaluation phrase set. The first evaluation phrase subset comprises a first evaluation phrase and a second evaluation phrase.
Optionally, two cases of determining a next group of the first current evaluation phrase and the second current evaluation phrase from the updated evaluation phrase set are provided. In the first case, two evaluation phrases are selected from the remaining evaluation phrase set as the first current evaluation phrase and the second current evaluation phrase, respectively. And continuously judging whether each phrase in the residual evaluation phrase set has an evaluation phrase of which the target similarity exceeds a first threshold value and which can generate a new evaluation phrase subset in the residual evaluation phrase set. In the second case, the target evaluation phrase of the first evaluation phrase subset is used as the first current evaluation phrase, and one evaluation phrase is selected from the remaining evaluation phrase set as the second current evaluation phrase. And sequentially judging whether the rest evaluation phrase sets have similar evaluation phrases with the first evaluation phrase subset.
Alternatively, the target evaluation phrase of the evaluation phrase subset may be, but is not limited to, a word with the highest word frequency in the evaluation phrases included in the evaluation phrase subset as the target evaluation phrase.
It should be noted that: and the target similarity calculated between every two evaluation phrases contained in the evaluation phrase subsets is greater than or equal to a first threshold value.
In the embodiment of the application, the target similarity is calculated one by one among the evaluation phrases, the target similarity is larger than or equal to a first threshold value and is used as a standard for combining the similarity among the evaluation phrases into a new evaluation word group set, the evaluation tags capable of representing the evaluation word group set are correspondingly generated after the target similarity reaches the standard, and therefore the evaluation tags can represent the evaluation word group set, and the corresponding evaluation texts are reasonably classified.
As an optional implementation manner, in the case that the updated evaluation phrase set reaches the tag generation condition, extracting the evaluation tag for presentation on the network platform from the updated evaluation phrase set includes:
determining an evaluation object from the updated evaluation phrase set, wherein the evaluation object comprises a target evaluation phrase in the evaluation phrase subset and an object evaluation phrase which is not combined in the evaluation phrase set, and the target evaluation phrase is the evaluation phrase with the largest word frequency in the evaluation phrase subset;
under the condition that the target similarity among all the evaluation objects in the updated evaluation phrase set is smaller than a first threshold value, determining that a label generation condition is reached;
the evaluation target is set as an evaluation label.
Alternatively, as shown in fig. 4, when determining whether the updated evaluation phrase set meets the label generation condition, step S402 is executed first to obtain the evaluation object included in the evaluation phrase set. And determining a target evaluation phrase of each evaluation phrase subset, wherein the target evaluation phrases in the evaluation phrase set exist independently and are also used as evaluation objects. In the case where the evaluation objects are determined, step S404 is executed to determine whether or not the target similarities between the evaluation objects are each smaller than the first threshold. In the case where it is determined that the target similarity between the evaluation objects is smaller than the first threshold, step S406 is performed to determine the evaluation object as an evaluation label. And executing step S408 when the target similarity between the evaluation objects is judged to be greater than or equal to the first threshold, combining the evaluation phrase subsets and/or the object evaluation phrases corresponding to the evaluation objects to form a new evaluation phrase subset, and updating the evaluation phrase set. And after the evaluation phrase set is updated, continuing to execute the step S402 until the target similarity between the evaluation objects is smaller than a first threshold value, and determining a final evaluation label.
In the embodiment of the application, whether the current evaluation word group set meets the standard of label generation is determined by judging whether the target similarity between the evaluation objects in the updated evaluation word group set is smaller than a first threshold. The similarity of the evaluation phrases in each evaluation phrase subset is guaranteed to be sufficient, the similarity between each evaluation phrase subset and the object evaluation phrases is verified to be small, the grouping rationality of the evaluation phrases in the evaluation phrase subset is guaranteed, and therefore the rationality of the generation of the evaluation labels is guaranteed.
As an optional implementation manner, performing word segmentation processing on each evaluation text in the evaluation text set respectively to obtain an evaluation word set includes:
cleaning the evaluation texts in the evaluation text set to obtain the evaluation texts with unified target format;
performing word segmentation processing on the evaluation text in the target format to obtain an evaluation phrase set
Alternatively, the cleaning process for the evaluation document may include, but is not limited to: simplified and unsimplified conversion, full half-corner conversion, removal of special characters, unified number format, unified letter format and unified expression.
Alternatively, the target format may include, but is not limited to: simple body and no special symbol.
In the embodiment of the application, the text in the target format is obtained by cleaning the evaluation text, so that word segmentation processing is conveniently performed on the text, and inaccurate word segmentation caused by format disorder is avoided.
As an optional implementation manner, after acquiring a set of evaluation texts issued by a plurality of user accounts in a network platform, the method further includes:
and performing global heuristic calculation on the evaluation text set to obtain a first threshold value.
Alternatively, the first threshold may be in the range of [0.6, 0.94 ]. Preferably, the value of the first threshold may be 0.76.
And determining the value range of the first threshold value according to the global heuristic calculation. It should be noted that, the larger the value of the first threshold is, the stricter the requirement on the similarity is, and the evaluation phrases that should belong to the same evaluation phrase set are divided into different evaluation phrase sets, so that the upper limit of the value range of the first threshold is 0.94. The smaller the value of the first threshold is, the smaller the requirement on the similarity is, and the evaluation phrases which should belong to different evaluation phrase sets are divided into the same evaluation phrase set, so that the lower limit of the value range of the first threshold is 0.6.
Through global heuristic calculation, when the value of the first threshold is selected to be 0.76, the division of the evaluation word group set is relatively more reasonable.
In the embodiment of the application, the value range of the first threshold is determined through global heuristic calculation, so that the judgment of the target similarity is kept in a reasonable range, the generation of the evaluation word group set is more reasonable, the label generation of the evaluation word group set can represent the evaluation word group set, and the reasonable classification of the evaluation text is realized.
Optionally, further description of the embodiments of the present application is made as shown in fig. 5. Setting that ten evaluation phrases are obtained after processing the evaluation text at present to obtain an evaluation phrase set 502, as shown in fig. 5(1), the evaluation phrase set 502 includes a phrase a, a phrase B, a phrase C, a phrase D, a phrase E, a phrase F, a phrase G, a phrase H, a phrase I, and a phrase J. Taking the phrase A as a first current evaluation phrase and the phrase B as a second current evaluation phrase, calculating the target similarity of the phrase A and the phrase B, and if the target similarity of the phrase A and the phrase B is larger than a first threshold, merging the phrase A and the phrase B into a first subset 504, and updating the evaluation phrase set 502. The updated evaluation word group set 502 includes the first subset 504 and phrases C, D, E, F, G, H, I, and J. It is determined that the current target evaluation phrase of the first subset 504 is phrase B. And sequentially carrying out target similarity calculation and comparison on the remaining phrases and a target evaluation phrase, namely the phrase B of the first subset 504, and setting the target similarity of the phrases C, D and B of the phrases to be smaller than a first threshold value. And if the target similarity between the phrase E and the phrase B is greater than the first threshold, merging the phrase E into the first subset 504, and re-determining the target evaluation phrase of the first subset 504, wherein the phrase E with the highest word frequency in the phrase A, B, E is the phrase E, so that the target evaluation phrase of the first subset 504 is updated to the phrase E. And sequentially determining the phrase F, the phrase G, the phrase H, the phrase I and the phrase J as a second current evaluation phrase, and respectively carrying out target similarity calculation with the phrase E, wherein the result target similarity is smaller than a first threshold value. At this time, the updated evaluation word group set 502 includes the first subset 504, and the word group C, the word group D, the word group F, the word group G, the word group H, the word group I, and the word group J, as shown in fig. 5 (2).
And taking the phrase C as a first current evaluation phrase and taking the phrase D as a second current evaluation phrase to continue to calculate the target similarity, and comparing the target similarity with the first threshold. After repeating the foregoing steps for a plurality of times, the evaluation phrase set is updated as shown in fig. 5(3), and the evaluation word set 502 includes a first subset 504, a second subset 506, a third subset 508, and a phrase H. The first subset 504 includes a phrase a, a phrase B, and a phrase E, the second subset 506 includes a phrase C, a phrase F, a phrase G, and a phrase J, and the third subset 508 includes a phrase D and a phrase I. It is determined that the target-evaluating phrase of the first subset 504 is phrase E, the target-evaluating phrase of the second subset 506 is phrase G, and the target-evaluating phrase of the third subset 508 is phrase D. And checking whether the target similarity between the phrase E, the phrase G, the phrase D and the phrase H is smaller than a first threshold value. Therefore, if the target similarity is smaller than the first threshold, the phrase E, the phrase G, the phrase D, and the phrase H are displayed in the corresponding area as the evaluation label, thereby completing the generation of the evaluation label according to the evaluation file.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiment of the present invention, there is also provided an evaluation tag generation apparatus for implementing the above evaluation tag generation method. As shown in fig. 6, the apparatus includes:
an obtaining module 602, configured to obtain an evaluation text set issued by multiple user accounts in a network platform;
the processing module 604 is configured to perform word segmentation processing on each evaluation text in the evaluation text set to obtain an evaluation word group set;
a conversion module 606, configured to convert each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrase;
a control module 608 for repeatedly performing the following steps until the label generating condition is reached:
the determining unit is used for determining two evaluation phrases from the evaluation phrase set to serve as a first current evaluation phrase and a second current evaluation phrase;
the calculation unit is used for acquiring target similarity between a first current evaluation phrase and a second current evaluation phrase, wherein the target similarity is determined according to a first similarity used for indicating the similarity between the evaluation phrases and a second similarity used for indicating the similarity between evaluation word vectors;
the updating unit is used for updating the evaluation phrase set according to the comparison result of the target similarity and the first threshold value to obtain an updated evaluation phrase set, and determining a next group of first current evaluation phrases and second current evaluation phrases from the updated evaluation phrase set;
a generating module 610, configured to extract, from the updated evaluation phrase set, an evaluation label for displaying on the network platform when the updated evaluation phrase set reaches the label generating condition.
Optionally, the calculating unit includes:
a first calculating subunit, configured to obtain a first similarity and a second similarity, where the first similarity includes: an editing similarity for indicating an editing distance between the first current evaluation phrase and the second current evaluation phrase, and a co-occurrence similarity for indicating the number of characters co-occurring in the first current evaluation phrase and the second current evaluation phrase, the second similarity including: the vector similarity is used for indicating the cosine distance between a first current evaluation word vector corresponding to the first current evaluation phrase and a second current evaluation word vector corresponding to the second current evaluation phrase;
the second calculation subunit is used for determining the target similarity according to the weighted summation result of the editing similarity, the co-occurrence similarity and the vector similarity;
optionally, the update unit includes:
the first updating subunit is used for merging the first current evaluation phrase and the second current evaluation phrase to serve as an evaluation phrase subset under the condition that the target similarity is greater than or equal to a first threshold value, so as to update the evaluation word group set and obtain an updated evaluation word group set; determining a next group of first current evaluation phrase and second current evaluation phrase from the updated evaluation phrase set;
the second updating subunit is used for keeping the first current evaluation phrase and the second current evaluation phrase under the condition that the target similarity is smaller than the first threshold, acquiring a next evaluation phrase from the evaluation phrase set as a new second current evaluation phrase, and continuously acquiring the target similarity between the first current evaluation phrase and the new second current evaluation phrase;
optionally, the generating module includes:
the first generating unit is used for determining an evaluation object from the updated evaluation phrase set, wherein the evaluation object comprises a target evaluation phrase in the evaluation phrase subset and an object evaluation phrase which is not combined in the evaluation phrase subset, and the target evaluation phrase is an evaluation phrase with the largest word frequency in the evaluation phrase subset;
the first comparison unit is used for determining that the label generation condition is reached under the condition that the target similarity among all the evaluation objects in the updated evaluation phrase set is smaller than a first threshold;
and the second generation unit is used for taking the evaluation object as an evaluation label.
In the embodiment of the application, evaluation phrases are obtained according to the obtained evaluation texts, and the target similarity between the evaluation phrases is compared with a preset threshold until a latest evaluation word group set is obtained, so that evaluation labels corresponding to the evaluation word group set are generated. Through the control of the label generation conditions of the evaluation word group set, the purpose of generating corresponding evaluation labels according to the evaluation texts is achieved, so that the technical effect that the labels correspond to the evaluation texts is achieved, and the technical problem that the evaluation attribution is disordered due to inaccurate evaluation labels is solved.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the evaluation label generation method, where the electronic device may be a terminal device or a server shown in fig. 1. The present embodiment is described by taking an example in which the electronic device is a terminal device. As shown in fig. 7, the electronic device comprises a memory 702 and a processor 704, the memory 702 having stored therein a computer program, the processor 704 being arranged to perform the steps of any of the above-described method embodiments by means of the computer program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring an evaluation text set issued by a plurality of user accounts in a network platform;
s2, performing word segmentation processing on each evaluation text in the evaluation text set respectively to obtain an evaluation word group set;
s3, converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases;
s4, repeatedly executing the following steps until the label generation condition is reached:
s4-1, determining two evaluation phrases from the evaluation phrase set as a first current evaluation phrase and a second current evaluation phrase;
s4-2, acquiring target similarity between the first current evaluation phrase and the second current evaluation phrase, wherein the target similarity is determined according to the first similarity used for indicating the similarity between the evaluation phrases and the second similarity used for indicating the similarity between the evaluation word vectors;
s4-3, updating the evaluation word group set according to the comparison result of the target similarity and the first threshold value to obtain an updated evaluation word group set, and determining a next group of first current evaluation word group and second current evaluation word group from the updated evaluation word group set;
and S5, extracting the evaluation label for displaying on the network platform from the updated evaluation phrase set under the condition that the updated evaluation phrase set reaches the label generating condition.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 does not limit the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.
The memory 702 may be used to store software programs and modules, such as program instructions/modules corresponding to the evaluation tag generation method and apparatus in the embodiments of the present invention, and the processor 704 executes various functional applications and data processing by running the software programs and modules stored in the memory 702, so as to implement the above-described evaluation tag generation method. The memory 702 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 702 can further include memory located remotely from the processor 704, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 702 may be, but not limited to, specifically configured to store information such as sample characteristics of the item and the target virtual resource account number. As an example, as shown in fig. 8, the memory 802 may include, but is not limited to, an obtaining module 602, a processing module 604, a converting module 606, a controlling module 608, and a generating module 610 of the evaluation tag generating apparatus. In addition, the evaluation tag generation apparatus may further include, but is not limited to, other module units in the evaluation tag generation apparatus, which is not described in detail in this example.
Optionally, the transmitting device 806 is configured to receive or transmit data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 806 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 806 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In addition, the electronic device further includes: a display 808, configured to display information of the evaluation text to be processed; and a connection bus 810 for connecting the respective module parts in the above-described electronic apparatus.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.
According to yet another aspect of the application, a computer program product or computer program is provided, comprising computer instructions stored in a computer readable storage medium. A processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the evaluation tag generation method provided in the various alternative implementations of the evaluation tag generation aspect described above, wherein the computer program is arranged to perform the steps in any of the method embodiments described above when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring an evaluation text set issued by a plurality of user accounts in a network platform;
s2, performing word segmentation processing on each evaluation text in the evaluation text set respectively to obtain an evaluation word group set;
s3, converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases;
s4, repeatedly executing the following steps until the label generation condition is reached:
s4-1, determining two evaluation phrases from the evaluation phrase set as a first current evaluation phrase and a second current evaluation phrase;
s4-2, acquiring target similarity between the first current evaluation phrase and the second current evaluation phrase, wherein the target similarity is determined according to the first similarity used for indicating the similarity between the evaluation phrases and the second similarity used for indicating the similarity between the evaluation word vectors;
s4-3, updating the evaluation word group set according to the comparison result of the target similarity and the first threshold value to obtain an updated evaluation word group set, and determining a next group of first current evaluation word group and second current evaluation word group from the updated evaluation word group set;
and S5, extracting the evaluation label for displaying on the network platform from the updated evaluation phrase set under the condition that the updated evaluation phrase set reaches the label generating condition.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. An evaluation label generation method, comprising:
acquiring an evaluation text set issued by a plurality of user accounts in a network platform;
performing word segmentation processing on each evaluation text in the evaluation text set to obtain an evaluation word group set;
converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases;
repeatedly executing the following steps until the label generation condition is reached:
determining two evaluation phrases from the evaluation phrase set as a first current evaluation phrase and a second current evaluation phrase;
acquiring target similarity between the first current evaluation phrase and the second current evaluation phrase, wherein the target similarity is determined according to a first similarity used for indicating similarity between the evaluation phrases and a second similarity used for indicating similarity between evaluation word vectors;
updating the evaluation word group set according to the comparison result of the target similarity and a first threshold value to obtain an updated evaluation word group set, and determining a next group of the first current evaluation word group and the second current evaluation word group from the updated evaluation word group set;
and under the condition that the updated evaluation phrase set reaches the label generation condition, extracting the evaluation label for displaying on the network platform from the updated evaluation phrase set.
2. The method according to claim 1, wherein the updating the evaluation phrase set according to the comparison result between the target similarity and the first threshold to obtain the updated evaluation phrase set, and determining a next group of the first current evaluation phrase and the second current evaluation phrase from the updated evaluation phrase set comprises:
under the condition that the target similarity is greater than or equal to the first threshold, combining the first current evaluation phrase and the second current evaluation phrase to serve as an evaluation phrase subset so as to update the evaluation phrase set to obtain the updated evaluation phrase set; determining a next group of the first current evaluation phrase and the second current evaluation phrase from the updated evaluation phrase set;
and under the condition that the target similarity is smaller than the first threshold, retaining the first current evaluation phrase and the second current evaluation phrase, acquiring a next evaluation phrase from the evaluation phrase set as a new second current evaluation phrase, and continuously acquiring the target similarity between the first current evaluation phrase and the new second current evaluation phrase.
3. The method of claim 1, wherein the obtaining the target similarity between the first current-evaluation phrase and the second current-evaluation phrase comprises:
obtaining the first similarity and the second similarity, wherein the first similarity comprises: an editing similarity for indicating an editing distance between the first current evaluation phrase and the second current evaluation phrase, and a co-occurrence similarity for indicating a number of characters that co-occur in the first current evaluation phrase and the second current evaluation phrase, the second similarity including: the vector similarity is used for indicating the cosine distance between a first current evaluation word vector corresponding to the first current evaluation phrase and a second current evaluation word vector corresponding to the second current evaluation phrase;
and determining the target similarity according to the weighted summation result of the editing similarity, the co-occurrence similarity and the vector similarity.
4. The method according to claim 1, wherein, in a case that the updated evaluation phrase set reaches the tag generation condition, extracting the evaluation tags for presentation on the network platform from the updated evaluation phrase set comprises:
determining an evaluation object from the updated evaluation phrase set, wherein the evaluation object comprises a target evaluation phrase in the evaluation phrase subset and an object evaluation phrase which is not combined in the evaluation phrase subset, and the target evaluation phrase is the evaluation phrase with the largest word frequency in the evaluation phrase subset;
determining that the label generation condition is reached under the condition that the target similarity between the evaluation objects in the updated evaluation phrase set is smaller than the first threshold;
and taking the evaluation object as the evaluation label.
5. The method according to claim 1, wherein the performing word segmentation processing on each evaluation text in the evaluation text set to obtain an evaluation word set comprises:
cleaning the evaluation texts in the evaluation text set to obtain the evaluation texts with unified target format;
and performing word segmentation processing on the evaluation text in the target format to obtain the evaluation word group set.
6. The method according to any one of claims 1 to 5, wherein after the obtaining of the evaluation text set published by the plurality of user accounts in the network platform, the method further comprises:
and performing global heuristic calculation on the evaluation text set to obtain the first threshold value.
7. An evaluation label generation apparatus, comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring an evaluation text set issued by a plurality of user accounts in a network platform;
the processing module is used for performing word segmentation processing on each evaluation text in the evaluation text set to obtain an evaluation word group set;
the conversion module is used for converting each evaluation phrase in the evaluation phrase set to obtain an evaluation word vector matched with the evaluation phrases;
the control module is used for repeatedly executing the following steps until the label generating condition is reached:
a determining unit, configured to determine two evaluation phrases from the evaluation phrase set, where the two evaluation phrases are used as a first current evaluation phrase and a second current evaluation phrase;
a calculating unit, configured to obtain a target similarity between the first current evaluation phrase and the second current evaluation phrase, where the target similarity is determined according to a first similarity used for indicating a similarity between evaluation phrases and a second similarity used for indicating a similarity between evaluation word vectors;
the updating unit is used for updating the evaluation phrase set according to the comparison result of the target similarity and the first threshold value to obtain the updated evaluation phrase set, and determining a next group of the first current evaluation phrase and the second current evaluation phrase from the updated evaluation phrase set;
a generating module, configured to extract the evaluation label for displaying on the network platform from the updated evaluation phrase set when the updated evaluation phrase set reaches the label generating condition.
8. The apparatus of claim 7, wherein:
the calculation unit includes: a first calculating subunit, configured to obtain the first similarity and the second similarity, where the first similarity includes: an editing similarity for indicating an editing distance between the first current evaluation phrase and the second current evaluation phrase, and a co-occurrence similarity for indicating a number of characters that co-occur in the first current evaluation phrase and the second current evaluation phrase, the second similarity including: the vector similarity is used for indicating the cosine distance between a first current evaluation word vector corresponding to the first current evaluation phrase and a second current evaluation word vector corresponding to the second current evaluation phrase; the second calculation subunit is configured to determine the target similarity according to the weighted summation result of the editing similarity, the co-occurrence similarity and the vector similarity;
the update unit includes: a first updating subunit, configured to, when the target similarity is greater than or equal to the first threshold, merge the first current evaluation phrase and the second current evaluation phrase to serve as an evaluation phrase subset, so as to update the evaluation word group set, and obtain the updated evaluation word group set; determining a next group of the first current evaluation phrase and the second current evaluation phrase from the updated evaluation phrase set; a second updating subunit, configured to, when the target similarity is smaller than the first threshold, reserve the first current evaluation phrase and the second current evaluation phrase, obtain a next evaluation phrase from the evaluation phrase set as a new second current evaluation phrase, and continue to obtain the target similarity between the first current evaluation phrase and the new second current evaluation phrase;
the generation module comprises: a first generating unit, configured to determine an evaluation object from the updated evaluation phrase set, where the evaluation object includes a target evaluation phrase in the evaluation phrase subset and an object evaluation phrase that is not merged in the evaluation phrase set, and the target evaluation phrase is an evaluation phrase with a largest word frequency in the evaluation phrase subset; a first comparing unit, configured to determine that the label generation condition is reached under a condition that the target similarity between the evaluation objects in the updated evaluation phrase set is smaller than the first threshold; a second generation unit configured to use the evaluation object as the evaluation label.
9. A computer-readable storage medium, comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 6.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.
CN202011091969.6A 2020-10-13 2020-10-13 Evaluation label generation method and device, storage medium and electronic equipment Withdrawn CN112184323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011091969.6A CN112184323A (en) 2020-10-13 2020-10-13 Evaluation label generation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011091969.6A CN112184323A (en) 2020-10-13 2020-10-13 Evaluation label generation method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112184323A true CN112184323A (en) 2021-01-05

Family

ID=73949628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011091969.6A Withdrawn CN112184323A (en) 2020-10-13 2020-10-13 Evaluation label generation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112184323A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN105825396A (en) * 2016-03-11 2016-08-03 合网络技术(北京)有限公司 Co-occurrence-based advertisement label clustering method and system
CN105975453A (en) * 2015-12-01 2016-09-28 乐视网信息技术(北京)股份有限公司 Method and device for comment label extraction
CN106777285A (en) * 2016-12-29 2017-05-31 中国移动通信集团江苏有限公司 The method and apparatus of label clustering
CN108470065A (en) * 2018-03-22 2018-08-31 北京航空航天大学 A kind of determination method and device of exception comment text
CN109461037A (en) * 2018-12-17 2019-03-12 北京百度网讯科技有限公司 Comment on viewpoint clustering method, device and terminal
CN109871447A (en) * 2019-03-05 2019-06-11 南京甄视智能科技有限公司 Clustering method, computer program product and the server system of Chinese comment unsupervised learning
CN110688455A (en) * 2019-09-09 2020-01-14 深圳壹账通智能科技有限公司 Method, medium and computer equipment for filtering invalid comments based on artificial intelligence
CN110750646A (en) * 2019-10-16 2020-02-04 乐山师范学院 Attribute description extracting method for hotel comment text
CN111259131A (en) * 2020-01-09 2020-06-09 杭州网易再顾科技有限公司 Information processing method, medium, device and computing equipment
CN111339247A (en) * 2020-02-11 2020-06-26 安徽理工大学 Microblog subtopic user comment emotional tendency analysis method
CN111414479A (en) * 2020-03-16 2020-07-14 北京智齿博创科技有限公司 Label extraction method based on short text clustering technology

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN105975453A (en) * 2015-12-01 2016-09-28 乐视网信息技术(北京)股份有限公司 Method and device for comment label extraction
CN105825396A (en) * 2016-03-11 2016-08-03 合网络技术(北京)有限公司 Co-occurrence-based advertisement label clustering method and system
CN106777285A (en) * 2016-12-29 2017-05-31 中国移动通信集团江苏有限公司 The method and apparatus of label clustering
CN108470065A (en) * 2018-03-22 2018-08-31 北京航空航天大学 A kind of determination method and device of exception comment text
CN109461037A (en) * 2018-12-17 2019-03-12 北京百度网讯科技有限公司 Comment on viewpoint clustering method, device and terminal
CN109871447A (en) * 2019-03-05 2019-06-11 南京甄视智能科技有限公司 Clustering method, computer program product and the server system of Chinese comment unsupervised learning
CN110688455A (en) * 2019-09-09 2020-01-14 深圳壹账通智能科技有限公司 Method, medium and computer equipment for filtering invalid comments based on artificial intelligence
CN110750646A (en) * 2019-10-16 2020-02-04 乐山师范学院 Attribute description extracting method for hotel comment text
CN111259131A (en) * 2020-01-09 2020-06-09 杭州网易再顾科技有限公司 Information processing method, medium, device and computing equipment
CN111339247A (en) * 2020-02-11 2020-06-26 安徽理工大学 Microblog subtopic user comment emotional tendency analysis method
CN111414479A (en) * 2020-03-16 2020-07-14 北京智齿博创科技有限公司 Label extraction method based on short text clustering technology

Similar Documents

Publication Publication Date Title
CN108334533B (en) Keyword extraction method and device, storage medium and electronic device
CN110380954B (en) Data sharing method and device, storage medium and electronic device
CN109033282B (en) Webpage text extraction method and device based on extraction template
US10311120B2 (en) Method and apparatus for identifying webpage type
CN108305180B (en) Friend recommendation method and device
CN110020312B (en) Method and device for extracting webpage text
US11880401B2 (en) Template generation using directed acyclic word graphs
CN110765973B (en) Account type identification method and device
CN109819002B (en) Data pushing method and device, storage medium and electronic device
CN110399564B (en) Account classification method and device, storage medium and electronic device
CN110750707A (en) Keyword recommendation method and device and electronic equipment
CN113806486B (en) Method and device for calculating long text similarity, storage medium and electronic device
CN114780709A (en) Text matching method and device and electronic equipment
CN108470289B (en) Virtual article issuing method and equipment based on E-commerce shopping platform
CN107153697A (en) Product search method and device in a kind of commodity transaction website
CN111882224A (en) Method and device for classifying consumption scenes
CN112184323A (en) Evaluation label generation method and device, storage medium and electronic equipment
CN113792232B (en) Page feature calculation method, page feature calculation device, electronic equipment, page feature calculation medium and page feature calculation program product
CN109683727A (en) A kind of data processing method and device
CN112036988B (en) Label generation method and device, storage medium and electronic equipment
CN111831885B (en) Internet information retrieval system and method
CN113325959A (en) Input corpus recommendation method and device
CN113111259A (en) Subscription number content pushing method, device, equipment and storage medium
CN109947947B (en) Text classification method and device and computer readable storage medium
CN113656466A (en) Policy data query method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210105