CN115168677A

CN115168677A - Comment classification method, device, equipment and storage medium

Info

Publication number: CN115168677A
Application number: CN202210646589.7A
Authority: CN
Inventors: 甘心; 肖冠正
Original assignee: iMusic Culture and Technology Co Ltd
Current assignee: iMusic Culture and Technology Co Ltd
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-10-11
Anticipated expiration: 2042-06-09
Also published as: CN115168677B

Abstract

The invention discloses a comment classification method, a comment classification device and a storage medium, wherein comment data are obtained, the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses, a first real probability is obtained according to the user comments, the user attributes and a natural language processing model, the comment number which is the same as the user IP addresses in a preset time range is determined according to the comment time, a second real probability is obtained through calculation according to the comment number and a function model, a comment classification result is determined according to the first real probability and the second real probability, and the comment classification result is obtained based on the comment time, the user IP addresses and the function model to assist the natural language model, so that the comment classification accuracy is improved; the comment classification method and the comment classification device can automatically generate comment classification results without manual intervention, improve the accuracy and efficiency, and can be widely applied to the technical field of natural language processing.

Description

Comment classification method, device, equipment and storage medium

Technical Field

The invention relates to the field of natural language processing, in particular to a comment classification method, device, equipment and storage medium.

Background

At present, with the development of internet technology, the number of users of various platforms gradually increases, and users can watch or listen to corresponding contents such as videos, music, video polyphonic ringtone and the like in the platforms and can issue own comments in comment areas to express own feelings. In fact, false comments such as machine generation may exist in the comments of the user, so that the truth of the comments is affected, and therefore true and false comments need to be distinguished, the current true and false comments are usually manually detected, the number of video polyphonic ringtone comments is large, if manual detection is adopted, a large amount of manpower and time need to be uninterruptedly input, the cost is high, the real-time performance is poor, the efficiency is low, and the accuracy is low.

Disclosure of Invention

In view of the above, in order to solve at least one of the above technical problems, an object of the present invention is to provide a comment classification method, apparatus, device and storage medium, which improve accuracy and efficiency.

The embodiment of the invention adopts the technical scheme that:

a method of review classification, comprising:

obtaining comment data; the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses;

obtaining a first real probability according to the user comment, the user attribute and a natural language processing model;

determining the number of comments, which is the same as the user IP address, in a preset time range according to the comment time, and calculating to obtain a second true probability according to the number of comments and a function model;

and determining a comment classification result according to the first real probability and the second real probability.

Further, the obtaining a first true probability according to the user comment, the user attribute, and a natural language processing model includes:

carrying out first coding processing on the user attribute to obtain a first matrix;

carrying out second coding processing on the user comment to obtain a second matrix;

and splicing the first matrix and the second matrix, and converting a splicing result through a full connection layer and a sigmoid function to obtain a first true probability.

Further, the performing a first encoding process on the user attribute to obtain a first matrix includes:

encoding the user attribute into a first vector through a word vector model;

encoding the first vector as a context dependent vector by a GRU encoder;

and splicing the context correlation vectors to obtain a first matrix.

Further, the performing a second encoding process on the user comment to obtain a second matrix includes:

encoding the user comment into a second vector through a word vector model;

constructing a Query vector according to the second vector and the first weight, constructing a Key vector according to the second vector and the second weight, and constructing a Value vector according to the second vector and the third weight;

uniformly processing the Query vector, the Key vector and the Value vector to preset lengths;

and calculating a matrix expression with self attention according to the Query vector with a preset length, the Key vector with a preset length and the Value vector with a preset length to obtain a second matrix.

Further, the calculating to obtain a second true probability according to the number of the comments and the function model includes:

calculating the product of the number of comments and a slope parameter;

calculating a difference between an intercept parameter and the product;

and obtaining a second true probability according to the sigmoid function and the difference value.

Further, the determining a comment classification result according to the first real probability and the second real probability includes:

performing weighted summation according to the first true probability, the first probability weight, the second true probability and the second probability weight;

and when the weighted sum result is larger than a real threshold value, obtaining a comment classification result representing the real comment, otherwise obtaining a comment classification result representing the false comment.

Further, the method further comprises:

when the comment classification result represents a real comment, publishing the user comment;

alternatively, the first and second electrodes may be,

and when the comment classification result represents a false comment, deleting the user comment.

An embodiment of the present invention further provides a comment classification device, including:

the acquisition module is used for acquiring comment data; the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses;

the first processing module is used for obtaining a first real probability according to the user comment, the user attribute and a natural language processing model;

the second processing module is used for determining the number of comments in a preset time range, which is the same as the user IP address, according to the comment time, and calculating to obtain a second true probability according to the number of comments and a function model;

and the classification module is used for determining a comment classification result according to the first real probability and the second real probability.

An embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the method.

Embodiments of the present invention also provide a computer-readable storage medium, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the storage medium, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the method.

The beneficial effects of the invention are: obtaining comment data, wherein the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses, obtaining a first real probability according to the user comments, the user attributes and a natural language processing model, determining the number of comments in a preset time range, which is the same as the user IP addresses, according to the comment time, calculating to obtain a second real probability according to the comment number and a function model, determining a comment classification result according to the first real probability and the second real probability, and obtaining a comment classification result based on the comment time, the user IP addresses and the function model to assist the natural language model, so that the comment classification accuracy is improved; and the comment classification result is automatically generated without manual intervention, so that the accuracy and the efficiency are improved.

Drawings

FIG. 1 is a flow chart illustrating steps of a comment classification method according to the present invention;

fig. 2 is a flowchart of a comment classification method according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.

As shown in fig. 1, an embodiment of the present invention provides a comment classification method, including steps S100 to S400:

and S100, obtaining comment data.

In the embodiment of the invention, the comment data comprises but is not limited to user comments, comment time corresponding to the user comments, user attributes and user IP addresses. Optionally, the comment data is obtained through a data layer, and the comment data may be obtained by a system when a user comments in a webpage, APP, an applet, or the like, for example: when a user sends a comment request in a video color ring comment area of a webpage, the system acquires user comments which need to be made by the user in the comment request, wherein the comment time corresponding to the user comments is the comment time sent by the user, such as the comment request time sent by the user, the IP address of the user sending the user request and the user attributes, and the user attributes include but are not limited to information such as user names, levels, liveness and the like. In addition, when a system (such as a color ring back tone video system) receives a comment request of a user, comment data can be obtained and subjected to subsequent processing to obtain a comment classification result, the comment of the user is issued again after the comment classification result is determined to be issued and approved, and delay issue of the comment of the user is achieved.

As shown in fig. 2, S200, a first true probability is obtained according to the user comment, the user attribute, and the natural language processing model.

It should be noted that the calculation of the first true probability and the second true probability and the comment classification result may be completed in a model layer, and a natural language processing model and a function model are disposed in the model layer.

Optionally, step S200 may include steps S210-S230:

s210, carrying out first coding processing on the user attributes to obtain a first matrix.

Specifically, step S210 includes steps S2101-S2103:

s2101, the user attributes are coded into a first vector through a word vector model.

Optionally, the word vector model includes, but is not limited to, a word2vec model, and in the embodiment of the present invention, for example, gloVe in the word2vec model is taken as an example, and the user attribute is encoded as the first vector by GloVe

In particular a word vector set.

S2102, encode the first vector as a context dependent vector by the GRU encoder.

S2103, splicing the context correlation vectors to obtain a first matrix.

It should be noted that the GRU Encoder is a GRU-based self-Encoder such as a bidirectional GRU Encoder, and the first vector is transmitted through the GRU Encoder

Encoding as a context dependent vector h _t The specific process is as follows:

wherein z is _t To update the gate, a determination is made as to how much information the current state needs to retain from the historical stateAnd how much information to accept from the candidate state; sigma is a sigmoid function, and data can be converted into a numerical value in a range of 0-1 through the function; w _z And U _z To calculate the weight parameter of the updated gate, h _t-1 The hidden layer state of the previous node is the historical state; t is the sequence number of the word, r _t A reset gate for controlling whether the calculation of the candidate state depends on the historical state; w _t And U _t To calculate a weight parameter for the reset gate; w and U are weight parameters for computing candidate states,

in order to multiply the elements of the vector,

is a candidate state.

In the embodiment of the invention, the context correlation vectors are spliced to obtain a first matrix H, namely the code of the user attribute sequence.

As shown in fig. 2, in S220, a second encoding process is performed on the user comment, so as to obtain a second matrix.

Specifically, step S220 includes steps S2201-S2204:

s2201, encoding the user comment into a second vector through a word vector model.

Similarly, the word vector model includes, but is not limited to, word2vec model, in the embodiment of the present invention, gloVe is taken as an example, and the user comment is encoded as the second vector by GloVe

In particular a word vector set.

S2202, constructing a Query vector according to the second vector and the first weight, constructing a Key vector according to the second vector and the second weight, and constructing a Value vector according to the second vector and the third weight.

Note that the first weight w _q A second weight w _k A third weight w _v Can be adjusted according to actual needs. In particular, according to the second vector

And a first weight w _q The first product of (A) to obtain a Query vector, according to the second vector

And a second weight w _k The second product of (a) to obtain a Key vector, according to the second vector

And a third weight w _v The third product of (d) yields a Value vector.

S2203, uniformly processing the Query vector, the Key vector and the Value vector to preset lengths.

Alternatively, each vector is encoded into a vector with self attention by a transform-based Encoder (Encoder), so as to have a better detection effect when facing a user comment of a long sentence, and the encoding process of a specific transform-based Encoder is steps S2203 and S2204. In the embodiment of the present invention, the preset length max _ length may be adjusted according to the actual condition, and the Query vector, the Key vector, and the Value vector are respectively compared with the preset length max _ length, when the length is greater than the preset length max _ length, a part of the Query vector, the Key vector, and the Value vector is truncated to make the length of the Query vector, the Key vector, and the Value vector, and if the length is smaller than the preset length max _ length, the tail of the Query vector, the Key vector, and the Value vector are supplemented with contents (corresponding word vectors) such as meaningless symbols or words to make the length of the Query vector, the Key vector, and the Value vector, and the length of the Value vector are set to the preset length. It should be noted that, in some embodiments, the user comments may be unified into a preset length in advance, and then the second encoding process is performed, which is not limited specifically.

S2204, calculating a matrix expression with self attention according to the Query vector with the preset length, the Key vector with the preset length and the Value vector with the preset length to obtain a second matrix.

It should be noted that the second vector includes vectors corresponding to a plurality of words, so that each word has a Query vector with a corresponding preset length, a Key vector with a preset length, and a Value vector with a preset length, at this time, the Query vectors with the preset length of each word are spliced into a matrix Q, the Key vector with the preset length of each word is connected to a matrix K, and the Value vectors with the preset length of each word are sequentially spliced into a matrix V. Then, calculating a matrix expression with self attention according to the matrix Q, the matrix K and the matrix V, specifically:

where Attention () is self-Attention, attention (Q, K, V) is the second matrix, i.e., the encoding of the comment sequence, softmax () is a function, T is transpose,

as scaling factor, d _k The dimension of the Key vector.

S230, splicing (connecting) the first matrix and the second matrix, and converting a splicing result through a full connection layer and a sigmoid function to obtain a first true probability p ₁ (i.e., the comment true probability P1).

As shown in fig. 2, in S300, the number of comments in the preset time range, which is the same as the user IP address, is determined according to the comment time, and a second true probability (i.e., a comment true probability P2) is calculated according to the number of comments and the function model.

It should be noted that the preset time range may be set as needed, for example, 10 seconds is taken as an example, it is determined whether the same comments issued by the user IP address exist in the preset time range from 5 seconds before the comment time to 5 seconds after the comment time, and if so, the total number of the comments is counted, so as to obtain the number of the comments.

Specifically, in step S300, the second true probability is calculated according to the number of comments and the function model, and the method includes steps S310 to S330:

and S310, calculating the product of the number of the comments and the slope parameter.

And S320, calculating the difference value of the intercept parameter and the product.

And S330, obtaining a second true probability according to the sigmoid function and the difference value.

Optionally, the function model in the embodiment of the present invention may be trained in advance or set appropriate values of the slope parameter and the intercept parameter according to a large amount of data prior; and mapping the linear function of the sigmoid function into the interval from 0 to 1.

In particular, the second true probability p ₂ The calculation formula of (2) is as follows:

p ₂ ＝sigmoid(b-kn)

where b is an intercept parameter, k is a slope parameter, and n is a number of comments.

S400, determining a comment classification result according to the first real probability and the second real probability.

Specifically, step S400 includes steps S410-S420:

as shown in fig. 2, S410, a weighted summation is performed according to the first true probability, the first probability weight, the second true probability and the second probability weight.

Specifically, the calculation formula of the weighted sum result p is:

p＝αp ₁ +βp ₂

wherein p is ₁ Is the first true probability, p ₂ Is the second true probability, α is the first probability weight, β is the second probability weight; the values of the first probability weight and the second probability weight may be adjusted as desired.

And S420, when the weighted sum result is larger than a real threshold value, obtaining a comment classification result representing a real comment, and otherwise, obtaining a comment classification result representing a false comment.

Specifically, when the weighted sum result is greater than a real threshold t, a comment classification result representing a real comment is obtained, and the current user comment is considered to be a real comment; and when the weighted sum result is less than or equal to the real threshold value, obtaining a comment classification result representing the false comment, and considering that the current user comment is the false comment.

Optionally, the comment classification method according to the embodiment of the present invention may further include step S500 or S600, and step S500 or S600 may be executed by the application layer:

s500, when the comment classification result represents the real comment, the user comment is published.

Specifically, when the system obtains a comment classification result representing the real comment, the current user comment is considered as the real comment, and the user is allowed to issue through auditing.

S600, when the comment classification result represents the false comment, deleting the user comment.

Specifically, when the system obtains a comment classification result representing the false comment, the current user comment is considered as the false comment, does not pass the review, and is not allowed to be issued by the user or deleted after being issued by the user.

By the comment classification method, when the user comments in the actual comment scene such as the video polyphonic ringtone system, the comment of the user can be detected and classified in real time, and the limitation that the conventional video polyphonic ringtone system does not detect and classify the comment area of the user in real time is broken; the accuracy of comment classification is further improved by adopting the technologies of a self-attention mechanism, delayed release, IP address check, user attribute extraction, model stacking of a natural language processing model and a function model, GRU coding and a Transformer. Meanwhile, the background of the system can determine whether the false comments are generated by the machine or not according to the comparison of the weighted sum result and the real threshold, so that the user comments are automatically deleted, manual intervention is reduced, real-time monitoring, real-time analysis and real-time cleaning are realized, and the efficiency is high.

An embodiment of the present invention further provides a comment classification apparatus, including:

the first processing module is used for obtaining a first real probability according to the user comment, the user attribute and the natural language processing model;

the second processing module is used for determining the number of comments in a preset time range, which is the same as the user IP address, according to the comment time and calculating to obtain a second true probability according to the number of comments and the function model;

and the classification module is used for determining the comment classification result according to the first real probability and the second real probability.

The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.

The embodiment of the present invention further provides an electronic device, where the electronic device includes a processor and a memory, where at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the comment classification method of the foregoing embodiment. The electronic device of the embodiment of the invention includes but is not limited to a mobile phone, a tablet computer, a vehicle-mounted computer and the like.

The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the beneficial effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.

An embodiment of the present invention further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the comment classification method of the foregoing embodiment.

Embodiments of the present invention also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the comment classification method of the foregoing embodiment.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in this application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of classifying reviews, comprising:

2. The comment classification method according to claim 1, characterized in that: obtaining a first true probability according to the user comment, the user attribute and the natural language processing model, including:

3. The comment classification method of claim 2, wherein: the performing a first encoding process on the user attribute to obtain a first matrix includes:

encoding the user attribute into a first vector through a word vector model;

encoding, by a GRU encoder, the first vector as a context dependent vector;

and splicing the context correlation vectors to obtain a first matrix.

4. The comment classification method according to claim 2, characterized in that: performing second encoding processing on the user comment to obtain a second matrix, wherein the second encoding processing includes:

encoding the user comment into a second vector through a word vector model;

and calculating a matrix expression with self attention according to the Query vector with the preset length, the Key vector with the preset length and the Value vector with the preset length to obtain a second matrix.

5. The comment classification method according to claim 1, characterized in that: calculating to obtain a second true probability according to the number of the comments and the function model, wherein the calculating comprises the following steps:

calculating the product of the number of comments and a slope parameter;

calculating a difference between the intercept parameter and the product;

6. The comment classification method according to any one of claims 1 to 5, characterized in that: the determining a comment classification result according to the first real probability and the second real probability includes:

7. The comment classification method of claim 6, wherein: the method further comprises the following steps:

alternatively, the first and second electrodes may be,

8. A comment classification apparatus characterized by comprising:

9. An electronic device, characterized in that: the electronic device comprises a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the method according to any one of claims 1-7.

10. A computer-readable storage medium characterized by: the storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by a processor to implement the method of any one of claims 1-7.