CN115168677A - Comment classification method, device, equipment and storage medium - Google Patents

Comment classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN115168677A
CN115168677A CN202210646589.7A CN202210646589A CN115168677A CN 115168677 A CN115168677 A CN 115168677A CN 202210646589 A CN202210646589 A CN 202210646589A CN 115168677 A CN115168677 A CN 115168677A
Authority
CN
China
Prior art keywords
comment
user
vector
probability
comments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210646589.7A
Other languages
Chinese (zh)
Other versions
CN115168677B (en
Inventor
甘心
肖冠正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iMusic Culture and Technology Co Ltd
Original Assignee
iMusic Culture and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iMusic Culture and Technology Co Ltd filed Critical iMusic Culture and Technology Co Ltd
Priority to CN202210646589.7A priority Critical patent/CN115168677B/en
Publication of CN115168677A publication Critical patent/CN115168677A/en
Application granted granted Critical
Publication of CN115168677B publication Critical patent/CN115168677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a comment classification method, a comment classification device and a storage medium, wherein comment data are obtained, the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses, a first real probability is obtained according to the user comments, the user attributes and a natural language processing model, the comment number which is the same as the user IP addresses in a preset time range is determined according to the comment time, a second real probability is obtained through calculation according to the comment number and a function model, a comment classification result is determined according to the first real probability and the second real probability, and the comment classification result is obtained based on the comment time, the user IP addresses and the function model to assist the natural language model, so that the comment classification accuracy is improved; the comment classification method and the comment classification device can automatically generate comment classification results without manual intervention, improve the accuracy and efficiency, and can be widely applied to the technical field of natural language processing.

Description

Comment classification method, device, equipment and storage medium
Technical Field
The invention relates to the field of natural language processing, in particular to a comment classification method, device, equipment and storage medium.
Background
At present, with the development of internet technology, the number of users of various platforms gradually increases, and users can watch or listen to corresponding contents such as videos, music, video polyphonic ringtone and the like in the platforms and can issue own comments in comment areas to express own feelings. In fact, false comments such as machine generation may exist in the comments of the user, so that the truth of the comments is affected, and therefore true and false comments need to be distinguished, the current true and false comments are usually manually detected, the number of video polyphonic ringtone comments is large, if manual detection is adopted, a large amount of manpower and time need to be uninterruptedly input, the cost is high, the real-time performance is poor, the efficiency is low, and the accuracy is low.
Disclosure of Invention
In view of the above, in order to solve at least one of the above technical problems, an object of the present invention is to provide a comment classification method, apparatus, device and storage medium, which improve accuracy and efficiency.
The embodiment of the invention adopts the technical scheme that:
a method of review classification, comprising:
obtaining comment data; the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses;
obtaining a first real probability according to the user comment, the user attribute and a natural language processing model;
determining the number of comments, which is the same as the user IP address, in a preset time range according to the comment time, and calculating to obtain a second true probability according to the number of comments and a function model;
and determining a comment classification result according to the first real probability and the second real probability.
Further, the obtaining a first true probability according to the user comment, the user attribute, and a natural language processing model includes:
carrying out first coding processing on the user attribute to obtain a first matrix;
carrying out second coding processing on the user comment to obtain a second matrix;
and splicing the first matrix and the second matrix, and converting a splicing result through a full connection layer and a sigmoid function to obtain a first true probability.
Further, the performing a first encoding process on the user attribute to obtain a first matrix includes:
encoding the user attribute into a first vector through a word vector model;
encoding the first vector as a context dependent vector by a GRU encoder;
and splicing the context correlation vectors to obtain a first matrix.
Further, the performing a second encoding process on the user comment to obtain a second matrix includes:
encoding the user comment into a second vector through a word vector model;
constructing a Query vector according to the second vector and the first weight, constructing a Key vector according to the second vector and the second weight, and constructing a Value vector according to the second vector and the third weight;
uniformly processing the Query vector, the Key vector and the Value vector to preset lengths;
and calculating a matrix expression with self attention according to the Query vector with a preset length, the Key vector with a preset length and the Value vector with a preset length to obtain a second matrix.
Further, the calculating to obtain a second true probability according to the number of the comments and the function model includes:
calculating the product of the number of comments and a slope parameter;
calculating a difference between an intercept parameter and the product;
and obtaining a second true probability according to the sigmoid function and the difference value.
Further, the determining a comment classification result according to the first real probability and the second real probability includes:
performing weighted summation according to the first true probability, the first probability weight, the second true probability and the second probability weight;
and when the weighted sum result is larger than a real threshold value, obtaining a comment classification result representing the real comment, otherwise obtaining a comment classification result representing the false comment.
Further, the method further comprises:
when the comment classification result represents a real comment, publishing the user comment;
alternatively, the first and second electrodes may be,
and when the comment classification result represents a false comment, deleting the user comment.
An embodiment of the present invention further provides a comment classification device, including:
the acquisition module is used for acquiring comment data; the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses;
the first processing module is used for obtaining a first real probability according to the user comment, the user attribute and a natural language processing model;
the second processing module is used for determining the number of comments in a preset time range, which is the same as the user IP address, according to the comment time, and calculating to obtain a second true probability according to the number of comments and a function model;
and the classification module is used for determining a comment classification result according to the first real probability and the second real probability.
An embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the method.
Embodiments of the present invention also provide a computer-readable storage medium, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the storage medium, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the method.
The beneficial effects of the invention are: obtaining comment data, wherein the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses, obtaining a first real probability according to the user comments, the user attributes and a natural language processing model, determining the number of comments in a preset time range, which is the same as the user IP addresses, according to the comment time, calculating to obtain a second real probability according to the comment number and a function model, determining a comment classification result according to the first real probability and the second real probability, and obtaining a comment classification result based on the comment time, the user IP addresses and the function model to assist the natural language model, so that the comment classification accuracy is improved; and the comment classification result is automatically generated without manual intervention, so that the accuracy and the efficiency are improved.
Drawings
FIG. 1 is a flow chart illustrating steps of a comment classification method according to the present invention;
fig. 2 is a flowchart of a comment classification method according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.
As shown in fig. 1, an embodiment of the present invention provides a comment classification method, including steps S100 to S400:
and S100, obtaining comment data.
In the embodiment of the invention, the comment data comprises but is not limited to user comments, comment time corresponding to the user comments, user attributes and user IP addresses. Optionally, the comment data is obtained through a data layer, and the comment data may be obtained by a system when a user comments in a webpage, APP, an applet, or the like, for example: when a user sends a comment request in a video color ring comment area of a webpage, the system acquires user comments which need to be made by the user in the comment request, wherein the comment time corresponding to the user comments is the comment time sent by the user, such as the comment request time sent by the user, the IP address of the user sending the user request and the user attributes, and the user attributes include but are not limited to information such as user names, levels, liveness and the like. In addition, when a system (such as a color ring back tone video system) receives a comment request of a user, comment data can be obtained and subjected to subsequent processing to obtain a comment classification result, the comment of the user is issued again after the comment classification result is determined to be issued and approved, and delay issue of the comment of the user is achieved.
As shown in fig. 2, S200, a first true probability is obtained according to the user comment, the user attribute, and the natural language processing model.
It should be noted that the calculation of the first true probability and the second true probability and the comment classification result may be completed in a model layer, and a natural language processing model and a function model are disposed in the model layer.
Optionally, step S200 may include steps S210-S230:
s210, carrying out first coding processing on the user attributes to obtain a first matrix.
Specifically, step S210 includes steps S2101-S2103:
s2101, the user attributes are coded into a first vector through a word vector model.
Optionally, the word vector model includes, but is not limited to, a word2vec model, and in the embodiment of the present invention, for example, gloVe in the word2vec model is taken as an example, and the user attribute is encoded as the first vector by GloVe
Figure BDA0003686137750000041
In particular a word vector set.
S2102, encode the first vector as a context dependent vector by the GRU encoder.
S2103, splicing the context correlation vectors to obtain a first matrix.
It should be noted that the GRU Encoder is a GRU-based self-Encoder such as a bidirectional GRU Encoder, and the first vector is transmitted through the GRU Encoder
Figure BDA0003686137750000042
Encoding as a context dependent vector h t The specific process is as follows:
Figure BDA0003686137750000043
Figure BDA0003686137750000044
Figure BDA0003686137750000045
Figure BDA0003686137750000046
wherein z is t To update the gate, a determination is made as to how much information the current state needs to retain from the historical stateAnd how much information to accept from the candidate state; sigma is a sigmoid function, and data can be converted into a numerical value in a range of 0-1 through the function; w z And U z To calculate the weight parameter of the updated gate, h t-1 The hidden layer state of the previous node is the historical state; t is the sequence number of the word, r t A reset gate for controlling whether the calculation of the candidate state depends on the historical state; w t And U t To calculate a weight parameter for the reset gate; w and U are weight parameters for computing candidate states,
Figure BDA0003686137750000051
in order to multiply the elements of the vector,
Figure BDA0003686137750000052
is a candidate state.
In the embodiment of the invention, the context correlation vectors are spliced to obtain a first matrix H, namely the code of the user attribute sequence.
As shown in fig. 2, in S220, a second encoding process is performed on the user comment, so as to obtain a second matrix.
Specifically, step S220 includes steps S2201-S2204:
s2201, encoding the user comment into a second vector through a word vector model.
Similarly, the word vector model includes, but is not limited to, word2vec model, in the embodiment of the present invention, gloVe is taken as an example, and the user comment is encoded as the second vector by GloVe
Figure BDA0003686137750000053
In particular a word vector set.
S2202, constructing a Query vector according to the second vector and the first weight, constructing a Key vector according to the second vector and the second weight, and constructing a Value vector according to the second vector and the third weight.
Note that the first weight w q A second weight w k A third weight w v Can be adjusted according to actual needs. In particular, according to the second vector
Figure BDA0003686137750000054
And a first weight w q The first product of (A) to obtain a Query vector, according to the second vector
Figure BDA0003686137750000055
And a second weight w k The second product of (a) to obtain a Key vector, according to the second vector
Figure BDA0003686137750000056
And a third weight w v The third product of (d) yields a Value vector.
S2203, uniformly processing the Query vector, the Key vector and the Value vector to preset lengths.
Alternatively, each vector is encoded into a vector with self attention by a transform-based Encoder (Encoder), so as to have a better detection effect when facing a user comment of a long sentence, and the encoding process of a specific transform-based Encoder is steps S2203 and S2204. In the embodiment of the present invention, the preset length max _ length may be adjusted according to the actual condition, and the Query vector, the Key vector, and the Value vector are respectively compared with the preset length max _ length, when the length is greater than the preset length max _ length, a part of the Query vector, the Key vector, and the Value vector is truncated to make the length of the Query vector, the Key vector, and the Value vector, and if the length is smaller than the preset length max _ length, the tail of the Query vector, the Key vector, and the Value vector are supplemented with contents (corresponding word vectors) such as meaningless symbols or words to make the length of the Query vector, the Key vector, and the Value vector, and the length of the Value vector are set to the preset length. It should be noted that, in some embodiments, the user comments may be unified into a preset length in advance, and then the second encoding process is performed, which is not limited specifically.
S2204, calculating a matrix expression with self attention according to the Query vector with the preset length, the Key vector with the preset length and the Value vector with the preset length to obtain a second matrix.
It should be noted that the second vector includes vectors corresponding to a plurality of words, so that each word has a Query vector with a corresponding preset length, a Key vector with a preset length, and a Value vector with a preset length, at this time, the Query vectors with the preset length of each word are spliced into a matrix Q, the Key vector with the preset length of each word is connected to a matrix K, and the Value vectors with the preset length of each word are sequentially spliced into a matrix V. Then, calculating a matrix expression with self attention according to the matrix Q, the matrix K and the matrix V, specifically:
Figure BDA0003686137750000061
where Attention () is self-Attention, attention (Q, K, V) is the second matrix, i.e., the encoding of the comment sequence, softmax () is a function, T is transpose,
Figure BDA0003686137750000062
as scaling factor, d k The dimension of the Key vector.
S230, splicing (connecting) the first matrix and the second matrix, and converting a splicing result through a full connection layer and a sigmoid function to obtain a first true probability p 1 (i.e., the comment true probability P1).
As shown in fig. 2, in S300, the number of comments in the preset time range, which is the same as the user IP address, is determined according to the comment time, and a second true probability (i.e., a comment true probability P2) is calculated according to the number of comments and the function model.
It should be noted that the preset time range may be set as needed, for example, 10 seconds is taken as an example, it is determined whether the same comments issued by the user IP address exist in the preset time range from 5 seconds before the comment time to 5 seconds after the comment time, and if so, the total number of the comments is counted, so as to obtain the number of the comments.
Specifically, in step S300, the second true probability is calculated according to the number of comments and the function model, and the method includes steps S310 to S330:
and S310, calculating the product of the number of the comments and the slope parameter.
And S320, calculating the difference value of the intercept parameter and the product.
And S330, obtaining a second true probability according to the sigmoid function and the difference value.
Optionally, the function model in the embodiment of the present invention may be trained in advance or set appropriate values of the slope parameter and the intercept parameter according to a large amount of data prior; and mapping the linear function of the sigmoid function into the interval from 0 to 1.
In particular, the second true probability p 2 The calculation formula of (2) is as follows:
p 2 =sigmoid(b-kn)
where b is an intercept parameter, k is a slope parameter, and n is a number of comments.
S400, determining a comment classification result according to the first real probability and the second real probability.
Specifically, step S400 includes steps S410-S420:
as shown in fig. 2, S410, a weighted summation is performed according to the first true probability, the first probability weight, the second true probability and the second probability weight.
Specifically, the calculation formula of the weighted sum result p is:
p=αp 1 +βp 2
wherein p is 1 Is the first true probability, p 2 Is the second true probability, α is the first probability weight, β is the second probability weight; the values of the first probability weight and the second probability weight may be adjusted as desired.
And S420, when the weighted sum result is larger than a real threshold value, obtaining a comment classification result representing a real comment, and otherwise, obtaining a comment classification result representing a false comment.
Specifically, when the weighted sum result is greater than a real threshold t, a comment classification result representing a real comment is obtained, and the current user comment is considered to be a real comment; and when the weighted sum result is less than or equal to the real threshold value, obtaining a comment classification result representing the false comment, and considering that the current user comment is the false comment.
Optionally, the comment classification method according to the embodiment of the present invention may further include step S500 or S600, and step S500 or S600 may be executed by the application layer:
s500, when the comment classification result represents the real comment, the user comment is published.
Specifically, when the system obtains a comment classification result representing the real comment, the current user comment is considered as the real comment, and the user is allowed to issue through auditing.
S600, when the comment classification result represents the false comment, deleting the user comment.
Specifically, when the system obtains a comment classification result representing the false comment, the current user comment is considered as the false comment, does not pass the review, and is not allowed to be issued by the user or deleted after being issued by the user.
By the comment classification method, when the user comments in the actual comment scene such as the video polyphonic ringtone system, the comment of the user can be detected and classified in real time, and the limitation that the conventional video polyphonic ringtone system does not detect and classify the comment area of the user in real time is broken; the accuracy of comment classification is further improved by adopting the technologies of a self-attention mechanism, delayed release, IP address check, user attribute extraction, model stacking of a natural language processing model and a function model, GRU coding and a Transformer. Meanwhile, the background of the system can determine whether the false comments are generated by the machine or not according to the comparison of the weighted sum result and the real threshold, so that the user comments are automatically deleted, manual intervention is reduced, real-time monitoring, real-time analysis and real-time cleaning are realized, and the efficiency is high.
An embodiment of the present invention further provides a comment classification apparatus, including:
the acquisition module is used for acquiring comment data; the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses;
the first processing module is used for obtaining a first real probability according to the user comment, the user attribute and the natural language processing model;
the second processing module is used for determining the number of comments in a preset time range, which is the same as the user IP address, according to the comment time and calculating to obtain a second true probability according to the number of comments and the function model;
and the classification module is used for determining the comment classification result according to the first real probability and the second real probability.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
The embodiment of the present invention further provides an electronic device, where the electronic device includes a processor and a memory, where at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the comment classification method of the foregoing embodiment. The electronic device of the embodiment of the invention includes but is not limited to a mobile phone, a tablet computer, a vehicle-mounted computer and the like.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the beneficial effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
An embodiment of the present invention further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the comment classification method of the foregoing embodiment.
Embodiments of the present invention also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the comment classification method of the foregoing embodiment.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that, in this application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A method of classifying reviews, comprising:
obtaining comment data; the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses;
obtaining a first real probability according to the user comment, the user attribute and a natural language processing model;
determining the number of comments, which is the same as the user IP address, in a preset time range according to the comment time, and calculating to obtain a second true probability according to the number of comments and a function model;
and determining a comment classification result according to the first real probability and the second real probability.
2. The comment classification method according to claim 1, characterized in that: obtaining a first true probability according to the user comment, the user attribute and the natural language processing model, including:
carrying out first coding processing on the user attribute to obtain a first matrix;
carrying out second coding processing on the user comment to obtain a second matrix;
and splicing the first matrix and the second matrix, and converting a splicing result through a full connection layer and a sigmoid function to obtain a first true probability.
3. The comment classification method of claim 2, wherein: the performing a first encoding process on the user attribute to obtain a first matrix includes:
encoding the user attribute into a first vector through a word vector model;
encoding, by a GRU encoder, the first vector as a context dependent vector;
and splicing the context correlation vectors to obtain a first matrix.
4. The comment classification method according to claim 2, characterized in that: performing second encoding processing on the user comment to obtain a second matrix, wherein the second encoding processing includes:
encoding the user comment into a second vector through a word vector model;
constructing a Query vector according to the second vector and the first weight, constructing a Key vector according to the second vector and the second weight, and constructing a Value vector according to the second vector and the third weight;
uniformly processing the Query vector, the Key vector and the Value vector to preset lengths;
and calculating a matrix expression with self attention according to the Query vector with the preset length, the Key vector with the preset length and the Value vector with the preset length to obtain a second matrix.
5. The comment classification method according to claim 1, characterized in that: calculating to obtain a second true probability according to the number of the comments and the function model, wherein the calculating comprises the following steps:
calculating the product of the number of comments and a slope parameter;
calculating a difference between the intercept parameter and the product;
and obtaining a second true probability according to the sigmoid function and the difference value.
6. The comment classification method according to any one of claims 1 to 5, characterized in that: the determining a comment classification result according to the first real probability and the second real probability includes:
performing weighted summation according to the first true probability, the first probability weight, the second true probability and the second probability weight;
and when the weighted sum result is larger than a real threshold value, obtaining a comment classification result representing the real comment, otherwise obtaining a comment classification result representing the false comment.
7. The comment classification method of claim 6, wherein: the method further comprises the following steps:
when the comment classification result represents a real comment, publishing the user comment;
alternatively, the first and second electrodes may be,
and when the comment classification result represents a false comment, deleting the user comment.
8. A comment classification apparatus characterized by comprising:
the acquisition module is used for acquiring comment data; the comment data comprise user comments, comment time corresponding to the user comments, user attributes and user IP addresses;
the first processing module is used for obtaining a first real probability according to the user comment, the user attribute and a natural language processing model;
the second processing module is used for determining the number of comments in a preset time range, which is the same as the user IP address, according to the comment time, and calculating to obtain a second true probability according to the number of comments and a function model;
and the classification module is used for determining a comment classification result according to the first real probability and the second real probability.
9. An electronic device, characterized in that: the electronic device comprises a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the method according to any one of claims 1-7.
10. A computer-readable storage medium characterized by: the storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by a processor to implement the method of any one of claims 1-7.
CN202210646589.7A 2022-06-09 2022-06-09 Comment classification method, device, equipment and storage medium Active CN115168677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210646589.7A CN115168677B (en) 2022-06-09 2022-06-09 Comment classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210646589.7A CN115168677B (en) 2022-06-09 2022-06-09 Comment classification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115168677A true CN115168677A (en) 2022-10-11
CN115168677B CN115168677B (en) 2023-03-28

Family

ID=83486291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210646589.7A Active CN115168677B (en) 2022-06-09 2022-06-09 Comment classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115168677B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100211515A1 (en) * 2003-06-30 2010-08-19 Idocuments, Llc Worker and document management system
CN104008289A (en) * 2014-05-26 2014-08-27 沈苹 Method and device for evaluating artistic works
CN106055664A (en) * 2016-06-03 2016-10-26 腾讯科技(深圳)有限公司 Method and system for filtering UGC (User Generated Content) spam based on user comments
CN106484679A (en) * 2016-10-20 2017-03-08 北京邮电大学 A kind of false review information recognition methodss being applied on consumption platform and device
CN109241302A (en) * 2018-08-31 2019-01-18 深圳市轱辘汽车维修技术有限公司 A kind of comment authorization method, device and the terminal device of online course
CN110399602A (en) * 2018-04-25 2019-11-01 北京京东尚科信息技术有限公司 A kind of method and apparatus for evaluating and testing text reliability
CN112732921A (en) * 2021-01-19 2021-04-30 福州大学 False user comment detection method and system
CN114266241A (en) * 2022-01-04 2022-04-01 成都晓多科技有限公司 Comment usefulness prediction method, device and medium based on text and emotion polarity

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100211515A1 (en) * 2003-06-30 2010-08-19 Idocuments, Llc Worker and document management system
US20140156686A1 (en) * 2003-06-30 2014-06-05 Idocuments, Llc Document management system
CN104008289A (en) * 2014-05-26 2014-08-27 沈苹 Method and device for evaluating artistic works
CN106055664A (en) * 2016-06-03 2016-10-26 腾讯科技(深圳)有限公司 Method and system for filtering UGC (User Generated Content) spam based on user comments
CN106484679A (en) * 2016-10-20 2017-03-08 北京邮电大学 A kind of false review information recognition methodss being applied on consumption platform and device
CN110399602A (en) * 2018-04-25 2019-11-01 北京京东尚科信息技术有限公司 A kind of method and apparatus for evaluating and testing text reliability
CN109241302A (en) * 2018-08-31 2019-01-18 深圳市轱辘汽车维修技术有限公司 A kind of comment authorization method, device and the terminal device of online course
CN112732921A (en) * 2021-01-19 2021-04-30 福州大学 False user comment detection method and system
CN114266241A (en) * 2022-01-04 2022-04-01 成都晓多科技有限公司 Comment usefulness prediction method, device and medium based on text and emotion polarity

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IFEOMA ADAJI 等: "Modelling User Collaboration in Social Networks Using Edits and Comments" *
SUPRIANTO 等: "Retrieval Information Using Generalized Vector Space Models And Sentiment Analysis Using Naïve Bayes Classifier For Evaluation Of Lecturers By Students" *
廖晨: "微博信息可信度的评判模型和可视化工具研究" *

Also Published As

Publication number Publication date
CN115168677B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN109033261B (en) Image processing method, image processing apparatus, image processing device, and storage medium
CN110781273B (en) Text data processing method and device, electronic equipment and storage medium
CN113392236A (en) Data classification method, computer equipment and readable storage medium
CN113139052B (en) Rumor detection method and device based on graph neural network feature aggregation
CN113516961B (en) Note generation method, related device, storage medium and program product
CN113283238A (en) Text data processing method and device, electronic equipment and storage medium
CN110046251A (en) Community content methods of risk assessment and device
CN116645624A (en) Video content understanding method and system, computer device, and storage medium
CN113836390B (en) Resource recommendation method, device, computer equipment and storage medium
CN115168677B (en) Comment classification method, device, equipment and storage medium
CN117081941A (en) Flow prediction method and device based on attention mechanism and electronic equipment
CN114398484A (en) Public opinion analysis method, device, equipment and storage medium
CN114707633A (en) Feature extraction method, feature extraction device, electronic equipment and storage medium
CN113722584A (en) Task pushing method and device and storage medium
CN111401070B (en) Word meaning similarity determining method and device, electronic equipment and storage medium
CN113254788A (en) Big data based recommendation method and system and readable storage medium
CN111552850A (en) Type determination method and device, electronic equipment and computer readable storage medium
CN114548083B (en) Title generation method, device, equipment and medium
CN111428118B (en) Method for detecting event reliability and electronic equipment
CN110992067B (en) Message pushing method, device, computer equipment and storage medium
CN109871487B (en) News recall method and system
CN115718696B (en) Source code cryptography misuse detection method and device, electronic equipment and storage medium
WO2021212377A1 (en) Method and apparatus for determining risky attribute of user data, and electronic device
CN116306666A (en) Semantic role analysis method, device, equipment and storage medium
CN112348273A (en) Information generation method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant