CN111695357A - Text labeling method and related product - Google Patents

Text labeling method and related product Download PDF

Info

Publication number
CN111695357A
CN111695357A CN202010465811.4A CN202010465811A CN111695357A CN 111695357 A CN111695357 A CN 111695357A CN 202010465811 A CN202010465811 A CN 202010465811A CN 111695357 A CN111695357 A CN 111695357A
Authority
CN
China
Prior art keywords
text data
piece
evaluation
data set
labeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010465811.4A
Other languages
Chinese (zh)
Inventor
李文斌
喻宁
冯晶凌
柳阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010465811.4A priority Critical patent/CN111695357A/en
Priority to PCT/CN2020/099493 priority patent/WO2021114634A1/en
Publication of CN111695357A publication Critical patent/CN111695357A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of emotion recognition in artificial intelligence, and particularly discloses a text labeling method and a related product, wherein the method comprises the following steps: acquiring a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises an emoji expression; labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation; obtaining a first training sample set according to a first labeling result of each piece of first text data; training a first neural network using a first set of training samples; acquiring a second text data set from a second three-party platform; and labeling the second text data set by using the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, wherein the second labeling result comprises one of positive evaluation, negative evaluation or neutral evaluation.

Description

Text labeling method and related product
Technical Field
The application relates to the technical field of emotion recognition in artificial intelligence, in particular to a text labeling method and a related product.
Background
With the development of artificial intelligence, the application range of the neural network is wider and wider. For example, in the field of video monitoring, a neural network can be used for identifying people in a monitoring video or in the field of medical treatment, the neural network can be used for identifying tumors in nuclear magnetic resonance images; furthermore, in the field of character recognition, neural networks are used to emotionally classify text.
Although neural networks have a good expression for image recognition. However, early training of the neural network requires training data sets of sufficient quantity and quality. The production of training data sets is a very costly item. Firstly, some high-quality original data sets need to be obtained from a database and labeled. For example, when training a text emotion classification network, a large amount of texts with complete semantics and definite emotion need to be acquired, and then the large amount of texts are labeled manually. However, because the number of texts is extremely large, a lot of time and labor cost are required to be invested in manual labeling, and the labeling efficiency is low.
Disclosure of Invention
The embodiment of the application provides a text labeling method and a related product. The application scenes of the text labeling are increased, and the efficiency of the text labeling is improved.
In a first aspect, an embodiment of the present application provides a text annotation method applied to an electronic device, including:
the electronic equipment acquires a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises an emoji expression;
the electronic equipment labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
the electronic equipment obtains a first training sample set according to a first labeling result of each piece of first text data;
the electronic device training a first neural network using the first set of training samples;
the electronic equipment acquires a second text data set from a second three-party platform;
and the electronic equipment marks the second text data set by using the first neural network to obtain a second marking result of each piece of second text data in the second text data set, wherein the second marking result comprises one of positive evaluation, negative evaluation or neutral evaluation.
In a second aspect, an embodiment of the present application provides an electronic device, including:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a first text data set from a first three-party platform, and each piece of first text data in the first text data set comprises an emoji expression;
the labeling unit labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
the training unit is used for obtaining a first training sample set according to a first labeling result of each piece of first text data and training a first neural network by using the first training sample set;
the acquisition unit is further used for acquiring a second text data set from a second three-party platform;
the labeling unit is further configured to label the second text data set by using the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, where the second labeling result includes one of positive evaluation, negative evaluation, or neutral evaluation.
In a third aspect, embodiments of the present application provide an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for performing the steps in the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, where the computer program makes a computer execute the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program, the computer being operable to cause a computer to perform the method according to the first aspect.
The embodiment of the application has the following beneficial effects:
it can be seen that in the embodiment of the application, comment data are labeled through emoji expressions in the text data, semantic analysis on the comment data is not needed, so that the comment data are not limited by the language type of the text data during labeling, and the application scenes of the text labeling are increased; in addition, text data can be automatically marked through the emoji expressions, manual marking is not needed, and manpower and material resources are saved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a labeling method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another annotation method provided in the embodiments of the present application;
FIG. 3 is a schematic flow chart of another annotation method provided in the embodiments of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 5 is a block diagram illustrating functional units of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The electronic device in the present application may include a smart phone (such as an Android phone, an iOS phone, a windows phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile internet device MID (MID for short), a wearable device, or the like. The above mentioned electronic devices are only examples, not exhaustive, and include but not limited to the above mentioned electronic devices. In practical applications, the electronic device may further include: intelligent vehicle-mounted terminal, computer equipment and the like.
Referring to fig. 1, fig. 1 is a schematic flowchart of a text annotation method provided in an embodiment of the present application, where the method is applied to an electronic device, and the method includes the following steps:
101: the electronic device obtains a first text data set from a first three-party platform.
The first three-party platform can be a social application such as microblog, twitter and Facebook, or an isoelectric business platform such as Amazon Taobao. Namely, the first three-party platform is a third-party platform which contains more text data with positive evaluation and more text data with negative evaluation. The electronic device obtains a first text data set from a plurality of random first text data in the first platform through an Application Programming Interface (API) provided by the first three-party platform. That is, the electronic device complies with the Robot protocol of the first-party platform, and obtains the first text data set from the first-party platform through the API of the first-party platform.
In some possible embodiments, since the first text data is obtained through the API of the first three-party platform, no manual review is performed, and some of the first text data may not meet the requirement. For example, emoji expressions are not included or the text content is too short. Therefore, after a plurality of pieces of first text data are obtained, the first text data in the first text data set are cleaned to clean the first text data which do not contain emoji expressions or have too short text contents, and the cleaned first text data form the first text data set.
Therefore, each piece of the first text data in the first text data set contains emoji expressions.
102: the electronic equipment labels each piece of first text data according to the emoji expression of each piece of first question data in the first text data to obtain a first labeling result of each piece of first text data.
Wherein the first annotation result comprises a proof evaluation or a negative evaluation.
Illustratively, the first text data is cleaned, and each piece of the first text data in the first text data set comprises an emoji expression. Because of this, emoji expressions themselves carry emotional ratings. For example emoji expressions
Figure BDA0002512626010000041
The emotional rating of the representation is positive, while the Emoji expression
Figure BDA0002512626010000042
Indicating a negative evaluation. Therefore, the first emotion evaluation of each piece of first text data can be determined according to the emoji expression of each piece of first text data; and then labeling each piece of first text data according to the first emotion evaluation of each piece of first text data, namely adding an emotion label to each piece of first text data. That is, if the emoji expression of any piece of first text data belongs to the emoji expression set of the positive evaluation, the first text data is labeled as the positive evaluation, and if the emoji expression belongs to the emoji expression set of the negative evaluation, the first text data is labeled as the negative evaluation.
The first marking result comprises positive evaluation and negative evaluation, the emotion corresponding to the positive evaluation comprises emotion of happiness, approval, appreciation and the like, and the emotion corresponding to the negative evaluation comprises emotion of anger, pessimism, disapproval and the like.
It should be noted that some emoji expressions do not know and determine the emotional rating corresponding to the emoji expression. For example emoji expressions
Figure BDA0002512626010000051
Either as appetitive, i.e. positive emotions, or as jejunum, i.e. negative emotions. And for the first text data which does not label the emoji expressions in the first text data set, only labeling the first text data which only contains the emoji expressions corresponding to the positive evaluation or the emoji expressions corresponding to the negative evaluation.
Further, in order to improve the accuracy of emoji expression labeling, the text content of each piece of first text data can be extracted, and semantic analysis is performed on the text content of each piece of first text data to obtain semantic information of each piece of first text data; determining a first emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data; and reserving the first text data with the consistent first emotion assessment and second emotion assessment in the first text data set, and deleting the first text data with the inconsistent first emotion assessment and second emotion assessment. The semantic analysis and the emoji expression are used for carrying out double labeling, so that the labeling error caused by the emoji expression labeling in a single direction is reduced, and the accuracy of labeling the first text data set is improved.
103: the electronic equipment obtains a first training sample set according to the first labeling result of each piece of first text data.
Namely, the labeled first text data is used as a training sample with a label, and the first training sample set is obtained.
104: the electronic device trains a first neural network using a first set of training samples.
Specifically, initial parameters of a first neural network are constructed, training samples in the first training sample set are input into the first neural network, and a prediction result of the training samples is obtained; then, determining a loss gradient based on the prediction result and the labeling result of the training sample, and constructing a loss function based on the loss gradient; finally, reversely updating the parameter value of the initial parameter based on the loss function and a gradient descent method; and completing the training of the first neural network until the first neural network converges.
105: the electronic device obtains a second text data set from a second three-party platform.
The second three-party platform can be a news platform for publishing science and technology news or wiki or summary text. That is, the second three-party platform is a three-party platform containing a large amount of neutral-rated text data.
Similarly, the electronic device follows the Robot protocol of the second-party platform, and acquires a plurality of pieces of second text data from the second-party platform through the API of the second-party platform to obtain the second text data set.
Of course, after the plurality of pieces of second text data are acquired, the plurality of pieces of second text data may be cleaned to clean the second text data with illegal text contents and too short text contents.
106: and the electronic equipment labels the second text data set by using the first neural network to obtain a labeling result of each piece of second text data in the second text data set.
Wherein the second annotation result comprises one of a positive evaluation, a negative evaluation, or a neutral evaluation.
Specifically, the electronic device uses a first neural network to classify each piece of second text data in the second text data set to obtain a first probability that each piece of second text data is positively evaluated and a second probability that each piece of second text data is negatively evaluated; then, labeling the second text data with the first probability larger than a first threshold (namely, the emotion evaluation of the second text data is considered as positive evaluation by 100 percent) as the positive evaluation; labeling the second text data with the second probability larger than the first threshold (namely, the emotion evaluation of the second text data is considered as negative evaluation with 100 percent confidence) as negative evaluation; second text data having a training sample rate of the first profile that is less than the first threshold and that is greater than a second threshold (i.e., no 100% confidence that the emotional rating of the second text data is a positive rating or a negative rating) is labeled as a neutral rating.
Wherein, the first threshold value may be 0.7, 0.75, 0.8 or other values. The second threshold may be 0.4, 0.45, 0.5, or other values.
It can be seen that in the embodiment of the application, the text data is labeled through the emoji expressions in the text data, semantic analysis on the text data is not needed, so that the labeling is not limited by the language type of the text data, and the application scenes of the labeling method are increased; in addition, text data can be automatically marked through the emoji expressions, and the text data can be marked without manual marking, so that manpower and material resources are saved.
In some possible embodiments, the method further comprises:
the electronic equipment obtains a second training sample set according to a second labeling result of each piece of second text data in the second text data set, namely, the second text data set is combined into a second training sample set with labels according to the labeling result of each piece of second text data in the second text data set; then, training a second neural network by using the second training sample set; any piece of comment data to be published is obtained, a second neural network is used for classifying the comment data to be published, and a classification result of the comment data to be published is obtained; and determining whether to disclose the comment data to be published according to the classification result.
If the to-be-published comment data can be the to-be-published comment data in any news website, the to-be-published comment data is disclosed when the classification result is positive evaluation or neutral evaluation, and the to-be-published comment data is not disclosed when the classification result is negative evaluation. Compared with the existing method for manually auditing the comment data to be published, the method and the device for auditing the comment data to be published can automatically audit the comment data to be published through the second neural network, and therefore human resources are saved.
And if the classification result is positive evaluation or negative evaluation, checking the comment data of the to-be-issued form with a purchase record of a user to determine the authenticity of the comment data of the to-be-issued form, and if the comment data of the to-be-issued form is determined to be malicious review, not disclosing the comment data of the to-be-issued form. The comment data to be published can be automatically audited through the second neural network so as to determine the authenticity of the comment data of the to-be-published list, and therefore human resources are saved.
In some possible embodiments, since the second text data obtained from the second three-party platform is mostly neutral text data, and the first text data obtained from the first three-party platform is mostly positive-rated text data and negative-rated text data. Therefore, in order to increase the number of the training samples for positive evaluation and the training samples for negative evaluation in the second training sample set, the second training sample set and the first training sample set may be combined to obtain a new second training sample set with sufficient training samples, and the second neural network is trained by using the new second training sample set, so that the trained second neural network is more accurate.
In some possible embodiments, after determining the first emotion rating of each piece of first text data from the emoji expression of each piece of first text data in the first text data set, the method further comprises:
extracting text content of each piece of first text data; converting the text content into a second emoji expression; determining a second emotion evaluation corresponding to each piece of first text data according to the second emoji expression; and determining whether the first emotion evaluation and the second emotion evaluation of each piece of first text data are consistent, and if so, labeling each piece of first text data according to the first emotion evaluation of each piece of first text data. And verifying emotion evaluation corresponding to each piece of first text data through text-to-emoji operation, so as to improve the accuracy of subsequent labeling of the first text data.
In some possible embodiments, the method further comprises:
obtaining comment data of any user, wherein the comment data is comment data of the user on a target product, and the target product comprises a financing product; classifying the comment data of the user by using the second neural network to obtain a classification result of the comment data of the user; screening target users according to the classification results of the comment data of the users, namely taking the users with the classification results of positive evaluation as the target users; and recommending the target product to the target user.
It can be seen that, in the embodiment, the second neural network is used for screening out the users interested in the target product (financial product), so that the accuracy of user screening is ensured, and the success rate of recommendation is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating another text labeling method according to an embodiment of the present application, which is the same as the embodiment shown in fig. 1, and a description thereof is not repeated here. The method is applied to the electronic equipment and comprises the following steps:
201: the electronic device obtains a first set of text data from a first platform.
202: and the electronic equipment cleans each piece of first text data in the first text data set, deletes the first text data which does not contain the emoji expression to obtain a new first text data set, and takes the new first text data set as the first text data set.
203: the electronic equipment determines a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises a positive evaluation or a negative evaluation.
204: the electronic equipment extracts the text content of each piece of first text data, and performs semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data.
205: and the electronic equipment determines a second emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data.
206: and the electronic equipment reserves the first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deletes the first text data with inconsistent first emotion evaluation and second emotion evaluation.
207: and the electronic equipment marks the remaining first text data according to the first emotion evaluation of the remaining first text data to obtain a first training sample set.
The remaining first text data is the first text data remaining after the first comment data with inconsistent first emotion assessment and second emotion assessment is deleted from the first text data set.
208: the electronic device trains a first neural network using a first set of training samples.
209: the electronic device obtains a second set of text data from the second platform.
210: and the electronic equipment marks the second text data set by using the first neural network to obtain a second marking result of each piece of second text data in the second text data set, wherein the second marking result comprises one of positive evaluation, negative evaluation or neutral evaluation.
It can be seen that in the embodiment of the application, the comment data is labeled through the emoji expression in the comment data, semantic analysis on the comment data is not needed, so that the comment data cannot be limited by the language type of the comment data during labeling, and the application scene of the labeling method is increased; in addition, comment data can be automatically labeled through emoji expressions, and a training sample set containing emotion classification labels can be obtained without manual labeling, so that manpower and material resources are saved; moreover, before the first text data set is labeled, the first text data set is cleaned to reserve high-quality first text data, so that the labeling accuracy is improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating another text labeling method according to an embodiment of the present application, which is the same as the embodiment shown in fig. 1 and fig. 2, and a description thereof is not repeated here. The method is applied to the electronic equipment and comprises the following steps:
301: the electronic device obtains a first set of text data from a first platform.
302: and the electronic equipment cleans each piece of first text data in the first text data set, deletes the first text data which does not contain the emoji expression to obtain a new first text data set, and takes the new first text data set as the first text data set.
303: the electronic equipment determines a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises a positive evaluation or a negative evaluation.
304: the electronic equipment extracts the text content of each piece of first text data, and performs semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data.
305: and the electronic equipment determines a second emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data.
306: and the electronic equipment reserves the first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deletes the first text data with inconsistent first emotion evaluation and second emotion evaluation.
307: and the electronic equipment marks the remaining first text data according to the first emotion evaluation of the remaining first text data to obtain a first training sample set.
The remaining first text data is the first text data remaining after the first comment data with inconsistent first emotion assessment and second emotion assessment is deleted from the first text data set.
308: the electronic device trains a first neural network using a first set of training samples.
309: the electronic device obtains a second set of text data from the second platform.
310: and the electronic equipment marks the second text data set by using the first neural network to obtain a second marking result of each piece of second text data in the second text data set, wherein the second marking result comprises one of positive evaluation, negative evaluation or neutral evaluation.
311: and the electronic equipment obtains a second training sample set by using a second labeling result according to each piece of second text data, and trains the second neural network by using the second training sample set.
312: the electronic equipment acquires any piece of comment data, classifies the comment data by using the second neural network to obtain a classification result of the comment data, and determines whether to disclose the comment data according to the classification result.
It can be seen that in the embodiment of the application, the text data is labeled through the emoji expressions in the text data, semantic analysis on the text data is not needed, so that the labeling is not limited by the language type of the text data, and the application scenes of the labeling method are increased; in addition, text data can be automatically labeled through emoji expressions, and a training sample set containing emotion classification labels can be obtained without manual labeling, so that manpower and material resources are saved; before the first text data set is labeled, the first text data set is cleaned to reserve high-quality first text data, so that the labeling accuracy is improved; in addition, the trained second neural network is used for classifying the comment data to be published, the comment data to be published which do not meet the requirements are automatically shielded, manual audit is not needed, and human resources are saved.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 4, electronic device 400 includes a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps of:
acquiring a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises an emoji expression;
labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
obtaining a first training sample set according to a first labeling result of each piece of first text data;
training a first neural network using the first set of training samples;
acquiring a second text data set from a second three-party platform;
and labeling the second text data set by using the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, wherein the second labeling result comprises one of positive evaluation, negative evaluation or neutral evaluation.
In some possible embodiments, in terms of labeling each piece of first text data according to its emoji expression in the first text data set, the above program is specifically configured to execute the following instructions:
determining a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises a positive evaluation or a negative evaluation;
and labeling each piece of first text data according to the first emotion evaluation of each piece of first text data.
In some possible embodiments, after determining the first emotion rating of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the program is further configured to execute the following steps:
extracting text content of each piece of first text data;
semantic analysis is carried out on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
determining a second emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data;
and reserving the first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deleting the first text data with inconsistent first emotion evaluation and second emotion evaluation.
In some possible embodiments, the program is further configured to, prior to labeling the first text data set, execute instructions for:
cleaning each piece of first text data in the first text data set, and deleting the first text data which do not contain emoji expressions to obtain a new first text data set;
taking the new first text data set as the first text data set.
In some possible embodiments, in terms of labeling the second text data set with the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, the above program is specifically configured to execute the following steps:
classifying each piece of second text data in the second text data set by using the first neural network to obtain a first probability of positive evaluation and a second probability of negative evaluation of each piece of second text data;
determining a second labeling result of the second text data with the first probability larger than a first threshold value as a positive evaluation;
determining a second labeling result of the second text data with the second probability larger than the first threshold value as a negative evaluation;
and the second labeling result of the second text data with the first probability smaller than the first threshold and larger than the second threshold is a neutral evaluation.
In some possible embodiments, the program is further for executing the instructions of:
obtaining a second training sample set according to a second labeling result of each piece of second text data in the second text data set;
training a second neural network using the second set of training samples;
the electronic equipment acquires any piece of comment data to be published;
performing sentiment classification on the comment data to be published by using the second neural network to obtain a classification result of the comment data to be published;
and determining whether to disclose the comment data to be published according to the classification result.
In some possible embodiments, after obtaining a second training sample set according to the second labeling result of each piece of the second text data in the second text data set, the program is further configured to execute the following steps:
merging the second training sample with the first training sample set to obtain a new second training sample set;
in respect of training a second neural network using the second set of training samples, the program is specifically for instructions to perform the steps of:
training a second neural network using the new second set of training samples.
Referring to fig. 5, fig. 5 is a block diagram of functional units of an electronic device according to an embodiment of the present application. The electronic device 500 includes: an acquisition unit 510, a labeling unit 520 and a training unit 530; wherein:
an obtaining unit 510, configured to obtain a first text data set from a first three-party platform, where each piece of first text data in the first text data set includes an emoji expression;
a labeling unit 520, configured to label each piece of first text data in the first text data set according to the emoji expression of each piece of first text data, so as to obtain a first labeling result of each piece of first text data, where the first labeling result includes positive evaluation or negative evaluation;
a training unit 530, configured to obtain a first training sample set according to a first labeling result of each piece of first text data, and train a first neural network using the first training sample set;
an obtaining unit 510, further configured to obtain a second text data set from a second three-party platform;
the labeling unit 520 is further configured to label the second text data set by using the first neural network, so as to obtain a second labeling result of each piece of second text data in the second text data set, where the second labeling result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
In some possible embodiments, in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the labeling unit 520 is specifically configured to:
determining a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises a positive evaluation or a negative evaluation;
and labeling each piece of first text data according to the first emotion evaluation of each piece of first text data.
In some possible embodiments, the electronic device 500 further includes a cleansing unit 540, and after determining the first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the cleansing unit 540 is configured to:
extracting text content of each piece of first text data;
semantic analysis is carried out on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
determining a second emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data;
and reserving the first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deleting the first text data with inconsistent first emotion evaluation and second emotion evaluation.
In some possible embodiments, the electronic device 500 further comprises a washing unit 540, and before labeling the first text data set, the washing unit 540 is configured to:
cleaning each piece of first text data in the first text data set, and deleting the first text data which do not contain emoji expressions to obtain a new first text data set;
taking the new first text data set as the first text data set.
In some possible embodiments, in terms of labeling the second text data set using the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, the labeling unit 520 is specifically configured to:
classifying each piece of second text data in the second text data set by using the first neural network to obtain a first probability of positive evaluation and a second probability of negative evaluation of each piece of second text data;
determining a second labeling result of the second text data with the first probability larger than a first threshold value as a positive evaluation;
determining a second labeling result of the second text data with the second probability larger than the first threshold value as a negative evaluation;
and the second labeling result of the second text data with the first probability smaller than the first threshold and larger than the second threshold is a neutral evaluation.
In some possible embodiments, a determination unit 550 is further included;
the training unit 530 is further configured to obtain a second training sample set according to a second labeling result of each piece of second text data in the second text data set;
a training unit 530, further configured to train a second neural network using the second training sample set;
a determining unit 550, configured to obtain any piece of comment data to be published; performing sentiment classification on the comment data to be published by using the second neural network to obtain a classification result of the comment data to be published; and determining whether to disclose the comment data to be published according to the classification result.
In some possible embodiments, after obtaining a second training sample set according to the second labeling result of each piece of the second text data in the second text data set, the training unit 530 is further configured to:
merging the second training sample with the first training sample set to obtain a new second training sample set;
in terms of training the second neural network using the second training sample set, the training unit 530 is specifically configured to:
training a second neural network using the new second set of training samples.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program, and the computer program is executed by a processor to implement part or all of the steps of any one of the text labeling methods described in the above method embodiments.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the text annotation methods as described in the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A text labeling method is applied to electronic equipment and comprises the following steps:
the electronic equipment acquires a first text data set from a first three-party platform, wherein each piece of first text data in the first text data set comprises an emoji expression;
the electronic equipment labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
the electronic equipment obtains a first training sample set according to a first labeling result of each piece of first text data;
the electronic device training a first neural network using the first set of training samples;
the electronic equipment acquires a second text data set from a second three-party platform;
and the electronic equipment marks the second text data set by using the first neural network to obtain a second marking result of each piece of second text data in the second text data set, wherein the second marking result comprises one of positive evaluation, negative evaluation or neutral evaluation.
2. The method of claim 1, wherein the electronic device labels each piece of first text data in the first text data set according to its emoji expression, comprising:
determining a first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, wherein the first emotion evaluation comprises a positive evaluation or a negative evaluation;
and labeling each piece of first text data according to the first emotion evaluation of each piece of first text data.
3. The method of claim 2, wherein after determining the first sentiment rating for each piece of first text data in the first text data set based on an emoji expression of each piece of first text data, the method further comprises:
extracting text content of each piece of first text data;
semantic analysis is carried out on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
determining a second emotion evaluation of each piece of first text data according to the semantic information of each piece of first text data;
and reserving the first text data with consistent first emotion evaluation and second emotion evaluation in the first text data set, and deleting the first text data with inconsistent first emotion evaluation and second emotion evaluation.
4. The method of any of claims 1-3, wherein prior to the electronic device annotating the first set of text data, the method further comprises:
cleaning each piece of first text data in the first text data set, and deleting the first text data which do not contain emoji expressions to obtain a new first text data set;
taking the new first text data set as the first text data set.
5. The method of claim 1, wherein the labeling the second text data set using the first neural network to obtain a second labeling result for each piece of second text data in the second text data set comprises:
classifying each piece of second text data in the second text data set by using the first neural network to obtain a first probability of positive evaluation and a second probability of negative evaluation of each piece of second text data;
determining a second labeling result of the second text data with the first probability larger than a first threshold value as a positive evaluation;
determining a second labeling result of the second text data with the second probability larger than the first threshold value as a negative evaluation;
and the second labeling result of the second text data with the first probability smaller than the first threshold and larger than the second threshold is a neutral evaluation.
6. The method of claim 1, further comprising:
the electronic equipment obtains a second training sample set according to a second labeling result of each piece of second text data in the second text data set;
training, by the electronic device, a second neural network using the second set of training samples;
the electronic equipment acquires any piece of comment data to be published;
the electronic equipment uses the second neural network to carry out sentiment classification on the comment data to be published to obtain a classification result of the comment data to be published;
and the electronic equipment determines whether to disclose the comment data to be published according to the classification result.
7. The method of claim 6, wherein after the electronic device obtains a second training sample set according to the second labeling result of each piece of second text data in the second text data set, the method further comprises:
merging the second training sample with the first training sample set to obtain a new second training sample set;
the electronic device trains a second neural network using the second set of training samples, including:
the electronic device trains a second neural network using the new second training sample set.
8. An electronic device, comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a first text data set from a first three-party platform, and each piece of first text data in the first text data set comprises an emoji expression;
the labeling unit labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, wherein the first labeling result comprises positive evaluation or negative evaluation;
the training unit is used for obtaining a first training sample set according to a first labeling result of each piece of first text data and training a first neural network by using the first training sample set;
the acquisition unit is further used for acquiring a second text data set from a second three-party platform;
the labeling unit is further configured to label the second text data set by using the first neural network to obtain a second labeling result of each piece of second text data in the second text data set, where the second labeling result includes one of positive evaluation, negative evaluation, or neutral evaluation.
9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.
CN202010465811.4A 2020-05-28 2020-05-28 Text labeling method and related product Pending CN111695357A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010465811.4A CN111695357A (en) 2020-05-28 2020-05-28 Text labeling method and related product
PCT/CN2020/099493 WO2021114634A1 (en) 2020-05-28 2020-06-30 Text annotation method, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010465811.4A CN111695357A (en) 2020-05-28 2020-05-28 Text labeling method and related product

Publications (1)

Publication Number Publication Date
CN111695357A true CN111695357A (en) 2020-09-22

Family

ID=72478683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010465811.4A Pending CN111695357A (en) 2020-05-28 2020-05-28 Text labeling method and related product

Country Status (2)

Country Link
CN (1) CN111695357A (en)
WO (1) WO2021114634A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172248B (en) * 2023-11-03 2024-01-30 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium
CN117689998B (en) * 2024-01-31 2024-05-03 数据空间研究院 Nonparametric adaptive emotion recognition model, method, system and storage medium
CN117725909B (en) * 2024-02-18 2024-05-14 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364797A1 (en) * 2016-06-16 2017-12-21 Sysomos L.P. Computing Systems and Methods for Determining Sentiment Using Emojis in Electronic Data
CN109034203B (en) * 2018-06-29 2021-04-30 北京百度网讯科技有限公司 Method, device, equipment and medium for training expression recommendation model and recommending expression
CN109684478B (en) * 2018-12-18 2023-04-07 腾讯科技(深圳)有限公司 Classification model training method, classification device, classification equipment and medium
CN110704581B (en) * 2019-09-11 2024-03-08 创新先进技术有限公司 Text emotion analysis method and device executed by computer

Also Published As

Publication number Publication date
WO2021114634A1 (en) 2021-06-17

Similar Documents

Publication Publication Date Title
CN110569377B (en) Media file processing method and device
US11170064B2 (en) Method and system to filter out unwanted content from incoming social media data
CN107870896B (en) Conversation analysis method and device
US20190163742A1 (en) Method and apparatus for generating information
CN110168535B (en) Information processing method and terminal, computer storage medium
CN105095288B (en) Data analysis method and data analysis device
CN106919661B (en) Emotion type identification method and related device
CN108595583A (en) Dynamic chart class page data crawling method, device, terminal and storage medium
CN111695357A (en) Text labeling method and related product
CN110263248A (en) A kind of information-pushing method, device, storage medium and server
CN110895568A (en) Method and system for processing court trial records
CN107977678A (en) Method and apparatus for output information
WO2021081914A1 (en) Pushing object determination method and apparatus, terminal device and storage medium
CN111104590A (en) Information recommendation method, device, medium and electronic equipment
CN110084373B (en) Information processing method, information processing device, computer-readable storage medium and computer equipment
CN113837836A (en) Model recommendation method, device, equipment and storage medium
CN105786929B (en) A kind of information monitoring method and device
CN112163415A (en) User intention identification method and device for feedback content and electronic equipment
CN116318974A (en) Site risk identification method and device, computer readable medium and electronic equipment
CN110825954A (en) Keyword recommendation method and device and electronic equipment
CN110825847B (en) Method and device for identifying intimacy between target people, electronic equipment and storage medium
CN114548263A (en) Method and device for verifying labeled data, computer equipment and storage medium
CN113901817A (en) Document classification method and device, computer equipment and storage medium
CN111368070B (en) Method and device for determining hot event
CN114153939A (en) Text recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination