WO2021114634A1 - Text annotation method, device, and storage medium - Google Patents

Text annotation method, device, and storage medium Download PDF

Info

Publication number
WO2021114634A1
WO2021114634A1 PCT/CN2020/099493 CN2020099493W WO2021114634A1 WO 2021114634 A1 WO2021114634 A1 WO 2021114634A1 CN 2020099493 W CN2020099493 W CN 2020099493W WO 2021114634 A1 WO2021114634 A1 WO 2021114634A1
Authority
WO
WIPO (PCT)
Prior art keywords
text data
piece
evaluation
data set
text
Prior art date
Application number
PCT/CN2020/099493
Other languages
French (fr)
Chinese (zh)
Inventor
李文斌
喻宁
冯晶凌
柳阳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021114634A1 publication Critical patent/WO2021114634A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This application relates to the technical field of emotion recognition in artificial intelligence, and specifically relates to a text labeling method, device and storage medium.
  • neural networks can be used to recognize people in surveillance videos or in the medical field, neural networks can be used to recognize tumors in MRI images; in the field of text recognition, neural networks can be used to affect text. classification.
  • the neural network has a good performance for image recognition.
  • the training of the neural network in the early stage requires a sufficient number of training data sets of sufficiently high quality.
  • the production of training data sets is a very costly project.
  • manual labeling requires a lot of time and labor costs, and labeling efficiency is low.
  • the embodiments of the present application provide a text labeling method, device, and storage medium. Increase the application scenarios of text annotation, and improve the efficiency of text annotation.
  • the first aspect of the embodiments of the present application provides a text labeling method applied to an electronic device, including: the electronic device obtains a first text data set from a first third party platform, and each first text data set in the first text data set
  • the text data includes emoji expressions; the electronic device labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain the first label of each piece of first text data
  • the first annotation result includes a positive evaluation or a negative evaluation;
  • the electronic device obtains a first training sample set according to the first annotation result of each piece of first text data; the electronic device uses the first training sample set
  • the first neural network is trained; the electronic device obtains a second text data set from a second third party platform; the electronic device uses the first neural network to annotate the second text data set to obtain the first A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a
  • a second aspect of the embodiments of the present application provides an electronic device, including: an acquiring unit configured to acquire a first text data set from a first third party platform, and each piece of first text data in the first text data set includes an emoji expression
  • the labeling unit according to the emoji expression of each piece of first text data in the first text data set, label each piece of first text data to obtain the first labeling result of each piece of first text data, the first The labeling result includes a positive evaluation or a negative evaluation; a training unit for obtaining a first training sample set according to the first labeling result of each piece of first text data, and using the first training sample set to train the first neural network;
  • the acquiring unit is further configured to acquire a second text data set from a second third party platform; the labeling unit is further configured to use the first neural network to label the second text data set to obtain the first A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or
  • the third aspect of the embodiments of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are generated by
  • the processor executes instructions to execute the following steps: obtain a first text data set from a first third party platform, each piece of first text data in the first text data set includes an emoji expression; according to the first text data Collect the emoji expressions of each piece of first text data in the collection, and mark each piece of first text data to obtain the first annotation result of each piece of first text data.
  • the first annotation result includes a positive evaluation or a negative evaluation;
  • the first annotation result of the first text data obtains the first training sample set; the first training sample set is used to train the first neural network; the second text data set is obtained from the second third party platform; the first training sample set is used
  • the neural network annotates the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes positive evaluation, negative evaluation, or neutral evaluation.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, and the stored computer program is executed by a processor to implement the following steps:
  • the three-party platform obtains a first text data set, and each piece of first text data in the first text data set includes an emoji expression; according to the emoji expression of each piece of first text data in the first text data set, each article is One piece of text data is annotated to obtain a first annotation result of each piece of first text data, and the first annotation result includes a positive evaluation or a negative evaluation;
  • a first training sample set is obtained according to the first annotation result of each piece of first text data Use the first training sample set to train a first neural network; obtain a second text data set from a second third party platform; use the first neural network to annotate the second text data set to obtain the A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive
  • the comment data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the comment data, so that the annotation will not be restricted by the language type of the text data.
  • the application scenario of text annotation in addition, the text data can be automatically annotated through emoji expressions, without manual annotation, which saves human and material resources.
  • FIG. 1 is a schematic flowchart of a labeling method provided by an embodiment of the application.
  • Fig. 2 is a schematic flowchart of another labeling method provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of another labeling method provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • FIG. 5 is a block diagram of the functional unit composition of an electronic device provided by an embodiment of the application.
  • the electronic devices in this application can include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablets, handheld computers, laptops, mobile Internet devices MID (Mobile Internet Devices, referred to as MID) or wearable devices Wait.
  • smart phones such as Android phones, iOS phones, Windows Phone phones, etc.
  • tablets handheld computers, laptops
  • mobile Internet devices MID Mobile Internet Devices, referred to as MID
  • wearable devices Wait can include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablets, handheld computers, laptops, mobile Internet devices MID (Mobile Internet Devices, referred to as MID) or wearable devices Wait.
  • MID Mobile Internet Devices, referred to as MID
  • wearable devices Wait can include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablets, handheld computers, laptops, mobile Internet devices MID (Mobile Internet Devices, referred to as MID) or wearable devices Wait.
  • MID Mobile Internet Devices, referred to as MID
  • wearable devices Wait
  • Figure 1 is a schematic flow diagram of a text labeling method provided by an embodiment of the application, the method is applied to an electronic device, the method includes the following steps.
  • the electronic device obtains the first text data set from the first third party platform.
  • the first three-party platform can be Weibo, twitter, Facebook, and other social applications or Amazon, Taobao, Jingdong, and other e-commerce platforms. That is, the first third-party platform is a third-party platform that contains more text data of positive reviews and text data of negative reviews.
  • the electronic device obtains a first text data set from randomly multiple pieces of first text data from the first platform through an application programming interface (Application Programming Interface, API) provided by the first third party platform. That is, the electronic device complies with the Robot protocol of the first third party platform, and obtains the first text data set from the first third party platform through the API of the first third party platform.
  • API Application Programming Interface
  • the first text data since the first text data is obtained through the API of the first third party platform, and manual review is not performed, some of the first text data may not meet the requirements. For example, it does not contain emoji or the text content is too short. Therefore, after multiple pieces of first text data are obtained, the first text data in the first text data set is cleaned first to clean the first text data that does not contain emoji expressions or the text content is too short, and the cleaned The first text data constitutes the first text data set.
  • each piece of first text data in the first text data set contains emoji expressions.
  • the electronic device labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data to obtain a first labeling result of each piece of first text data.
  • the first labeling result includes proof evaluation or negative evaluation.
  • each piece of first text data in the first text data set includes an emoji expression.
  • emoji expressions themselves carry emotional evaluation.
  • emoji The emotional evaluation expressed is a positive evaluation, and Emoji expressions Express negative reviews. Therefore, the first emotion evaluation of each piece of first text data can be determined according to the emoji expression of each piece of first text data; then, each piece of first text data is labeled according to the first emotion evaluation of each piece of first text data, namely Add emotional tags to each piece of first text data.
  • the first text data is marked as a positive evaluation, and when the emoji expression belongs to the emoji expression set of negative evaluation, then The first text data is marked as a negative evaluation.
  • the first annotation result includes a positive evaluation and a negative evaluation.
  • the emotions corresponding to the positive evaluation include happiness, approval, appreciation, etc.
  • the emotions corresponding to the negative evaluation include anger, pessimism, Disagree, wait for emotion.
  • emoji It can be used to express happiness, which is a positive feeling, or it can be used to express sarcasm, which is a negative feeling.
  • the first text data containing these emoji expressions in the first text data set is not labeled, and only the first text data containing emoji expressions corresponding to positive reviews or emoji expressions corresponding to negative reviews are labeled.
  • the text content of each piece of first text data may be extracted, and the text content of each piece of first text data may be semantically analyzed to obtain the semantic information of each piece of first text data; According to the semantic information of each piece of first text data, determine the first sentiment evaluation of each piece of first text data; retain the first text data in the first text data set that is consistent with the first sentiment evaluation and the second sentiment evaluation, and delete the first sentiment The first text data in which the evaluation and the second emotion evaluation are inconsistent. Double labeling through semantic analysis and emoji expressions reduces the labeling error caused by unilateral emoji labeling and improves the accuracy of labeling the first text data set.
  • the electronic device obtains the first training sample set according to the first annotation result of each piece of first text data.
  • the labeled first text data is used as a labeled training sample, and the first training sample set is obtained.
  • the electronic device uses the first training sample set to train the first neural network.
  • the initial parameters of the first neural network are constructed first, and the training samples in the first training sample set are input to the first neural network to obtain the prediction results of the training samples; then, based on the prediction results and the training The labeling result of the sample determines the loss gradient, and the loss function is constructed based on the loss gradient; finally, the parameter value of the initial parameter is updated inversely based on the loss function and the gradient descent method; until the first neural network converges, the first neural network is completed.
  • Network training is
  • the electronic device obtains the second text data set from the second third party platform.
  • the second third party platform may be a news platform that publishes science and technology news or wiki or summary text. That is, the second third-party platform is a third-party platform that contains a large amount of neutrally evaluated text data.
  • the electronic device complies with the Robot protocol of the second third party platform, and obtains multiple pieces of second text data from the second third party platform through the API of the second third party platform to obtain the second text data set.
  • the multiple pieces of second text data can be cleaned to clean out illegal second text data with too short text content.
  • the electronic device uses the first neural network to annotate the second text data set to obtain an annotation result of each piece of second text data in the second text data set.
  • the second labeling result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
  • the electronic device uses the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a second probability that the second text data is a negative evaluation; , Mark the second text data with the first probability greater than the first threshold (that is, 100% sure that the emotional evaluation of the second text data is positive) as positive evaluation; mark the second probability greater than the first threshold (that is, there is 100% certainty that the emotional evaluation of the second text data is negative evaluation) the second text data is marked as negative evaluation; the training sample rate of the first general evaluation is less than the first threshold and greater than the second threshold (that is, no With 100% certainty, whether the sentiment evaluation of the second text data is a positive evaluation or a negative evaluation), the second text data is marked as a neutral evaluation.
  • the first threshold may be 0.7, 0.75, 0.8 or other values.
  • the second threshold may be 0.4, 0.45, 0.5 or other values.
  • the text data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the text data, so that the annotation will not be restricted by the language type of the text data, thereby increasing
  • the text data can be automatically annotated through emoji expressions, and the text data can be annotated without manual annotation, thereby saving human and material resources.
  • the method further includes: the electronic device obtains the second training sample set according to the second labeling result of each piece of second text data in the second text data set, that is, according to each second text data set in the second text data set.
  • Annotated results of the second text data the second text data set is formed into a labeled second training sample set; then, the second training sample set is used to train the second neural network; and any one to be published is obtained
  • the second neural network is used to classify the comment data to be published to obtain a classification result of the comment data to be published; according to the classification result, it is determined whether to publish the comment data to be published.
  • the comment data to be published can be comment data to be published under any news website
  • the classification result is a positive evaluation or a neutral evaluation
  • the comment data to be published is disclosed.
  • the classification result is negative
  • the comment data to be published will not be disclosed.
  • the review data to be published can be automatically reviewed through the second neural network, thereby saving human resources.
  • the review data to be published can be review data under any e-commerce platform
  • the classification result is a positive review or a negative review
  • the review data to be published is combined with the user's purchase
  • the records are checked to determine the authenticity of the comment data to be published, and in the case where it is determined that the comment data to be published is a malicious review, the comment data to be published is not disclosed.
  • the review data to be published can be automatically reviewed through the second neural network to determine the authenticity of the review data to be published, thereby saving human resources.
  • the second training sample set can be combined with the first training sample set to obtain a new second training sample with sufficient training samples.
  • the sample set is used to train the second neural network using the new second training sample set, thereby making the trained second neural network more accurate.
  • the method further includes: extracting each piece of first text data.
  • the text content of the first text data convert the text content into a second emoji expression; determine the second emotional evaluation corresponding to each piece of first text data according to the second emoji expression; determine the value of each piece of first text data Whether the first sentiment evaluation and the second sentiment evaluation are consistent, if they are consistent, each piece of first text data is labeled according to the first sentiment evaluation of each piece of first text data.
  • the sentiment evaluation corresponding to each piece of first text data is verified, thereby improving the accuracy of subsequent labeling of the first text data.
  • the method further includes: obtaining comment data of any user, the comment data being the user’s comment data on a target product, and the target product includes wealth management products;
  • the user’s comment data is classified to obtain a classification result of the user’s comment data;
  • target users are screened according to the classification result of the user’s comment data, that is, users whose classification results are positively rated are regarded as target users;
  • the target user recommends the target product.
  • the second neural network is used to screen out users who are interested in the target product (financial management product) to ensure the accuracy of user screening and improve the success rate of recommendation.
  • FIG. 2 is a schematic flowchart of another text labeling method provided by an embodiment of this application.
  • the content of this embodiment is the same as that of the embodiment shown in FIG. 1, and the description will not be repeated here.
  • the method is applied to electronic equipment, and the method includes the following steps.
  • the electronic device obtains the first text data set from the first platform.
  • the electronic device cleans each piece of first text data in the first text data set, deletes the first text data that does not contain emoji expressions, obtains a new first text data set, and replaces the new first text data
  • the data set serves as the first text data set.
  • the electronic device determines a first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, and the first emotional evaluation includes a positive evaluation or a negative evaluation.
  • the electronic device extracts the text content of each piece of first text data, and performs semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data.
  • the electronic device determines the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data.
  • the electronic device retains the first text data in the first text data set that has the same first sentiment evaluation and the second sentiment evaluation, and deletes the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent.
  • the electronic device labels the remaining first text data according to the first sentiment evaluation of the remaining first text data to obtain the first training sample set.
  • the remaining first text data is the remaining first text data after deleting the first comment data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent in the first text data set.
  • the electronic device uses the first training sample set to train the first neural network.
  • the electronic device obtains the second text data set from the second platform.
  • the electronic device uses the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes a positive One of evaluation, negative evaluation, or neutral evaluation.
  • the comment data is annotated by emoji expressions in the comment data, and there is no need to perform semantic analysis on the comment data, so that the annotation will not be restricted by the language type of the comment data, thereby increasing
  • the comment data can be automatically annotated through emoji expressions, and training sample sets containing emotion classification labels can be obtained without manual labeling, thereby saving human and material resources; moreover, in the first text Before the data set is annotated, the first text data set is cleaned to retain high-quality first text data, thereby improving the accuracy of the annotation.
  • FIG. 3 is a schematic flowchart of another text labeling method provided by an embodiment of the application.
  • the content in this embodiment is the same as the embodiment shown in FIG. 1 and FIG. 2, and the description will not be repeated here.
  • the method is applied to electronic equipment, and the method includes the following steps.
  • the electronic device obtains the first text data set from the first platform.
  • the electronic device cleans each piece of first text data in the first text data set, deletes the first text data that does not contain emoji expressions, obtains a new first text data set, and replaces the new first text data
  • the data set serves as the first text data set.
  • the electronic device determines a first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, and the first emotional evaluation includes a positive evaluation or a negative evaluation.
  • the electronic device extracts the text content of each piece of first text data, performs semantic analysis on the text content of each piece of first text data, and obtains semantic information of each piece of first text data.
  • the electronic device determines the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data.
  • the electronic device retains the first text data in the first text data set that has the same first sentiment evaluation and the second sentiment evaluation, and deletes the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent.
  • the electronic device labels the remaining first text data according to the first sentiment evaluation of the remaining first text data to obtain a first training sample set.
  • the remaining first text data is the remaining first text data after deleting the first comment data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent in the first text data set.
  • the electronic device uses the first training sample set to train the first neural network.
  • the electronic device obtains the second text data set from the second platform.
  • the electronic device uses the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, where the second annotation result includes a positive One of evaluation, negative evaluation, or neutral evaluation.
  • the electronic device uses the second labeling result according to each piece of second text data to obtain a second training sample set, and uses the second training sample set to train the second neural network.
  • the electronic device obtains any piece of comment data, uses the second neural network to classify the comment data to obtain a classification result of the comment data, and determines whether to disclose the comment data according to the classification result.
  • the text data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the text data, so that the annotation will not be restricted by the language type of the text data, thereby increasing
  • the text data can be automatically annotated by emoji expressions, and training sample sets containing emotion classification labels can be obtained without manual labeling, thereby saving human and material resources; moreover, in the first text Before annotating the data set, clean the first text data set to retain high-quality first text data, thereby improving the accuracy of annotation; in addition, use the trained second neural network to classify the comment data to be published. Automatically block the comment data that does not meet the requirements to be published, without human review, saving human resources.
  • the electronic device 400 includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and are configured to be executed by the processor to execute Instructions for the following steps: Obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; according to each piece of first text data in the first text data set The emoji expression of each piece of first text data is annotated, and the first annotation result of each piece of first text data is obtained, and the first annotation result includes a positive evaluation or a negative evaluation; according to the first annotation of each piece of first text data
  • the first training sample set is obtained by the labeling result; the first neural network is trained using the first training sample set; the second text data set is obtained from the second third party platform; the second text data set is obtained by using the first neural network
  • the data set is annotated to obtain
  • the processor in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, is specifically configured to: Describe the emoji expression of each piece of first text data in the first text data set, and determine the first emotional evaluation of each piece of first text data, where the first emotional evaluation includes a positive evaluation or a negative evaluation; according to each piece of first text data The first sentiment evaluation of each piece of first text data is marked.
  • the processor is further configured to: Extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; determine each piece of data according to the semantic information of each piece of first text data The second sentiment evaluation of the first text data; the first text data in the first text data set with the first sentiment evaluation consistent with the second sentiment evaluation is retained, and the first text data in which the first sentiment evaluation is inconsistent with the second sentiment evaluation is deleted .
  • the processor before annotating the first text data set, is further configured to: clean each piece of first text data in the first text data set, and delete The first text data containing emoji expressions is used to obtain a new first text data set; and the new first text data set is used as the first text data set.
  • the The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a negative evaluation Second probability; determining that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determining the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  • the processor is further configured to: obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; use the second training The sample set trains the second neural network; obtains any piece of comment data to be published; uses the second neural network to perform sentiment classification on the comment data to be published to obtain the classification result of the comment data to be published ; According to the classification result, determine whether to publish the comment data to be published.
  • the processor is further configured to: The two training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train a second neural network, the processor is specifically configured to: The second neural network is trained using the new second training sample set.
  • the electronic device 500 includes: an acquisition unit 510, a labeling unit 520, and a training unit 530.
  • the obtaining unit 510 is configured to obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; the labeling unit 520, according to each of the first text data set An emoji expression of the first text data is annotated for each first text data, and the first annotation result of each first text data is obtained.
  • the first annotation result includes a positive evaluation or a negative evaluation
  • the training unit 530 uses To obtain a first training sample set according to the first annotation result of each piece of first text data, and use the first training sample set to train the first neural network
  • the obtaining unit 510 is also used to obtain from the second third party platform
  • the labeling unit 520 is further configured to use the first neural network to label the second text data set to obtain a second labeling result of each piece of second text data in the second text data set
  • the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
  • the labeling unit 520 is specifically configured to: The emoji expression of each piece of first text data in the first text data set determines the first emotional evaluation of each piece of first text data, and the first emotional evaluation includes a positive evaluation or a negative evaluation; The first sentiment evaluation is to label each piece of first text data.
  • the electronic device 500 further includes a cleaning unit 540.
  • the cleaning unit 540 is used to: extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; according to each piece of first text data To determine the second sentiment evaluation of each piece of first text data; retain the first text data in the first text data set with the first sentiment evaluation consistent with the second sentiment evaluation, and delete the first sentiment evaluation and the second sentiment The first text data with inconsistent evaluations.
  • the electronic device 500 further includes a cleaning unit 540.
  • the cleaning unit 540 is configured to: The data is cleaned, the first text data that does not contain emoji expressions is deleted, and a new first text data set is obtained; the new first text data set is used as the first text data set.
  • the annotation unit 520 in terms of using the first neural network to annotate the second text data set to obtain the second annotation result of each piece of second text data in the second text data set, the annotation unit 520, specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and the first probability of a negative evaluation Two probabilities; determine that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determine the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  • it further includes a determining unit 550; a training unit 530, further configured to obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; and training unit 530, is further configured to use the second training sample set to train a second neural network; the determining unit 550, is configured to obtain any piece of comment data to be published; use the second neural network to perform a comment on the comment to be published The data is emotionally classified to obtain a classification result of the comment data to be published; according to the classification result, it is determined whether to disclose the comment data to be published.
  • the training unit 530 is further configured to: The training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train the second neural network, the training unit 530 is specifically used to: The new second training sample set trains the second neural network.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium may be non-volatile or volatile, the computer-readable storage medium stores a computer program, and the storage computer The program is executed by the processor to implement the following steps: obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; The emoji expression of each piece of first text data is annotated for each piece of first text data, and the first annotation result of each piece of first text data is obtained.
  • the first annotation result includes a positive evaluation or a negative evaluation
  • a first labeling result of text data obtains a first training sample set; using the first training sample set to train a first neural network; obtaining a second text data set from a second third party platform; using the first neural network Annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluationkind.
  • the processor in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, is specifically configured to: Describe the emoji expression of each piece of first text data in the first text data set, and determine the first emotional evaluation of each piece of first text data, where the first emotional evaluation includes a positive evaluation or a negative evaluation; according to each piece of first text data The first sentiment evaluation of each piece of first text data is marked.
  • the processor is further configured to: Extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; determine each piece of data according to the semantic information of each piece of first text data The second sentiment evaluation of the first text data; retain the first text data in the first text data set whose first sentiment evaluation is consistent with the second sentiment evaluation, and delete the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent .
  • the processor before annotating the first text data set, is further configured to: clean each piece of first text data in the first text data set, and delete The first text data containing emoji expressions is used to obtain a new first text data set; and the new first text data set is used as the first text data set.
  • the The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a negative evaluation Second probability; determining that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determining the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  • the processor is further configured to: obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; use the second training The sample set trains the second neural network; obtains any piece of comment data to be published; uses the second neural network to perform sentiment classification on the comment data to be published to obtain the classification result of the comment data to be published ; According to the classification result, determine whether to publish the comment data to be published.
  • the processor is further configured to: The two training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train a second neural network, the processor is specifically configured to: The second neural network is trained using the new second training sample set.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software program module.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
  • a number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A text annotation method, a device, and a storage medium, related to the technical field for emotion recognition in artificial intelligence. The text annotation method comprises: acquiring a first text dataset from a first third-party platform, each piece of text data in the first text dataset comprising an emoji expression; annotating each piece of text data on the basis of the emoji expression of each piece of first text data in the first text dataset to produce a first annotation result for each piece of first text data, the first annotation result comprising a positive comment or a negative comment; producing a first training sample set on the basis of the first annotation result of each piece of first text data; using the first training sample set to train a first neural network; acquiring a second text dataset from a second third-party platform; and using the first neural network to annotate the second text dataset to produce a second annotation result for each piece of second text data in the second text dataset, the second annotation result comprising one of a positive comment, a negative comment, or a neutral comment.

Description

文本标注方法、设备及存储介质Text marking method, equipment and storage medium
本申请要求于2020年05月28日提交中国专利局、申请号为2020104658114、发明名称为“文本标注方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 28, 2020, with the application number 2020104658114 and the title of the invention "text labeling method and related products", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及人工智能中的情绪识别技术领域,具体涉及一种文本标注方法、设备及存储介质。This application relates to the technical field of emotion recognition in artificial intelligence, and specifically relates to a text labeling method, device and storage medium.
背景技术Background technique
随着人工智能的发展,神经网络应用的范围越来越广泛。例如,在视频监控领域,可使用神经网络对监控视频中的人物识别或者在医疗领域,使用神经网络对核磁共振图像中肿瘤进行识别;再者,在文字识别领域,使用神经网络对文本进行感情分类。With the development of artificial intelligence, the scope of neural network applications has become wider and wider. For example, in the field of video surveillance, neural networks can be used to recognize people in surveillance videos or in the medical field, neural networks can be used to recognize tumors in MRI images; in the field of text recognition, neural networks can be used to affect text. classification.
虽然神经网络对图像识别有着不错的表现。但是,前期对神经网络的训练需要数量足够多,质量足够高的训练数据集。而训练数据集的制作是一个成本非常高的项目。首先需要从数据库中获取一些质量较高的原始数据集,并对该原始数据集进行标注。例如,训练文本情感分类网络时,需要获取大量语义完整,情感明确的文本,然后,人工对该大量的文本进行标注。然而,发明人发现由于文本的数量极其庞大,人工标注需要投入大量时间和人力成本,标注效率低。Although the neural network has a good performance for image recognition. However, the training of the neural network in the early stage requires a sufficient number of training data sets of sufficiently high quality. The production of training data sets is a very costly project. First, we need to obtain some high-quality original data sets from the database and label the original data sets. For example, when training a text emotion classification network, a large amount of semantically complete and emotionally clear text needs to be obtained, and then the large amount of text is manually labeled. However, the inventor found that due to the extremely large amount of text, manual labeling requires a lot of time and labor costs, and labeling efficiency is low.
发明内容Summary of the invention
本申请实施例提供了一种文本标注方法、设备及存储介质。增加了对文本标注的应用场景,以及提高对文本标注的效率。The embodiments of the present application provide a text labeling method, device, and storage medium. Increase the application scenarios of text annotation, and improve the efficiency of text annotation.
本申请实施例第一方面提供了一种文本标注方法,应用于电子设备,包括:所述电子设备从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;所述电子设备根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;所述电子设备根据每条第一文本数据的第一标注结果得到第一训练样本集;所述电子设备使用所述第一训练样本集对第一神经网络进行训练;所述电子设备从第二三方平台获取第二文本数据集;所述电子设备使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。The first aspect of the embodiments of the present application provides a text labeling method applied to an electronic device, including: the electronic device obtains a first text data set from a first third party platform, and each first text data set in the first text data set The text data includes emoji expressions; the electronic device labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain the first label of each piece of first text data As a result, the first annotation result includes a positive evaluation or a negative evaluation; the electronic device obtains a first training sample set according to the first annotation result of each piece of first text data; the electronic device uses the first training sample set The first neural network is trained; the electronic device obtains a second text data set from a second third party platform; the electronic device uses the first neural network to annotate the second text data set to obtain the first A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
本申请实施例第二方面提供了一种电子设备,包括:获取单元,用于从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;标注单元,根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;训练单元,用于根据每条第一文本数据的第一标注结果得到第一训练样本集,并使用所述第一训练样本集对第一神经网络进行训练;所述获取单元,还用于从第二三方平台获取第二文本数据集;所述标注单元,还用于使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。A second aspect of the embodiments of the present application provides an electronic device, including: an acquiring unit configured to acquire a first text data set from a first third party platform, and each piece of first text data in the first text data set includes an emoji expression The labeling unit, according to the emoji expression of each piece of first text data in the first text data set, label each piece of first text data to obtain the first labeling result of each piece of first text data, the first The labeling result includes a positive evaluation or a negative evaluation; a training unit for obtaining a first training sample set according to the first labeling result of each piece of first text data, and using the first training sample set to train the first neural network; The acquiring unit is further configured to acquire a second text data set from a second third party platform; the labeling unit is further configured to use the first neural network to label the second text data set to obtain the first A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
本申请实施例第三方面提供了一种电子设备,包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被生成由所述处理器执行,以执行以下步骤的指令:从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;根据每条第一文本数据的第一标注结 果得到第一训练样本集;使用所述第一训练样本集对第一神经网络进行训练;从第二三方平台获取第二文本数据集;使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。The third aspect of the embodiments of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are generated by The processor executes instructions to execute the following steps: obtain a first text data set from a first third party platform, each piece of first text data in the first text data set includes an emoji expression; according to the first text data Collect the emoji expressions of each piece of first text data in the collection, and mark each piece of first text data to obtain the first annotation result of each piece of first text data. The first annotation result includes a positive evaluation or a negative evaluation; The first annotation result of the first text data obtains the first training sample set; the first training sample set is used to train the first neural network; the second text data set is obtained from the second third party platform; the first training sample set is used The neural network annotates the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes positive evaluation, negative evaluation, or neutral evaluation. Kind of.
本申请实施例第四方面提供了一种计算机可读存储介质,其中,所述计算机可读存储介质用于存储计算机程序,所述存储计算机程序被处理器执行,以实现以下步骤:从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;根据每条第一文本数据的第一标注结果得到第一训练样本集;使用所述第一训练样本集对第一神经网络进行训练;从第二三方平台获取第二文本数据集;使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。The fourth aspect of the embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, and the stored computer program is executed by a processor to implement the following steps: The three-party platform obtains a first text data set, and each piece of first text data in the first text data set includes an emoji expression; according to the emoji expression of each piece of first text data in the first text data set, each article is One piece of text data is annotated to obtain a first annotation result of each piece of first text data, and the first annotation result includes a positive evaluation or a negative evaluation; a first training sample set is obtained according to the first annotation result of each piece of first text data Use the first training sample set to train a first neural network; obtain a second text data set from a second third party platform; use the first neural network to annotate the second text data set to obtain the A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
可以看出,在本申请实施例中,通过文本数据中的emoji表情对评论数据进行标注,无需对评论数据进行语义分析,从而在标注时不会受文本数据的语言类型的限制,增加了该文本标注的应用场景;另外,可通过emoji表情对文本数据进行自动标注,无需人工标注,节省了人力物力资源。It can be seen that in the embodiments of the present application, the comment data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the comment data, so that the annotation will not be restricted by the language type of the text data. The application scenario of text annotation; in addition, the text data can be automatically annotated through emoji expressions, without manual annotation, which saves human and material resources.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为本申请实施例提供的一种标注方法的流程示意图。FIG. 1 is a schematic flowchart of a labeling method provided by an embodiment of the application.
图2为本申请实施例提供的另一种标注方法的流程示意图。Fig. 2 is a schematic flowchart of another labeling method provided by an embodiment of the application.
图3为本申请实施例提供的另一种标注方法的流程示意图。FIG. 3 is a schematic flowchart of another labeling method provided by an embodiment of the application.
图4为本申请实施例提供的一种电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
图5为本申请实施例提供的一种电子设备的功能单元组成框图。FIG. 5 is a block diagram of the functional unit composition of an electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the specification and claims of this application and the drawings are used to distinguish different objects, not to describe a specific order . In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结果或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that specific features, results or characteristics described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
本申请中的电子设备可以包括智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备MID(Mobile Internet Devices,简称:MID)或穿戴式设备等。上述电子设备仅是举例,而非穷举,包含但不限于上述电子设备。在实际应用中,上述电子设备还可以包括:智能车载终端、计算机设备等等。The electronic devices in this application can include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablets, handheld computers, laptops, mobile Internet devices MID (Mobile Internet Devices, referred to as MID) or wearable devices Wait. The above electronic devices are only examples, not exhaustive, including but not limited to the above electronic devices. In practical applications, the above-mentioned electronic equipment may also include: intelligent vehicle-mounted terminals, computer equipment, and so on.
参阅图1,图1为本申请实施例提供的一种文本标注方法的流程示意图,该方法应用于电子设备,该方法包括以下步骤内容.Referring to Figure 1, Figure 1 is a schematic flow diagram of a text labeling method provided by an embodiment of the application, the method is applied to an electronic device, the method includes the following steps.
101:电子设备从第一三方平台获取第一文本数据集。101: The electronic device obtains the first text data set from the first third party platform.
其中,该第一三方平台可以为微博、twitter、Facebook,等社交应用或Amazon淘宝京东,等电商平台。即该第一三方平台为包含正面评价的文本数据和负面评价的文本数据较多的第三方平台。电子设备通过该第一三方平台提供的应用程序接口(Application Programming Interface,API)从该第一平台中随机多条第一文本数据,得到第一文本数据集。即电子设备遵从第一三方平台的Robot协议,通过该第一三方平台的API从第一三方平台中获得第一文本数据集。Among them, the first three-party platform can be Weibo, twitter, Facebook, and other social applications or Amazon, Taobao, Jingdong, and other e-commerce platforms. That is, the first third-party platform is a third-party platform that contains more text data of positive reviews and text data of negative reviews. The electronic device obtains a first text data set from randomly multiple pieces of first text data from the first platform through an application programming interface (Application Programming Interface, API) provided by the first third party platform. That is, the electronic device complies with the Robot protocol of the first third party platform, and obtains the first text data set from the first third party platform through the API of the first third party platform.
在一些可能的实施方式中,由于第一文本数据是通过第一三方平台的API中获取的,未进行人工审核,有些第一文本数据可能并不符合要求。例如,不包含emoji表情或者文本内容过短。因此,在得到多条第一文本数据后,先对该第一文本数据集中的第一文本数据进行清洗,以清洗掉不包含emoji表情或者文本内容过短的第一文本数据,将清洗后的第一文本数据组成该第一文本数据集。In some possible implementation manners, since the first text data is obtained through the API of the first third party platform, and manual review is not performed, some of the first text data may not meet the requirements. For example, it does not contain emoji or the text content is too short. Therefore, after multiple pieces of first text data are obtained, the first text data in the first text data set is cleaned first to clean the first text data that does not contain emoji expressions or the text content is too short, and the cleaned The first text data constitutes the first text data set.
因此,该第一文本数据集中的每条第一文本数据包含有emoji表情。Therefore, each piece of first text data in the first text data set contains emoji expressions.
102:电子设备对根据第一文本数据中的每条第一问本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果。102: The electronic device labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data to obtain a first labeling result of each piece of first text data.
其中,该第一标注结果包括证明评价或负面评价。Wherein, the first labeling result includes proof evaluation or negative evaluation.
示例性的,对该第一文本数据进行清洗,则该第一文本数据集中的每条第一文本数据包括emoji表情。由于,emoji表情本身携带情感评价。例如,emoji表情
Figure PCTCN2020099493-appb-000001
表示的情感评价为正面评价,而Emoji表情
Figure PCTCN2020099493-appb-000002
表示负面评价。因此可根据每条第一文本数据的emoji表情确定每条第一文本数据的第一情感评价;然后,根据每条第一文本数据的第一情感评价对每条第一文本数据进行标注,即为每条第一文本数据添加情感标签。即在任意一条第一文本数据的emoji表情属于正面评价的emoji表情集合的情况下,则将该第一文本数据标注为正面评价,在该emoji表情属于负面评价的emoji表情集合的情况下,则将该第一文本数据标注为负面评价。
Exemplarily, if the first text data is cleaned, each piece of first text data in the first text data set includes an emoji expression. Because emoji expressions themselves carry emotional evaluation. For example, emoji
Figure PCTCN2020099493-appb-000001
The emotional evaluation expressed is a positive evaluation, and Emoji expressions
Figure PCTCN2020099493-appb-000002
Express negative reviews. Therefore, the first emotion evaluation of each piece of first text data can be determined according to the emoji expression of each piece of first text data; then, each piece of first text data is labeled according to the first emotion evaluation of each piece of first text data, namely Add emotional tags to each piece of first text data. That is, in the case where the emoji expression of any piece of first text data belongs to the emoji expression set of positive evaluation, the first text data is marked as a positive evaluation, and when the emoji expression belongs to the emoji expression set of negative evaluation, then The first text data is marked as a negative evaluation.
其中,与该第一文本数据相对应的,该第一标注结果包括正面评价和负面评价,该正面评价对应的情感包括开心、赞同、欣赏,等情感,负面评价对应的情感包括愤怒、悲观、不赞成,等情感。Wherein, corresponding to the first text data, the first annotation result includes a positive evaluation and a negative evaluation. The emotions corresponding to the positive evaluation include happiness, approval, appreciation, etc., and the emotions corresponding to the negative evaluation include anger, pessimism, Disagree, wait for emotion.
需要说明,有些emoji表情并没有把握确定出该emoji表情对应的情感评价。例如,emoji表情
Figure PCTCN2020099493-appb-000003
可以表示开心,即正面感情,也可以用来表示嘲讽,即负面情感。对于不对该第一文本数据集中包含有这些emoji表情的第一文本数据进行标注,只对包含有正面评价对应的emoji表情或者负面评价对应的emoji表情的第一文本数据进行标注。
It should be noted that some emoji expressions are not sure of determining the emotional evaluation corresponding to the emoji expression. For example, emoji
Figure PCTCN2020099493-appb-000003
It can be used to express happiness, which is a positive feeling, or it can be used to express sarcasm, which is a negative feeling. The first text data containing these emoji expressions in the first text data set is not labeled, and only the first text data containing emoji expressions corresponding to positive reviews or emoji expressions corresponding to negative reviews are labeled.
进一步地,为了提高通过emoji表情标注的精确度,可提取每条第一文本数据的文本内容,对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息;根据每条第一文本数据的语义信息,确定每条第一文本数据的第一情感评价;保留第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。通过语义分析与emoji表情进行双重标注,从而降低单 方面通过emoji表情标注带来的标注误差,提高对第一文本数据集标注的精确度。Further, in order to improve the accuracy of the emoji expression annotation, the text content of each piece of first text data may be extracted, and the text content of each piece of first text data may be semantically analyzed to obtain the semantic information of each piece of first text data; According to the semantic information of each piece of first text data, determine the first sentiment evaluation of each piece of first text data; retain the first text data in the first text data set that is consistent with the first sentiment evaluation and the second sentiment evaluation, and delete the first sentiment The first text data in which the evaluation and the second emotion evaluation are inconsistent. Double labeling through semantic analysis and emoji expressions reduces the labeling error caused by unilateral emoji labeling and improves the accuracy of labeling the first text data set.
103:电子设备根据每条第一文本数据的第一标注结果得到第一训练样本集。103: The electronic device obtains the first training sample set according to the first annotation result of each piece of first text data.
即将标注好的第一文本数据作为带有标签的训练样本,得到该第一训练样本集合。The labeled first text data is used as a labeled training sample, and the first training sample set is obtained.
104:电子设备使用第一训练样本集对第一神经网络进行训练。104: The electronic device uses the first training sample set to train the first neural network.
具体来说,先构建第一神经网络的初始参数,将该第一训练样本集中的训练样本输入到该第一神经网络,得到对该训练样本的预测结果;然后,基于该预测结果和该训练样本的标注结果确定损失梯度,基于该损失梯度构造损失函数;最后,基于该损失函数以及梯度下降法反向更新该初始参数的参数值;直到该第一神经网络收敛,完成对该第一神经网络的训练。Specifically, the initial parameters of the first neural network are constructed first, and the training samples in the first training sample set are input to the first neural network to obtain the prediction results of the training samples; then, based on the prediction results and the training The labeling result of the sample determines the loss gradient, and the loss function is constructed based on the loss gradient; finally, the parameter value of the initial parameter is updated inversely based on the loss function and the gradient descent method; until the first neural network converges, the first neural network is completed. Network training.
105:电子设备从第二三方平台获取第二文本数据集。105: The electronic device obtains the second text data set from the second third party platform.
其中,该第二三方平台可以为发表科技类新闻或wiki或summary文本的新闻平台。即该第二三方平台为包含有大量的中性评价的文本数据的三方平台。Among them, the second third party platform may be a news platform that publishes science and technology news or wiki or summary text. That is, the second third-party platform is a third-party platform that contains a large amount of neutrally evaluated text data.
同样,电子设备遵从第二三方平台的Robot协议,通过该第二三方平台的API从第二三方平台中获取多条第二文本数据,得到该第二文本数据集。Similarly, the electronic device complies with the Robot protocol of the second third party platform, and obtains multiple pieces of second text data from the second third party platform through the API of the second third party platform to obtain the second text data set.
当然,在获取多条第二文本数据后,可对该多条第二文本数据进行清洗,以清洗掉不合法、文本内容过短的第二文本数据。Of course, after acquiring multiple pieces of second text data, the multiple pieces of second text data can be cleaned to clean out illegal second text data with too short text content.
106:电子设备使用第一神经网络对第二文本数据集进行标注,得到第二文本数据集中每条第二文本数据的标注结果。106: The electronic device uses the first neural network to annotate the second text data set to obtain an annotation result of each piece of second text data in the second text data set.
其中,该第二标注结果包括正面评价、负面评价或中性评价中的一种。Wherein, the second labeling result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
具体地,电子设备使用第一神经网络,对第二文本数据集中的每条第二文本数据进行分类,得到每条第二文本数据为正面评价的第一概率和负面评价的第二概率;然后,将第一概率大于第一阈值(即有100%把握认为该第二文本数据的情感评价为正面评价)的第二文本数据标注为正面评价;将第二概率大于该第一阈值(即有100%把握认为该第二文本数据的情感评价为负面评价)的第二文本数据标注为负面评价;将第一概评价的训练样本率小于所述第一阈值,且大于第二阈值(即没有100%把握认为该第二文本数据的情感评价为正面评价还是负面评价)的第二文本数据标注为中性评价。Specifically, the electronic device uses the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a second probability that the second text data is a negative evaluation; , Mark the second text data with the first probability greater than the first threshold (that is, 100% sure that the emotional evaluation of the second text data is positive) as positive evaluation; mark the second probability greater than the first threshold (that is, there is 100% certainty that the emotional evaluation of the second text data is negative evaluation) the second text data is marked as negative evaluation; the training sample rate of the first general evaluation is less than the first threshold and greater than the second threshold (that is, no With 100% certainty, whether the sentiment evaluation of the second text data is a positive evaluation or a negative evaluation), the second text data is marked as a neutral evaluation.
其中,该第一阈值可以为0.7、0.75、0.8或者其他值。该第二阈值可以为0.4、0.45、0.5或者其他值。Wherein, the first threshold may be 0.7, 0.75, 0.8 or other values. The second threshold may be 0.4, 0.45, 0.5 or other values.
可以看出,在本申请实施例中,通过文本数据中的emoji表情对文本数据进行标注,无需对文本数据进行语义分析,从而在标注时不会受文本数据的语言类型的限制,进而增加了该标注方法的应用场景;另外,可通过emoji表情对文本数据进行自动标注,无需人工标注即可完成对文本数据进行标注,从而节省了人力物力资源。It can be seen that, in the embodiment of the present application, the text data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the text data, so that the annotation will not be restricted by the language type of the text data, thereby increasing The application scenario of the annotation method; in addition, the text data can be automatically annotated through emoji expressions, and the text data can be annotated without manual annotation, thereby saving human and material resources.
在一些可能的实施方式中,所述方法还包括:电子设备根据该第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集,即根据第二文本数据集中每条第二文本数据的标注结果,将该第二文本数据集组成有标签的第二训练样本集;然后,使用该第二训练样本集对第二神经网络进行训练;并获取任意一条待发表的评论数据,使用第二神经网络对所述待发表的评论数据进行分类,得到对所述待发表的评论数据的分类结果;根据所述分类结果确定是否公开所述待发表的评论数据。In some possible implementation manners, the method further includes: the electronic device obtains the second training sample set according to the second labeling result of each piece of second text data in the second text data set, that is, according to each second text data set in the second text data set. Annotated results of the second text data, the second text data set is formed into a labeled second training sample set; then, the second training sample set is used to train the second neural network; and any one to be published is obtained For comment data, the second neural network is used to classify the comment data to be published to obtain a classification result of the comment data to be published; according to the classification result, it is determined whether to publish the comment data to be published.
其中,在该待发表的评论数据可以为任意一个新闻网站下的待发表的评论数据的情况下,则当该分类结果为正面评价或者中性评价时,则公开该待发表的评论数据,当该分类结果为负面评价时,则不公开该待发表的评论数据。相比现有的通过人工审核待发表的评论数据,本申请中可以通过第二神经网络自动对该待发表的评论数据进行审核,进而节省了人力资源。Wherein, in the case that the comment data to be published can be comment data to be published under any news website, when the classification result is a positive evaluation or a neutral evaluation, the comment data to be published is disclosed. When the classification result is negative, the comment data to be published will not be disclosed. Compared with the existing review data to be published through manual review, in this application, the review data to be published can be automatically reviewed through the second neural network, thereby saving human resources.
其中,在该待发表的评论数据可以为任意一个电商平台下的评论数据的情况下,则当 该分类结果为为正面评价或者负面评价时,则将该待发表的评论数据与用户的购买记录进行核对,确定该待发表的评论数据的真实性,在确定该待发表的评论数据为恶意刷评的情况下,则不公开该待发表的评论数据。本申请中可以通过第二神经网络自动对该待发表的评论数据进行审核,以确定该待发表的评论数据的真实性,进而节省了人力资源。Wherein, in the case that the review data to be published can be review data under any e-commerce platform, when the classification result is a positive review or a negative review, the review data to be published is combined with the user's purchase The records are checked to determine the authenticity of the comment data to be published, and in the case where it is determined that the comment data to be published is a malicious review, the comment data to be published is not disclosed. In this application, the review data to be published can be automatically reviewed through the second neural network to determine the authenticity of the review data to be published, thereby saving human resources.
在一些可能的实施方式中,由于从该第二三方平台中获得的第二文本数据大多数为中性文本数据,而从第一三方平台中获得的第一文本数据大多数为正面评价的文本数据和负面评价的文本数据。因此,为了增加第二训练样本集中正面评价的训练样本和负面评价的训练样本的数量,可将第二训练样本集与该第一训练样本集进行合并,得到训练样本充足的新的第二训练样本集,使用该新的第二训练样本集进行对第二神经网络进行训练,进而使训练出的第二神经网络更加精确。In some possible implementation manners, since most of the second text data obtained from the second third party platform are neutral text data, and most of the first text data obtained from the first third party platform are positively rated texts. Data and text data of negative reviews. Therefore, in order to increase the number of positively evaluated training samples and negatively evaluated training samples in the second training sample set, the second training sample set can be combined with the first training sample set to obtain a new second training sample with sufficient training samples. The sample set is used to train the second neural network using the new second training sample set, thereby making the trained second neural network more accurate.
在一些可能的实施方式中,在根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价之后,所述方法还包括:提取每条第一文本数据的文本内容;将所述文本内容转化为第二emoji表情;根据所述第二emoji表情确定每条第一文本数据对应的第二情感评价;确定每条第一文本数据的第一情感评价与第二情感评价是否一致,若一致,则根据每条第一文本数据的第一情感评价,对每条第一文本数据进行标注。通过文本转emoji操作,对每条第一文本数据对应的情感评价进行验证,进而提高后续对第一文本数据标注的精确度。In some possible implementation manners, after determining the first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the method further includes: extracting each piece of first text data. The text content of the first text data; convert the text content into a second emoji expression; determine the second emotional evaluation corresponding to each piece of first text data according to the second emoji expression; determine the value of each piece of first text data Whether the first sentiment evaluation and the second sentiment evaluation are consistent, if they are consistent, each piece of first text data is labeled according to the first sentiment evaluation of each piece of first text data. Through the text-to-emoji operation, the sentiment evaluation corresponding to each piece of first text data is verified, thereby improving the accuracy of subsequent labeling of the first text data.
在一些可能的实施方式中,所述方法还包括:获取任意用户的评论数据,该评论数据为所述用户对目标产品的评论数据,该目标产品包括理财产品;使用上述第二神经网络对所述用户的评论数据进行分类,得到对所述用户的评论数据的分类结果;根据所述用户的评论数据的分类结果,筛选目标用户,即将分类结果为正面评价的用户作为目标用户;向所述目标用户推荐所述目标产品。In some possible implementation manners, the method further includes: obtaining comment data of any user, the comment data being the user’s comment data on a target product, and the target product includes wealth management products; The user’s comment data is classified to obtain a classification result of the user’s comment data; target users are screened according to the classification result of the user’s comment data, that is, users whose classification results are positively rated are regarded as target users; The target user recommends the target product.
可以看出,在本实施例中,使用第二神经网络筛选出对目标产品(理财产品)感兴趣的用户,保证用户筛选的精确性,提高推荐的成功率。It can be seen that, in this embodiment, the second neural network is used to screen out users who are interested in the target product (financial management product) to ensure the accuracy of user screening and improve the success rate of recommendation.
参阅图2,图2为本申请实施例提供的另一种文本标注方法的流程示意图该实施例中与图1所示的实施例相同的内容,此处不再重复描述。该方法应用于电子设备,该方法包括以下步骤内容。Referring to FIG. 2, FIG. 2 is a schematic flowchart of another text labeling method provided by an embodiment of this application. The content of this embodiment is the same as that of the embodiment shown in FIG. 1, and the description will not be repeated here. The method is applied to electronic equipment, and the method includes the following steps.
201:电子设备从第一平台获取第一文本数据集。201: The electronic device obtains the first text data set from the first platform.
202:电子设备对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集,将所述新的第一文本数据集作为所述第一文本数据集。202: The electronic device cleans each piece of first text data in the first text data set, deletes the first text data that does not contain emoji expressions, obtains a new first text data set, and replaces the new first text data The data set serves as the first text data set.
203:电子设备根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,该第一情感评价包括正面评价或负面评价。203: The electronic device determines a first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, and the first emotional evaluation includes a positive evaluation or a negative evaluation.
204:电子设备提取每条第一文本数据的文本内容,对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息。204: The electronic device extracts the text content of each piece of first text data, and performs semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data.
205:电子设备根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价。205: The electronic device determines the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data.
206:电子设备保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。206: The electronic device retains the first text data in the first text data set that has the same first sentiment evaluation and the second sentiment evaluation, and deletes the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent.
207:电子设备根据剩余的第一文本数据的第一情感评价,对该剩余的第一文本数据进行标注,得到第一训练样本集。207: The electronic device labels the remaining first text data according to the first sentiment evaluation of the remaining first text data to obtain the first training sample set.
其中,该剩余的第一文本数据为该第一文本数据集中删除第一情感评价和第二情感评价不一致的第一评论数据之后剩余的第一文本数据。The remaining first text data is the remaining first text data after deleting the first comment data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent in the first text data set.
208:电子设备使用第一训练样本集对第一神经网络进行训练。208: The electronic device uses the first training sample set to train the first neural network.
209:电子设备从第二平台获取第二文本数据集。209: The electronic device obtains the second text data set from the second platform.
210:电子设备使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。210: The electronic device uses the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes a positive One of evaluation, negative evaluation, or neutral evaluation.
可以看出,在本申请实施例中,通过评论数据中的emoji表情对评论数据进行标注,无需对评论数据进行语义分析,从而在标注时不会受评论数据的语言类型的限制,进而增加了该标注方法的应用场景;另外,可通过emoji表情对评论数据进行自动标注,无需人工标注即可得到包含有情感分类标签的训练样本集,从而节省了人力物力资源;而且,在对第一文本数据集进行标注前,先对第一文本数据集进行清洗保留高质量的第一文本数据,从而提高了标注的精确度。It can be seen that in the embodiment of the present application, the comment data is annotated by emoji expressions in the comment data, and there is no need to perform semantic analysis on the comment data, so that the annotation will not be restricted by the language type of the comment data, thereby increasing The application scenario of the labeling method; in addition, the comment data can be automatically annotated through emoji expressions, and training sample sets containing emotion classification labels can be obtained without manual labeling, thereby saving human and material resources; moreover, in the first text Before the data set is annotated, the first text data set is cleaned to retain high-quality first text data, thereby improving the accuracy of the annotation.
参阅图3,图3为本申请实施例提供的另一种文本标注方法的流程示意图该实施例中与图1和图2所示的实施例相同的内容,此处不再重复描述。该方法应用于电子设备,该方法包括以下步骤内容。Referring to FIG. 3, FIG. 3 is a schematic flowchart of another text labeling method provided by an embodiment of the application. The content in this embodiment is the same as the embodiment shown in FIG. 1 and FIG. 2, and the description will not be repeated here. The method is applied to electronic equipment, and the method includes the following steps.
301:电子设备从第一平台获取第一文本数据集。301: The electronic device obtains the first text data set from the first platform.
302:电子设备对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集,将所述新的第一文本数据集作为所述第一文本数据集。302: The electronic device cleans each piece of first text data in the first text data set, deletes the first text data that does not contain emoji expressions, obtains a new first text data set, and replaces the new first text data The data set serves as the first text data set.
303:电子设备根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,该第一情感评价包括正面评价或负面评价。303: The electronic device determines a first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, and the first emotional evaluation includes a positive evaluation or a negative evaluation.
304:电子设备提取每条第一文本数据的文本内容,对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息。304: The electronic device extracts the text content of each piece of first text data, performs semantic analysis on the text content of each piece of first text data, and obtains semantic information of each piece of first text data.
305:电子设备根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价。305: The electronic device determines the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data.
306:电子设备保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。306: The electronic device retains the first text data in the first text data set that has the same first sentiment evaluation and the second sentiment evaluation, and deletes the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent.
307:电子设备根据剩余的第一文本数据的第一情感评价,对该剩余的第一文本数据进行标注,得到第一训练样本集。307: The electronic device labels the remaining first text data according to the first sentiment evaluation of the remaining first text data to obtain a first training sample set.
其中,该剩余的第一文本数据为该第一文本数据集中删除第一情感评价和第二情感评价不一致的第一评论数据之后剩余的第一文本数据。The remaining first text data is the remaining first text data after deleting the first comment data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent in the first text data set.
308:电子设备使用第一训练样本集对第一神经网络进行训练。308: The electronic device uses the first training sample set to train the first neural network.
309:电子设备从第二平台获取第二文本数据集。309: The electronic device obtains the second text data set from the second platform.
310:电子设备使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。310: The electronic device uses the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, where the second annotation result includes a positive One of evaluation, negative evaluation, or neutral evaluation.
311:电子设备使用根据每条第二文本数据的第二标注结果,得到第二训练样本集,并使用该第二训练样本集对对第二神经网络进行训练。311: The electronic device uses the second labeling result according to each piece of second text data to obtain a second training sample set, and uses the second training sample set to train the second neural network.
312:电子设备获取任意一条评论数据,使用所述第二神经网络对所述评论数据进行分类,得到对所述评论数据的分类结果,根据所述分类结果,确定是否公开所述评论数据。312: The electronic device obtains any piece of comment data, uses the second neural network to classify the comment data to obtain a classification result of the comment data, and determines whether to disclose the comment data according to the classification result.
可以看出,在本申请实施例中,通过文本数据中的emoji表情对文本数据进行标注,无需对文本数据进行语义分析,从而在标注时不会受文本数据的语言类型的限制,进而增加了该标注方法的应用场景;另外,可通过emoji表情对文本数据进行自动标注,无需人工标注即可得到包含有情感分类标签的训练样本集,从而节省了人力物力资源;而且,在对第一文本数据集进行标注前,先对第一文本数据集进行清洗保留高质量的第一文本数据, 从而提高了标注的精确度;此外,使用训练好的第二神经网络对待发表的评论数据进行分类,自动屏蔽不满足要求的待发表的评论数据,无需人力审核,节省了人力资源。It can be seen that, in the embodiment of the present application, the text data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the text data, so that the annotation will not be restricted by the language type of the text data, thereby increasing The application scenario of the labeling method; in addition, the text data can be automatically annotated by emoji expressions, and training sample sets containing emotion classification labels can be obtained without manual labeling, thereby saving human and material resources; moreover, in the first text Before annotating the data set, clean the first text data set to retain high-quality first text data, thereby improving the accuracy of annotation; in addition, use the trained second neural network to classify the comment data to be published. Automatically block the comment data that does not meet the requirements to be published, without human review, saving human resources.
参阅图4,图4为本申请实施例提供的一种电子设备的结构示意图。如图4所示,电子设备400包括处理器、存储器、通信接口以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,以执行以下步骤的指令:从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;根据每条第一文本数据的第一标注结果得到第一训练样本集;使用所述第一训练样本集对第一神经网络进行训练;从第二三方平台获取第二文本数据集;使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。Refer to FIG. 4, which is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in FIG. 4, the electronic device 400 includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and are configured to be executed by the processor to execute Instructions for the following steps: Obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; according to each piece of first text data in the first text data set The emoji expression of each piece of first text data is annotated, and the first annotation result of each piece of first text data is obtained, and the first annotation result includes a positive evaluation or a negative evaluation; according to the first annotation of each piece of first text data The first training sample set is obtained by the labeling result; the first neural network is trained using the first training sample set; the second text data set is obtained from the second third party platform; the second text data set is obtained by using the first neural network The data set is annotated to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
在一些可能的实施方式中,在根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注方面,所述处理器,具体用于:根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,所述第一情感评价包括正面评价或负面评价;根据每条第一文本数据的第一情感评价,对每条第一文本数据进行标注。In some possible implementation manners, in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor is specifically configured to: Describe the emoji expression of each piece of first text data in the first text data set, and determine the first emotional evaluation of each piece of first text data, where the first emotional evaluation includes a positive evaluation or a negative evaluation; according to each piece of first text data The first sentiment evaluation of each piece of first text data is marked.
在一些可能的实施方式中,根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价之后,所述处理器,还用于:提取每条第一文本数据的文本内容;对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息;根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价;保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。In some possible implementation manners, after determining the first sentiment evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor is further configured to: Extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; determine each piece of data according to the semantic information of each piece of first text data The second sentiment evaluation of the first text data; the first text data in the first text data set with the first sentiment evaluation consistent with the second sentiment evaluation is retained, and the first text data in which the first sentiment evaluation is inconsistent with the second sentiment evaluation is deleted .
在一些可能的实施方式中,在对所述第一文本数据集进行标注之前,所述处理器,还用于:对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集;将所述新的第一文本数据集作为所述第一文本数据集。In some possible implementation manners, before annotating the first text data set, the processor is further configured to: clean each piece of first text data in the first text data set, and delete The first text data containing emoji expressions is used to obtain a new first text data set; and the new first text data set is used as the first text data set.
在一些可能的实施方式中,在用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果方面,所述处理器,具体用于:使用所述第一神经网络对所述第二文本数据集中的每条第二文本数据进行分类,得到每条第二文本数据为正面评价的第一概率和负面评价的第二概率;确定第一概率大于第一阈值的第二文本数据的第二标注结果为正面评价;确定第二概率大于所述第一阈值的第二文本数据的第二标注结果为负面评价;将第一概率小于所述第一阈值,且大于所述第二阈值的第二文本数据的第二标注结果为中性评价。In some possible implementation manners, in terms of using the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, the The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a negative evaluation Second probability; determining that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determining the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
在一些可能的实施方式中,所述处理器,还用于:根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集;使用所述第二训练样本集对第二神经网络进行训练;获取任意一条待发表的评论数据;使用所述第二神经网络对所述待发表的评论数据进行情感分类,得到对所述待发表的评论数据的分类结果;根据所述分类结果,确定是否公开所述待发表的评论数据。In some possible implementation manners, the processor is further configured to: obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; use the second training The sample set trains the second neural network; obtains any piece of comment data to be published; uses the second neural network to perform sentiment classification on the comment data to be published to obtain the classification result of the comment data to be published ; According to the classification result, determine whether to publish the comment data to be published.
在一些可能的实施方式中,在根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集之后,所述处理器,还用于:将所述第二训练样本与所述第一训练样本集进行合并,得到新的第二训练样本集;在使用所述第二训练样本集对第二神经网络进行训练方面,所述处理器,具体用于:使用所述新的第二训练样本集对第二神经网络进行训练。In some possible implementation manners, after obtaining a second training sample set according to the second annotation result of each piece of second text data in the second text data set, the processor is further configured to: The two training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train a second neural network, the processor is specifically configured to: The second neural network is trained using the new second training sample set.
参阅图5,图5本申请实施例提供的一种电子设备的功能单元组成框图。电子设备500包括:获取单元510、标注单元520和训练单元530。Refer to FIG. 5, which is a block diagram of a functional unit composition of an electronic device provided by an embodiment of the present application. The electronic device 500 includes: an acquisition unit 510, a labeling unit 520, and a training unit 530.
获取单元510,用于从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;标注单元520,根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;训练单元530,用于根据每条第一文本数据的第一标注结果得到第一训练样本集,并使用所述第一训练样本集对第一神经网络进行训练;获取单元510,还用于从第二三方平台获取第二文本数据集;标注单元520,还用于使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。The obtaining unit 510 is configured to obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; the labeling unit 520, according to each of the first text data set An emoji expression of the first text data is annotated for each first text data, and the first annotation result of each first text data is obtained. The first annotation result includes a positive evaluation or a negative evaluation; the training unit 530 uses To obtain a first training sample set according to the first annotation result of each piece of first text data, and use the first training sample set to train the first neural network; the obtaining unit 510 is also used to obtain from the second third party platform The second text data set; the labeling unit 520 is further configured to use the first neural network to label the second text data set to obtain a second labeling result of each piece of second text data in the second text data set , The second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
在一些可能的实施方式中,在根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注方面,标注单元520,具体用于:根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,所述第一情感评价包括正面评价或负面评价;根据每条第一文本数据的第一情感评价,对每条第一文本数据进行标注。In some possible implementation manners, in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the labeling unit 520 is specifically configured to: The emoji expression of each piece of first text data in the first text data set determines the first emotional evaluation of each piece of first text data, and the first emotional evaluation includes a positive evaluation or a negative evaluation; The first sentiment evaluation is to label each piece of first text data.
在一些可能的实施方式中,电子设备500还包括清洗单元540,根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价之后,清洗单元540,用于:提取每条第一文本数据的文本内容;对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息;根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价;保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。In some possible implementation manners, the electronic device 500 further includes a cleaning unit 540. After determining the first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, The cleaning unit 540 is used to: extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; according to each piece of first text data To determine the second sentiment evaluation of each piece of first text data; retain the first text data in the first text data set with the first sentiment evaluation consistent with the second sentiment evaluation, and delete the first sentiment evaluation and the second sentiment The first text data with inconsistent evaluations.
在一些可能的实施方式中,电子设备500还包括清洗单元540,在对所述第一文本数据集进行标注之前,清洗单元540用于:对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集;将所述新的第一文本数据集作为所述第一文本数据集。In some possible implementation manners, the electronic device 500 further includes a cleaning unit 540. Before annotating the first text data set, the cleaning unit 540 is configured to: The data is cleaned, the first text data that does not contain emoji expressions is deleted, and a new first text data set is obtained; the new first text data set is used as the first text data set.
在一些可能的实施方式中,在使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果方面,标注单元520,具体用于:使用所述第一神经网络对所述第二文本数据集中的每条第二文本数据进行分类,得到每条第二文本数据为正面评价的第一概率和负面评价的第二概率;确定第一概率大于第一阈值的第二文本数据的第二标注结果为正面评价;确定第二概率大于所述第一阈值的第二文本数据的第二标注结果为负面评价;将第一概率小于所述第一阈值,且大于所述第二阈值的第二文本数据的第二标注结果为中性评价。In some possible implementation manners, in terms of using the first neural network to annotate the second text data set to obtain the second annotation result of each piece of second text data in the second text data set, the annotation unit 520, specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and the first probability of a negative evaluation Two probabilities; determine that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determine the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
在一些可能的实施方式中,还包括确定单元550;训练单元530,还用于根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集;训练单元530,还用于使用所述第二训练样本集对第二神经网络进行训练;确定单元550,用于获取任意一条待发表的评论数据;使用所述第二神经网络对所述待发表的评论数据进行情感分类,得到对所述待发表的评论数据的分类结果;根据所述分类结果,确定是否公开所述待发表的评论数据。In some possible implementation manners, it further includes a determining unit 550; a training unit 530, further configured to obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; and training unit 530, is further configured to use the second training sample set to train a second neural network; the determining unit 550, is configured to obtain any piece of comment data to be published; use the second neural network to perform a comment on the comment to be published The data is emotionally classified to obtain a classification result of the comment data to be published; according to the classification result, it is determined whether to disclose the comment data to be published.
在一些可能的实施方式中,在根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集之后,训练单元530,还用于:将所述第二训练样本与所述第一训练样本集进行合并,得到新的第二训练样本集;在使用所述第二训练样本集对第二神经网络进行训练方面,训练单元530,具体用于:使用所述新的第二训练样本集对第二神经网络进行训练。In some possible implementation manners, after obtaining a second training sample set according to the second annotation result of each piece of second text data in the second text data set, the training unit 530 is further configured to: The training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train the second neural network, the training unit 530 is specifically used to: The new second training sample set trains the second neural network.
本申请实施例还提供一种计算机存储可读介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有计算机程序,所述存储计算机程序被处理器执行,以实现以下步骤:从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;根据每条第一文本数据的第一标注结果得到第一训练样本集;使用所述第一训练样本集对第一神经网络进行训练;从第二三方平台获取第二文本数据集;使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium may be non-volatile or volatile, the computer-readable storage medium stores a computer program, and the storage computer The program is executed by the processor to implement the following steps: obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; The emoji expression of each piece of first text data is annotated for each piece of first text data, and the first annotation result of each piece of first text data is obtained. The first annotation result includes a positive evaluation or a negative evaluation; A first labeling result of text data obtains a first training sample set; using the first training sample set to train a first neural network; obtaining a second text data set from a second third party platform; using the first neural network Annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation Kind.
在一些可能的实施方式中,在根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注方面,所述处理器,具体用于:根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,所述第一情感评价包括正面评价或负面评价;根据每条第一文本数据的第一情感评价,对每条第一文本数据进行标注。In some possible implementation manners, in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor is specifically configured to: Describe the emoji expression of each piece of first text data in the first text data set, and determine the first emotional evaluation of each piece of first text data, where the first emotional evaluation includes a positive evaluation or a negative evaluation; according to each piece of first text data The first sentiment evaluation of each piece of first text data is marked.
在一些可能的实施方式中,根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价之后,所述处理器,还用于:提取每条第一文本数据的文本内容;对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息;根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价;保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。In some possible implementation manners, after determining the first sentiment evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor is further configured to: Extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; determine each piece of data according to the semantic information of each piece of first text data The second sentiment evaluation of the first text data; retain the first text data in the first text data set whose first sentiment evaluation is consistent with the second sentiment evaluation, and delete the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent .
在一些可能的实施方式中,在对所述第一文本数据集进行标注之前,所述处理器,还用于:对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集;将所述新的第一文本数据集作为所述第一文本数据集。In some possible implementation manners, before annotating the first text data set, the processor is further configured to: clean each piece of first text data in the first text data set, and delete The first text data containing emoji expressions is used to obtain a new first text data set; and the new first text data set is used as the first text data set.
在一些可能的实施方式中,在用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果方面,所述处理器,具体用于:使用所述第一神经网络对所述第二文本数据集中的每条第二文本数据进行分类,得到每条第二文本数据为正面评价的第一概率和负面评价的第二概率;确定第一概率大于第一阈值的第二文本数据的第二标注结果为正面评价;确定第二概率大于所述第一阈值的第二文本数据的第二标注结果为负面评价;将第一概率小于所述第一阈值,且大于所述第二阈值的第二文本数据的第二标注结果为中性评价。In some possible implementation manners, in terms of using the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, the The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a negative evaluation Second probability; determining that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determining the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
在一些可能的实施方式中,所述处理器,还用于:根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集;使用所述第二训练样本集对第二神经网络进行训练;获取任意一条待发表的评论数据;使用所述第二神经网络对所述待发表的评论数据进行情感分类,得到对所述待发表的评论数据的分类结果;根据所述分类结果,确定是否公开所述待发表的评论数据。In some possible implementation manners, the processor is further configured to: obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; use the second training The sample set trains the second neural network; obtains any piece of comment data to be published; uses the second neural network to perform sentiment classification on the comment data to be published to obtain the classification result of the comment data to be published ; According to the classification result, determine whether to publish the comment data to be published.
在一些可能的实施方式中,在根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集之后,所述处理器,还用于:将所述第二训练样本与所述第一训练样本集进行合并,得到新的第二训练样本集;在使用所述第二训练样本集对第二神经网络进行训练方面,所述处理器,具体用于:使用所述新的第二训练样本集对第二神经网络进行训练。In some possible implementation manners, after obtaining a second training sample set according to the second annotation result of each piece of second text data in the second text data set, the processor is further configured to: The two training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train a second neural network, the processor is specifically configured to: The second neural network is trained using the new second training sample set.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为 依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions. Because according to this application, some steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by this application.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized in the form of hardware or software program module.
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the ideas of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as limiting the application.

Claims (20)

  1. 一种文本标注方法,其中,应用于电子设备,包括:A text labeling method, which is applied to electronic equipment, includes:
    所述电子设备从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;The electronic device obtains a first text data set from a first third party platform, and each piece of first text data in the first text data set includes an emoji expression;
    所述电子设备根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;The electronic device tags each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain the first tagging result of each piece of first text data. The marked results include positive or negative comments;
    所述电子设备根据每条第一文本数据的第一标注结果得到第一训练样本集;The electronic device obtains the first training sample set according to the first annotation result of each piece of first text data;
    所述电子设备使用所述第一训练样本集对第一神经网络进行训练;The electronic device uses the first training sample set to train a first neural network;
    所述电子设备从第二三方平台获取第二文本数据集;The electronic device obtains the second text data set from the second third party platform;
    所述电子设备使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。The electronic device uses the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes a positive One of evaluation, negative evaluation, or neutral evaluation.
  2. 根据权利要求1所述的方法,其中,所述电子设备根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,包括:The method according to claim 1, wherein the electronic device marking each piece of first text data according to the emoji expression of each piece of first text data in the first text data set comprises:
    根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,所述第一情感评价包括正面评价或负面评价;Determine the first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, where the first emotional evaluation includes a positive evaluation or a negative evaluation;
    根据每条第一文本数据的第一情感评价,对每条第一文本数据进行标注。According to the first sentiment evaluation of each piece of first text data, mark each piece of first text data.
  3. 根据权利要求2所述的方法,其中,根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价之后,所述方法还包括:The method according to claim 2, wherein, after determining the first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the method further comprises:
    提取每条第一文本数据的文本内容;Extract the text content of each piece of first text data;
    对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息;Perform semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
    根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价;Determine the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data;
    保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。Retaining the first text data in the first text data set where the first emotion evaluation and the second emotion evaluation are consistent, and deleting the first text data where the first emotion evaluation and the second emotion evaluation are inconsistent.
  4. 根据权利要求1-3中任一项所述的方法,其中,在所述电子设备对所述第一文本数据集进行标注之前,所述方法还包括:The method according to any one of claims 1 to 3, wherein, before the electronic device annotates the first text data set, the method further comprises:
    对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集;Clean each piece of first text data in the first text data set, delete the first text data that does not contain emoji expressions, and obtain a new first text data set;
    将所述新的第一文本数据集作为所述第一文本数据集。Use the new first text data set as the first text data set.
  5. 根据权利要求1所述的方法,其中,所述使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,包括:The method according to claim 1, wherein the first neural network is used to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set ,include:
    使用所述第一神经网络对所述第二文本数据集中的每条第二文本数据进行分类,得到每条第二文本数据为正面评价的第一概率和负面评价的第二概率;Use the first neural network to classify each piece of second text data in the second text data set to obtain a first probability of a positive evaluation and a second probability of a negative evaluation for each piece of second text data;
    确定第一概率大于第一阈值的第二文本数据的第二标注结果为正面评价;Determining that the second annotation result of the second text data whose first probability is greater than the first threshold is a positive evaluation;
    确定第二概率大于所述第一阈值的第二文本数据的第二标注结果为负面评价;Determining that the second annotation result of the second text data whose second probability is greater than the first threshold is a negative evaluation;
    将第一概率小于所述第一阈值,且大于所述第二阈值的第二文本数据的第二标注结果为中性评价。The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  6. 根据权利要求1所述的方法,其中,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    所述电子设备根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集;The electronic device obtains a second training sample set according to the second annotation result of each piece of second text data in the second text data set;
    所述电子设备使用所述第二训练样本集对第二神经网络进行训练;The electronic device uses the second training sample set to train a second neural network;
    所述电子设备获取任意一条待发表的评论数据;The electronic device obtains any piece of comment data to be published;
    所述电子设备使用所述第二神经网络对所述待发表的评论数据进行情感分类,得到对 所述待发表的评论数据的分类结果;The electronic device uses the second neural network to perform emotional classification on the comment data to be published, and obtain a classification result of the comment data to be published;
    所述电子设备根据所述分类结果,确定是否公开所述待发表的评论数据。The electronic device determines whether to disclose the comment data to be published according to the classification result.
  7. 根据权利要求6所述的方法,其中,所述电子设备根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集之后,所述方法还包括:The method according to claim 6, wherein after the electronic device obtains the second training sample set according to the second annotation result of each piece of second text data in the second text data set, the method further comprises:
    将所述第二训练样本与所述第一训练样本集进行合并,得到新的第二训练样本集;Combining the second training sample with the first training sample set to obtain a new second training sample set;
    所述电子设备使用所述第二训练样本集对第二神经网络进行训练,包括:The electronic device using the second training sample set to train a second neural network includes:
    所述电子设备使用所述新的第二训练样本集对第二神经网络进行训练。The electronic device uses the new second training sample set to train the second neural network.
  8. 一种电子设备,其中,包括:An electronic device, including:
    获取单元,用于从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;The acquiring unit is configured to acquire a first text data set from the first third party platform, and each piece of first text data in the first text data set includes an emoji expression;
    标注单元,根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;The labeling unit labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain a first labeling result of each piece of first text data, and the first label Results include positive or negative comments;
    训练单元,用于根据每条第一文本数据的第一标注结果得到第一训练样本集,并使用所述第一训练样本集对第一神经网络进行训练;A training unit, configured to obtain a first training sample set according to the first annotation result of each piece of first text data, and use the first training sample set to train the first neural network;
    所述获取单元,还用于从第二三方平台获取第二文本数据集;The acquiring unit is further configured to acquire a second text data set from a second third party platform;
    所述标注单元,还用于使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。The labeling unit is further configured to use the first neural network to label the second text data set to obtain a second labeling result of each piece of second text data in the second text data set, and the second The labeling result includes one of positive evaluation, negative evaluation or neutral evaluation.
  9. 一种电子设备,其中,包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被生成由所述处理器执行,以执行以下步骤的指令:An electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory, and are generated and executed by the processor to Follow the instructions for the following steps:
    从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;Acquiring a first text data set from the first third party platform, where each piece of first text data in the first text data set includes an emoji expression;
    根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;According to the emoji expression of each piece of first text data in the first text data set, each piece of first text data is annotated to obtain a first annotation result of each piece of first text data, and the first annotation result includes a positive Evaluation or negative evaluation;
    根据每条第一文本数据的第一标注结果得到第一训练样本集;Obtain the first training sample set according to the first annotation result of each piece of first text data;
    使用所述第一训练样本集对第一神经网络进行训练;Use the first training sample set to train the first neural network;
    从第二三方平台获取第二文本数据集;Obtain the second text data set from the second third party platform;
    使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。Use the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes a positive evaluation and a negative evaluation Or one of the neutral evaluations.
  10. 根据权利要求9所述的设备,其中,在根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注方面,所述处理器,具体用于:The device according to claim 9, wherein, in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor is specifically configured to :
    根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,所述第一情感评价包括正面评价或负面评价;Determine the first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, where the first emotional evaluation includes a positive evaluation or a negative evaluation;
    根据每条第一文本数据的第一情感评价,对每条第一文本数据进行标注。According to the first sentiment evaluation of each piece of first text data, mark each piece of first text data.
  11. 根据权利要求10所述的设备,其中,在根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价之后,所述处理器,还用于:The device according to claim 10, wherein, after determining the first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor, Also used for:
    提取每条第一文本数据的文本内容;Extract the text content of each piece of first text data;
    对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息;Perform semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
    根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价;Determine the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data;
    保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除 第一情感评价和第二情感评价不一致的第一文本数据。Retaining the first text data in the first text data set where the first emotion evaluation and the second emotion evaluation are consistent, and deleting the first text data where the first emotion evaluation and the second emotion evaluation are inconsistent.
  12. 根据权利要求9-11中任一项所述的设备,其中,在对所述第一文本数据集进行标注之前,所述处理器,还用于对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集;将所述新的第一文本数据集作为所述第一文本数据集。The device according to any one of claims 9-11, wherein, before annotating the first text data set, the processor is further configured to: A text data is cleaned, the first text data that does not contain emoji expressions is deleted, and a new first text data set is obtained; the new first text data set is used as the first text data set.
  13. 根据权利要求9所述的设备,其中,在使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果方面,所述处理器,具体用于:使用所述第一神经网络对所述第二文本数据集中的每条第二文本数据进行分类,得到每条第二文本数据为正面评价的第一概率和负面评价的第二概率;The device according to claim 9, wherein, in terms of using the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set , The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain a first probability that each piece of second text data is a positive evaluation and The second probability of negative evaluation;
    确定第一概率大于第一阈值的第二文本数据的第二标注结果为正面评价;Determining that the second annotation result of the second text data whose first probability is greater than the first threshold is a positive evaluation;
    确定第二概率大于所述第一阈值的第二文本数据的第二标注结果为负面评价;Determining that the second annotation result of the second text data whose second probability is greater than the first threshold is a negative evaluation;
    将第一概率小于所述第一阈值,且大于所述第二阈值的第二文本数据的第二标注结果为中性评价。The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  14. 根据权利要求9所述的设备,其中,所述处理器,还用于:The device according to claim 9, wherein the processor is further configured to:
    根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集;Obtaining a second training sample set according to the second annotation result of each piece of second text data in the second text data set;
    使用所述第二训练样本集对第二神经网络进行训练;Use the second training sample set to train a second neural network;
    获取任意一条待发表的评论数据;Get any piece of comment data to be published;
    使用所述第二神经网络对所述待发表的评论数据进行情感分类,得到对所述待发表的评论数据的分类结果;Using the second neural network to perform sentiment classification on the comment data to be published to obtain a classification result of the comment data to be published;
    根据所述分类结果,确定是否公开所述待发表的评论数据。According to the classification result, it is determined whether to disclose the comment data to be published.
  15. 根据权利要求14所述的设备,其中,根据所述第二文本数据集中每条第二文本数据的第二标注结果,得到第二训练样本集之后,所述处理器,还用于将所述第二训练样本与所述第一训练样本集进行合并,得到新的第二训练样本集;The device according to claim 14, wherein, after obtaining the second training sample set according to the second annotation result of each piece of second text data in the second text data set, the processor is further configured to transfer the Combining the second training sample with the first training sample set to obtain a new second training sample set;
    在使用所述第二训练样本集对第二神经网络进行训练方面,所述处理器,具体用于:使用所述新的第二训练样本集对第二神经网络进行训练。In terms of using the second training sample set to train the second neural network, the processor is specifically configured to: use the new second training sample set to train the second neural network.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质用于存储计算机程序,所述存储计算机程序被处理器执行,以实现以下步骤:A computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, and the stored computer program is executed by a processor to implement the following steps:
    从第一三方平台获取第一文本数据集,所述第一文本数据集中的每条第一文本数据包括emoji表情;Acquiring a first text data set from the first third party platform, where each piece of first text data in the first text data set includes an emoji expression;
    根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注,得到每条第一文本数据的第一标注结果,所述第一标注结果包括正面评价或负面评价;According to the emoji expression of each piece of first text data in the first text data set, each piece of first text data is annotated to obtain a first annotation result of each piece of first text data, and the first annotation result includes a positive Evaluation or negative evaluation;
    根据每条第一文本数据的第一标注结果得到第一训练样本集;Obtain the first training sample set according to the first annotation result of each piece of first text data;
    使用所述第一训练样本集对第一神经网络进行训练;Use the first training sample set to train the first neural network;
    从第二三方平台获取第二文本数据集;Obtain the second text data set from the second third party platform;
    使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果,所述第二标注结果包括正面评价、负面评价或中性评价中的一种。Use the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes a positive evaluation and a negative evaluation Or one of the neutral evaluations.
  17. 根据权利要求16所述的介质,其中,在根据所述第一文本数据集中的每条第一文本数据的emoji表情,对每条第一文本数据进行标注方面,所述处理器,具体用于:The medium according to claim 16, wherein, in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor is specifically configured to :
    根据所述第一文本数据集中的每条第一文本数据的emoji表情,确定每条第一文本数据的第一情感评价,所述第一情感评价包括正面评价或负面评价;Determine the first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, where the first emotional evaluation includes a positive evaluation or a negative evaluation;
    根据每条第一文本数据的第一情感评价,对每条第一文本数据进行标注。According to the first sentiment evaluation of each piece of first text data, mark each piece of first text data.
  18. 根据权利要求17所述的介质,其中,在根据所述第一文本数据集中的每条第一文 本数据的emoji表情,确定每条第一文本数据的第一情感评价之后,所述处理器,还用于:The medium according to claim 17, wherein, after determining the first emotion evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, the processor, Also used for:
    提取每条第一文本数据的文本内容;Extract the text content of each piece of first text data;
    对每条第一文本数据的文本内容进行语义分析,得到每条第一文本数据的语义信息;Perform semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data;
    根据每条第一文本数据的语义信息,确定每条第一文本数据的第二情感评价;Determine the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data;
    保留所述第一文本数据集中第一情感评价和第二情感评价一致的第一文本数据,删除第一情感评价和第二情感评价不一致的第一文本数据。Retaining the first text data in the first text data set where the first emotion evaluation and the second emotion evaluation are consistent, and deleting the first text data where the first emotion evaluation and the second emotion evaluation are inconsistent.
  19. 根据权利要求16-18中任一项所述的介质,其中,在对所述第一文本数据集进行标注之前,所述处理器,还用于对所述第一文本数据集中的每条第一文本数据进行清洗,删除不包含emoji表情的第一文本数据,得到新的第一文本数据集;将所述新的第一文本数据集作为所述第一文本数据集。The medium according to any one of claims 16-18, wherein, before annotating the first text data set, the processor is further configured to: A text data is cleaned, the first text data that does not contain emoji expressions is deleted, and a new first text data set is obtained; the new first text data set is used as the first text data set.
  20. 根据权利要求16所述的介质,其中,在使用所述第一神经网络对所述第二文本数据集进行标注,得到所述第二文本数据集中每条第二文本数据的第二标注结果方面,所述处理器,具体用于:使用所述第一神经网络对所述第二文本数据集中的每条第二文本数据进行分类,得到每条第二文本数据为正面评价的第一概率和负面评价的第二概率;The medium according to claim 16, wherein, in terms of using the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set , The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain a first probability that each piece of second text data is a positive evaluation and The second probability of negative evaluation;
    确定第一概率大于第一阈值的第二文本数据的第二标注结果为正面评价;Determining that the second annotation result of the second text data whose first probability is greater than the first threshold is a positive evaluation;
    确定第二概率大于所述第一阈值的第二文本数据的第二标注结果为负面评价;Determining that the second annotation result of the second text data whose second probability is greater than the first threshold is a negative evaluation;
    将第一概率小于所述第一阈值,且大于所述第二阈值的第二文本数据的第二标注结果为中性评价。The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
PCT/CN2020/099493 2020-05-28 2020-06-30 Text annotation method, device, and storage medium WO2021114634A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010465811.4 2020-05-28
CN202010465811.4A CN111695357A (en) 2020-05-28 2020-05-28 Text labeling method and related product

Publications (1)

Publication Number Publication Date
WO2021114634A1 true WO2021114634A1 (en) 2021-06-17

Family

ID=72478683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099493 WO2021114634A1 (en) 2020-05-28 2020-06-30 Text annotation method, device, and storage medium

Country Status (2)

Country Link
CN (1) CN111695357A (en)
WO (1) WO2021114634A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172248A (en) * 2023-11-03 2023-12-05 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium
CN117689998A (en) * 2024-01-31 2024-03-12 数据空间研究院 Nonparametric adaptive emotion recognition model, method, system and storage medium
CN117725909A (en) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364797A1 (en) * 2016-06-16 2017-12-21 Sysomos L.P. Computing Systems and Methods for Determining Sentiment Using Emojis in Electronic Data
CN109034203A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Training, expression recommended method, device, equipment and the medium of expression recommended models
CN109684478A (en) * 2018-12-18 2019-04-26 腾讯科技(深圳)有限公司 Disaggregated model training method, classification method and device, equipment and medium
CN110704581A (en) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 Computer-executed text emotion analysis method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364797A1 (en) * 2016-06-16 2017-12-21 Sysomos L.P. Computing Systems and Methods for Determining Sentiment Using Emojis in Electronic Data
CN109034203A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Training, expression recommended method, device, equipment and the medium of expression recommended models
CN109684478A (en) * 2018-12-18 2019-04-26 腾讯科技(深圳)有限公司 Disaggregated model training method, classification method and device, equipment and medium
CN110704581A (en) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 Computer-executed text emotion analysis method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172248A (en) * 2023-11-03 2023-12-05 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium
CN117172248B (en) * 2023-11-03 2024-01-30 翼方健数(北京)信息科技有限公司 Text data labeling method, system and medium
CN117689998A (en) * 2024-01-31 2024-03-12 数据空间研究院 Nonparametric adaptive emotion recognition model, method, system and storage medium
CN117689998B (en) * 2024-01-31 2024-05-03 数据空间研究院 Nonparametric adaptive emotion recognition model, method, system and storage medium
CN117725909A (en) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium
CN117725909B (en) * 2024-02-18 2024-05-14 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111695357A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN107346336B (en) Information processing method and device based on artificial intelligence
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
WO2021114634A1 (en) Text annotation method, device, and storage medium
US20220254348A1 (en) Automatically generating a meeting summary for an information handling system
WO2022116418A1 (en) Method and apparatus for automatically determining trademark infringement, electronic device, and storage medium
CN112749326B (en) Information processing method, information processing device, computer equipment and storage medium
US20200134398A1 (en) Determining intent from multimodal content embedded in a common geometric space
JP7334395B2 (en) Video classification methods, devices, equipment and storage media
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
CN110046293B (en) User identity correlation method and device
US11019012B2 (en) File sending in instant messaging application
CN111931859B (en) Multi-label image recognition method and device
US11436446B2 (en) Image analysis enhanced related item decision
CN108959323B (en) Video classification method and device
WO2018205845A1 (en) Data processing method, server, and computer storage medium
WO2017206376A1 (en) Searching method, searching device and non-volatile computer storage medium
CN111177462B (en) Video distribution timeliness determination method and device
CN110516203B (en) Dispute focus analysis method, device, electronic equipment and computer-readable medium
CN113596130A (en) Artificial intelligence module training method, system and server based on interest portrait
US20210256221A1 (en) System and method for automatic summarization of content with event based analysis
CN115661302A (en) Video editing method, device, equipment and storage medium
CN113392205A (en) User portrait construction method, device and equipment and storage medium
TWI575391B (en) Social data filtering system, method and non-transitory computer readable storage medium of the same
WO2018120575A1 (en) Method and device for identifying main picture in web page
WO2021081914A1 (en) Pushing object determination method and apparatus, terminal device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900623

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900623

Country of ref document: EP

Kind code of ref document: A1