WO2021114634A1 - Procédé d'annotation de texte, dispositif, et support de stockage - Google Patents

Procédé d'annotation de texte, dispositif, et support de stockage Download PDF

Info

Publication number
WO2021114634A1
WO2021114634A1 PCT/CN2020/099493 CN2020099493W WO2021114634A1 WO 2021114634 A1 WO2021114634 A1 WO 2021114634A1 CN 2020099493 W CN2020099493 W CN 2020099493W WO 2021114634 A1 WO2021114634 A1 WO 2021114634A1
Authority
WO
WIPO (PCT)
Prior art keywords
text data
piece
evaluation
data set
text
Prior art date
Application number
PCT/CN2020/099493
Other languages
English (en)
Chinese (zh)
Inventor
李文斌
喻宁
冯晶凌
柳阳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021114634A1 publication Critical patent/WO2021114634A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This application relates to the technical field of emotion recognition in artificial intelligence, and specifically relates to a text labeling method, device and storage medium.
  • neural networks can be used to recognize people in surveillance videos or in the medical field, neural networks can be used to recognize tumors in MRI images; in the field of text recognition, neural networks can be used to affect text. classification.
  • the neural network has a good performance for image recognition.
  • the training of the neural network in the early stage requires a sufficient number of training data sets of sufficiently high quality.
  • the production of training data sets is a very costly project.
  • manual labeling requires a lot of time and labor costs, and labeling efficiency is low.
  • the embodiments of the present application provide a text labeling method, device, and storage medium. Increase the application scenarios of text annotation, and improve the efficiency of text annotation.
  • the first aspect of the embodiments of the present application provides a text labeling method applied to an electronic device, including: the electronic device obtains a first text data set from a first third party platform, and each first text data set in the first text data set
  • the text data includes emoji expressions; the electronic device labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data set to obtain the first label of each piece of first text data
  • the first annotation result includes a positive evaluation or a negative evaluation;
  • the electronic device obtains a first training sample set according to the first annotation result of each piece of first text data; the electronic device uses the first training sample set
  • the first neural network is trained; the electronic device obtains a second text data set from a second third party platform; the electronic device uses the first neural network to annotate the second text data set to obtain the first A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a
  • a second aspect of the embodiments of the present application provides an electronic device, including: an acquiring unit configured to acquire a first text data set from a first third party platform, and each piece of first text data in the first text data set includes an emoji expression
  • the labeling unit according to the emoji expression of each piece of first text data in the first text data set, label each piece of first text data to obtain the first labeling result of each piece of first text data, the first The labeling result includes a positive evaluation or a negative evaluation; a training unit for obtaining a first training sample set according to the first labeling result of each piece of first text data, and using the first training sample set to train the first neural network;
  • the acquiring unit is further configured to acquire a second text data set from a second third party platform; the labeling unit is further configured to use the first neural network to label the second text data set to obtain the first A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or
  • the third aspect of the embodiments of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are generated by
  • the processor executes instructions to execute the following steps: obtain a first text data set from a first third party platform, each piece of first text data in the first text data set includes an emoji expression; according to the first text data Collect the emoji expressions of each piece of first text data in the collection, and mark each piece of first text data to obtain the first annotation result of each piece of first text data.
  • the first annotation result includes a positive evaluation or a negative evaluation;
  • the first annotation result of the first text data obtains the first training sample set; the first training sample set is used to train the first neural network; the second text data set is obtained from the second third party platform; the first training sample set is used
  • the neural network annotates the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes positive evaluation, negative evaluation, or neutral evaluation.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, and the stored computer program is executed by a processor to implement the following steps:
  • the three-party platform obtains a first text data set, and each piece of first text data in the first text data set includes an emoji expression; according to the emoji expression of each piece of first text data in the first text data set, each article is One piece of text data is annotated to obtain a first annotation result of each piece of first text data, and the first annotation result includes a positive evaluation or a negative evaluation;
  • a first training sample set is obtained according to the first annotation result of each piece of first text data Use the first training sample set to train a first neural network; obtain a second text data set from a second third party platform; use the first neural network to annotate the second text data set to obtain the A second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive
  • the comment data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the comment data, so that the annotation will not be restricted by the language type of the text data.
  • the application scenario of text annotation in addition, the text data can be automatically annotated through emoji expressions, without manual annotation, which saves human and material resources.
  • FIG. 1 is a schematic flowchart of a labeling method provided by an embodiment of the application.
  • Fig. 2 is a schematic flowchart of another labeling method provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of another labeling method provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • FIG. 5 is a block diagram of the functional unit composition of an electronic device provided by an embodiment of the application.
  • the electronic devices in this application can include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablets, handheld computers, laptops, mobile Internet devices MID (Mobile Internet Devices, referred to as MID) or wearable devices Wait.
  • smart phones such as Android phones, iOS phones, Windows Phone phones, etc.
  • tablets handheld computers, laptops
  • mobile Internet devices MID Mobile Internet Devices, referred to as MID
  • wearable devices Wait can include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablets, handheld computers, laptops, mobile Internet devices MID (Mobile Internet Devices, referred to as MID) or wearable devices Wait.
  • MID Mobile Internet Devices, referred to as MID
  • wearable devices Wait can include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablets, handheld computers, laptops, mobile Internet devices MID (Mobile Internet Devices, referred to as MID) or wearable devices Wait.
  • MID Mobile Internet Devices, referred to as MID
  • wearable devices Wait
  • Figure 1 is a schematic flow diagram of a text labeling method provided by an embodiment of the application, the method is applied to an electronic device, the method includes the following steps.
  • the electronic device obtains the first text data set from the first third party platform.
  • the first three-party platform can be Weibo, twitter, Facebook, and other social applications or Amazon, Taobao, Jingdong, and other e-commerce platforms. That is, the first third-party platform is a third-party platform that contains more text data of positive reviews and text data of negative reviews.
  • the electronic device obtains a first text data set from randomly multiple pieces of first text data from the first platform through an application programming interface (Application Programming Interface, API) provided by the first third party platform. That is, the electronic device complies with the Robot protocol of the first third party platform, and obtains the first text data set from the first third party platform through the API of the first third party platform.
  • API Application Programming Interface
  • the first text data since the first text data is obtained through the API of the first third party platform, and manual review is not performed, some of the first text data may not meet the requirements. For example, it does not contain emoji or the text content is too short. Therefore, after multiple pieces of first text data are obtained, the first text data in the first text data set is cleaned first to clean the first text data that does not contain emoji expressions or the text content is too short, and the cleaned The first text data constitutes the first text data set.
  • each piece of first text data in the first text data set contains emoji expressions.
  • the electronic device labels each piece of first text data according to the emoji expression of each piece of first text data in the first text data to obtain a first labeling result of each piece of first text data.
  • the first labeling result includes proof evaluation or negative evaluation.
  • each piece of first text data in the first text data set includes an emoji expression.
  • emoji expressions themselves carry emotional evaluation.
  • emoji The emotional evaluation expressed is a positive evaluation, and Emoji expressions Express negative reviews. Therefore, the first emotion evaluation of each piece of first text data can be determined according to the emoji expression of each piece of first text data; then, each piece of first text data is labeled according to the first emotion evaluation of each piece of first text data, namely Add emotional tags to each piece of first text data.
  • the first text data is marked as a positive evaluation, and when the emoji expression belongs to the emoji expression set of negative evaluation, then The first text data is marked as a negative evaluation.
  • the first annotation result includes a positive evaluation and a negative evaluation.
  • the emotions corresponding to the positive evaluation include happiness, approval, appreciation, etc.
  • the emotions corresponding to the negative evaluation include anger, pessimism, Disagree, wait for emotion.
  • emoji It can be used to express happiness, which is a positive feeling, or it can be used to express sarcasm, which is a negative feeling.
  • the first text data containing these emoji expressions in the first text data set is not labeled, and only the first text data containing emoji expressions corresponding to positive reviews or emoji expressions corresponding to negative reviews are labeled.
  • the text content of each piece of first text data may be extracted, and the text content of each piece of first text data may be semantically analyzed to obtain the semantic information of each piece of first text data; According to the semantic information of each piece of first text data, determine the first sentiment evaluation of each piece of first text data; retain the first text data in the first text data set that is consistent with the first sentiment evaluation and the second sentiment evaluation, and delete the first sentiment The first text data in which the evaluation and the second emotion evaluation are inconsistent. Double labeling through semantic analysis and emoji expressions reduces the labeling error caused by unilateral emoji labeling and improves the accuracy of labeling the first text data set.
  • the electronic device obtains the first training sample set according to the first annotation result of each piece of first text data.
  • the labeled first text data is used as a labeled training sample, and the first training sample set is obtained.
  • the electronic device uses the first training sample set to train the first neural network.
  • the initial parameters of the first neural network are constructed first, and the training samples in the first training sample set are input to the first neural network to obtain the prediction results of the training samples; then, based on the prediction results and the training The labeling result of the sample determines the loss gradient, and the loss function is constructed based on the loss gradient; finally, the parameter value of the initial parameter is updated inversely based on the loss function and the gradient descent method; until the first neural network converges, the first neural network is completed.
  • Network training is
  • the electronic device obtains the second text data set from the second third party platform.
  • the second third party platform may be a news platform that publishes science and technology news or wiki or summary text. That is, the second third-party platform is a third-party platform that contains a large amount of neutrally evaluated text data.
  • the electronic device complies with the Robot protocol of the second third party platform, and obtains multiple pieces of second text data from the second third party platform through the API of the second third party platform to obtain the second text data set.
  • the multiple pieces of second text data can be cleaned to clean out illegal second text data with too short text content.
  • the electronic device uses the first neural network to annotate the second text data set to obtain an annotation result of each piece of second text data in the second text data set.
  • the second labeling result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
  • the electronic device uses the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a second probability that the second text data is a negative evaluation; , Mark the second text data with the first probability greater than the first threshold (that is, 100% sure that the emotional evaluation of the second text data is positive) as positive evaluation; mark the second probability greater than the first threshold (that is, there is 100% certainty that the emotional evaluation of the second text data is negative evaluation) the second text data is marked as negative evaluation; the training sample rate of the first general evaluation is less than the first threshold and greater than the second threshold (that is, no With 100% certainty, whether the sentiment evaluation of the second text data is a positive evaluation or a negative evaluation), the second text data is marked as a neutral evaluation.
  • the first threshold may be 0.7, 0.75, 0.8 or other values.
  • the second threshold may be 0.4, 0.45, 0.5 or other values.
  • the text data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the text data, so that the annotation will not be restricted by the language type of the text data, thereby increasing
  • the text data can be automatically annotated through emoji expressions, and the text data can be annotated without manual annotation, thereby saving human and material resources.
  • the method further includes: the electronic device obtains the second training sample set according to the second labeling result of each piece of second text data in the second text data set, that is, according to each second text data set in the second text data set.
  • Annotated results of the second text data the second text data set is formed into a labeled second training sample set; then, the second training sample set is used to train the second neural network; and any one to be published is obtained
  • the second neural network is used to classify the comment data to be published to obtain a classification result of the comment data to be published; according to the classification result, it is determined whether to publish the comment data to be published.
  • the comment data to be published can be comment data to be published under any news website
  • the classification result is a positive evaluation or a neutral evaluation
  • the comment data to be published is disclosed.
  • the classification result is negative
  • the comment data to be published will not be disclosed.
  • the review data to be published can be automatically reviewed through the second neural network, thereby saving human resources.
  • the review data to be published can be review data under any e-commerce platform
  • the classification result is a positive review or a negative review
  • the review data to be published is combined with the user's purchase
  • the records are checked to determine the authenticity of the comment data to be published, and in the case where it is determined that the comment data to be published is a malicious review, the comment data to be published is not disclosed.
  • the review data to be published can be automatically reviewed through the second neural network to determine the authenticity of the review data to be published, thereby saving human resources.
  • the second training sample set can be combined with the first training sample set to obtain a new second training sample with sufficient training samples.
  • the sample set is used to train the second neural network using the new second training sample set, thereby making the trained second neural network more accurate.
  • the method further includes: extracting each piece of first text data.
  • the text content of the first text data convert the text content into a second emoji expression; determine the second emotional evaluation corresponding to each piece of first text data according to the second emoji expression; determine the value of each piece of first text data Whether the first sentiment evaluation and the second sentiment evaluation are consistent, if they are consistent, each piece of first text data is labeled according to the first sentiment evaluation of each piece of first text data.
  • the sentiment evaluation corresponding to each piece of first text data is verified, thereby improving the accuracy of subsequent labeling of the first text data.
  • the method further includes: obtaining comment data of any user, the comment data being the user’s comment data on a target product, and the target product includes wealth management products;
  • the user’s comment data is classified to obtain a classification result of the user’s comment data;
  • target users are screened according to the classification result of the user’s comment data, that is, users whose classification results are positively rated are regarded as target users;
  • the target user recommends the target product.
  • the second neural network is used to screen out users who are interested in the target product (financial management product) to ensure the accuracy of user screening and improve the success rate of recommendation.
  • FIG. 2 is a schematic flowchart of another text labeling method provided by an embodiment of this application.
  • the content of this embodiment is the same as that of the embodiment shown in FIG. 1, and the description will not be repeated here.
  • the method is applied to electronic equipment, and the method includes the following steps.
  • the electronic device obtains the first text data set from the first platform.
  • the electronic device cleans each piece of first text data in the first text data set, deletes the first text data that does not contain emoji expressions, obtains a new first text data set, and replaces the new first text data
  • the data set serves as the first text data set.
  • the electronic device determines a first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, and the first emotional evaluation includes a positive evaluation or a negative evaluation.
  • the electronic device extracts the text content of each piece of first text data, and performs semantic analysis on the text content of each piece of first text data to obtain semantic information of each piece of first text data.
  • the electronic device determines the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data.
  • the electronic device retains the first text data in the first text data set that has the same first sentiment evaluation and the second sentiment evaluation, and deletes the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent.
  • the electronic device labels the remaining first text data according to the first sentiment evaluation of the remaining first text data to obtain the first training sample set.
  • the remaining first text data is the remaining first text data after deleting the first comment data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent in the first text data set.
  • the electronic device uses the first training sample set to train the first neural network.
  • the electronic device obtains the second text data set from the second platform.
  • the electronic device uses the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, and the second annotation result includes a positive One of evaluation, negative evaluation, or neutral evaluation.
  • the comment data is annotated by emoji expressions in the comment data, and there is no need to perform semantic analysis on the comment data, so that the annotation will not be restricted by the language type of the comment data, thereby increasing
  • the comment data can be automatically annotated through emoji expressions, and training sample sets containing emotion classification labels can be obtained without manual labeling, thereby saving human and material resources; moreover, in the first text Before the data set is annotated, the first text data set is cleaned to retain high-quality first text data, thereby improving the accuracy of the annotation.
  • FIG. 3 is a schematic flowchart of another text labeling method provided by an embodiment of the application.
  • the content in this embodiment is the same as the embodiment shown in FIG. 1 and FIG. 2, and the description will not be repeated here.
  • the method is applied to electronic equipment, and the method includes the following steps.
  • the electronic device obtains the first text data set from the first platform.
  • the electronic device cleans each piece of first text data in the first text data set, deletes the first text data that does not contain emoji expressions, obtains a new first text data set, and replaces the new first text data
  • the data set serves as the first text data set.
  • the electronic device determines a first emotional evaluation of each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, and the first emotional evaluation includes a positive evaluation or a negative evaluation.
  • the electronic device extracts the text content of each piece of first text data, performs semantic analysis on the text content of each piece of first text data, and obtains semantic information of each piece of first text data.
  • the electronic device determines the second sentiment evaluation of each piece of first text data according to the semantic information of each piece of first text data.
  • the electronic device retains the first text data in the first text data set that has the same first sentiment evaluation and the second sentiment evaluation, and deletes the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent.
  • the electronic device labels the remaining first text data according to the first sentiment evaluation of the remaining first text data to obtain a first training sample set.
  • the remaining first text data is the remaining first text data after deleting the first comment data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent in the first text data set.
  • the electronic device uses the first training sample set to train the first neural network.
  • the electronic device obtains the second text data set from the second platform.
  • the electronic device uses the first neural network to annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, where the second annotation result includes a positive One of evaluation, negative evaluation, or neutral evaluation.
  • the electronic device uses the second labeling result according to each piece of second text data to obtain a second training sample set, and uses the second training sample set to train the second neural network.
  • the electronic device obtains any piece of comment data, uses the second neural network to classify the comment data to obtain a classification result of the comment data, and determines whether to disclose the comment data according to the classification result.
  • the text data is annotated by emoji expressions in the text data, and there is no need to perform semantic analysis on the text data, so that the annotation will not be restricted by the language type of the text data, thereby increasing
  • the text data can be automatically annotated by emoji expressions, and training sample sets containing emotion classification labels can be obtained without manual labeling, thereby saving human and material resources; moreover, in the first text Before annotating the data set, clean the first text data set to retain high-quality first text data, thereby improving the accuracy of annotation; in addition, use the trained second neural network to classify the comment data to be published. Automatically block the comment data that does not meet the requirements to be published, without human review, saving human resources.
  • the electronic device 400 includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and are configured to be executed by the processor to execute Instructions for the following steps: Obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; according to each piece of first text data in the first text data set The emoji expression of each piece of first text data is annotated, and the first annotation result of each piece of first text data is obtained, and the first annotation result includes a positive evaluation or a negative evaluation; according to the first annotation of each piece of first text data
  • the first training sample set is obtained by the labeling result; the first neural network is trained using the first training sample set; the second text data set is obtained from the second third party platform; the second text data set is obtained by using the first neural network
  • the data set is annotated to obtain
  • the processor in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, is specifically configured to: Describe the emoji expression of each piece of first text data in the first text data set, and determine the first emotional evaluation of each piece of first text data, where the first emotional evaluation includes a positive evaluation or a negative evaluation; according to each piece of first text data The first sentiment evaluation of each piece of first text data is marked.
  • the processor is further configured to: Extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; determine each piece of data according to the semantic information of each piece of first text data The second sentiment evaluation of the first text data; the first text data in the first text data set with the first sentiment evaluation consistent with the second sentiment evaluation is retained, and the first text data in which the first sentiment evaluation is inconsistent with the second sentiment evaluation is deleted .
  • the processor before annotating the first text data set, is further configured to: clean each piece of first text data in the first text data set, and delete The first text data containing emoji expressions is used to obtain a new first text data set; and the new first text data set is used as the first text data set.
  • the The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a negative evaluation Second probability; determining that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determining the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  • the processor is further configured to: obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; use the second training The sample set trains the second neural network; obtains any piece of comment data to be published; uses the second neural network to perform sentiment classification on the comment data to be published to obtain the classification result of the comment data to be published ; According to the classification result, determine whether to publish the comment data to be published.
  • the processor is further configured to: The two training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train a second neural network, the processor is specifically configured to: The second neural network is trained using the new second training sample set.
  • the electronic device 500 includes: an acquisition unit 510, a labeling unit 520, and a training unit 530.
  • the obtaining unit 510 is configured to obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; the labeling unit 520, according to each of the first text data set An emoji expression of the first text data is annotated for each first text data, and the first annotation result of each first text data is obtained.
  • the first annotation result includes a positive evaluation or a negative evaluation
  • the training unit 530 uses To obtain a first training sample set according to the first annotation result of each piece of first text data, and use the first training sample set to train the first neural network
  • the obtaining unit 510 is also used to obtain from the second third party platform
  • the labeling unit 520 is further configured to use the first neural network to label the second text data set to obtain a second labeling result of each piece of second text data in the second text data set
  • the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluation.
  • the labeling unit 520 is specifically configured to: The emoji expression of each piece of first text data in the first text data set determines the first emotional evaluation of each piece of first text data, and the first emotional evaluation includes a positive evaluation or a negative evaluation; The first sentiment evaluation is to label each piece of first text data.
  • the electronic device 500 further includes a cleaning unit 540.
  • the cleaning unit 540 is used to: extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; according to each piece of first text data To determine the second sentiment evaluation of each piece of first text data; retain the first text data in the first text data set with the first sentiment evaluation consistent with the second sentiment evaluation, and delete the first sentiment evaluation and the second sentiment The first text data with inconsistent evaluations.
  • the electronic device 500 further includes a cleaning unit 540.
  • the cleaning unit 540 is configured to: The data is cleaned, the first text data that does not contain emoji expressions is deleted, and a new first text data set is obtained; the new first text data set is used as the first text data set.
  • the annotation unit 520 in terms of using the first neural network to annotate the second text data set to obtain the second annotation result of each piece of second text data in the second text data set, the annotation unit 520, specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and the first probability of a negative evaluation Two probabilities; determine that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determine the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  • it further includes a determining unit 550; a training unit 530, further configured to obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; and training unit 530, is further configured to use the second training sample set to train a second neural network; the determining unit 550, is configured to obtain any piece of comment data to be published; use the second neural network to perform a comment on the comment to be published The data is emotionally classified to obtain a classification result of the comment data to be published; according to the classification result, it is determined whether to disclose the comment data to be published.
  • the training unit 530 is further configured to: The training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train the second neural network, the training unit 530 is specifically used to: The new second training sample set trains the second neural network.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium may be non-volatile or volatile, the computer-readable storage medium stores a computer program, and the storage computer The program is executed by the processor to implement the following steps: obtain a first text data set from the first third party platform, each piece of first text data in the first text data set includes an emoji expression; The emoji expression of each piece of first text data is annotated for each piece of first text data, and the first annotation result of each piece of first text data is obtained.
  • the first annotation result includes a positive evaluation or a negative evaluation
  • a first labeling result of text data obtains a first training sample set; using the first training sample set to train a first neural network; obtaining a second text data set from a second third party platform; using the first neural network Annotate the second text data set to obtain a second annotation result of each piece of second text data in the second text data set, where the second annotation result includes one of a positive evaluation, a negative evaluation, or a neutral evaluationkind.
  • the processor in terms of labeling each piece of first text data according to the emoji expression of each piece of first text data in the first text data set, is specifically configured to: Describe the emoji expression of each piece of first text data in the first text data set, and determine the first emotional evaluation of each piece of first text data, where the first emotional evaluation includes a positive evaluation or a negative evaluation; according to each piece of first text data The first sentiment evaluation of each piece of first text data is marked.
  • the processor is further configured to: Extract the text content of each piece of first text data; perform semantic analysis on the text content of each piece of first text data to obtain the semantic information of each piece of first text data; determine each piece of data according to the semantic information of each piece of first text data The second sentiment evaluation of the first text data; retain the first text data in the first text data set whose first sentiment evaluation is consistent with the second sentiment evaluation, and delete the first text data in which the first sentiment evaluation and the second sentiment evaluation are inconsistent .
  • the processor before annotating the first text data set, is further configured to: clean each piece of first text data in the first text data set, and delete The first text data containing emoji expressions is used to obtain a new first text data set; and the new first text data set is used as the first text data set.
  • the The processor is specifically configured to: use the first neural network to classify each piece of second text data in the second text data set to obtain the first probability that each piece of second text data is a positive evaluation and a negative evaluation Second probability; determining that the second annotation result of the second text data with the first probability greater than the first threshold is a positive evaluation; determining the second annotation result of the second text data with the second probability greater than the first threshold as a negative evaluation; The second marking result of the second text data whose first probability is less than the first threshold and greater than the second threshold is a neutral evaluation.
  • the processor is further configured to: obtain a second training sample set according to the second annotation result of each piece of second text data in the second text data set; use the second training The sample set trains the second neural network; obtains any piece of comment data to be published; uses the second neural network to perform sentiment classification on the comment data to be published to obtain the classification result of the comment data to be published ; According to the classification result, determine whether to publish the comment data to be published.
  • the processor is further configured to: The two training samples are combined with the first training sample set to obtain a new second training sample set; in terms of using the second training sample set to train a second neural network, the processor is specifically configured to: The second neural network is trained using the new second training sample set.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software program module.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
  • a number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé d'annotation de texte, un dispositif et un support de stockage, associés au domaine technique de la reconnaissance d'émotions dans l'intelligence artificielle. Procédé d'annotation de texte comprenant les étapes consistant à : acquérir un premier ensemble de données de texte en provenance d'une première plateforme tierce, chaque élément de donnée de texte du premier ensemble de données de texte comportant une expression émoticône ; annoter chaque élément de données de texte sur la base de l'expression émoticône de chaque élément d'une première donnée de texte du premier ensemble de données de texte afin d'obtenir un premier résultat d'annotation pour chaque élément de premières données de texte, le premier résultat d'annotation comportant un commentaire positif ou un commentaire négatif ; produire un premier ensemble d'échantillons d'apprentissage sur la base du premier résultat d'annotation de chaque élément de premières données de texte ; avoir recours au premier ensemble d'échantillons d'apprentissage pour entraîner un premier réseau neuronal ; acquérir un deuxième ensemble de données de texte en provenance d'une deuxième plateforme tierce ; et avoir recours au premier réseau neuronal pour annoter le deuxième ensemble de données de texte afin de produire un deuxième résultat d'annotation pour chaque élément de deuxièmes données de texte du deuxième ensemble de données de texte, le deuxième résultat d'annotation comportant un commentaire positif ou un commentaire négatif ou un commentaire neutre.
PCT/CN2020/099493 2020-05-28 2020-06-30 Procédé d'annotation de texte, dispositif, et support de stockage WO2021114634A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010465811.4 2020-05-28
CN202010465811.4A CN111695357A (zh) 2020-05-28 2020-05-28 文本标注方法及相关产品

Publications (1)

Publication Number Publication Date
WO2021114634A1 true WO2021114634A1 (fr) 2021-06-17

Family

ID=72478683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099493 WO2021114634A1 (fr) 2020-05-28 2020-06-30 Procédé d'annotation de texte, dispositif, et support de stockage

Country Status (2)

Country Link
CN (1) CN111695357A (fr)
WO (1) WO2021114634A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172248A (zh) * 2023-11-03 2023-12-05 翼方健数(北京)信息科技有限公司 一种文本数据标注方法、系统和介质
CN117689998A (zh) * 2024-01-31 2024-03-12 数据空间研究院 非参数自适应的情绪识别模型、方法、系统和存储介质
CN117725909A (zh) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 一种多维度的评论审核方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364797A1 (en) * 2016-06-16 2017-12-21 Sysomos L.P. Computing Systems and Methods for Determining Sentiment Using Emojis in Electronic Data
CN109034203A (zh) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 表情推荐模型的训练、表情推荐方法、装置、设备及介质
CN109684478A (zh) * 2018-12-18 2019-04-26 腾讯科技(深圳)有限公司 分类模型训练方法、分类方法及装置、设备和介质
CN110704581A (zh) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 计算机执行的文本情感分析方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170364797A1 (en) * 2016-06-16 2017-12-21 Sysomos L.P. Computing Systems and Methods for Determining Sentiment Using Emojis in Electronic Data
CN109034203A (zh) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 表情推荐模型的训练、表情推荐方法、装置、设备及介质
CN109684478A (zh) * 2018-12-18 2019-04-26 腾讯科技(深圳)有限公司 分类模型训练方法、分类方法及装置、设备和介质
CN110704581A (zh) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 计算机执行的文本情感分析方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172248A (zh) * 2023-11-03 2023-12-05 翼方健数(北京)信息科技有限公司 一种文本数据标注方法、系统和介质
CN117172248B (zh) * 2023-11-03 2024-01-30 翼方健数(北京)信息科技有限公司 一种文本数据标注方法、系统和介质
CN117689998A (zh) * 2024-01-31 2024-03-12 数据空间研究院 非参数自适应的情绪识别模型、方法、系统和存储介质
CN117689998B (zh) * 2024-01-31 2024-05-03 数据空间研究院 非参数自适应的情绪识别模型、方法、系统和存储介质
CN117725909A (zh) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 一种多维度的评论审核方法、装置、电子设备及存储介质
CN117725909B (zh) * 2024-02-18 2024-05-14 四川日报网络传媒发展有限公司 一种多维度的评论审核方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111695357A (zh) 2020-09-22

Similar Documents

Publication Publication Date Title
CN107346336B (zh) 基于人工智能的信息处理方法和装置
CN104735468B (zh) 一种基于语义分析将图像合成新视频的方法及系统
WO2021114634A1 (fr) Procédé d'annotation de texte, dispositif, et support de stockage
US20220254348A1 (en) Automatically generating a meeting summary for an information handling system
WO2022116418A1 (fr) Procédé et appareil de détermination automatique de contrefaçon de marque de commerce, dispositif électronique, et support de stockage
CN112749326B (zh) 信息处理方法、装置、计算机设备及存储介质
US20200134398A1 (en) Determining intent from multimodal content embedded in a common geometric space
JP7334395B2 (ja) ビデオ分類方法、装置、機器、および記憶媒体
CN112559800B (zh) 用于处理视频的方法、装置、电子设备、介质和产品
CN110046293B (zh) 一种用户身份关联方法及装置
US11019012B2 (en) File sending in instant messaging application
CN111931859B (zh) 一种多标签图像识别方法和装置
US11436446B2 (en) Image analysis enhanced related item decision
CN108959323B (zh) 视频分类方法和装置
WO2018205845A1 (fr) Procédé de traitement de données, serveur, et support de stockage informatique
WO2017206376A1 (fr) Procédé de recherche, dispositif de recherche et support de stockage informatique non volatil
CN111177462B (zh) 视频分发时效的确定方法和装置
CN110516203B (zh) 争议焦点分析方法、装置、电子设备及计算机可存储介质
CN113596130A (zh) 基于兴趣画像的人工智能模块训练方法、系统及服务器
US20210256221A1 (en) System and method for automatic summarization of content with event based analysis
CN115661302A (zh) 一种视频编辑方法、装置、设备及存储介质
CN113392205A (zh) 用户画像构建方法、装置、设备及存储介质
TWI575391B (zh) 社群資料篩選系統、方法及其非揮發性電腦可讀取紀錄媒體
WO2018120575A1 (fr) Procédé et dispositif d'identification d'image principale dans une page web
WO2021081914A1 (fr) Procédé et appareil de détermination d'objet à pousser, dispositif terminal et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900623

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900623

Country of ref document: EP

Kind code of ref document: A1