CN114580409A - Text classification method and device, electronic equipment and storage medium - Google Patents

Text classification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114580409A
CN114580409A CN202210233373.8A CN202210233373A CN114580409A CN 114580409 A CN114580409 A CN 114580409A CN 202210233373 A CN202210233373 A CN 202210233373A CN 114580409 A CN114580409 A CN 114580409A
Authority
CN
China
Prior art keywords
label
sample data
pseudo
data set
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210233373.8A
Other languages
Chinese (zh)
Inventor
刘欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202210233373.8A priority Critical patent/CN114580409A/en
Publication of CN114580409A publication Critical patent/CN114580409A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a text classification method, a text classification device, electronic equipment and a storage medium, wherein the method comprises the following steps: predicting the unlabeled sample data set by using an initialized classification model obtained by training the labeled sample data set to obtain a label set of each unlabeled sample data and the probability of each label in the label set; determining a target pseudo label set of each unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set; pre-marking the sample data set without the label to obtain a sample data set with a pseudo label; and inputting the text to be classified into a target classification model trained according to the sample data set with the pseudo label and the sample data set with the label to obtain a classification result. According to the invention, the classification model is trained according to the sample data set with the pseudo label and the sample data set with the label to classify the text, so that the accuracy of text classification is improved.

Description

Text classification method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a text classification method and device, electronic equipment and a storage medium.
Background
Supervised learning is always the most important field of artificial intelligence, but the supervised learning depends on a large amount of labeled linguistic data to train a model with high accuracy, and in the prior art, a small amount of original sample data sets are trained firstly, and then, predicted unlabeled sample data sets and a small amount of original sample data sets are trained together to obtain a classification model.
However, for a multi-label scene, a trained model is used to predict a label-free sample data set, which labels are real labels of the sample cannot be determined, many labels which are wrongly predicted may appear, and an accurate classification model cannot be trained by using the wrongly predicted labels, so that the text classification accuracy is low.
Disclosure of Invention
In view of the above, there is a need to provide a text classification method, device, electronic device and storage medium, which train a classification model according to a sample data set with a pseudo tag and a sample data set with a tag to classify a text, so as to improve the accuracy of text classification.
A first aspect of the present invention provides a text classification method, including:
acquiring a sample data set with a label and a sample data set without a label from a preset text library;
training a preset classification model based on the sample data set with the labels to obtain an initialized classification model;
predicting the unlabeled sample data set by adopting the initialized classification model to obtain a label set of each unlabeled sample data and the probability of each label in the label set;
determining a target pseudo label set of each unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set;
pre-marking the label-free sample data set according to a target pseudo label set of the label-free sample data set to obtain a sample data set with pseudo labels;
training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model;
and responding to the received text to be classified, and inputting the text to be classified into the target classification model to obtain a classification result.
Optionally, the determining, according to the labelset of each unlabeled sample data and the probability of each label in the labelset, a target pseudo labelset of the unlabeled sample data set includes:
determining a first pseudo label set of each non-label sample data according to the label set of each non-label sample data and the probability of each label in the label set;
calculating the similarity between each label in the labeled sample data set and each label in the unlabeled sample data label set, and determining a second pseudo label set of each unlabeled sample data according to the calculated similarity;
calculating the intersection of the first pseudo tag set and the corresponding second pseudo tag set of each non-tag sample data to obtain a third pseudo tag set of each non-tag sample data, and associating each non-tag sample data with the corresponding third pseudo tag set;
and determining a plurality of third pseudo label sets corresponding to the non-label sample data set as a target pseudo label set of the non-label sample data set.
Optionally, the determining the first pseudo tag set of each unlabeled sample data according to the tag set of each unlabeled sample data and the probability of each tag in the tag set includes:
acquiring a preset pseudo label threshold value, screening out a first label with the label probability greater than or equal to the preset pseudo label threshold value from the label set of each non-label sample data, and determining the first label as the first pseudo label set of each non-label sample data.
Optionally, the determining, according to the labelset of each piece of the unlabeled sample data and the probability of each label in the labelset, the first pseudo labelset of each piece of the unlabeled sample data includes:
sorting the label sets of the non-label sample data in a descending order according to the probability of each label; and selecting a plurality of labels ranked at the front from the descending ranking result, and determining the labels as a first pseudo label set of each non-label sample data.
Optionally, said calculating a similarity between each tag in said set of tagged sample data and each tag in said set of tags for said each non-tagged sample data comprises:
identifying the label of each labeled sample in the labeled sample set, and classifying the labeled sample data set according to the identified label to obtain a sample data set corresponding to each label;
inputting the sample data set corresponding to each label into a pre-trained Bert model to obtain a first label code of each label in the sample data set with the label;
inputting each label in the label set of each non-label sample data into a pre-trained Bert model to obtain a second label code of each label in the label set of each non-label sample data;
and calculating the similarity between each first label code and each second label code by adopting a preset function to obtain the similarity between each label in the label set of the non-label sample data and each label in the labeled sample data set.
Optionally, the determining the second pseudo label set of each non-label sample data according to the calculated similarity includes:
acquiring a preset similarity threshold;
and screening out the labels with the similarity between each label and each label in the labeled sample data set larger than the preset similarity threshold from the label set of each unlabeled sample data, and determining the labels as a second pseudo label set of each unlabeled sample data.
Optionally, the pre-marking the unlabeled sample data set according to the target pseudo label set of the unlabeled sample data set to obtain a sample data set with pseudo labels includes:
identifying location information of each target pseudo tag in a target pseudo tag set of the unlabeled sample data set;
and pre-marking the label-free sample data set by adopting a preset pre-marking module according to the position information of each target pseudo label to obtain the sample data set with the pseudo label.
A second aspect of the present invention provides a text classification apparatus, comprising:
the acquisition module is used for acquiring a sample data set with a label and a sample data set without the label from a preset text library;
the first training module is used for training a preset classification model based on the sample data set with the labels to obtain an initialized classification model;
the prediction module is used for predicting the unlabeled sample data set by adopting the initialized classification model to obtain a label set of each unlabeled sample data and the probability of each label in the label set;
a determining module, configured to determine a target pseudo tag set of each unlabeled sample data set according to the tag set of each unlabeled sample data and the probability of each tag in the tag set;
the pre-marking module is used for pre-marking the label-free sample data set according to the target pseudo label set of the label-free sample data set to obtain a sample data set with pseudo labels;
the second training module is used for training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model;
and the input module is used for responding to the received text to be classified, inputting the text to be classified into the target classification model and obtaining a classification result.
A third aspect of the invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the text classification method when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the text classification method.
In summary, according to the text classification method, the text classification device, the electronic device and the storage medium of the present invention, on one hand, the probability of each label of each unlabeled sample data is predicted by performing classification prediction on the unlabeled sample data set by using the initialized classification model, and then the probability of each label is considered in the process of performing pre-labeling on each unlabeled sample data, so that the accuracy of each label of each unlabeled sample data is improved, and further, the accuracy of text classification is improved. And determining a target pseudo label set of the unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set, and pre-marking the unlabeled sample data set according to the target pseudo label set to obtain a sample data set with pseudo labels, so that the label prediction accuracy of each unlabeled sample data is improved, the accuracy of sample data used by a subsequent training classification model is ensured, and the text classification accuracy is improved. And training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model, solving the problem of low classification accuracy of the initial classification model obtained by learning a small sample data set, ensuring the accuracy of the target classification model, and further improving the accuracy of the subsequent text classification.
Drawings
Fig. 1 is a flowchart of a text classification method according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a text classification apparatus according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Example one
Fig. 1 is a flowchart of a text classification method according to an embodiment of the present invention.
In this embodiment, the text classification method may be applied to an electronic device, and for an electronic device that needs to perform text classification, the text classification function provided by the method of the present invention may be directly integrated on the electronic device, or may be run in the electronic device in the form of a Software Development Kit (SDK).
The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning, deep learning and the like.
As shown in fig. 1, the text classification method specifically includes the following steps, and the order of the steps in the flowchart may be changed and some steps may be omitted according to different requirements.
And S11, acquiring a sample data set with a label and a sample data set without a label from a preset text library.
In this embodiment, when text classification is performed on a spam or a customer service robot scene, a sample data set with a label and a sample data set without a label to be trained are obtained from a preset text library, wherein the number of samples of the sample data set with the label is smaller than that of the sample data set without the label.
In this embodiment, the sample data set with tags includes a plurality of tags, each tag corresponds to a plurality of sample data, and the sample data in the sample data set without tags is not labeled.
And S12, training a preset classification model based on the labeled sample data set to obtain an initialized classification model.
In this embodiment, a classification model may be preset, where the preset classification model may be a convolutional neural network or a deep learning network, and the convolutional neural network and the deep learning network are the prior art, which is not described in detail herein.
In the embodiment, the initialized classification model is obtained by training the preset classification model based on the sample data set with the label, and the sample data in the sample data set with the label is less and cannot cover all sample data, so that the text classification accuracy is low if the initialized classification model is directly adopted for text classification in the subsequent process.
S13, predicting the unlabeled sample data set by adopting the initialized classification model to obtain the label set of each unlabeled sample data and the probability of each label in the label set.
In this embodiment, in order to solve the problem that the text classification is inaccurate because the sample data set with the label cannot cover all sample data, the present embodiment predicts the classification of the unlabeled sample data set by using the initialized classification model, predicts the probability of each label of each unlabeled sample data, and subsequently considers the probability of each label in the process of pre-labeling each unlabeled sample data, thereby improving the accuracy of each label of the unlabeled sample data and further improving the accuracy of text classification.
And S14, determining a target pseudo label set of the unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set.
In this embodiment, the target pseudo tag set is obtained by merging the target pseudo tag sets of each non-tag sample data in the non-tag sample data set.
In an optional embodiment, the determining the target pseudo tag set of the unlabeled sample data set according to the tag set of each unlabeled sample data and the probability of each tag in the tag set includes:
acquiring a preset pseudo label threshold value, screening out a first label with the probability of being greater than or equal to the preset pseudo label threshold value from the label set of each non-label sample data, and determining the first label as the first pseudo label set of each non-label sample data;
calculating the similarity between each label in the labeled sample data set and each label in the unlabeled sample data label set, and determining a second pseudo label set of each unlabeled sample data according to the calculated similarity;
calculating the intersection of the first pseudo tag set and the corresponding second pseudo tag set of each non-tag sample data to obtain a third pseudo tag set of each non-tag sample data, and associating each non-tag sample data with the corresponding third pseudo tag set;
and determining a plurality of third pseudo label sets corresponding to the non-label sample data set as a target pseudo label set of the non-label sample data set.
In the embodiment, the target pseudo tag set of the unlabeled sample data set is screened out by calculating the similarity between the tag of each labeled sample in the labeled sample data set and each tag in the unlabeled sample data tag set according to the preset pseudo tag threshold, so that the problem that which tags are the real tags of each unlabeled sample data cannot be determined for a plurality of tag scenes in the prior art is solved, the tag prediction accuracy of each unlabeled sample data is improved, the accuracy of the sample data used by a subsequent training classification model is ensured, and the text classification accuracy is further improved.
Further, said calculating a similarity between each tag in said set of tagged sample data and each tag in said set of tags of each said unlabeled sample data comprises:
identifying the label of each labeled sample in the labeled sample set, and classifying the labeled sample data set according to the identified label to obtain a sample data set corresponding to each label;
inputting the sample data set corresponding to each label into a pre-trained Bert model to obtain a first label code of each label in the sample data set with the label;
inputting each label in the label set of each non-label sample data into a pre-trained Bert model to obtain a second label code of each label in the label set of each non-label sample data;
and calculating the similarity between each first label code and each second label code by adopting a preset function to obtain the similarity between each label in the label set of the non-label sample data and each label in the labeled sample data set.
In this embodiment, the preset function may be a Sim function, and the similarity between each label in the label set of each non-label sample data is calculated through the Sim function, which exemplarily: the unlabeled sample data set [ x1, x2, …, xn ], the corresponding unlabeled sample data has a label set [ x1(x11, x12, x13), x2(x21, x22) …, xn (xn1, xn2, xn3, xn4, xn5) ], and x11, x12, and x13 have a label set of x 1.
In this embodiment, each unlabeled sample data set may include one label or multiple labels, where the one label or multiple labels form a label set of the unlabeled sample data, a code of each label in the labeled sample data set is calculated, for example, the labeled sample data set includes three labels of sports, finance, and entertainment, a sample data set corresponding to label "sports" is obtained, the sample data set corresponding to label "sports" is input into a pre-trained Bert model for coding, the codes of all sample data in the sample data set are added to obtain an average value, a first label code of label "sports" is obtained, the label set of each unlabeled sample data in the unlabeled sample data set is input into a pre-trained Bert model, a second label code of each label in the label set of each unlabeled sample data in the unlabeled sample data set is obtained, and calculating the similarity between the second label code of each label in the label set of each sample data without the label and the first label code corresponding to the label 'sports', the first label code corresponding to the label 'financial affairs' and the first label code corresponding to the label 'entertainment'.
Further, the determining the second pseudo tag set of each non-tag sample data according to the calculated similarity includes:
acquiring a preset similarity threshold;
and screening out the labels with the similarity between each label and each label in the labeled sample data set larger than the preset similarity threshold from the label set of each unlabeled sample data, and determining the labels as a second pseudo label set of each unlabeled sample data.
Specifically, the preset similarity threshold may be set by using the following formula:
t=rmax1≤i≤sf(x,li)+(1-r)min1≤i≤sf(x,li),
wherein r represents a weight, the value is between 0 and 1, liRepresenting the ith label in the tagged sample data set tagsetA first tag code, x represents a second tag code of any tag in any one of the unlabeled sample data set, f (x, l)i) A second label code of any label in the label set representing any unlabeled sample and a first label code of the ith label, s represents the number of labels in the labeled sample data set, max1≤i≤sf(x,li) Representing a maximum similarity, min, in a similarity set corresponding to a labelset of the any one unlabeled sample data1≤i≤sf(x,li) And representing the minimum similarity in the similarity set corresponding to the label set of any one non-label sample data.
In this embodiment, the weight may be preset, for example, may be set to 0.4, where the weight occupied by the label with the greater similarity in the similarity threshold is higher.
In this embodiment, in order to ensure the accuracy of the label set of each non-label sample data, the maximum value and the minimum value are considered when the similarity threshold is set, so that the influence of the selection of the threshold on the selection of the pseudo label is reduced, and the reasonability of the set similarity threshold is ensured.
In other optional embodiments, the determining, according to the labelset of each non-labeled sample data and the probability of each label in the labelset, the target pseudo label set of the non-labeled sample data set includes:
sorting the label sets of the non-label sample data in a descending order according to the probability of each label; selecting a plurality of labels ranked at the front from the descending ranking result, and determining the labels as a first pseudo label set of each non-label sample data;
calculating the similarity between each label in the labeled sample data set and each label in the unlabeled sample data label set, and determining a second pseudo label set of each unlabeled sample data according to the calculated similarity;
calculating the intersection of the first pseudo label set and the corresponding second pseudo label set of each non-label sample data to obtain a third pseudo label set of each non-label sample data, and associating each non-label sample data with the corresponding third pseudo label set;
and determining a plurality of third pseudo label sets corresponding to the non-label sample data set as a target pseudo label set of the non-label sample data set.
In this embodiment, in the process of confirming each non-labeled sample data set, a plurality of labels with a high probability of a label may be directly selected as the first pseudo label set, and other calculations are not required, so that the determination efficiency of the first pseudo label set may be improved.
And S15, pre-marking the unlabeled sample data set according to the target pseudo label set of the unlabeled sample data set to obtain a sample data set with pseudo labels.
In this embodiment, a sample data set with a pseudo label is obtained by pre-marking a target pseudo label corresponding to each unlabeled sample in the unlabeled sample data set.
In an optional embodiment, the pre-marking the unlabeled sample data set according to the target pseudo label set of the unlabeled sample data set to obtain the sample data set with pseudo labels includes:
identifying location information of each target pseudo tag in a target pseudo tag set of the unlabeled sample data set;
and pre-marking the label-free sample data set by adopting a preset pre-marking module according to the position information of each target pseudo label to obtain the sample data set with the pseudo label.
In the embodiment, the pre-marking module can be preset, and pre-marking is performed on each non-label sample data in the non-label sample data set according to the position information of each target pseudo label by adopting the preset pre-marking module, so that manual marking is not needed, the problem of high marking error rate caused by manual marking is avoided, and the efficiency and accuracy of pre-marking are improved.
And S16, training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model.
In this embodiment, since the pseudo label of each sample data in the sample data set with the pseudo label is obtained by calculating an intersection from a first pseudo label set determined by a preset pseudo label threshold and a second pseudo label set determined by calculating a similarity between the label of each labeled sample in the labeled sample data set and each label in the label set of each unlabeled sample data, the accuracy of the pseudo label of each unlabeled sample data in the unlabeled sample data set is ensured, and the set with the pseudo label and the labeled sample data set are used for training when the target classification model is performed, so that the problem of low classification accuracy of an initial classification model obtained by learning a small sample data set is solved, the accuracy of the target classification model is ensured, and the accuracy of subsequent text classification is further improved.
S17, responding to the received text to be classified, inputting the text to be classified into the target classification model, and obtaining a classification result.
In this embodiment, the text to be classified refers to a text that needs to be classified, and the text to be classified may include a text with a tag or a text without a tag. For example, for spam classification, some spam mails contain tags, some spam mails do not contain tags, and the spam mails not containing tags need to be marked, so as to obtain a classification result.
In other optional embodiments, the target classification model is used for text classification, so that the problem that rapid and accurate text classification cannot be realized due to the fact that only a small amount of label data exists in a project file is solved.
In summary, in the text classification method according to this embodiment, on one hand, the initialized classification model is used to classify and predict the unlabeled sample data set, so as to predict the probability of each label of each unlabeled sample data, and then the probability of each label is considered in the process of pre-labeling each unlabeled sample data, so that the accuracy of each label of each unlabeled sample data is improved, and further, the accuracy of text classification is improved. And determining a target pseudo label set of the unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set, and pre-marking the unlabeled sample data set according to the target pseudo label set to obtain a sample data set with pseudo labels, so that the label prediction accuracy of each unlabeled sample data is improved, the accuracy of sample data used by a subsequent training classification model is ensured, and the text classification accuracy is improved. And training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model, solving the problem of low classification accuracy of the initial classification model obtained by learning a small sample data set, ensuring the accuracy of the target classification model, and further improving the accuracy of the subsequent text classification.
Example two
Fig. 2 is a structural diagram of a text classification apparatus according to a second embodiment of the present invention.
In some embodiments, the text classification apparatus 20 may include a plurality of functional modules composed of program code segments. The program code of the various program segments in the text classification apparatus 20 may be stored in a memory of the electronic device and executed by the at least one processor to perform the functions of text classification (described in detail in fig. 1).
In this embodiment, the text classification device 20 may be divided into a plurality of functional modules according to the functions performed by the device. The functional module may include: an acquisition module 201, a first training module 202, a prediction module 203, a determination module 204, a pre-marking module 205, a second training module 206, and an input module 207. The module referred to herein is a series of computer readable instruction segments stored in a memory that can be executed by at least one processor and that can perform a fixed function. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The obtaining module 201 obtains a sample data set with a tag and a sample data set without a tag from a preset text library.
In this embodiment, when text classification is performed on a spam or a customer service robot scene, a sample data set with a label and a sample data set without a label to be trained are obtained from a preset text library, wherein the number of samples of the sample data set with the label is smaller than that of the sample data set without the label.
In this embodiment, the sample data set with tags includes a plurality of tags, each tag corresponds to a plurality of sample data, and the sample data in the sample data set without tags is not labeled.
The first training module 202 is configured to train a preset classification model based on the labeled sample data set to obtain an initialized classification model.
In this embodiment, a classification model may be preset, where the preset classification model may be a convolutional neural network or a deep learning network, and the convolutional neural network and the deep learning network are the prior art, which is not described in detail herein.
In this embodiment, the preset classification model is trained based on the sample data set with the label to obtain the initialized classification model, and since the sample data in the sample data set with the label is less, all sample data cannot be covered, and subsequently, if the initialized classification model is directly adopted to perform text classification, the text classification accuracy is low.
The predicting module 203 is configured to predict the unlabeled sample data set by using the initialized classification model, so as to obtain a label set of each unlabeled sample data and a probability of each label in the label set.
In this embodiment, in order to solve the problem that the text classification is inaccurate because the sample data set with the label cannot cover all sample data, the present embodiment predicts the classification of the unlabeled sample data set by using the initialized classification model, predicts the probability of each label of each unlabeled sample data, and subsequently considers the probability of each label in the process of pre-labeling each unlabeled sample data, thereby improving the accuracy of each label of the unlabeled sample data and further improving the accuracy of text classification.
A determining module 204, configured to determine a target pseudo tag set of each unlabeled sample data set according to the tag set of each unlabeled sample data and the probability of each tag in the tag set.
In this embodiment, the target pseudo tag set is obtained by merging the target pseudo tag sets of each non-tag sample data in the non-tag sample data set.
In an optional embodiment, the determining module 204 determines the target pseudo tag set of the unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set, including:
acquiring a preset pseudo label threshold value, screening out a first label with the label probability greater than or equal to the preset pseudo label threshold value from the label set of each non-label sample data, and determining the first label as the first pseudo label set of each non-label sample data;
calculating the similarity between each label in the labeled sample data set and each label in the unlabeled sample data label set, and determining a second pseudo label set of each unlabeled sample data according to the calculated similarity;
calculating the intersection of the first pseudo tag set and the corresponding second pseudo tag set of each non-tag sample data to obtain a third pseudo tag set of each non-tag sample data, and associating each non-tag sample data with the corresponding third pseudo tag set;
and determining a plurality of third pseudo label sets corresponding to the non-label sample data set as a target pseudo label set of the non-label sample data set.
In this embodiment, the target pseudo tag set of the unlabeled sample data set is screened out by calculating the similarity between the tag of each labeled sample in the labeled sample data set and each tag in the unlabeled sample data tag set according to a preset pseudo tag threshold, so that the problem that which tags are the real tags of each unlabeled sample data cannot be determined for a plurality of tag scenes in the prior art is solved, the tag prediction accuracy of each unlabeled sample data is improved, the accuracy of the sample data used in the subsequent training classification model is ensured, and the text classification accuracy is further improved.
Further, said calculating a similarity between each tag in said set of tagged sample data and each tag in said set of tags of each said unlabeled sample data comprises:
identifying the label of each labeled sample in the labeled sample set, and classifying the labeled sample data set according to the identified label to obtain a sample data set corresponding to each label;
inputting the sample data set corresponding to each label into a pre-trained Bert model to obtain a first label code of each label in the sample data set with the label;
inputting each label in the label set of each non-label sample data into a pre-trained Bert model to obtain a second label code of each label in the label set of each non-label sample data;
and calculating the similarity between each first label code and each second label code by adopting a preset function to obtain the similarity between each label in the label set of the non-label sample data and each label in the labeled sample data set.
In this embodiment, the preset function may be an Sim function, and the similarity between each tag in the tag set of each non-tag sample data is calculated through the Sim function, which exemplarily: the unlabeled sample data set [ x1, x2, …, xn ], the corresponding unlabeled sample data has a tagset [ x1(x11, x12, x13), x2(x21, x22) …, xn (xn1, xn2, xn3, xn4, xn5) ], x11, x12, and x13 is a tagset of x 1.
In this embodiment, each unlabeled sample data set may include one label or multiple labels, where the one label or multiple labels form a label set of the unlabeled sample data, a code of each label in the labeled sample data set is calculated, for example, the labeled sample data set includes three labels of sports, finance, and entertainment, a sample data set corresponding to label "sports" is obtained, the sample data set corresponding to label "sports" is input into a pre-trained Bert model for coding, the codes of all sample data in the sample data set are added to obtain an average value, a first label code of label "sports" is obtained, the label set of each unlabeled sample data in the unlabeled sample data set is input into a pre-trained Bert model, a second label code of each label in the label set of each unlabeled sample data in the unlabeled sample data set is obtained, and calculating the similarity between the second label code of each label in the label set of each sample data without the label and the first label code corresponding to the label 'sports', the first label code corresponding to the label 'financial affairs' and the first label code corresponding to the label 'entertainment'.
Further, the determining the second pseudo tag set of each non-tag sample data according to the calculated similarity includes:
acquiring a preset similarity threshold;
and screening out the labels with the similarity between each label and each label in the labeled sample data set larger than the preset similarity threshold from the label set of each unlabeled sample data, and determining the labels as a second pseudo label set of each unlabeled sample data.
Specifically, the preset similarity threshold may be set by using the following formula:
t=rmax1≤i≤sf(x,li)+(1-r)min1≤i≤sf(x,li),
wherein r represents a weight, the value is between 0 and 1, liA first tag code representing an i-th tag in the tagged sample data set, x representing a second tag code of any tag in any unlabeled sample data in the unlabeled sample data set, f (x, l)i) A second label code of any label in the label set representing any unlabeled sample and a first label code of the ith label, s represents the number of labels in the labeled sample data set, max1≤i≤sf(x,li) Representing a maximum similarity, min, in a similarity set corresponding to a labelset of the any one unlabeled sample data1≤i≤sf(x,li) And representing the minimum similarity in the similarity set corresponding to the label set of any one non-label sample data.
In this embodiment, the weight may be preset, for example, may be set to 0.4, where the weight occupied by the label with the greater similarity in the similarity threshold is higher.
In this embodiment, in order to ensure the accuracy of the label set of each non-label sample data, the maximum value and the minimum value are considered when the similarity threshold is set, so that the influence of the selection of the threshold on the selection of the pseudo label is reduced, and the reasonability of the set similarity threshold is ensured.
In other optional embodiments, the determining module 204 determines the target pseudo tag set of the unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set, including:
sorting the label sets of the non-label sample data in a descending order according to the probability of each label; selecting a plurality of labels ranked at the front from the descending ranking result, and determining the labels as a first pseudo label set of each non-label sample data;
calculating the similarity between each label in the labeled sample data set and each label in the unlabeled sample data label set, and determining a second pseudo label set of each unlabeled sample data according to the calculated similarity;
calculating the intersection of the first pseudo tag set and the corresponding second pseudo tag set of each non-tag sample data to obtain a third pseudo tag set of each non-tag sample data, and associating each non-tag sample data with the corresponding third pseudo tag set;
and determining a plurality of third pseudo label sets corresponding to the non-label sample data set as a target pseudo label set of the non-label sample data set.
In this embodiment, in the process of confirming each non-labeled sample data set, a plurality of labels with a high probability of labels may be directly selected as the first pseudo label set, and other calculations are not required, so that the efficiency of determining the first pseudo label set may be improved.
The pre-marking module 205 is configured to pre-mark the unlabeled sample data set according to the target pseudo label set of the unlabeled sample data set, so as to obtain a sample data set with pseudo labels.
In this embodiment, a sample data set with a pseudo label is obtained by pre-marking a target pseudo label corresponding to each unlabeled sample in the unlabeled sample data set.
In an optional embodiment, the pre-labeling module 205 performs pre-labeling on the non-label sample data set according to the target pseudo label set of the non-label sample data set, and obtaining the sample data set with pseudo labels includes:
identifying location information of each target pseudo tag in a target pseudo tag set of the unlabeled sample data set;
and pre-marking the label-free sample data set by adopting a preset pre-marking module according to the position information of each target pseudo label to obtain the sample data set with the pseudo label.
In the embodiment, the pre-marking module can be preset, and pre-marking is performed on each non-label sample data in the non-label sample data set according to the position information of each target pseudo label by adopting the preset pre-marking module, so that manual marking is not needed, the problem of high marking error rate caused by manual marking is avoided, and the efficiency and accuracy of pre-marking are improved.
The second training module 206 is configured to train the preset classification model according to the sample data set with the pseudo tag and the sample data set with the tag, so as to obtain a target classification model.
In this embodiment, since the pseudo label of each sample data in the sample data set with the pseudo label is obtained by calculating an intersection from a first pseudo label set determined by a preset pseudo label threshold and a second pseudo label set determined by calculating a similarity between the label of each labeled sample in the labeled sample data set and each label in the label set of each unlabeled sample data, the accuracy of the pseudo label of each unlabeled sample data in the unlabeled sample data set is ensured, and the set with the pseudo label and the labeled sample data set are used for training when the target classification model is performed, so that the problem of low classification accuracy of an initial classification model obtained by learning a small sample data set is solved, the accuracy of the target classification model is ensured, and the accuracy of subsequent text classification is further improved.
And the input module 207 is used for responding to the received text to be classified, inputting the text to be classified into the target classification model, and obtaining a classification result.
In this embodiment, the text to be classified refers to a text that needs to be classified, and the text to be classified may include a text with a tag or a text without a tag. For example, for spam classification, some spam mails contain tags, some spam mails do not contain tags, and the spam mails not containing tags need to be marked, so as to obtain a classification result.
In other optional embodiments, the target classification model is used for text classification, so that the problem that rapid and accurate text classification cannot be realized due to the fact that only a small amount of label data exists in a project file is solved.
In summary, in the text classification device according to this embodiment, on one hand, the initialized classification model is used to classify and predict the unlabeled sample data set, so as to predict the probability of each label of each unlabeled sample data, and then the probability of each label is considered in the process of pre-labeling each unlabeled sample data, so that the accuracy of each label of each unlabeled sample data is improved, and further, the accuracy of text classification is improved. And determining a target pseudo label set of the unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set, and pre-marking the unlabeled sample data set according to the target pseudo label set to obtain a sample data set with pseudo labels, so that the label prediction accuracy of each unlabeled sample data is improved, the accuracy of sample data used by a subsequent training classification model is ensured, and the text classification accuracy is improved. And training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model, solving the problem of low classification accuracy of the initial classification model obtained by learning a small sample data set, ensuring the accuracy of the target classification model, and further improving the accuracy of the subsequent text classification.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33 and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 does not constitute a limitation of the embodiment of the present invention, and may be a bus-type configuration or a star-type configuration, and the electronic device 3 may include more or less other hardware or software than those shown, or a different arrangement of components.
In some embodiments, the electronic device 3 is an electronic device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.
It should be noted that the electronic device 3 is only an example, and other existing or future electronic products, such as those that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 is used for storing program codes and various data, such as the text classification apparatus 20 installed in the electronic device 3, and realizes high-speed and automatic access to programs or data during the operation of the electronic device 3. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects various components of the electronic device 3 by using various interfaces and lines, and executes various functions and processes data of the electronic device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the electronic device 3 may further include a power supply (such as a battery) for supplying power to each component, and optionally, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In a further embodiment, in conjunction with fig. 2, the at least one processor 32 may execute operating means of the electronic device 3 and various installed applications (such as the text classification device 20), program codes, and the like, for example, the above-mentioned modules.
The memory 31 has program code stored therein, and the at least one processor 32 can call the program code stored in the memory 31 to perform related functions. For example, the various modules illustrated in fig. 2 are program code stored in the memory 31 and executed by the at least one processor 32 to implement the functions of the various modules for text classification purposes.
Illustratively, the program code may be divided into one or more modules/units, which are stored in the memory 31 and executed by the processor 32 to accomplish the present application. The one or more modules/units may be a series of computer readable instruction segments capable of performing certain functions, which are used for describing the execution process of the program code in the electronic device 3. For example, the program code may be partitioned into an acquisition module 201, a first training module 202, a prediction module 203, a determination module 204, a pre-marking module 205, a second training module 206, and an input module 207.
In one embodiment of the present invention, the memory 31 stores a plurality of computer readable instructions that are executed by the at least one processor 32 to implement the functionality of text classification.
Specifically, the at least one processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, and details are not repeated here.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the present invention may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of text classification, the method comprising:
acquiring a sample data set with a label and a sample data set without a label from a preset text library;
training a preset classification model based on the sample data set with the labels to obtain an initialized classification model;
predicting the unlabeled sample data set by adopting the initialized classification model to obtain a label set of each unlabeled sample data and the probability of each label in the label set;
determining a target pseudo label set of each unlabeled sample data set according to the label set of each unlabeled sample data and the probability of each label in the label set;
pre-marking the label-free sample data set according to a target pseudo label set of the label-free sample data set to obtain a sample data set with pseudo labels;
training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model;
and responding to the received text to be classified, and inputting the text to be classified into the target classification model to obtain a classification result.
2. The method of claim 1, wherein said determining a target set of pseudo labels of said set of unlabeled sample data based on a set of labels of each said set of unlabeled sample data and a probability of each label in said set of labels comprises:
determining a first pseudo label set of each non-label sample data according to the label set of each non-label sample data and the probability of each label in the label set;
calculating the similarity between each label in the labeled sample data set and each label in the unlabeled sample data label set, and determining a second pseudo label set of each unlabeled sample data according to the calculated similarity;
calculating the intersection of the first pseudo tag set and the corresponding second pseudo tag set of each non-tag sample data to obtain a third pseudo tag set of each non-tag sample data, and associating each non-tag sample data with the corresponding third pseudo tag set;
and determining a plurality of third pseudo label sets corresponding to the non-label sample data set as a target pseudo label set of the non-label sample data set.
3. The text classification method according to claim 2, wherein said determining the first set of pseudo labels for each said unlabeled sample data according to the label set of each said unlabeled sample data and the probability of each label in the label set comprises:
acquiring a preset pseudo label threshold value, screening out a first label with the label probability greater than or equal to the preset pseudo label threshold value from the label set of each non-label sample data, and determining the first label as the first pseudo label set of each non-label sample data.
4. The text classification method according to claim 2, wherein said determining the first set of pseudo labels for each said unlabeled sample data according to the label set of each said unlabeled sample data and the probability of each label in the label set comprises:
sorting the label sets of the non-label sample data in a descending order according to the probability of each label; and selecting a plurality of labels ranked at the front from the descending ranking result, and determining the labels as a first pseudo label set of each non-label sample data.
5. The text classification method of claim 2, wherein said calculating a similarity between each tag in the set of tagged sample data and each tag in the set of tags for each of the unlabeled sample data comprises:
identifying the label of each labeled sample in the labeled sample set, and classifying the labeled sample data set according to the identified label to obtain a sample data set corresponding to each label;
inputting a sample data set corresponding to each label into a pre-trained Bert model to obtain a first label code of each label in the sample data set with the label;
inputting each label in the label set of each non-label sample data into a pre-trained Bert model to obtain a second label code of each label in the label set of each non-label sample data;
and calculating the similarity between each first label code and each second label code by adopting a preset function to obtain the similarity between each label in the label set of the non-label sample data and each label in the labeled sample data set.
6. The method of text classification according to claim 2, wherein said determining a second set of pseudo labels for each of said unlabeled sample data according to the calculated similarity comprises:
acquiring a preset similarity threshold;
and screening out tags of which the similarity between each tag and each tag in the set of the sample data with the tags is greater than the preset similarity threshold from each tag set of the non-tag sample data, and determining the tags as a second pseudo tag set of each non-tag sample data.
7. The text classification method according to claim 1, wherein said pre-labeling the unlabeled sample data set according to the target pseudo label set of the unlabeled sample data set to obtain a sample data set with pseudo labels comprises:
identifying location information of each target pseudo tag in a target pseudo tag set of the unlabeled sample data set;
and pre-marking the label-free sample data set by adopting a preset pre-marking module according to the position information of each target pseudo label to obtain the sample data set with the pseudo label.
8. An apparatus for classifying text, the apparatus comprising:
the acquisition module is used for acquiring a sample data set with a label and a sample data set without the label from a preset text library;
the first training module is used for training a preset classification model based on the sample data set with the labels to obtain an initialized classification model;
the prediction module is used for predicting the unlabeled sample data set by adopting the initialized classification model to obtain a label set of each unlabeled sample data and the probability of each label in the label set;
the determining module is used for determining a target pseudo label set of the non-label sample data set according to the label set of each non-label sample data and the probability of each label in the label set;
the pre-marking module is used for pre-marking the label-free sample data set according to a target pseudo label set of the label-free sample data set to obtain a sample data set with a pseudo label;
the second training module is used for training the preset classification model according to the sample data set with the pseudo label and the sample data set with the label to obtain a target classification model;
and the input module is used for responding to the received text to be classified, inputting the text to be classified into the target classification model and obtaining a classification result.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to implement the text classification method according to any one of claims 1 to 7 when executing a computer program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for text classification according to any one of claims 1 to 7.
CN202210233373.8A 2022-03-10 2022-03-10 Text classification method and device, electronic equipment and storage medium Pending CN114580409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210233373.8A CN114580409A (en) 2022-03-10 2022-03-10 Text classification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210233373.8A CN114580409A (en) 2022-03-10 2022-03-10 Text classification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114580409A true CN114580409A (en) 2022-06-03

Family

ID=81773411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210233373.8A Pending CN114580409A (en) 2022-03-10 2022-03-10 Text classification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114580409A (en)

Similar Documents

Publication Publication Date Title
CN114663223A (en) Credit risk assessment method, device and related equipment based on artificial intelligence
CN113435582B (en) Text processing method and related equipment based on sentence vector pre-training model
US10885477B2 (en) Data processing for role assessment and course recommendation
CN113190372B (en) Multi-source data fault processing method and device, electronic equipment and storage medium
CN113435998B (en) Loan overdue prediction method and device, electronic equipment and storage medium
CN113256108A (en) Human resource allocation method, device, electronic equipment and storage medium
CN112950344A (en) Data evaluation method and device, electronic equipment and storage medium
CN114880449A (en) Reply generation method and device of intelligent question answering, electronic equipment and storage medium
CN113792146A (en) Text classification method and device based on artificial intelligence, electronic equipment and medium
CN113268478A (en) Big data analysis method and device, electronic equipment and storage medium
CN113674065B (en) Service contact-based service recommendation method and device, electronic equipment and medium
CN113570286B (en) Resource allocation method and device based on artificial intelligence, electronic equipment and medium
CN113469291B (en) Data processing method and device, electronic equipment and storage medium
CN116108276A (en) Information recommendation method and device based on artificial intelligence and related equipment
CN115061895A (en) Business process arranging method and device, electronic equipment and storage medium
CN114580409A (en) Text classification method and device, electronic equipment and storage medium
CN114881313A (en) Behavior prediction method and device based on artificial intelligence and related equipment
US11900325B2 (en) Utilizing a combination of machine learning models to determine a success probability for a software product
CN114925674A (en) File compliance checking method and device, electronic equipment and storage medium
CN115237706A (en) Buried point data processing method and device, electronic equipment and storage medium
CN114817449A (en) Text search ordering method and device based on artificial intelligence and related equipment
US20220044174A1 (en) Utilizing machine learning and predictive modeling to manage and determine a predicted success rate of new product development
CN113435800A (en) Method and device for executing labeling task based on big data, electronic equipment and medium
CN113486183B (en) Text classification method and device based on support vector machine, electronic equipment and medium
CN115049409A (en) Thread distribution method and device based on artificial intelligence, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination