CN113157927A - Text classification method and device, electronic equipment and readable storage medium - Google Patents
Text classification method and device, electronic equipment and readable storage medium Download PDFInfo
- Publication number
- CN113157927A CN113157927A CN202110581189.8A CN202110581189A CN113157927A CN 113157927 A CN113157927 A CN 113157927A CN 202110581189 A CN202110581189 A CN 202110581189A CN 113157927 A CN113157927 A CN 113157927A
- Authority
- CN
- China
- Prior art keywords
- text
- sequence
- classified
- word
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000013145 classification model Methods 0.000 claims abstract description 87
- 238000000605 extraction Methods 0.000 claims abstract description 57
- 230000011218 segmentation Effects 0.000 claims abstract description 56
- 230000004927 fusion Effects 0.000 claims abstract description 46
- 238000012545 processing Methods 0.000 claims abstract description 45
- 230000001537 neural effect Effects 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 28
- 239000011159 matrix material Substances 0.000 claims description 77
- 239000013598 vector Substances 0.000 claims description 64
- 230000006870 function Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000007246 mechanism Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims 1
- 238000007726 management method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention relates to the field of semantic analysis, and discloses a text classification method, which comprises the following steps: performing category label marking on each text in a text set to obtain a target label set of the text set; performing text splicing processing on the text set and the target label set to obtain a sample sequence set; performing iterative training based on neural feature fusion extraction on the pre-constructed text classification model by using the sample sequence set until the text classification model is converged to obtain a trained text classification model; when a text to be classified is received, performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result. The invention also relates to a blockchain technique, the text sets can be stored in blockchain link points. The invention also provides a text classification device, electronic equipment and a storage medium. The invention can improve the accuracy of text classification.
Description
Technical Field
The invention relates to the field of semantic parsing, in particular to a text classification method and device, electronic equipment and a readable storage medium.
Background
With the development of artificial intelligence, the natural language processing field becomes an important component of artificial intelligence, and text classification has been paid attention to as a basic technology of the natural language processing field.
However, in the current text classification, text classification is performed by means of a model, only simple feature fusion between words is considered in the model classification process, and feature extraction is incomplete, so that the accuracy of text classification is low.
Disclosure of Invention
The invention provides a text classification method, a text classification device, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of text classification.
In order to achieve the above object, the present invention provides a text classification method, including:
performing intention recognition on each text in a text set, and performing category label marking on each text in the text set according to the result of the intention recognition to obtain a target label set of the text set;
performing word segmentation processing on each text in the text set, and performing sequence combination according to the result of the word segmentation processing to obtain a text sequence of each text;
performing text splicing processing on all the labels in the target label set and the text sequence to obtain a sample sequence set;
performing model training based on neural feature fusion extraction on the pre-constructed text classification model by using the sample sequence set to obtain a trained text classification model;
when a text to be classified is received, performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
When a text to be classified is received, performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
Optionally, the text stitching processing is performed on all the tags in the target tag set and the text sequence to obtain a sample sequence set, and the method includes:
randomly combining all the tags in the target tag set to obtain a tag sequence;
splicing each text sequence with the label sequence by using preset characters to obtain a sample sequence;
and summarizing all sample sequences to obtain the sample sequence set.
Optionally, the performing word segmentation on each text in the text set, and performing sequence combination according to a result of the word segmentation to obtain a text sequence of each text includes:
segmenting each text in the text set by using a preset segmentation dictionary to obtain a corresponding initial text word set;
deleting stop words by using the initial text word set to obtain the text word set;
and combining each word in the text word set according to the sequence in the corresponding text to obtain the text sequence of each text.
Optionally, the performing, by using the sample sequence set, model training based on neural feature fusion extraction on the pre-constructed text classification model to obtain a trained text classification model includes:
step A: converting words in each sample sequence into vectors by using a coding layer of the text classification model, and combining all the vectors obtained by conversion according to the sequence of corresponding words in the sample sequence to obtain a sample matrix;
and B: carrying out neural feature fusion extraction on the sample matrix by utilizing a feature extraction layer of the text classification model to obtain a fusion feature matrix;
and C: performing weight calculation on the fusion characteristic matrix by using an attention mechanism layer of the text classification model to obtain a target matrix;
step D: calculating a classification prediction probability value corresponding to the target matrix by using a preset activation function;
step E: determining a sample classification true value according to the class label of the text corresponding to the sample matrix, and calculating a loss value between the classification prediction probability value and the sample classification true value by using a preset loss function;
step F: and when the loss value is greater than or equal to a preset loss threshold value, updating the model parameters of the text classification model, returning to the step A for iterative training, and stopping training until the loss value is less than the preset loss threshold value to obtain the trained text classification model.
Optionally, the performing neural feature fusion extraction on the sample matrix by using the feature extraction layer of the text classification model to obtain a fusion feature matrix includes:
obtaining a target column by traversing and selecting columns of the sample matrix;
carrying out neural feature fusion extraction on the target column to obtain a feature word vector;
and transversely combining all the feature word vectors according to the sequence of the corresponding target columns in the sample matrix to obtain the feature matrix.
Optionally, the performing neural feature fusion extraction on the sample matrix by using the feature extraction layer of the text classification model to obtain a fusion feature matrix includes:
traversing and selecting the columns of the sample matrix to obtain target columns;
carrying out neural feature fusion extraction on the target column to obtain a feature word vector;
and transversely combining all the feature word vectors according to the sequence of the corresponding target columns in the sample matrix to obtain the feature matrix.
Optionally, the performing neural feature fusion extraction on the target column to obtain a feature word vector includes:
carrying out tensor multiplication calculation on the target column and each column of the sample matrix to obtain a first word vector matrix;
stacking all the first word vector matrixes according to the sequence of corresponding columns in the sample matrix to obtain a three-dimensional word vector matrix;
longitudinally dividing the three-dimensional word vector matrix according to columns to obtain a plurality of second word vector matrices;
and selecting the maximum value in each second word vector matrix for combination to obtain the feature word vector.
Optionally, the segmenting and tag splicing the classified texts to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result, including:
performing word segmentation processing on the text to be classified to obtain a word segmentation word set;
combining the word segmentation word sets according to the sequence of each word in the text to be classified to obtain a text sequence to be classified;
splicing the text sequence to be classified and the label sequence by using the preset characters to obtain a text sequence to be classified;
and classifying the text sequence to be classified by using the trained text classification model to obtain the classification result.
In order to solve the above problem, the present invention also provides a text classification apparatus, including:
the data processing module is used for carrying out category label marking on each text in the text set to obtain a target label set of the text set; performing text splicing processing on the text set and the target label set to obtain a sample sequence set;
the model training module is used for carrying out iterative training based on neural feature fusion extraction on the pre-constructed text classification model by utilizing the sample sequence set until the text classification model is converged to obtain a trained text classification model;
and the text classification module is used for performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified when receiving the text to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one computer program; and
and a processor executing the computer program stored in the memory to implement the text classification method described above.
In order to solve the above problem, the present invention also provides a computer-readable storage medium, in which at least one computer program is stored, the at least one computer program being executed by a processor in an electronic device to implement the text classification method described above.
The method comprises the steps of carrying out category label marking on each text in a text set to obtain a target label set of the text set; performing text splicing processing on the text set and the target label set to obtain a sample sequence set; performing iterative training based on neural feature fusion extraction on the pre-constructed text classification model by using the sample sequence set until the text classification model is converged to obtain a trained text classification model; when a text to be classified is received, word segmentation and label splicing are carried out on the classified text to obtain a text sequence to be classified, the trained text classification model is used for classifying the text sequence to be classified to obtain a classification result, and a feature extraction layer contained in the text classification model can carry out nerve feature fusion extraction on the text to be classified, so that the feature extraction is more comprehensive, therefore, the trained text classification model has stronger capability based on feature extraction, and the text classification accuracy is higher. Therefore, the text classification method, the text classification device, the electronic equipment and the computer readable storage medium provided by the embodiment of the invention improve the accuracy of text classification.
Drawings
Fig. 1 is a schematic flowchart of a text classification method according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a text classification apparatus according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device implementing a text classification method according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a text classification method. The execution subject of the text classification method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the text classification method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Referring to fig. 1, which is a schematic flow chart of a text classification method according to an embodiment of the present invention, in an embodiment of the present invention, the text classification method includes:
s1, performing intention recognition on each text in the text set, and performing category label marking on each text in the text set according to the result of the intention recognition to obtain a target label set of the text set;
in the embodiment of the present invention, the text set is composed of a plurality of user dialog texts in a certain scene, optionally, the text set may be obtained from a customer service database of a certain company,
in another embodiment of the present invention, the text set may be stored in a block chain node, and the high throughput of the block chain to the data is utilized to improve the access efficiency of the text set.
Further, in order to perform better training on subsequent models, in the embodiment of the present invention, a category label is tagged to each text in the text set, where the category label is a text intention, such as: the text set is a text set of a travel scene, a text A contained in the text set is 'whether a hotel has a tomorrow room', a text corresponding to the text A is intended to be hotel ordering, and then the text A is marked as a hotel ordering category label.
Optionally, in the embodiment of the present invention, a pre-constructed intention recognition model is used to perform intention recognition on each text in the text set, so as to obtain an intention recognition result; according to the intention recognition result, performing category label marking on the corresponding text, such as: the text set is a text set of a travel scene, a text A contained in the text set is 'whether a hotel has a tomorrow room', the text corresponding to the text A is identified to be the hotel ordering, and then the text A is marked as a hotel ordering category label.
Further, different texts may correspond to the same label, and in order to avoid counting repeated labels, in the embodiment of the present invention, each text and corresponding category label in the text set after the labeling are summarized to obtain an initial label set. Such as: the text set comprises a text A, a text B and a text C, the text A is marked with a type A label, the text B is marked with a type B label, and the text C is marked with a type A label, so that the initial label set comprises two type A labels and one type label, the initial label set is subjected to de-duplication, one repeated type A label is removed, a target label set is obtained, and the target label set comprises one type A label and one type B label.
S2, performing word segmentation processing on each text in the text set, and performing sequence combination according to the result of the word segmentation processing to obtain a text sequence of each text;
in the embodiment of the invention, in order to construct a model training sample, each text in the text set is subjected to word segmentation processing, and sequence combination is carried out according to the result of the word segmentation processing to obtain the text sequence of each text.
In detail, in the embodiment of the present invention, word segmentation processing is performed on each text in the text set to obtain a text word set of each text; combining the sequence of each word in the text word set in the corresponding text to obtain a corresponding text sequence, such as: the text A is 'I is Chinese', a text word set corresponding to the text A comprises three words of 'I', 'Chinese' and 'Y', and the words of 'I', 'Chinese' and 'Y' are combined according to the sequence of each word in the text A to obtain a text sequence of 'I, is, Chinese'; optionally, in the embodiment of the present invention, a preset word segmentation dictionary is used to perform word segmentation on each text in the text set, so as to obtain an initial text word set; and further, deleting stop words by using the word segmentation word set to obtain the text word set. Wherein, the stop word is a nonsense word, including: words such as "in" and "in" are intended to be words such as "in," adverbs, prepositions, conjunctions, and the like.
S3, performing text splicing processing on all the labels in the target label set and the text sequence to obtain a sample sequence set;
specifically, in the embodiments of the present invention, all tags in the target tag set are randomly combined to obtain a tag sequence, and further, in order to distinguish a text from a tag, each text sequence is spliced with the tag sequence by using preset characters to obtain a sample sequence, where: the text sequence is [ A ], the label sequence is [ B ], the text sequence and the label sequence are spliced by a special character SEP, and the sample sequence is represented as [ A, SEP, B ]; and summarizing all sample sequences to obtain the sample sequence set.
S4, performing model training based on neural feature fusion extraction on the pre-constructed text classification model by using the sample sequence set to obtain a trained text classification model;
in the embodiment of the invention, in order to perform better model classification on the text to be classified subsequently, the sample sequence set is used for performing model training based on neural feature fusion extraction on the pre-constructed text classification model to obtain the trained text classification model.
Specifically, the text classification model has the capability of neural feature fusion extraction, the dimensionality of feature extraction is comprehensive, and the text classification of the trained text classification model is more accurate.
Optionally, in an embodiment of the present invention, the text classification model includes: the device comprises an encoding layer, a feature extraction layer and an attention mechanism layer. The text classification model comprises a feature extraction layer, neural feature fusion extraction can be carried out on the text to be classified, the feature extraction is more comprehensive, and the text classification accuracy of the model is higher.
Optionally, the coding layer is Embedding.
In detail, in the embodiment of the present invention, the iterative training based on neural feature fusion extraction is performed on the pre-constructed text classification model by using the sample sequence set until the text classification model converges, so as to obtain a trained text classification model, including:
step A: converting words in each sample sequence into vectors by using a coding layer of the text classification model, and combining all the vectors obtained by conversion according to the sequence of corresponding words in the sample sequence to obtain a sample matrix;
and B: carrying out neural feature fusion extraction on the sample matrix by utilizing a feature extraction layer of the text classification model to obtain a fusion feature matrix;
and C: performing weight calculation on the fusion characteristic matrix by using an attention mechanism layer of the text classification model to obtain a target matrix;
step D: calculating a classification prediction probability value corresponding to the target matrix by using a preset activation function;
optionally, the activation function is a relu function;
step E: determining a sample classification true value according to the class label of the text corresponding to the sample matrix, and calculating a loss value between the classification prediction probability value and the sample classification true value by using a preset loss function;
such as: the target label set comprises a label of type A and a label of type B, the label of type A of the text corresponding to the sample matrix is of type A, and then the corresponding true value is 1 for type A and 0 for type B.
Optionally, in an embodiment of the present invention, the loss function is a cross entropy loss function.
Step F: and when the loss value is greater than or equal to a preset loss threshold value, updating the model parameters of the text classification model, returning to the step A for iterative training, and stopping training until the loss value is less than the preset loss threshold value to obtain the trained text classification model.
In detail, in the embodiment of the present invention, performing feature fusion extraction on the sample matrix by using the feature extraction layer to obtain a fusion feature matrix, includes: traversing and selecting the columns of the sample matrix to obtain a target column, performing neural feature fusion extraction on the target column to obtain feature word vectors, and performing transverse combination on all the feature word vectors according to the sequence of the corresponding target column in the sample matrix to obtain a fusion feature matrix, for example: and the sample matrix has three columns, the feature word vector corresponding to each column in the sample matrix is B, A, C, B corresponds to the first column in the sample matrix, A corresponds to the second column in the sample matrix, C corresponds to the third column in the sample matrix, and then B is taken as the first column, A is taken as the second column, and C is taken as the third column, and transverse combination is performed to obtain a fused feature matrix [ B A C ].
Further, in the embodiment of the present invention, performing neural feature fusion extraction on the target sequence to obtain a feature word vector, includes: selecting the target column and each column of the sample matrixCarrying out tensor multiplication to obtain a corresponding first word vector matrix, stacking all the first word vector matrices according to the sequence of the corresponding columns in the sample matrix to obtain a three-dimensional word vector matrix, longitudinally splitting the three-dimensional word vector matrix according to the columns to obtain a plurality of second word vector matrices, and selecting the maximum value in each second word vector matrix to combine to obtain the feature word vector. For example: the target column is a 1 × n column vector, the sample matrix has m columns, and the target column and each column of the sample matrix are subjected to tensor multiplication to obtain a corresponding n × n first word vector matrix, for example: vector quantityAnd vectorCarrying out tensor multiplication to obtain a matrixObtaining m first word vector matrixes of n x n in total, stacking the m first word vector matrixes of n x n according to the sequence of the corresponding columns in the sample matrix to obtain three-dimensional word vector matrixes of n x m, and stacking the first word vector matrixes on a first layer if the first word vector matrixes are the multiplication calculation results of the target columns and the first column of tensors in the sample matrix; and longitudinally dividing the three-dimensional word vector matrix of n x m according to columns to obtain n second word vector matrices of n x m, namely selecting the same columns of each layer of the three-dimensional word vector matrix of n x m, such as the first column, the second column, the third column and the like of each layer to form the n x m second word vector matrices to obtain n second word vector matrices of n x m, and selecting the maximum value in each n m second word vector matrix to longitudinally combine according to the sequence of the columns of the corresponding three-dimensional word vector matrices to obtain the feature word vector of n x 1.
By utilizing the characteristic extraction layer to perform neural characteristic fusion extraction on the sample matrix, the accuracy of characteristic extraction is improved, and thus the accuracy of classification of the trained text classification model is improved.
And S5, when receiving the text to be classified, performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
In detail, in the embodiment of the present invention, the text to be classified refers to a text that needs to be classified, so that the classification is more accurate.
Further, in the embodiment of the present invention, in order to better classify the text to be classified by using the text classification model, the text to be classified needs to be preprocessed.
In detail, in the embodiment of the present invention, the preprocessing the text to be classified includes: performing word segmentation on the text to be classified, and combining all words obtained after the word segmentation of the text to be classified according to the sequence of each word in the text to be classified to obtain a text sequence to be classified, such as: the text to be classified is 'I is Chinese', three words of 'I', 'Chinese' and 'Y' are obtained after word segmentation, and the 'I', 'Chinese' and 'Y' are combined according to the sequence of each word in the text to be classified to obtain a text sequence to be classified [ I, Y, Chinese ]; splicing the text sequence to be classified and the label sequence by using the preset characters to obtain a text sequence to be classified; the technical means used for word segmentation processing and splicing are the same as those in the foregoing, and are not described herein again.
Further, the embodiment of the present invention classifies the text sequence to be classified by using the trained text classification model, so as to obtain the classification result.
Fig. 2 is a functional block diagram of the text classification apparatus according to the present invention.
The text classification apparatus 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the text classification apparatus may include a data processing module 101, a model training module 102, and a text classification module 103, which may also be referred to as a unit, and refers to a series of computer program segments that can be executed by a processor of an electronic device and can perform fixed functions, and are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the data processing module 101 is configured to perform category label tagging on each text in a text set to obtain a target label set of the text set; performing text splicing processing on the text set and the target label set to obtain a sample sequence set;
in the embodiment of the present invention, the text set is composed of a plurality of user dialog texts in a certain scene, and optionally, the text set may be obtained from a customer service database of a certain company.
In another embodiment of the present invention, the text set may be stored in a block chain node, and the high throughput of the block chain to the data is utilized to improve the access efficiency of the text set.
Further, in order to perform better training on subsequent models, in an embodiment of the present invention, the data processing module 101 performs class label tagging on each text in the text set, where the class label is a text intention, such as: the text set is a text set of a travel scene, a text A contained in the text set is 'whether a hotel has a tomorrow room', a text corresponding to the text A is intended to be hotel ordering, and then the text A is marked as a hotel ordering category label.
Further, different texts may correspond to the same tag, so as to avoid counting repeated tags, in this embodiment of the present invention, the data processing module 101 collects each text and the corresponding category tag in the tagged text set to obtain an initial tag set, and since different texts may correspond to the same tag and the repeated tag exists in the initial tag set, the data processing module 101 needs to perform deduplication on the initial tag set to obtain the target tag set. Such as: the text set comprises a text A, a text B and a text C, the text A is marked with a type A label, the text B is marked with a type B label, and the text C is marked with a type A label, so that the initial label set comprises two type A labels and one type label, the initial label set is subjected to de-duplication, one repeated type A label is removed, a target label set is obtained, and the target label set comprises one type A label and one type B label.
In the embodiment of the present invention, in order to construct a model training sample, the data processing module 101 performs word segmentation on each text in the text set, and performs sequence combination according to a result of the word segmentation to obtain a text sequence of each text.
In detail, in the embodiment of the present invention, the data processing module 101 performs word segmentation processing on each text in the text set to obtain a text word set of each text; combining the sequence of each word in the text word set in the corresponding text to obtain a corresponding text sequence, such as: the text A is 'I is Chinese', a text word set corresponding to the text A comprises three words of 'I', 'Chinese' and 'Y', and the words of 'I', 'Chinese' and 'Y' are combined according to the sequence of each word in the text A to obtain a text sequence of 'I, is, Chinese'; optionally, in the embodiment of the present invention, a preset word segmentation dictionary is used to perform word segmentation on each text in the text set, so as to obtain an initial text word set; and further, deleting stop words by using the word segmentation word set to obtain the text word set. Wherein, the stop word is a nonsense word, including: words such as "in" and "in" are intended to be words such as "in," adverbs, prepositions, conjunctions, and the like.
Specifically, the data processing module 101 according to the embodiment of the present invention randomly combines all tags in the target tag set to obtain a tag sequence, and further, in order to distinguish between a text and a tag, when a text is distinguished from a tag, each text sequence is spliced with the tag sequence by using a preset character to obtain a sample sequence, where: the text sequence is [ A ], the label sequence is [ B ], the text sequence and the label sequence are spliced by a special character SEP, and the sample sequence is represented as [ A, SEP, B ]; and summarizing all sample sequences to obtain the sample sequence set.
The model training module 102 is configured to perform iterative training based on neural feature fusion extraction on a pre-constructed text classification model by using the sample sequence set until the text classification model converges to obtain a trained text classification model;
in order to better classify the model of the text to be classified subsequently, the embodiment of the invention utilizes the sample sequence set to perform iterative training based on neural feature fusion extraction on the pre-constructed text classification model until the text classification model is converged to obtain the trained text classification model.
Specifically, the text classification model has the capability of neural feature fusion extraction, the dimensionality of feature extraction is comprehensive, and the text classification of the trained text classification model is more accurate.
Optionally, in an embodiment of the present invention, the text classification model includes: the device comprises an encoding layer, a feature extraction layer and an attention mechanism layer. The text classification model comprises a feature extraction layer, neural feature fusion extraction can be carried out on the text to be classified, the feature extraction is more comprehensive, and the classification accuracy of the model is higher.
Optionally, the coding layer is Embedding.
In detail, in the embodiment of the present invention, the model training module 102 obtains the trained text classification model by using the following means, including:
step A: converting words in each sample sequence into vectors by using a coding layer of the text classification model, and combining all the vectors obtained by conversion according to the sequence of corresponding words in the sample sequence to obtain a sample matrix;
and B: carrying out neural feature fusion extraction on the sample matrix by utilizing a feature extraction layer of the text classification model to obtain a fusion feature matrix;
and C: performing weight calculation on the fusion characteristic matrix by using an attention mechanism layer of the text classification model to obtain a target matrix;
step D: calculating a classification prediction probability value corresponding to the target matrix by using a preset activation function;
optionally, the activation function is a relu function;
step E: determining a sample classification true value according to the class label of the text corresponding to the sample matrix, and calculating a loss value between the classification prediction probability value and the sample classification true value by using a preset loss function;
such as: the target label set comprises a label of type A and a label of type B, the label of type A of the text corresponding to the sample matrix is of type A, and then the corresponding true value is 1 for type A and 0 for type B.
Optionally, in an embodiment of the present invention, the loss function is a cross entropy loss function.
Step F: and when the loss value is greater than or equal to a preset loss threshold value, updating the model parameters of the text classification model, returning to the step A for iterative training, and stopping training until the loss value is less than the preset loss threshold value to obtain the trained text classification model.
In detail, in the embodiment of the present invention, performing feature fusion extraction on the sample matrix by using the feature extraction layer to obtain a fusion feature matrix, includes: traversing and selecting the columns of the sample matrix to obtain a target column, performing neural feature fusion extraction on the target column to obtain feature word vectors, and performing transverse combination on all the feature word vectors according to the sequence of the corresponding target column in the sample matrix to obtain a fusion feature matrix, for example: and the sample matrix has three columns, the feature word vector corresponding to each column in the sample matrix is B, A, C, B corresponds to the first column in the sample matrix, A corresponds to the second column in the sample matrix, C corresponds to the third column in the sample matrix, and then B is taken as the first column, A is taken as the second column, and C is taken as the third column, and transverse combination is performed to obtain a fused feature matrix [ B A C ].
Further, in this embodiment of the present invention, the performing, by the model training module 102, neural feature fusion extraction on the target sequence to obtain a feature word vector includes: selecting each of the target column and the sample matrixAnd carrying out tensor multiplication on one column to obtain a corresponding first word vector matrix, stacking all the first word vector matrices according to the sequence of the corresponding columns in the sample matrix to obtain a three-dimensional word vector matrix, longitudinally splitting the three-dimensional word vector matrix according to the columns to obtain a plurality of second word vector matrices, and selecting the maximum value in each second word vector matrix to combine to obtain the feature word vector. For example: the target column is a 1 × n column vector, the sample matrix has m columns, and the target column and each column of the sample matrix are subjected to tensor multiplication to obtain a corresponding n × n first word vector matrix, for example: vector quantityAnd vectorCarrying out tensor multiplication to obtain a matrixObtaining m first word vector matrixes of n x n in total, stacking the m first word vector matrixes of n x n according to the sequence of the corresponding columns in the sample matrix to obtain three-dimensional word vector matrixes of n x m, and stacking the first word vector matrixes on a first layer if the first word vector matrixes are the multiplication calculation results of the target columns and the first column of tensors in the sample matrix; and longitudinally dividing the three-dimensional word vector matrix of n x m according to columns to obtain n second word vector matrices of n x m, namely selecting the same columns of each layer of the three-dimensional word vector matrix of n x m, such as the first column, the second column, the third column and the like of each layer to form the n x m second word vector matrices to obtain n second word vector matrices of n x m, and selecting the maximum value in each n m second word vector matrix to longitudinally combine according to the sequence of the columns of the corresponding three-dimensional word vector matrices to obtain the feature word vector of n x 1.
By utilizing the characteristic extraction layer to perform neural characteristic fusion extraction on the sample matrix, the accuracy of characteristic extraction is improved, and thus the accuracy of classification of the trained text classification model is improved.
The text classification module 103 is configured to, when receiving a text to be classified, perform word segmentation and tag concatenation on the classified text to obtain a text sequence to be classified, and classify the text sequence to be classified by using the trained text classification model to obtain a classification result.
In detail, in the embodiment of the present invention, the text to be classified refers to a text that needs to be classified, so that the classification is more accurate.
Further, in the embodiment of the present invention, in order to better classify the text to be classified by using the text classification model, the text to be classified needs to be preprocessed.
In detail, in the embodiment of the present invention, the preprocessing the text to be classified by the text classification module 103 includes: performing word segmentation on the text to be classified, and combining all words obtained after the word segmentation of the text to be classified according to the sequence of each word in the text to be classified to obtain a text sequence to be classified, such as: the text to be classified is 'I is Chinese', three words of 'I', 'Chinese' and 'Y' are obtained after word segmentation, and the 'I', 'Chinese' and 'Y' are combined according to the sequence of each word in the text to be classified to obtain a text sequence to be classified [ I, Y, Chinese ]; splicing the text sequence to be classified and the label sequence by using the preset characters to obtain a text sequence to be classified; the technical means used for word segmentation processing and splicing are the same as those in the foregoing, and are not described herein again.
Further, the embodiment of the present invention classifies the text sequence to be classified by using the trained text classification model, so as to obtain the classification result.
Fig. 3 is a schematic structural diagram of an electronic device implementing the text classification method according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a text classification program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a text classification program, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., text classification programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a PerIPheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The text classification program 12 stored in the memory 11 of the electronic device 1 is a combination of computer programs that, when executed in the processor 10, enable:
performing intention recognition on each text in a text set, and performing category label marking on each text in the text set according to the result of the intention recognition to obtain a target label set of the text set;
performing word segmentation processing on each text in the text set, and performing sequence combination according to the result of the word segmentation processing to obtain a text sequence of each text;
performing text splicing processing on all the labels in the target label set and the text sequence to obtain a sample sequence set;
performing model training based on neural feature fusion extraction on the pre-constructed text classification model by using the sample sequence set to obtain a trained text classification model;
when a text to be classified is received, performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
Specifically, the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
Embodiments of the present invention may also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of an electronic device, the computer program may implement:
performing intention recognition on each text in a text set, and performing category label marking on each text in the text set according to the result of the intention recognition to obtain a target label set of the text set;
performing word segmentation processing on each text in the text set, and performing sequence combination according to the result of the word segmentation processing to obtain a text sequence of each text;
performing text splicing processing on all the labels in the target label set and the text sequence to obtain a sample sequence set;
performing model training based on neural feature fusion extraction on the pre-constructed text classification model by using the sample sequence set to obtain a trained text classification model;
when a text to be classified is received, performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
1. A method of text classification, the method comprising:
performing intention recognition on each text in a text set, and performing category label marking on each text in the text set according to the result of the intention recognition to obtain a target label set of the text set;
performing word segmentation processing on each text in the text set, and performing sequence combination according to the result of the word segmentation processing to obtain a text sequence of each text;
performing text splicing processing on all the labels in the target label set and the text sequence to obtain a sample sequence set;
performing model training based on neural feature fusion extraction on the pre-constructed text classification model by using the sample sequence set to obtain a trained text classification model;
when a text to be classified is received, performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
2. The text classification method according to claim 1, wherein the text stitching processing of all the tags in the target tag set and the text sequence to obtain a sample sequence set comprises:
randomly combining all the tags in the target tag set to obtain a tag sequence;
splicing each text sequence with the label sequence by using preset characters to obtain a sample sequence;
and summarizing all sample sequences to obtain the sample sequence set.
3. The method for classifying texts according to claim 2, wherein the step of performing word segmentation on each text in the text set and performing sequence combination according to the result of the word segmentation to obtain the text sequence of each text comprises:
segmenting each text in the text set by using a preset segmentation dictionary to obtain a corresponding initial text word set;
deleting stop words by using the initial text word set to obtain the text word set;
and combining each word in the text word set according to the sequence in the corresponding text to obtain the text sequence of each text.
4. The text classification method according to claim 2, wherein the model training based on neural feature fusion extraction is performed on the pre-constructed text classification model by using the sample sequence set to obtain a trained text classification model, and the method comprises:
step A: converting words in each sample sequence into vectors by using a coding layer of the text classification model, and combining all the vectors obtained by conversion according to the sequence of corresponding words in the sample sequence to obtain a sample matrix;
and B: carrying out neural feature fusion extraction on the sample matrix by utilizing a feature extraction layer of the text classification model to obtain a fusion feature matrix;
and C: performing weight calculation on the fusion characteristic matrix by using an attention mechanism layer of the text classification model to obtain a target matrix;
step D: calculating a classification prediction probability value corresponding to the target matrix by using a preset activation function;
step E: determining a sample classification true value according to the class label of the text corresponding to the sample matrix, and calculating a loss value between the classification prediction probability value and the sample classification true value by using a preset loss function;
step F: and when the loss value is greater than or equal to a preset loss threshold value, updating the model parameters of the text classification model, returning to the step A for iterative training, and stopping training until the loss value is less than the preset loss threshold value to obtain the trained text classification model.
5. The method for classifying texts according to claim 4, wherein the performing neural feature fusion extraction on the sample matrix by using the feature extraction layer of the text classification model to obtain a fusion feature matrix comprises:
obtaining a target column by traversing and selecting columns of the sample matrix;
carrying out neural feature fusion extraction on the target column to obtain a feature word vector;
and transversely combining all the feature word vectors according to the sequence of the corresponding target columns in the sample matrix to obtain the feature matrix.
6. The method for classifying texts according to claim 5, wherein the performing neural feature fusion extraction on the target columns to obtain feature word vectors comprises:
carrying out tensor multiplication calculation on the target column and each column of the sample matrix to obtain a first word vector matrix;
stacking all the first word vector matrixes according to the sequence of corresponding columns in the sample matrix to obtain a three-dimensional word vector matrix;
longitudinally dividing the three-dimensional word vector matrix according to columns to obtain a plurality of second word vector matrices;
and selecting the maximum value in each second word vector matrix for combination to obtain the feature word vector.
7. The method according to any one of claims 1 to 6, wherein the performing word segmentation and label concatenation on the classified texts to obtain a text sequence to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result comprises:
performing word segmentation processing on the text to be classified to obtain a word segmentation word set;
combining the word segmentation word sets according to the sequence of each word in the text to be classified to obtain a text sequence to be classified;
splicing the text sequence to be classified and the label sequence by using the preset characters to obtain a text sequence to be classified;
and classifying the text sequence to be classified by using the trained text classification model to obtain the classification result.
8. A text classification apparatus, comprising:
the data processing module is used for identifying the intention of each text in the text set and marking the category label of each text in the text set according to the result of the intention identification to obtain a target label set of the text set; performing word segmentation processing on each text in the text set, and performing sequence combination according to the result of the word segmentation processing to obtain a text sequence of each text; performing text splicing processing on all the labels in the target label set and the text sequence to obtain a sample sequence set;
the model training module is used for carrying out model training based on neural feature fusion extraction on the pre-constructed text classification model by utilizing the sample sequence set to obtain a trained text classification model;
and the text classification module is used for performing word segmentation and label splicing on the classified text to obtain a text sequence to be classified when receiving the text to be classified, and classifying the text sequence to be classified by using the trained text classification model to obtain a classification result.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of text classification according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method for text classification according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110581189.8A CN113157927B (en) | 2021-05-27 | 2021-05-27 | Text classification method, apparatus, electronic device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110581189.8A CN113157927B (en) | 2021-05-27 | 2021-05-27 | Text classification method, apparatus, electronic device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113157927A true CN113157927A (en) | 2021-07-23 |
CN113157927B CN113157927B (en) | 2023-10-31 |
Family
ID=76877849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110581189.8A Active CN113157927B (en) | 2021-05-27 | 2021-05-27 | Text classification method, apparatus, electronic device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113157927B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688239A (en) * | 2021-08-20 | 2021-11-23 | 平安国际智慧城市科技股份有限公司 | Text classification method and device under few samples, electronic equipment and storage medium |
CN113806540A (en) * | 2021-09-18 | 2021-12-17 | 平安银行股份有限公司 | Text labeling method and device, electronic equipment and storage medium |
CN113836303A (en) * | 2021-09-26 | 2021-12-24 | 平安科技(深圳)有限公司 | Text type identification method and device, computer equipment and medium |
CN113919344A (en) * | 2021-09-26 | 2022-01-11 | 腾讯科技(深圳)有限公司 | Text processing method and device |
CN114330357A (en) * | 2021-08-04 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Text processing method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543032A (en) * | 2018-10-26 | 2019-03-29 | 平安科技(深圳)有限公司 | File classification method, device, computer equipment and storage medium |
CN111061881A (en) * | 2019-12-27 | 2020-04-24 | 浪潮通用软件有限公司 | Text classification method, equipment and storage medium |
CN112015863A (en) * | 2020-08-26 | 2020-12-01 | 华东师范大学 | Multi-feature fusion Chinese text classification method based on graph neural network |
US20210012199A1 (en) * | 2019-07-04 | 2021-01-14 | Zhejiang University | Address information feature extraction method based on deep neural network model |
CN112597312A (en) * | 2020-12-28 | 2021-04-02 | 深圳壹账通智能科技有限公司 | Text classification method and device, electronic equipment and readable storage medium |
-
2021
- 2021-05-27 CN CN202110581189.8A patent/CN113157927B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543032A (en) * | 2018-10-26 | 2019-03-29 | 平安科技(深圳)有限公司 | File classification method, device, computer equipment and storage medium |
US20210012199A1 (en) * | 2019-07-04 | 2021-01-14 | Zhejiang University | Address information feature extraction method based on deep neural network model |
CN111061881A (en) * | 2019-12-27 | 2020-04-24 | 浪潮通用软件有限公司 | Text classification method, equipment and storage medium |
CN112015863A (en) * | 2020-08-26 | 2020-12-01 | 华东师范大学 | Multi-feature fusion Chinese text classification method based on graph neural network |
CN112597312A (en) * | 2020-12-28 | 2021-04-02 | 深圳壹账通智能科技有限公司 | Text classification method and device, electronic equipment and readable storage medium |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114330357A (en) * | 2021-08-04 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Text processing method and device, computer equipment and storage medium |
CN113688239A (en) * | 2021-08-20 | 2021-11-23 | 平安国际智慧城市科技股份有限公司 | Text classification method and device under few samples, electronic equipment and storage medium |
CN113688239B (en) * | 2021-08-20 | 2024-04-16 | 平安国际智慧城市科技股份有限公司 | Text classification method and device under small sample, electronic equipment and storage medium |
CN113806540A (en) * | 2021-09-18 | 2021-12-17 | 平安银行股份有限公司 | Text labeling method and device, electronic equipment and storage medium |
CN113806540B (en) * | 2021-09-18 | 2023-08-08 | 平安银行股份有限公司 | Text labeling method, text labeling device, electronic equipment and storage medium |
CN113836303A (en) * | 2021-09-26 | 2021-12-24 | 平安科技(深圳)有限公司 | Text type identification method and device, computer equipment and medium |
CN113919344A (en) * | 2021-09-26 | 2022-01-11 | 腾讯科技(深圳)有限公司 | Text processing method and device |
CN113919344B (en) * | 2021-09-26 | 2022-09-23 | 腾讯科技(深圳)有限公司 | Text processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113157927B (en) | 2023-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113157927B (en) | Text classification method, apparatus, electronic device and readable storage medium | |
CN112597312A (en) | Text classification method and device, electronic equipment and readable storage medium | |
CN112528616B (en) | Service form generation method and device, electronic equipment and computer storage medium | |
CN112860905A (en) | Text information extraction method, device and equipment and readable storage medium | |
CN112733551A (en) | Text analysis method and device, electronic equipment and readable storage medium | |
CN113268615A (en) | Resource label generation method and device, electronic equipment and storage medium | |
CN113658002B (en) | Transaction result generation method and device based on decision tree, electronic equipment and medium | |
CN115018588A (en) | Product recommendation method and device, electronic equipment and readable storage medium | |
CN113505273B (en) | Data sorting method, device, equipment and medium based on repeated data screening | |
CN114491047A (en) | Multi-label text classification method and device, electronic equipment and storage medium | |
CN113344125A (en) | Long text matching identification method and device, electronic equipment and storage medium | |
CN112801222A (en) | Multi-classification method and device based on two-classification model, electronic equipment and medium | |
CN115409041B (en) | Unstructured data extraction method, device, equipment and storage medium | |
CN113626605B (en) | Information classification method, device, electronic equipment and readable storage medium | |
CN113435308B (en) | Text multi-label classification method, device, equipment and storage medium | |
CN114708073B (en) | Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium | |
CN113591881B (en) | Intention recognition method and device based on model fusion, electronic equipment and medium | |
CN114943306A (en) | Intention classification method, device, equipment and storage medium | |
CN115146064A (en) | Intention recognition model optimization method, device, equipment and storage medium | |
CN113706207A (en) | Order transaction rate analysis method, device, equipment and medium based on semantic analysis | |
CN112434157A (en) | Document multi-label classification method and device, electronic equipment and storage medium | |
CN111680513B (en) | Feature information identification method and device and computer readable storage medium | |
CN113361274B (en) | Intent recognition method and device based on label vector, electronic equipment and medium | |
CN114723488B (en) | Course recommendation method and device, electronic equipment and storage medium | |
CN115546814A (en) | Key contract field extraction method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |