CN115858776B - Variant text classification recognition method, system, storage medium and electronic equipment - Google Patents

Variant text classification recognition method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN115858776B
CN115858776B CN202211348321.1A CN202211348321A CN115858776B CN 115858776 B CN115858776 B CN 115858776B CN 202211348321 A CN202211348321 A CN 202211348321A CN 115858776 B CN115858776 B CN 115858776B
Authority
CN
China
Prior art keywords
text
variant
corpus
data set
supervised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211348321.1A
Other languages
Chinese (zh)
Other versions
CN115858776A (en
Inventor
刘苏楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Original Assignee
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shumei Tianxia Beijing Technology Co ltd, Beijing Nextdata Times Technology Co ltd filed Critical Shumei Tianxia Beijing Technology Co ltd
Priority to CN202211348321.1A priority Critical patent/CN115858776B/en
Publication of CN115858776A publication Critical patent/CN115858776A/en
Application granted granted Critical
Publication of CN115858776B publication Critical patent/CN115858776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention relates to a variant text classification recognition method, a system, a storage medium and electronic equipment, which comprise the following steps: constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set; training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition; and inputting the text to be identified into a target text classification model to obtain a target identification result containing variant error correction and text classification of the text to be identified. According to the invention, the variant error correction data set is constructed through the supervised and unsupervised corpus data sets, the variant error correction task is trained through the variant error correction data set, and the variant error correction task is used as an auxiliary task to train a model together with the classification task, so that the regular effect can be played on the variant semantic understanding of the model, and the recognition accuracy of the classification model is further improved.

Description

Variant text classification recognition method, system, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of text classification, in particular to a variant text classification recognition method, a system, a storage medium and electronic equipment.
Background
The neural network can be used for training to obtain a classification model, so that the identification and interception of forbidden contents are realized. In order to avoid network supervision, poor text content often contains a large number of variants, either near-voice or near-shape, which presents a great challenge to internet content supervision. To address the challenges presented by these variants, a common solution is to add corresponding variant samples in the dataset of the trained classification model. However, the above scheme can improve the recall rate of the model to the variant sample and reduce the accuracy of the classification model.
Therefore, there is a need to provide a solution to the problems in the prior art.
Disclosure of Invention
In order to solve the technical problems, the invention provides a variant text classification recognition method, a system, a storage medium and electronic equipment.
The technical scheme of the variant text classification and identification method is as follows:
acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition;
inputting the text to be identified into the target text classification model to obtain a target identification result containing the variant error correction and text classification of the text to be identified.
The variant text classification and identification method has the beneficial effects that:
according to the method, the variant error correction data set is constructed through the supervised and unsupervised corpus data sets, the variant error correction task is trained through the variant error correction data set, and the variant error correction task is used as an auxiliary task to train the model together with the classification task, so that the regular effect on the variant semantic understanding of the model can be achieved, and the recognition accuracy of the classification model is improved.
Based on the scheme, the variant text classification recognition method can be improved as follows.
Further, the method further comprises the following steps:
and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Further, the step of constructing a variant error correction text dataset from the supervised corpus dataset and the unsupervised corpus dataset, comprises:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
generating a supervised language model by using the supervised corpus black sample set, and generating an unsupervised language model by using the unsupervised corpus black sample set;
based on a keyword extraction technology, extracting a black sample template from the black sample set of the non-supervised corpus, and obtaining a first variant mapping dataset according to the black sample template, the supervised language model and the non-supervised language model;
manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
The beneficial effects of adopting the further technical scheme are as follows: and further, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, so that the production efficiency of the variant error correction data set is improved compared with the variant error correction data set which is completely and manually marked.
Further, the step of generating a supervised language model by training the supervised corpus black sample set and generating an unsupervised language model by training the unsupervised corpus black sample set includes:
and training the supervised corpus black sample set by adopting a Masked LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
The technical scheme of the variant text classification recognition system is as follows:
comprising the following steps: the system comprises a construction module, a training module and an identification module;
the construction module is used for: acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
the training module is used for: training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition;
the identification module is used for: inputting the text to be identified into the target text classification model to obtain a target identification result containing the variant error correction and text classification of the text to be identified.
The variant text classification recognition system has the following beneficial effects:
according to the system, the variant error correction data set is constructed through the supervised and unsupervised corpus data sets, the variant error correction task is trained through the variant error correction data set, and the variant error correction task is used as an auxiliary task to train the model together with the classification task, so that the regular effect can be played on the variant semantic understanding of the model, and the recognition accuracy of the classification model is improved.
Based on the scheme, the variant text classification recognition system can be improved as follows.
Further, the method further comprises the following steps: a processing module;
the processing module is used for: and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Further, the construction module is specifically configured to:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
generating a supervised language model by using the supervised corpus black sample set, and generating an unsupervised language model by using the unsupervised corpus black sample set;
based on a keyword extraction technology, extracting a black sample template from the black sample set of the non-supervised corpus, and obtaining a first variant mapping dataset according to the black sample template, the supervised language model and the non-supervised language model;
manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
The beneficial effects of adopting the further technical scheme are as follows: and further, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, so that the production efficiency of the variant error correction data set is improved compared with the variant error correction data set which is completely and manually marked.
Further, the construction module is specifically configured to:
and training the supervised corpus black sample set by adopting a Masked LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
The technical scheme of the storage medium is as follows:
the storage medium has stored therein instructions which, when read by a computer, cause the computer to perform the steps of a variant text classification recognition method according to the invention.
The technical scheme of the electronic equipment is as follows:
comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, causes the computer to perform the steps of a variant text classification recognition method according to the invention.
Drawings
FIG. 1 is a flow chart of a variant text classification recognition method according to an embodiment of the invention;
fig. 2 is a schematic structural diagram of a variant text classification recognition system according to an embodiment of the invention.
Detailed Description
As shown in fig. 1, a variant text classification recognition method according to an embodiment of the present invention includes the following steps:
s1, acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set.
Wherein (1) the first text data set is: a data set comprising a plurality of texts, usable for training a text classification model, each data in a first text data set being labeled with a classification type, such as: forbidden, normal, etc. (2) The supervised corpus data set includes: a plurality of supervised corpus texts, the supervised corpus texts being obtained from text content sent by a supervisory population, the supervised corpus texts comprising a plurality of variant texts. (3) The unsupervised corpus data set includes: the method comprises the steps of acquiring a plurality of pieces of unregulated corpus texts, wherein the unregulated corpus texts are obtained from text contents sent by an unregulated crowd, and basically do not contain variant texts. (4) The variant error correction text data set is used to train a variant error correction task, the variant error correction text data set comprising a plurality of variant data pairs. For example, one variant data pair is: "you are sand" (variant text) and "you are boy" (body text).
And S2, training the first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition.
Wherein (1) the first original neural network model is: a neural network model capable of being used for text variant correction and text classification recognition simultaneously, the two parts of the model share a model backbone, and only the output layers of the model are different. (2) The target text classification model is as follows: the model for text variant error correction and text classification recognition is obtained after training.
It should be noted that, in the training process of the first original neural network model, the first text data set is used for training the text classification task of the first original neural network model, and the variant error correction text data set is used for training the variant error correction task of the first original neural network model.
S3, inputting the text to be identified into the target text classification model to obtain a target identification result containing the variant error correction and text classification of the text to be identified.
Wherein, (1) the text to be recognized is: the text selected at will can be a variant text or an ontology text. (2) The target recognition result includes: variant error correction results and text classification results. For example, the text to be recognized is: "you are sand", the target recognition result corresponding to the text to be recognized includes: variant error correction results: "you are fool", text classification results: "forbidden".
It should be noted that, the text classification result is judged according to the preset threshold value of the target text classification model. For example, assuming that the lower limit of the forbidden probability set by default of the preset threshold is 0.7, when the forbidden probability obtained by judging that the text to be recognized is input into the target text classification model is 0.8, judging that the text classification result of the text to be recognized is: forbidden; when the forbidden probability is 0.3, determining that the text classification result of the text to be identified is: normal. In this embodiment, the preset threshold may be set according to the requirement, and no limitation is set here.
Preferably, the method further comprises:
and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Wherein (1) the second original neural network model is: neural network models that can be used for text classification. (2) The original text classification model is: the model for classification herein obtained after training, the specific training process is not described in detail herein.
Preferably, the step of constructing a variant error correction text dataset from the supervised corpus dataset and the unsupervised corpus dataset comprises:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set.
The original text classification model judges whether a default threshold value for forbidden text defaults to 0.7, if so, the text is a black sample set which is more than or equal to 0.7, and is a white sample set which is less than 0.7, so that the supervised corpus data set and the unsupervised corpus data set are respectively classified through the original text classification model, and the supervised corpus black sample set, the supervised corpus white sample set, the unsupervised corpus black sample set and the unsupervised corpus white sample set are respectively obtained.
And generating a supervised language model by using the supervised corpus black sample set, and generating an unsupervised language model by using the unsupervised corpus black sample set.
Specifically, a Masked LM mode is adopted to train the black corpus sample set to obtain the supervised language model, and train the black corpus sample set to obtain the unsupervised language model.
The method comprises the steps of (1) training a language sample set by adopting a Masked LM mode to obtain a corresponding language model. (2) The functions of the supervised language model and the unsupervised language model are: and predicting the missing text according to the context. For example, the input is "today' s_gas is true. The model predicts "_" and outputs "day".
And extracting a black sample template from the black sample set of the non-supervised corpus based on a keyword extraction technology, and obtaining a first variant mapping data set according to the black sample template, the supervised language model and the non-supervised language model.
Wherein, (1) the black sample template is: text templates containing forbidden phrases. For example, when the black sample is "you are fool", the keyword "fool" is extracted by the keyword extraction technique; at this time, the characters in the keywords are deleted randomly, so that a black sample template is obtained: "you are_son" or "you are fool. (2) The first variant mapping dataset comprises: a plurality of low precision variant pairs. For example, using both a supervised language model and an unsupervised language model, the same black sample template is predicted (complemented); specifically, the supervised language model complements the black sample template "you are_son" to obtain "you are sand", while the unsupervised language model complements the black sample template "you are_son" to obtain "you are foolson", so as to obtain a variant pair.
Manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
Wherein (1) the target variant mapping dataset comprises: a plurality of artificially labeled variant pairs. (2) Since the white sample generally contains no variants, namely in the variant mapping pairs constructed by the supervised corpus white sample set and the unsupervised corpus white sample set, the ontology and the variants are the corresponding white samples.
It should be noted that, since the first variant mapping dataset may have errors (because the model is automatically generated, there may be problems of keyword extraction errors, misprediction between the supervised language model and the unsupervised language model, and incapability of matching the ontology and the variant, etc.), the first variant mapping dataset needs to be corrected by a manual labeling manner, so as to obtain the high-precision target variant mapping dataset.
According to the technical scheme, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, so that the production efficiency of the variant error correction data set is improved compared with the variant error correction data set which is completely and manually marked; the variant error correction task training can be performed through the variant error correction data set, the variant error correction task is used as an auxiliary task to train the model together with the classification task, the regular effect can be achieved on the variant semantic understanding of the model, and the recognition accuracy of the classification model is further improved.
As shown in fig. 2, a variant text classification recognition system 200 according to an embodiment of the present invention includes: a building module 210, a training module 220, and an identification module 230;
the construction module 210 is configured to: acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
the training module 220 is configured to: training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition;
the identification module 230 is configured to: inputting the text to be identified into the target text classification model to obtain a target identification result containing the variant error correction and text classification of the text to be identified.
Preferably, the method further comprises: a processing module;
the processing module is used for: and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Preferably, the construction module 210 is specifically configured to:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
generating a supervised language model by using the supervised corpus black sample set, and generating an unsupervised language model by using the unsupervised corpus black sample set;
based on a keyword extraction technology, extracting a black sample template from the black sample set of the non-supervised corpus, and obtaining a first variant mapping dataset according to the black sample template, the supervised language model and the non-supervised language model;
manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
Preferably, the construction module 210 is specifically configured to:
and training the supervised corpus black sample set by adopting a Masked LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
According to the technical scheme, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, so that the production efficiency of the variant error correction data set is improved compared with the variant error correction data set which is completely and manually marked; the variant error correction task training can be performed through the variant error correction data set, the variant error correction task is used as an auxiliary task to train the model together with the classification task, the regular effect can be achieved on the variant semantic understanding of the model, and the recognition accuracy of the classification model is further improved.
The steps for implementing the corresponding functions by the parameters and the modules in the variant text classification recognition system 200 according to the present embodiment are referred to the parameters and the steps in the embodiment of the variant text classification recognition method according to the present embodiment, and are not described herein.
The storage medium provided by the embodiment of the invention comprises: the storage medium stores instructions that, when read by a computer, cause the computer to perform steps such as a variant text classification and identification method, and specific reference may be made to the parameters and steps in the above embodiments of a variant text classification and identification method, which are not described herein.
Computer storage media such as: flash disk, mobile hard disk, etc.
The electronic device provided in the embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to make the computer execute steps such as a variant text classification recognition method, and specific reference may be made to each parameter and step in the embodiment of a variant text classification recognition method described above, which is not described herein.
Those skilled in the art will appreciate that the present invention may be implemented as a method, system, storage medium, and electronic device.
Thus, the invention may be embodied in the form of: either entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or entirely software, or a combination of hardware and software, referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media, which contain computer-readable program code. Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (6)

1. A method for classifying and identifying variant text, comprising:
acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition;
inputting a text to be identified into the target text classification model to obtain a target identification result containing variant error correction and text classification of the text to be identified;
further comprises: training a second original neural network model for text classification based on the first text data set to obtain an original text classification model;
the step of constructing a variant error correction text dataset from the supervised corpus dataset and the unsupervised corpus dataset comprises:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
generating a supervised language model by using the supervised corpus black sample set, and generating an unsupervised language model by using the unsupervised corpus black sample set;
based on a keyword extraction technology, extracting a black sample template from the black sample set of the non-supervised corpus, and obtaining a first variant mapping dataset according to the black sample template, the supervised language model and the non-supervised language model;
manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set;
wherein the first text data set is: a dataset comprising a plurality of text, each piece of data in the first text dataset being labeled with a classification type; the supervised corpus data set includes: a plurality of pieces of supervised corpus text, each piece of supervised corpus text being obtained from text content sent by a supervisory population and including a plurality of variant text; the unsupervised corpus data set includes: a plurality of pieces of non-supervised corpus texts, wherein the non-supervised corpus texts are obtained from text contents which are sent by non-supervised people and do not contain variant texts; the variant error correction text dataset comprises: a plurality of variant data pairs; the black sample template is: a text template containing forbidden phrases; the first variant mapping dataset comprises: a plurality of low precision variant pairs; the target variant mapping dataset comprises: a plurality of artificially labeled variant pairs.
2. The method of claim 1, wherein the step of generating a supervised language model using the supervised corpus black sample set and generating an unsupervised language model using the unsupervised corpus black sample set comprises:
and training the supervised corpus black sample set by adopting a maskidlm mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
3. A variant text classification recognition system, comprising: the system comprises a construction module, a training module and an identification module;
the construction module is used for: acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
the training module is used for: training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition;
the identification module is used for: inputting a text to be identified into the target text classification model to obtain a target identification result containing variant error correction and text classification of the text to be identified;
further comprises: a processing module;
the processing module is used for: training a second original neural network model for text classification based on the first text data set to obtain an original text classification model;
the construction module is specifically used for:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
generating a supervised language model by using the supervised corpus black sample set, and generating an unsupervised language model by using the unsupervised corpus black sample set;
based on a keyword extraction technology, extracting a black sample template from the black sample set of the non-supervised corpus, and obtaining a first variant mapping dataset according to the black sample template, the supervised language model and the non-supervised language model;
manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set;
wherein the first text data set is: a dataset comprising a plurality of text, each piece of data in the first text dataset being labeled with a classification type; the supervised corpus data set includes: a plurality of pieces of supervised corpus text, each piece of supervised corpus text being obtained from text content sent by a supervisory population and including a plurality of variant text; the unsupervised corpus data set includes: a plurality of pieces of non-supervised corpus texts, wherein the non-supervised corpus texts are obtained from text contents which are sent by non-supervised people and do not contain variant texts; the variant error correction text dataset comprises: a plurality of variant data pairs; the black sample template is: a text template containing forbidden phrases; the first variant mapping dataset comprises: a plurality of low precision variant pairs; the target variant mapping dataset comprises: a plurality of artificially labeled variant pairs.
4. A variant text classification recognition system according to claim 3, wherein the construction module is specifically configured to:
and training the supervised corpus black sample set by adopting a maskidlm mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
5. A storage medium having instructions stored therein which, when read by a computer, cause the computer to perform a variant text classification recognition method according to claim 1 or 2.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, causes the computer to perform a variant text classification recognition method as claimed in claim 1 or 2.
CN202211348321.1A 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment Active CN115858776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211348321.1A CN115858776B (en) 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211348321.1A CN115858776B (en) 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN115858776A CN115858776A (en) 2023-03-28
CN115858776B true CN115858776B (en) 2023-06-23

Family

ID=85662162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211348321.1A Active CN115858776B (en) 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115858776B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501867B (en) * 2023-03-29 2023-09-12 北京数美时代科技有限公司 Variant knowledge mastery detection method, system and storage medium based on mutual information

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766475A (en) * 2018-12-13 2019-05-17 北京爱奇艺科技有限公司 A kind of recognition methods of rubbish text and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287100A (en) * 2019-07-12 2021-01-29 阿里巴巴集团控股有限公司 Text recognition method, spelling error correction method and voice recognition method
CN113297833A (en) * 2020-02-21 2021-08-24 华为技术有限公司 Text error correction method and device, terminal equipment and computer storage medium
CN113642317A (en) * 2021-08-12 2021-11-12 广域铭岛数字科技有限公司 Text error correction method and system based on voice recognition result
CN114564942B (en) * 2021-09-06 2023-07-18 北京数美时代科技有限公司 Text error correction method, storage medium and device for supervision field
CN114203158A (en) * 2021-12-14 2022-03-18 苏州驰声信息科技有限公司 Child Chinese spoken language evaluation and error detection and correction method and device
CN114861636A (en) * 2022-05-10 2022-08-05 网易(杭州)网络有限公司 Training method and device of text error correction model and text error correction method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766475A (en) * 2018-12-13 2019-05-17 北京爱奇艺科技有限公司 A kind of recognition methods of rubbish text and device

Also Published As

Publication number Publication date
CN115858776A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
US20210150142A1 (en) Method and apparatus for determining feature words and server
US20170270912A1 (en) Language modeling based on spoken and unspeakable corpuses
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
US9626622B2 (en) Training a question/answer system using answer keys based on forum content
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
CN106528694B (en) semantic judgment processing method and device based on artificial intelligence
CN112580346B (en) Event extraction method and device, computer equipment and storage medium
CN115328756A (en) Test case generation method, device and equipment
CN112347760A (en) Method and device for training intention recognition model and method and device for recognizing intention
CN110633475A (en) Natural language understanding method, device and system based on computer scene and storage medium
CN112528637A (en) Text processing model training method and device, computer equipment and storage medium
CN114003682A (en) Text classification method, device, equipment and storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN111508497B (en) Speech recognition method, device, electronic equipment and storage medium
CN110738056B (en) Method and device for generating information
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN111492364B (en) Data labeling method and device and storage medium
CN115455416A (en) Malicious code detection method and device, electronic equipment and storage medium
CN111785259A (en) Information processing method and device and electronic equipment
CN117077678B (en) Sensitive word recognition method, device, equipment and medium
CN114519357B (en) Natural language processing method and system based on machine learning
CN114648984B (en) Audio sentence-breaking method and device, computer equipment and storage medium
CN113722496B (en) Triple extraction method and device, readable storage medium and electronic equipment
CN117573956B (en) Metadata management method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant