CN115858776A - Variant text classification recognition method, system, storage medium and electronic equipment - Google Patents

Variant text classification recognition method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN115858776A
CN115858776A CN202211348321.1A CN202211348321A CN115858776A CN 115858776 A CN115858776 A CN 115858776A CN 202211348321 A CN202211348321 A CN 202211348321A CN 115858776 A CN115858776 A CN 115858776A
Authority
CN
China
Prior art keywords
data set
corpus
variant
text
unsupervised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211348321.1A
Other languages
Chinese (zh)
Other versions
CN115858776B (en
Inventor
刘苏楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Original Assignee
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shumei Tianxia Beijing Technology Co ltd, Beijing Nextdata Times Technology Co ltd filed Critical Shumei Tianxia Beijing Technology Co ltd
Priority to CN202211348321.1A priority Critical patent/CN115858776B/en
Publication of CN115858776A publication Critical patent/CN115858776A/en
Application granted granted Critical
Publication of CN115858776B publication Critical patent/CN115858776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention relates to a variant text classification recognition method, a system, a storage medium and an electronic device, comprising the following steps: constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set; training the first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification identification; and inputting the text to be recognized into the target text classification model to obtain a target recognition result containing variant error correction and text classification of the text to be recognized. The method constructs the variant error correction data set through the supervised and unsupervised corpus data sets, performs variant error correction task training through the variant error correction data set, trains the model by taking the variant error correction task as an auxiliary task together with the classification task, can play a regular role in understanding the variant semantics of the model, and further improves the identification accuracy of the classification model.

Description

Variant text classification recognition method, system, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of text classification, in particular to a variant text classification and identification method, a variant text classification and identification system, a storage medium and electronic equipment.
Background
A classification model can be obtained by using neural network training, so as to realize the identification and interception of forbidden content. To avoid network administration, objectionable textual content often contains a large number of variants, either near-sound, near-shape, which presents a significant challenge to internet content administration. To address the challenges presented by these variants, a common solution is to add corresponding variant samples to the data set that trains the classification model. However, the above scheme can improve the recall rate of the model for the variant sample and also reduce the accuracy of the classification model.
Therefore, it is desirable to provide a technical solution to the problems in the prior art.
Disclosure of Invention
In order to solve the technical problem, the invention provides a variant text classification and identification method, a variant text classification and identification system, a storage medium and electronic equipment.
The technical scheme of the variant text classification and identification method is as follows:
acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification identification;
and inputting the text to be recognized into the target text classification model to obtain a target recognition result containing variant error correction and text classification of the text to be recognized.
The variant text classification and identification method has the following beneficial effects:
the method constructs the variant error correction dataset through the supervised and unsupervised corpus datasets, performs variant error correction task training through the variant error correction dataset, trains the model by taking the variant error correction task as an auxiliary task together with the classification task, can play a regular role in understanding the variant semantics of the model, and further improves the identification accuracy of the classification model.
On the basis of the scheme, the variant text classification and identification method can be further improved as follows.
Further, still include:
and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Further, the step of constructing a variant error correction text dataset from the supervised corpus dataset and the unsupervised corpus dataset comprises:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
training by utilizing the supervised corpus black sample set to generate a supervised language model, and training by utilizing the unsupervised corpus black sample set to generate an unsupervised language model;
extracting a black sample template from the unsupervised corpus black sample set based on a keyword extraction technology, and obtaining a first variant mapping data set according to the black sample template, the supervised language model and the unsupervised language model;
and manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
The beneficial effect of adopting the further technical scheme is that: and further, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, and compared with the variant error correction data set which is completely manually marked, the production efficiency of the variant error correction data set is improved.
Further, the step of training to generate a supervised language model using the supervised corpus black sample set and generating an unsupervised language model using the unsupervised corpus black sample set includes:
and training the supervised corpus black sample set by adopting a Masked LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
The technical scheme of the variant text classification and identification system is as follows:
the method comprises the following steps: the system comprises a construction module, a training module and an identification module;
the building module is used for: acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
the training module is configured to: training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification identification;
the identification module is configured to: and inputting the text to be recognized into the target text classification model to obtain a target recognition result containing variant error correction and text classification of the text to be recognized.
The variant text classification and identification system has the following beneficial effects:
the system of the invention constructs the variant error correction dataset through the supervised and unsupervised corpus dataset, performs variant error correction task training through the variant error correction dataset, takes the variant error correction task as an auxiliary task to train the model together with the classification task, can play a regular role in understanding the variant semantics of the model, and further improves the identification accuracy of the classification model.
On the basis of the scheme, the variant text classification and recognition system can be further improved as follows.
Further, the method also comprises the following steps: a processing module;
the processing module is used for: and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Further, the building module is specifically configured to:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
training by utilizing the supervised corpus black sample set to generate a supervised language model, and training by utilizing the unsupervised corpus black sample set to generate an unsupervised language model;
extracting a black sample template from the unsupervised corpus black sample set based on a keyword extraction technology, and obtaining a first variant mapping data set according to the black sample template, the supervised language model and the unsupervised language model;
and manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
The beneficial effect of adopting the further technical scheme is that: and further, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, and compared with the variant error correction data set which is completely manually marked, the production efficiency of the variant error correction data set is improved.
Further, the building module is specifically configured to:
and training the supervised corpus black sample set by adopting a mask LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
The technical scheme of the storage medium of the invention is as follows:
the storage medium has stored therein instructions which, when read by a computer, cause the computer to perform the steps of a variant text classification recognition method according to the invention.
The technical scheme of the electronic equipment is as follows:
comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, causes the computer to carry out the steps of a method for classification and identification of variant texts as claimed in the present invention.
Drawings
Fig. 1 is a schematic flow chart of a variant text classification and identification method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a variant text classification recognition system according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, a method for classifying and recognizing a variant text according to an embodiment of the present invention includes the following steps:
s1, a first text data set, a supervised corpus data set and an unsupervised corpus data set are obtained, and a variant error correction text data set is constructed according to the supervised corpus data set and the unsupervised corpus data set.
Wherein (1) the first text data set is: the data set containing a plurality of texts can be used for training a text classification model, and each piece of data in the first text data set is labeled with a classification type, such as: contraband, normal, etc. (2) The supervised corpus data set includes: and a plurality of supervised corpus texts, wherein the supervised corpus texts are obtained from the text contents sent by the supervised people and comprise a large amount of variant texts. (3) The unsupervised corpus data set includes: the method comprises the steps that a plurality of unsupervised corpus texts are obtained from text contents sent by unsupervised crowds, and the unsupervised corpus texts basically do not contain variant texts. (4) The variant error correction text data set is used to train a variant error correction task, the variant error correction text data set including a plurality of variant data pairs. For example, one variant data pair is: "you are sand" (variant text) and "you are fool" (ontology text).
S2, training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification recognition.
Wherein, (1) the first original neural network model is: the neural network model can be simultaneously used for text variant error correction and text classification recognition, two functions of the model share a model backbone, and only the output layer of the model is different. (2) The target text classification model is as follows: and training to obtain a model for text variant error correction and text classification recognition.
It should be noted that, in the training process of the first original neural network model, the first text data set is used for training the text classification task of the first original neural network model, and the variant error correction text data set is used for training the variant error correction task of the first original neural network model.
S3, inputting the text to be recognized into the target text classification model to obtain a target recognition result containing variant error correction and text classification of the text to be recognized.
Wherein, (1) the text to be recognized is: the arbitrarily selected text may be a variant text or an ontology text. (2) The target recognition result includes: variant error correction results and text classification results. For example, the text to be recognized is: "you are sand", the target recognition result corresponding to the text to be recognized includes: variant error correction results: "you are fool", text classification result: "violation".
It should be noted that the text classification result is judged according to the preset threshold of the target text classification model. For example, assuming that the default setting of the preset threshold value has a lower limit of the forbidden probability of 0.7, when the text to be recognized is input into the target text classification model for judgment and the obtained forbidden probability is 0.8, the text classification result of the text to be recognized is determined as follows: violation; when the forbidden probability is 0.3, judging that the text classification result of the text to be recognized is as follows: and (4) normal. In this embodiment, the preset threshold may be set according to requirements, and is not limited herein.
Preferably, the method further comprises the following steps:
and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Wherein, (1) the second original neural network model is: neural network models that can be used for text classification. (2) The original text classification model was: the specific training process of the model for text classification obtained after training is not described in detail herein.
Preferably, the step of constructing a variant error corrected text data set from the supervised corpus data set and the unsupervised corpus data set comprises:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set.
The original text classification model judges whether the forbidden preset threshold value of the text is 0.7 by default, if the forbidden preset threshold value is greater than or equal to 0.7, the forbidden preset threshold value is a black sample set, and if the forbidden preset threshold value is less than 0.7, the forbidden preset threshold value is a white sample set, so that the supervised corpus data set and the unsupervised corpus data set are classified through the original text classification model respectively, and the supervised corpus black sample set, the supervised corpus white sample set, the unsupervised corpus black sample set and the unsupervised corpus white sample set are obtained respectively.
And training by using the supervised corpus black sample set to generate a supervised language model, and training by using the unsupervised corpus black sample set to generate an unsupervised language model.
Specifically, a Masked LM mode is adopted to train the supervised corpus black sample set to obtain the supervised language model, and the unsupervised corpus black sample set is trained to obtain the unsupervised language model.
Wherein, the process of (1) training the corpus sample set to obtain the corresponding language model by adopting a Masked LM mode is the prior art. (2) The supervised and unsupervised language models function as: and predicting the missing characters according to the context. For example, the input is "today's _ true and good. ", the model predicts" _ "and outputs" day ".
And extracting a black sample template from the unsupervised corpus black sample set based on a keyword extraction technology, and obtaining a first variant mapping data set according to the black sample template, the supervised language model and the unsupervised language model.
Wherein, (1) the black sample template is: a text template containing a forbidden phrase. For example, when the black sample is "you are fool", the keyword "fool" is extracted by the keyword extraction technique; and at the moment, randomly deleting the characters in the keywords to obtain a black sample template: "you are children" or "you are fool". (2) The first variant mapping dataset comprises: a plurality of low precision variant pairs. For example, the same black sample template is predicted (complemented) using both the supervised and unsupervised language models; specifically, the supervised language model completes the black sample template 'you are _ children' to obtain 'you are sand', and the unsupervised language model completes the black sample template 'you are _ children' to obtain 'you are fool', so that a variant pair is obtained.
And manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
Wherein (1) the target variant mapping dataset comprises: a plurality of manually labeled variant pairs. (2) Since the white samples generally do not contain variants, that is, in the variant mapping pair constructed by the supervised corpus white sample set and the unsupervised corpus white sample set, both the ontology and the variants are the corresponding white samples themselves.
It should be noted that, since the first variant mapping data set may have errors (because the model is automatically generated, there may be problems of a keyword extraction error, a prediction error between a supervised language model and an unsupervised language model, and an ontology and a variant being unable to be matched, etc.), the first variant mapping data set needs to be modified by a manual labeling manner, so as to obtain a high-precision target variant mapping data set.
According to the technical scheme, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, and compared with the variant error correction data set which is completely manually marked, the production efficiency of the variant error correction data set is improved; the variant error correction task can be trained through the variant error correction data set, the variant error correction task is used as an auxiliary task to train the model together with the classification task, the regular understanding effect on the variant semantics of the model can be achieved, and the recognition accuracy of the classification model is further improved.
As shown in fig. 2, a variant text classification recognition system 200 according to an embodiment of the present invention includes: a construction module 210, a training module 220, and a recognition module 230;
the building module 210 is configured to: acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
the training module 220 is configured to: training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification identification;
the identification module 230 is configured to: and inputting the text to be recognized into the target text classification model to obtain a target recognition result containing variant error correction and text classification of the text to be recognized.
Preferably, the method further comprises the following steps: a processing module;
the processing module is used for: and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
Preferably, the building module 210 is specifically configured to:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
training by utilizing the supervised corpus black sample set to generate a supervised language model, and training by utilizing the unsupervised corpus black sample set to generate an unsupervised language model;
extracting a black sample template from the unsupervised corpus black sample set based on a keyword extraction technology, and obtaining a first variant mapping data set according to the black sample template, the supervised language model and the unsupervised language model;
and manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
Preferably, the building module 210 is specifically configured to:
and training the supervised corpus black sample set by adopting a mask LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
According to the technical scheme, the variant error correction data set is automatically constructed by constructing the supervised language model and the unsupervised language model, and compared with the variant error correction data set which is completely manually marked, the production efficiency of the variant error correction data set is improved; the variant error correction task can be trained through the variant error correction data set, the variant error correction task is used as an auxiliary task to train the model together with the classification task, the regular understanding effect on the variant semantics of the model can be achieved, and the recognition accuracy of the classification model is further improved.
The above steps for realizing the corresponding functions of each parameter and each module in the variant text classification and identification system 200 of the present embodiment may refer to each parameter and step in the above embodiment of the variant text classification and identification method, which are not described herein again.
An embodiment of the present invention provides a storage medium, including: the storage medium stores instructions, and when the instructions are read by a computer, the computer is caused to execute the steps of the variant text classification and identification method, which may specifically refer to the parameters and steps in the embodiment of the variant text classification and identification method above, and are not described herein again.
Computer storage media such as: flash disks, portable hard disks, and the like.
An electronic device provided in an embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and is characterized in that when the processor executes the computer program, the computer executes steps of a variant text classification and identification method, which may specifically refer to each parameter and step in an embodiment of the variant text classification and identification method above, and are not described herein again.
Those skilled in the art will appreciate that the present invention may be embodied as methods, systems, storage media and electronic devices.
Thus, the present invention may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A variant text classification recognition method is characterized by comprising the following steps:
acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification identification;
and inputting the text to be recognized into the target text classification model to obtain a target recognition result containing variant error correction and text classification of the text to be recognized.
2. The variant text classification recognition method of claim 1, further comprising:
and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
3. The variant text classification recognition method according to claim 2, wherein the step of constructing a variant corrected text data set from the supervised corpus data set and the unsupervised corpus data set comprises:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
training by utilizing the supervised corpus black sample set to generate a supervised language model, and training by utilizing the unsupervised corpus black sample set to generate an unsupervised language model;
extracting a black sample template from the unsupervised corpus black sample set based on a keyword extraction technology, and obtaining a first variant mapping data set according to the black sample template, the supervised language model and the unsupervised language model;
and manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
4. The method according to claim 3, wherein the step of training the supervised corpus black sample set to generate the supervised language model and the step of training the unsupervised corpus black sample set to generate the unsupervised language model comprises:
and training the supervised corpus black sample set by adopting a mask LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
5. A variant text classification recognition system, comprising: the system comprises a construction module, a training module and an identification module;
the building module is used for: acquiring a first text data set, a supervised corpus data set and an unsupervised corpus data set, and constructing a variant error correction text data set according to the supervised corpus data set and the unsupervised corpus data set;
the training module is configured to: training a first original neural network model based on the first text data set and the variant error correction text data set to obtain a target text classification model for text variant error correction and text classification identification;
the identification module is configured to: and inputting the text to be recognized into the target text classification model to obtain a target recognition result containing variant error correction and text classification of the text to be recognized.
6. The variant text classification recognition system of claim 5, further comprising: a processing module;
the processing module is used for: and training a second original neural network model for text classification based on the first text data set to obtain an original text classification model.
7. The system according to claim 6, wherein the construction module is specifically configured to:
classifying the supervised corpus data set by using the original text classification model to obtain a supervised corpus black sample set and a supervised corpus white sample set, and classifying the unsupervised corpus data set by using the original text classification model to obtain an unsupervised corpus black sample set and an unsupervised corpus white sample set;
training by utilizing the supervised corpus black sample set to generate a supervised language model, and training by utilizing the unsupervised corpus black sample set to generate an unsupervised language model;
extracting a black sample template from the unsupervised corpus black sample set based on a keyword extraction technology, and obtaining a first variant mapping data set according to the black sample template, the supervised language model and the unsupervised language model;
and manually labeling the first variant mapping data set to obtain a target variant mapping data set, and obtaining the variant error correction text data set according to the target variant mapping data set, the supervised corpus white sample set and the unsupervised corpus white sample set.
8. The system according to claim 7, wherein the construction module is specifically configured to:
and training the supervised corpus black sample set by adopting a mask LM mode to obtain the supervised language model, and training the unsupervised corpus black sample set to obtain the unsupervised language model.
9. A storage medium having stored thereon instructions which, when read by a computer, cause the computer to carry out a variant text classification recognition method as claimed in any one of claims 1 to 4.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, causes the computer to perform a variant text classification recognition method according to any one of claims 1 to 4.
CN202211348321.1A 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment Active CN115858776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211348321.1A CN115858776B (en) 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211348321.1A CN115858776B (en) 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN115858776A true CN115858776A (en) 2023-03-28
CN115858776B CN115858776B (en) 2023-06-23

Family

ID=85662162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211348321.1A Active CN115858776B (en) 2022-10-31 2022-10-31 Variant text classification recognition method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115858776B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501867A (en) * 2023-03-29 2023-07-28 北京数美时代科技有限公司 Variant knowledge mastery detection method, system and storage medium based on mutual information

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766475A (en) * 2018-12-13 2019-05-17 北京爱奇艺科技有限公司 A kind of recognition methods of rubbish text and device
CN112287100A (en) * 2019-07-12 2021-01-29 阿里巴巴集团控股有限公司 Text recognition method, spelling error correction method and voice recognition method
CN113297833A (en) * 2020-02-21 2021-08-24 华为技术有限公司 Text error correction method and device, terminal equipment and computer storage medium
CN113642317A (en) * 2021-08-12 2021-11-12 广域铭岛数字科技有限公司 Text error correction method and system based on voice recognition result
CN114203158A (en) * 2021-12-14 2022-03-18 苏州驰声信息科技有限公司 Child Chinese spoken language evaluation and error detection and correction method and device
CN114564942A (en) * 2021-09-06 2022-05-31 北京数美时代科技有限公司 Text error correction method, storage medium and device for supervision field
CN114861636A (en) * 2022-05-10 2022-08-05 网易(杭州)网络有限公司 Training method and device of text error correction model and text error correction method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766475A (en) * 2018-12-13 2019-05-17 北京爱奇艺科技有限公司 A kind of recognition methods of rubbish text and device
CN112287100A (en) * 2019-07-12 2021-01-29 阿里巴巴集团控股有限公司 Text recognition method, spelling error correction method and voice recognition method
CN113297833A (en) * 2020-02-21 2021-08-24 华为技术有限公司 Text error correction method and device, terminal equipment and computer storage medium
CN113642317A (en) * 2021-08-12 2021-11-12 广域铭岛数字科技有限公司 Text error correction method and system based on voice recognition result
CN114564942A (en) * 2021-09-06 2022-05-31 北京数美时代科技有限公司 Text error correction method, storage medium and device for supervision field
CN114203158A (en) * 2021-12-14 2022-03-18 苏州驰声信息科技有限公司 Child Chinese spoken language evaluation and error detection and correction method and device
CN114861636A (en) * 2022-05-10 2022-08-05 网易(杭州)网络有限公司 Training method and device of text error correction model and text error correction method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501867A (en) * 2023-03-29 2023-07-28 北京数美时代科技有限公司 Variant knowledge mastery detection method, system and storage medium based on mutual information
CN116501867B (en) * 2023-03-29 2023-09-12 北京数美时代科技有限公司 Variant knowledge mastery detection method, system and storage medium based on mutual information

Also Published As

Publication number Publication date
CN115858776B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
US10192545B2 (en) Language modeling based on spoken and unspeakable corpuses
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
JP5901001B1 (en) Method and device for acoustic language model training
JP6909832B2 (en) Methods, devices, equipment and media for recognizing important words in audio
US10755048B2 (en) Artificial intelligence based method and apparatus for segmenting sentence
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
WO2020108063A1 (en) Feature word determining method, apparatus, and server
EP3748548A1 (en) Adversarial learning-based text annotation method and device
CN110083832B (en) Article reprint relation identification method, device, equipment and readable storage medium
EP3620994A1 (en) Methods, apparatuses, devices, and computer-readable storage media for determining category of entity
CN111125317A (en) Model training, classification, system, device and medium for conversational text classification
CN107861948B (en) Label extraction method, device, equipment and medium
US9811517B2 (en) Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
CN113177412A (en) Named entity identification method and system based on bert, electronic equipment and storage medium
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
CN112507706A (en) Training method and device of knowledge pre-training model and electronic equipment
US20230114673A1 (en) Method for recognizing token, electronic device and storage medium
CN111177375A (en) Electronic document classification method and device
CN110991175A (en) Text generation method, system, device and storage medium under multiple modes
CN113743101A (en) Text error correction method and device, electronic equipment and computer storage medium
CN111126084B (en) Data processing method, device, electronic equipment and storage medium
CN115858776A (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116561298A (en) Title generation method, device, equipment and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant