CN110223675B - Method and system for screening training text data for voice recognition - Google Patents
Method and system for screening training text data for voice recognition Download PDFInfo
- Publication number
- CN110223675B CN110223675B CN201910510814.2A CN201910510814A CN110223675B CN 110223675 B CN110223675 B CN 110223675B CN 201910510814 A CN201910510814 A CN 201910510814A CN 110223675 B CN110223675 B CN 110223675B
- Authority
- CN
- China
- Prior art keywords
- screening
- training text
- neural network
- processing
- text data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012216 screening Methods 0.000 title claims abstract description 120
- 238000012549 training Methods 0.000 title claims abstract description 112
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000012545 processing Methods 0.000 claims abstract description 60
- 238000013528 artificial neural network Methods 0.000 claims abstract description 35
- 230000004927 fusion Effects 0.000 claims abstract description 20
- 238000010606 normalization Methods 0.000 claims abstract description 20
- 230000011218 segmentation Effects 0.000 claims description 19
- 238000001914 filtration Methods 0.000 claims description 7
- 230000007787 long-term memory Effects 0.000 claims description 4
- 230000006403 short-term memory Effects 0.000 claims description 4
- 230000008676 import Effects 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 abstract description 17
- 230000008569 process Effects 0.000 abstract description 10
- 230000000694 effects Effects 0.000 abstract description 5
- 230000015654 memory Effects 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 241000208818 Helianthus Species 0.000 description 2
- 235000003222 Helianthus annuus Nutrition 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a method for screening training text data for voice recognition. The method comprises the following steps: the method comprises the following steps of carrying out standardization processing on training text data, and carrying out preprocessing before inputting the training text data after the standardization processing, wherein the preprocessing comprises the following steps: converting the training text data after the normalization processing into input information of a data screening model, wherein the input information comprises a unique number corresponding to a sentence of the training text; and importing the converted input information into a fusion screening model formed by combining a plurality of neural network screening models in parallel, and screening the text sentences reaching a preset positive case probability score threshold value in the output of the fusion screening model into training text data for voice recognition. The embodiment of the invention also provides a system for screening the training text data for the voice recognition. The embodiment of the invention ensures that the processes are all automatic, thereby saving a large amount of labor cost, improving the reusability, and improving the screening effect of the training text data by considering the language relationship in the text content.
Description
Technical Field
The invention relates to the field of intelligent voice, in particular to a method and a system for screening training text data for voice recognition.
Background
In training a speech recognition model, a large amount of high-quality training text data is often required for the effect of training. In the existing scheme, in order to acquire a large amount of training text data, the training text data is usually completed by simple and rough matching based on simple character rules, or is screened by a manual inspection mode after being matched by simple rules.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:
the existing text training data of voice recognition are obtained by using a small amount of manual inspection labels, most of the existing text training data are obtained by using simple and rough tools for processing, and efficient and automatic methods are not used for processing. Although the data quality obtained by the manual labeling method is generally high, the data volume of the speech recognition training text is generally very large, and in general, a large number of MB (Megabyte ) (corresponding to a hundred million characters), and a large number of TB (Terabyte, Terabyte or Terabyte) bring a large amount of indication of manpower and material resources.
First, the amount of text training data that is usually recognized is huge and the data distribution is very diverse, and a method using simple rules may result in effectiveness on some fixed range or domain of text data, but the same scheme migrates to other data, and much work needs to be reprocessed. The reusability of the scheme is extremely low, and the scheme is not easy to popularize.
Secondly, the general automated scheme cannot effectively consider the semantic language relationship in the text content, the effectiveness of the screened data is not high, but whether the text is reasonable or smooth is judged by depending on the semantic language relationship in the text.
In addition, a general automatic scheme adopts a rule screening and rule filtering method, after general rules of the scheme are increased, a plurality of rules are contradictory or redundant, the same input data can be simultaneously applied to a plurality of rules, and how to select the really applied rules needs manual intervention or the weight conditions of some rules are set in advance; furthermore, in such schemes, it is usually impossible to predict in advance which rule or rules the data needs to be processed for each piece of input data, so that each piece of data needs to be calculated by the rule one by one, which makes it difficult for these schemes to process data in a large-scale and distributed manner quickly and efficiently.
Disclosure of Invention
The method at least solves the problem that the existing text training data in the prior art are all completed based on simple character rule rough matching or are screened in a mode of manual inspection after simple rule matching.
In a first aspect, an embodiment of the present invention provides a method for screening training text data for speech recognition, including:
carrying out standardization processing on training text data, and carrying out preprocessing before inputting on the training text data after the standardization processing, wherein the preprocessing before inputting at least comprises the following steps: converting the training text data after the normalization processing into input information of a data screening model, wherein the input information comprises a unique number corresponding to a sentence of the training text;
and importing the converted input information into a fusion screening model formed by combining a plurality of neural network screening models in parallel, and screening the text sentences reaching a preset positive case probability score threshold value in the output of the fusion screening model into training text data for voice recognition.
In a second aspect, an embodiment of the present invention provides a system for screening training text data for speech recognition, including:
the pre-processing module is used for carrying out normalization processing on the training text data and carrying out pre-input preprocessing on the training text data after the normalization processing, wherein the pre-input preprocessing at least comprises the following steps: converting the training text data after the normalization processing into input information of a data screening model, wherein the input information comprises a unique number corresponding to a sentence of the training text;
and the training text screening module is used for importing the converted input information into a fusion screening model formed by combining a plurality of neural network screening models in parallel, and screening the text sentences which reach a preset positive case probability score threshold value in the output of the fusion screening model into training text data for voice recognition.
In a third aspect, an electronic device is provided, comprising: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method for filtering training text data for speech recognition according to any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the method for filtering training text data for speech recognition according to any embodiment of the present invention.
The embodiment of the invention has the beneficial effects that: the training text data is subjected to standardized processing and converted into input information using a data screening model, so that the whole process is automated, a large amount of labor cost can be saved, a large amount of expenditure is reduced for enterprises, and the reusability is improved. By fusing various neural network screening models and screening and judging a plurality of dimensions, the language relationship in the text content is effectively considered, and the screening effect of the training text data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for screening training text data for speech recognition according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a system for screening training text data for speech recognition according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for screening training text data for speech recognition according to an embodiment of the present invention, which includes the following steps:
s11: carrying out standardization processing on training text data, and carrying out preprocessing before inputting on the training text data after the standardization processing, wherein the preprocessing before inputting at least comprises the following steps: converting the training text data after the normalization processing into input information of a data screening model, wherein the input information comprises a unique number corresponding to a sentence of the training text;
s12: and importing the converted input information into a fusion screening model formed by combining a plurality of neural network screening models in parallel, and screening the text sentences reaching a preset positive case probability score threshold value in the output of the fusion screening model into training text data for voice recognition.
In the embodiment, a method for deepening neural network classification by deep text semantic representation is adopted. Because the deep learning method is adopted to express the semantics of the text, the effective semantic information in the text can be obtained by a very simple and efficient scheme, so that the screening effect becomes more accurate. The scheme based on deep learning only needs to consider different input information and output information, can share the same model structure, and then is used in different fields or scenes, so that the model has high expansibility. The scheme based on deep learning, namely model training or formal application of the model can be deployed in a distributed computer cluster, so that the screening efficiency can be doubled.
For step S11, performing normalization processing on the training text data, where the normalization processing at least includes, as an embodiment: text format processing and/or character form processing; the text format processing includes: converting the training text data in a non-standard format into a text form of one word per line or one sentence per line, wherein the non-standard format comprises HTML and JSON; the character form processing includes: and removing illegal symbols in the training text data, wherein the illegal symbols comprise webpage labels and emoticons.
For example, since training text data is widely acquired, the acquired training text data is relatively cluttered. For example, the obtained HTML-formatted data is as follows:
-var articleties ═ machine-learned text classification ═ & (attached training set + dataset + all codes) "; -1. precision mode, trying to start the sentence most accurately, suitable for text analysis. < br/> -2. the full mode scans all words that can be worded in a sentence out very fast but does not resolve the ambiguity < br/>
Such training text sentences usually have similar mark symbols, and during data training, such illegal characters need to be removed, and the text format processing is performed to obtain:
text classification for machine learning (with training set + data set + all codes) "
1. And the accurate mode tries to start the sentence most accurately, and is suitable for text analysis.
2. The full mode scans all words that can be typed in a sentence very fast but does not resolve ambiguities.
And further performing text format processing and character form processing on the data acquired by the webpage. Thereby making the training text more canonical. And the accuracy of training text data is improved.
After the character processing, the normalization processing further includes: sentence breaking processing;
the sentence-breaking processing comprises: and performing punctuation according to punctuation marks in the training text data, when no punctuation mark exists in the training text exceeding the preset length, performing punctuation addition through character form processing, and performing punctuation processing on the training text data to which the punctuation marks are added.
For example, in the above sentence, "2. full mode scans all words that can be typed in the sentence very fast but cannot resolve ambiguities" can be broken to "2. full mode scans all words that can be typed in the sentence very fast but cannot resolve ambiguities. "
After normalization processing, data conversion is carried out on the training text data, the training text data are converted into input information suitable for a data screening model, the input information comprises unique number numbers corresponding to sentences of the training text, the input of the data screening module is usually a vector or a matrix, therefore, the sentences in each sentence cannot be directly used as input, each word in the feature conversion module is uniquely corresponding to a number, each sentence text corresponds to a number string, and the vector formed by the number strings is used as the input data of the data screening module.
For step S12, the converted input information is imported into a fusion screening model formed by combining a plurality of neural network screening models in parallel, text sentences that reach a preset positive case probability score threshold value in the output of the fusion screening model are screened as training text data for speech recognition, wherein a batch of data input each time is simultaneously sent into the plurality of neural network screening models, then model calculation for data screening is respectively performed, scores of each classification result corresponding to the data are respectively obtained and fused, a certain fusion method is adopted for a plurality of groups of scores, a group of scores corresponding to each classification label of the input are obtained, and the labels are divided into positive case labels and negative case labels of sentences. For example, the sentence: "Cano-femoral leg", the positive example tag scored 0.1 and the negative example tag scored 0.9. For example: the sentence "too dark to turn the light on" positive case label score is 0.95 and negative case label score is 0.05. For example, when the default positive probability score threshold is 0.5, the sentence "turning the light on too dark" is filtered as training text data for speech recognition.
According to the embodiment, the training text data is subjected to the standardized processing and is converted into the input information using the data screening model, so that the whole process is automated, a large amount of labor cost can be saved, a large amount of expenditure is reduced for enterprises, and the reusability is improved. By fusing various neural network screening models and screening and judging a plurality of dimensions, the language relationship in the text content is effectively considered, and the screening effect of the training text data is improved.
As an implementation manner, in this embodiment, the pre-input preprocessing at least includes:
performing word segmentation on the training text data after the normalization processing to obtain word string combinations with uniform granularity;
and converting the word string combination into input information suitable for a data screening model, wherein the input information comprises a unique number corresponding to the word string combination.
In the present embodiment, the training text data after the normalization processing is subjected to word segmentation, for example, "sunflower is beautiful under Sansku-Gao pen", word string combinations "Sansko", "sunflower" and "beautiful" with uniform granularity are obtained after the word segmentation, and these word string combinations are converted into an input using the data filtering model. For example, the corresponding unique number, e.g., "Sanskrit" corresponds to a number: 37954567646040612330 (or other types of numbers, without limitation) a word will typically have a number corresponding to it, and the numbers will form a string of numbers.
Through the embodiment, the word segmentation is carried out on the training text data, so that the data distinguishing performance can be further improved, and the error classification in the subsequent screening process is avoided.
As an implementation manner, in this embodiment, the method further includes:
performing word segmentation on the training text data after the normalization processing to obtain word string combinations with uniform granularity, and determining word segmentation parts of the word string combinations;
and converting the word string combination and the corresponding word segmentation part of speech into input information suitable for a data screening model, wherein the input information comprises the combination of the word string combination and the corresponding word segmentation part of speech to obtain a corresponding unique number.
In the present embodiment, in addition to determining a word string combination, the part-word part-of-speech of the word string combination is also determined, for example: morphemes, adjectives, paraphrases, ideograms, discriminators, conjunctions, etc. And then the word string combination and the corresponding word segmentation part of speech are converted into input information using a data screening model.
According to the embodiment, each sentence of training text is composed of the information of the words and the part-of-speech information corresponding to the words, and due to the combination mode of the part-of-speech information, the smoothness degree of a sentence can be reflected to a great extent, so that the accuracy of the screening model can be increased by taking the information as a component in the scheme.
As an implementation manner, in this embodiment, the screening, as training text data for speech recognition, a text sentence in the output of the fusion screening model that reaches a preset normative probability score threshold includes:
respectively obtaining the positive case probability scores output by each neural network screening model, and screening the text sentences of which the highest score reaches a preset positive case probability score threshold value in the positive case probability scores into training text data for voice recognition; or
And respectively obtaining the positive case probability score output by each neural network screening model, and screening the text statement into training text data for voice recognition when the positive case probability score of the weighted mean of the positive case probability scores reaches a preset positive case probability score threshold value.
In this embodiment, the positive example probability scores of each neural network output may be obtained, and the text sentence with the highest positive example probability score and reaching the preset threshold value of the percentage of modification is screened as the training text data for speech recognition, for example, the sentence "turn on the light too dark" is selected, the positive example label score of the first neural network output is 0.95, and the positive example label score of the second neural network output is 0.75, and then the positive example probability scores are compared with the preset threshold value of the positive example probability scores according to the score of 0.95. A weighted average may also be performed to give a score of 0.85 for (0.95+ 0.75)/2.
It can be seen from this embodiment that the user's needs can be satisfied by using different score determination methods. Different fusion modes are provided according to the requirements of users. Therefore, text training data which are more suitable for user requirements are screened out.
As an implementation manner, in this embodiment, each neural network screening model has a first fully connected layer and a second fully connected layer, wherein the dimension of the first fully connected layer is larger than that of the second fully connected layer, and the neural network screening model is trained using dropout in the second fully connected layer to prevent overfitting.
In the embodiment, each model structure in the screening model structure has two full-connection layers, the two full-connection layers have different dimensions, the first full-connection layer has a large dimension, the second full-connection layer has a small dimension, and the second full-connection layer uses the dropout technology, so that the model cannot be over-fitted when the screening model is trained by the design technology.
According to the embodiment, the model has stronger generalization capability and more robustness in the formal use process.
As an implementation manner, in this embodiment, the plurality of neural network screening models includes at least two neural network screening models, including: each sentence requires a long-short term memory network screening model that inputs each word in turn, and a convolutional neural network screening model that allows the input of a complete string of words for a complete sentence at a time.
In this embodiment, it can be seen that in a CNN (Convolutional Neural Network) screening model, each word after feature conversion is first converted into a corresponding word nest, then after a result of the word nest is calculated by convolution of the Convolutional Neural Network, a main feature after convolution is selected through a maximum pooling layer and is input into two continuous full-link layers, and finally, a probability score of the current word for each classification label is obtained through a classification layer.
In an LSTM (Long Short-Term Memory network) screening model, the main flow is similar to that of a CNN screening model, the largest difference exists between two points, and the most main model structure in the first model is a Long-Term and Short-Term Memory network layer; the second LSTM screening model memory network requires entry of each word once for each utterance, but the CNN screening model allows entry of a complete string of words for a complete utterance at a time. Therefore, the CNN screening model is usually faster to compute than the long-term memory network model.
As an implementation manner, in this embodiment, the training text data is from raw data uploaded by a web crawler or manually.
In the embodiment, as massive data are required to be screened, and massive data are obtained from a certain position, if the massive data are only uploaded manually, the labor cost is high, and the automatic acquisition of the initial-level training text by the webpage crawler is quicker.
According to the embodiment, the source of the training text data is also automated on the basis of full automation of the training text data screening. The enterprise expenses are reduced to a greater extent.
On the whole, the quality and the quantity of text training data in voice recognition can greatly influence the performance of voice recognition, and a large amount of high-quality training data are selected through an automatic scheme and can be used for training a voice recognition model, so that the performance of voice recognition is greatly improved; the scheme is based on a classification method, and similar training data screening can be improved slightly by using the framework. For example, screening of picture data and audio data only requires that a word nesting layer in the data preprocessing module, the word segmentation module, the feature extraction module and the screening model is correspondingly changed and can be used.
Fig. 2 is a schematic structural diagram of a system for screening training text data for speech recognition according to an embodiment of the present invention, which can execute the method for screening training text data for speech recognition according to any of the embodiments described above and is configured in a terminal.
The present embodiment provides a system for screening training text data for speech recognition, including: a pre-preprocessing module 11 and a training text screening module 12.
The pre-processing module 11 is configured to perform normalization processing on training text data, and perform pre-processing before inputting on the training text data after the normalization processing, where the pre-processing before inputting at least includes: converting the training text data after the normalization processing into input information of a data screening model, wherein the input information comprises a unique number corresponding to a sentence of the training text; the training text screening module 12 is configured to import the converted input information into a fusion screening model formed by combining a plurality of neural network screening models in parallel, and screen a text sentence, which reaches a preset positive case probability score threshold value in the output of the fusion screening model, as training text data for speech recognition.
The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the method for screening the training text data for voice recognition in any method embodiment;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
carrying out standardization processing on training text data, and carrying out preprocessing before inputting on the training text data after the standardization processing, wherein the preprocessing before inputting at least comprises the following steps: converting the training text data after the normalization processing into input information of a data screening model, wherein the input information comprises a unique number corresponding to a sentence of the training text;
and importing the converted input information into a fusion screening model formed by combining a plurality of neural network screening models in parallel, and screening the text sentences reaching a preset positive case probability score threshold value in the output of the fusion screening model into training text data for voice recognition.
As a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the methods of testing software in embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a screening method of training text data for speech recognition in any of the method embodiments described above.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a device of test software, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the means for testing software over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present invention further provides an electronic device, which includes: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method for filtering training text data for speech recognition according to any of the embodiments of the present invention.
The client of the embodiment of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.
(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) Other electronic devices with data processing capabilities.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (8)
1. A method of automated screening of training text data for speech recognition, comprising:
acquiring original data from a webpage crawler or manually uploaded;
carrying out text format processing and/or character form processing normalization processing on the original data;
performing word segmentation on the normalized original data;
converting the original data after word segmentation processing into input information which is suitable for a neural network screening model and corresponds to a unique number, wherein the input information comprises: the unique number corresponding to the word string combination and the corresponding word segmentation part-of-speech combination;
importing the input information into a fusion screening model formed by combining a plurality of neural network screening models in parallel, wherein the neural network screening models at least comprise: each sentence needs to input the long and short term memory network screening model of each word in turn, and the convolution neural network screening model of the complete word string allowing to input a complete sentence at a time;
and fusing the output results of the plurality of neural network screening models, and screening out training text data according to the fusion result.
2. The method of claim 1, wherein the tokenizing the normalized raw data comprises:
and the word segmentation is carried out to obtain word string combinations with uniform granularity.
3. The method of claim 2, wherein the method further comprises:
and performing word segmentation on the normalized original data to obtain word string combinations with uniform granularity, and determining word segmentation parts of the word string combinations.
4. The method of claim 1, wherein the fusing the output results of the plurality of neural network filtering models, and filtering out training text data according to the fused results comprises:
respectively obtaining the positive case probability scores output by each neural network screening model, and screening the text sentences of which the highest score reaches a preset positive case probability score threshold value in the positive case probability scores into training text data for voice recognition; or
And respectively obtaining the positive case probability score output by each neural network screening model, and screening the text statement into training text data for voice recognition when the positive case probability score of the weighted mean of the positive case probability scores reaches a preset positive case probability score threshold value.
5. The method of claim 1, wherein each neural network screening model has a first fully-connected layer and a second fully-connected layer, wherein the first fully-connected layer is larger in dimension than the second fully-connected layer, wherein the neural network screening model is trained using dropout in the second fully-connected layer to prevent overfitting.
6. The method of claim 1, wherein,
the text format processing includes: converting original data in a non-standard format into a text form of one word per line or one sentence per line, wherein the non-standard format comprises HTML and JSON;
the character form processing includes: and removing illegal symbols in the original data, wherein the illegal symbols comprise webpage labels and emoticons.
7. The method of claim 6, wherein after the character form processing, the normalization processing further comprises: sentence breaking processing;
the sentence-breaking processing comprises: and performing punctuation according to punctuation marks in the original data, performing punctuation addition through character form processing when punctuation marks do not exist in a training text with a length exceeding a preset length, and performing punctuation processing on the original data after punctuation addition.
8. An automated screening system of training text data for speech recognition, comprising:
the original data acquisition program module is used for acquiring original data from a webpage crawler or manually uploaded;
the normalized processing program module is used for performing normalized processing of text format processing and/or character form processing on the original data;
the word segmentation program module is used for segmenting the original data after the normalization processing;
a conversion program module, configured to convert the original data after word segmentation processing into input information corresponding to a unique number and suitable for a neural network screening model, where the input information includes: the unique number corresponding to the word string combination and the corresponding word segmentation part-of-speech combination;
a screening program module, configured to import the input information into a fusion screening model formed by parallel combination of a plurality of neural network screening models, where the plurality of neural network screening models at least include: each sentence needs to input the long and short term memory network screening model of each word in turn, and the convolution neural network screening model of the complete word string allowing to input a complete sentence at a time;
and the training program module is used for fusing the output results of the plurality of neural network screening models and screening out training text data according to the fusion result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910510814.2A CN110223675B (en) | 2019-06-13 | 2019-06-13 | Method and system for screening training text data for voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910510814.2A CN110223675B (en) | 2019-06-13 | 2019-06-13 | Method and system for screening training text data for voice recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110223675A CN110223675A (en) | 2019-09-10 |
CN110223675B true CN110223675B (en) | 2022-04-19 |
Family
ID=67816839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910510814.2A Active CN110223675B (en) | 2019-06-13 | 2019-06-13 | Method and system for screening training text data for voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110223675B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929532B (en) * | 2019-11-21 | 2023-03-21 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
CN111145732B (en) * | 2019-12-27 | 2022-05-10 | 思必驰科技股份有限公司 | Processing method and system after multi-task voice recognition |
CN111090970B (en) * | 2019-12-31 | 2023-05-12 | 思必驰科技股份有限公司 | Text standardization processing method after voice recognition |
CN111429913B (en) * | 2020-03-26 | 2023-03-31 | 厦门快商通科技股份有限公司 | Digit string voice recognition method, identity verification device and computer readable storage medium |
CN112560453B (en) * | 2020-12-18 | 2023-07-14 | 平安银行股份有限公司 | Voice information verification method and device, electronic equipment and medium |
CN113361644B (en) * | 2021-07-03 | 2024-05-14 | 上海理想信息产业(集团)有限公司 | Model training method, telecommunication service characteristic information extraction method, device and equipment |
CN116911305A (en) * | 2023-09-13 | 2023-10-20 | 中博信息技术研究院有限公司 | Chinese address recognition method based on fusion model |
CN117252539A (en) * | 2023-09-20 | 2023-12-19 | 广东筑小宝人工智能科技有限公司 | Engineering standard specification acquisition method and system based on neural network |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103514170B (en) * | 2012-06-20 | 2017-03-29 | 中国移动通信集团安徽有限公司 | A kind of file classification method and device of speech recognition |
CN104217717B (en) * | 2013-05-29 | 2016-11-23 | 腾讯科技(深圳)有限公司 | Build the method and device of language model |
CN105957518B (en) * | 2016-06-16 | 2019-05-31 | 内蒙古大学 | A kind of method of Mongol large vocabulary continuous speech recognition |
US9972308B1 (en) * | 2016-11-08 | 2018-05-15 | International Business Machines Corporation | Splitting utterances for quick responses |
CN107229684B (en) * | 2017-05-11 | 2021-05-18 | 合肥美的智能科技有限公司 | Sentence classification method and system, electronic equipment, refrigerator and storage medium |
CN107680579B (en) * | 2017-09-29 | 2020-08-14 | 百度在线网络技术(北京)有限公司 | Text regularization model training method and device, and text regularization method and device |
CN108509411B (en) * | 2017-10-10 | 2021-05-11 | 腾讯科技(深圳)有限公司 | Semantic analysis method and device |
CN109460472A (en) * | 2018-11-09 | 2019-03-12 | 北京京东金融科技控股有限公司 | File classification method and device and electronic equipment |
-
2019
- 2019-06-13 CN CN201910510814.2A patent/CN110223675B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110223675A (en) | 2019-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223675B (en) | Method and system for screening training text data for voice recognition | |
CN108920622B (en) | Training method, training device and recognition device for intention recognition | |
CN108874776B (en) | Junk text recognition method and device | |
CN111339305B (en) | Text classification method and device, electronic equipment and storage medium | |
CN111177326B (en) | Key information extraction method and device based on fine labeling text and storage medium | |
CN111160031A (en) | Social media named entity identification method based on affix perception | |
CN111460162B (en) | Text classification method and device, terminal equipment and computer readable storage medium | |
CN113553848B (en) | Long text classification method, system, electronic device, and computer-readable storage medium | |
CN111460149A (en) | Text classification method, related equipment and readable storage medium | |
CN105975497A (en) | Automatic microblog topic recommendation method and device | |
CN112380866A (en) | Text topic label generation method, terminal device and storage medium | |
CN113469298A (en) | Model training method and resource recommendation method | |
CN115600605A (en) | Method, system, equipment and storage medium for jointly extracting Chinese entity relationship | |
CN111291551A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
CN106202349B (en) | Webpage classification dictionary generation method and device | |
CN111079433A (en) | Event extraction method and device and electronic equipment | |
CN113076720B (en) | Long text segmentation method and device, storage medium and electronic device | |
CN110969005A (en) | Method and device for determining similarity between entity corpora | |
CN113806483A (en) | Data processing method and device, electronic equipment and computer program product | |
CN111178080A (en) | Named entity identification method and system based on structured information | |
CN110874408A (en) | Model training method, text recognition device and computing equipment | |
CN112559750A (en) | Text data classification method and device, nonvolatile storage medium and processor | |
CN113761874A (en) | Event reality prediction method and device, electronic equipment and storage medium | |
CN112541352A (en) | Policy interpretation method based on deep learning | |
CN112015891A (en) | Method and system for classifying messages of network inquiry platform based on deep neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |