CN109830229A - Audio corpus intelligence cleaning method, device, storage medium and computer equipment - Google Patents

Audio corpus intelligence cleaning method, device, storage medium and computer equipment Download PDF

Info

Publication number
CN109830229A
CN109830229A CN201811512398.1A CN201811512398A CN109830229A CN 109830229 A CN109830229 A CN 109830229A CN 201811512398 A CN201811512398 A CN 201811512398A CN 109830229 A CN109830229 A CN 109830229A
Authority
CN
China
Prior art keywords
text
correct
subordinate sentence
audio
cypher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811512398.1A
Other languages
Chinese (zh)
Inventor
贾克尧
陈磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811512398.1A priority Critical patent/CN109830229A/en
Publication of CN109830229A publication Critical patent/CN109830229A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention provides a kind of audio corpus intelligence cleaning method, comprising: obtains target audio data and the corresponding correct text of the target audio data;Cutting processing is carried out to the target audio data, obtains cutting treated audio fragment;According to the audio fragment, the corresponding cypher text of the audio fragment is obtained based on preset speech text switching method;The cypher text and the correct text are subjected to Dynamic Matching, obtain the correct subordinate sentence text in the corresponding correct text of the cypher text;According to the audio fragment and the correct subordinate sentence text, cleaned audio corpus is obtained.This method can realize that audio corpus intelligently cleans based on intelligent Matching substitution artificial detection, significantly reduce processing cost compared to artificial treatment mode, improve work efficiency, and significantly improve the degree of correspondence of text and audio, it is ensured that the accuracy of audio corpus.

Description

Audio corpus intelligence cleaning method, device, storage medium and computer equipment
Technical field
The present invention relates to field of computer technology, specifically, the present invention relates to a kind of audio corpus intelligence cleaning method, Device, computer readable storage medium and computer equipment.
Background technique
With the fast development of artificial intelligence technology, speech recognition is gradually applied to types of applications software, equipment, is people Work, studying and living provide convenience.To realize speech recognition, a large amount of voice and corpus of text, text and sound are needed The degree of correspondence of frequency affects the accuracy of speech recognition result.
Currently, mainly obtain corpus by way of manually marking, limitation is, by manually marking or other hands The corpus that section obtains is easy a series of problems, such as there are multiword, few word or wrongly written characters, and artificial treatment inefficiency while exists Higher error rate, and high labor cost.
Summary of the invention
It is that at least can solve above-mentioned one of technological deficiency, the present invention provides the audio corpus intelligence of following technical scheme Cleaning method and corresponding device, computer readable storage medium and computer equipment.
The embodiment of the present invention provides a kind of audio corpus intelligence cleaning method, including walk as follows according on one side It is rapid:
Obtain target audio data and the corresponding correct text of the target audio data;
Cutting processing is carried out to the target audio data, obtains cutting treated audio fragment;According to the audio Segment obtains the corresponding cypher text of the audio fragment based on preset speech text switching method;
The cypher text and the correct text are subjected to Dynamic Matching, obtain the cypher text it is corresponding it is described just Correct subordinate sentence text in true text;
According to the audio fragment and the correct subordinate sentence text, cleaned audio corpus is obtained.
Preferably, described that the cypher text and the correct text are subjected to Dynamic Matching, obtain the cypher text Correct subordinate sentence text in the corresponding correct text, comprising:
Cutting processing is carried out to the correct text, obtains the correct text dividing treated subordinate sentence text;
The cypher text and the subordinate sentence text are subjected to Dynamic Matching, the translation is determined from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text of text.
Further, described that the cypher text and the subordinate sentence text are subjected to Dynamic Matching, from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text of the middle determination cypher text, comprising:
Based on most short editing distance algorithm, calculates the cypher text and divide respectively with the preset quantity in the correct text Similarity between sentence text;
According to the similarity, determine that the cypher text is corresponding described correct from the preset quantity subordinate sentence text Correct subordinate sentence text in text.
Further, it is described be based on most short editing distance algorithm, calculate the cypher text respectively with the correct text In preset quantity subordinate sentence text between similarity before, further includes:
Obtain the text serial number of the cypher text;
According to the text serial number, text serial number subordinate sentence text corresponding with the cypher text and the subordinate sentence text are obtained At least one each subordinate sentence text of front and back;
By before and after text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text it is each at least one Subordinate sentence text be set as the preset quantity subordinate sentence text, with for calculating the cypher text respectively and in the correct text Preset quantity subordinate sentence text between similarity.
Preferably, described according to the similarity, the cypher text pair is determined from the preset quantity subordinate sentence text After the correct subordinate sentence text in the correct text answered, further includes:
Obtain the similarity between the cypher text and the correct subordinate sentence text;
If the similarity between the cypher text and the correct subordinate sentence text is less than preset threshold value, to described correct point Sentence text carries out secondary cleaning processing.
Preferably, described that cutting processing is carried out to the target audio data, obtain cutting treated audio fragment, packet It includes:
Cutting processing is carried out to the target audio data using silence suppression techniques, is obtained by sentence pause point cutting Multiple audio fragments after reason.
Preferably, described that cutting processing is carried out to the target audio data, obtain cutting treated audio fragment it Before, further includes:
The target audio data are pre-processed, pretreated target audio data are used for cutting processing.
In addition, the embodiment of the present invention provides a kind of audio corpus intelligence cleaning device according on the other hand, wrap It includes:
Audio corpus acquiring unit, for obtaining target audio data and the corresponding correct text of the target audio data This;
Cypher text acquiring unit, for carrying out cutting processing to the target audio data, obtaining cutting, treated Audio fragment;According to the audio fragment, obtain that the audio fragment is corresponding to be turned over based on preset speech text switching method Translation sheet;
Dynamic Matching unit obtains described turn over for the cypher text and the correct text to be carried out Dynamic Matching Correct subordinate sentence text in the corresponding correct text of translation sheet;
Corpus cleaning unit, for obtaining cleaned according to the audio fragment and the correct subordinate sentence text Audio corpus.
The embodiment of the present invention provides a kind of computer readable storage medium, the computer according to another aspect Computer program is stored on readable storage medium storing program for executing, the computer program realizes above-mentioned audio corpus when being executed by processor Intelligent cleaning method.
The embodiment of the present invention provides a kind of computer equipment according to another aspect, and the computer includes one Or multiple processors;Memory;One or more computer programs, wherein one or more of computer programs are stored in It in the memory and is configured as being executed by one or more of processors, one or more of computer program configurations For: execute above-mentioned audio corpus intelligence cleaning method.
Compared with the prior art, the present invention has the following beneficial effects:
Audio corpus intelligence cleaning method, device, computer readable storage medium and computer equipment provided by the invention, It, can base by accurately finding the corresponding correct subordinate sentence text of cypher text for cypher text and correct text progress Dynamic Matching It realizes that audio corpus intelligently cleans in intelligent Matching substitution artificial detection, significantly reduces and be processed into compared to artificial treatment mode This, improves work efficiency, and significantly improve the degree of correspondence of text and audio, it is ensured that the accuracy of audio corpus.
In addition, audio corpus intelligence cleaning method, device, computer readable storage medium and computer provided by the invention Equipment also carries out cutting processing to the target audio data by using silence suppression techniques, can quickly obtain stopping by sentence Pause point cutting treated multiple audio fragments, and by calculating cypher text and subordinate sentence text based on most short editing distance algorithm Similarity, and the corresponding correct subordinate sentence text of cypher text determined according to similarity, it can be achieved that Fast Fuzzy to sentence Match, algorithm is simple and matching efficiency is high, further increases the efficiency of audio corpus cleaning;Also by with the cypher text The low correct subordinate sentence text of similarity carries out secondary cleaning processing, it can be ensured that the accuracy of audio corpus after cleaning.
The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the method flow diagram of audio corpus intelligence cleaning method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram of audio corpus intelligence cleaning device provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.Wording used herein " and/ Or " it include one or more associated wholes for listing item or any cell and all combinations.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.
In the application scenarios of speech recognition technology, speech recognition needs a large amount of voice and corpus of text, corpus of text The accuracy of speech recognition result is affected with the degree of correspondence of audio corpus, the degree of correspondence of corpus of text and audio corpus is got over Height, then correspondingly the accuracy of speech recognition result is also higher.For the standard for guaranteeing the audio corpus comprising voice and corpus of text True property need to usually clean audio corpus.
The embodiment of the invention provides a kind of audio corpus intelligence cleaning methods, as shown in Figure 1, this method comprises:
Step S101: target audio data and the corresponding correct text of the target audio data are obtained.
For the present embodiment, the target audio data are the audio data of pending corpus cleaning treatment, can be specially .mp3 the audio file of format.
The corresponding correct text of the target audio data is the voice content of word content and the target audio data Completely the same text can be specially the text of .txt format.
For example, the target audio data of pending corpus cleaning treatment are to read the voice document of certain article, it is described correct Text is the text of corresponding this article.
Step S102: carrying out cutting processing to the target audio data, obtains cutting treated audio fragment;According to The audio fragment obtains the corresponding cypher text of the audio fragment based on preset speech text switching method.
For the present embodiment, using preset audio segmentation technique, such as silence suppression techniques are to the target sound frequency According to cutting processing is carried out, the target audio data can be cut into based on the cutting principle of the preset audio segmentation technique Several segments obtain cutting treated multiple audio fragments.The multiple audio fragment can be sequentially with audio text by cutting Part serial number, such as the audio file serial number #1 for first audio fragment that cutting obtains, the audio file sequence of second audio fragment Number be #2, and so on.
For the present embodiment, using preset speech text diverter tool, such as the achievable voice such as news fly, Baidu AI turns Audio data is converted to writing text by the Open-Source Tools of text, and specifically, by cutting, treated that multiple audio fragments are based on Speech-to-text algorithm is respectively converted into corresponding multiple cypher texts.The multiple cypher text can be by its corresponding audio piece The audio file serial number of section is equipped with cypher text serial number, the cypher text serial number of the cypher text of such as corresponding first audio fragment For #1, the cypher text serial number #2 of the cypher text of second audio fragment is corresponded to, and so on.
It, can be by each audio file and each cypher text according to the audio file serial number and the cypher text serial number It corresponds.
In practical application scene, accuracy is absolutely converted since speech-to-text algorithm not can guarantee, is used That there may be contents is inconsistent for the corresponding audio fragment of the cypher text that preset speech text diverter tool is converted to The problem of.
Step S103: the cypher text and the correct text are subjected to Dynamic Matching, obtain the cypher text pair The correct subordinate sentence text in the correct text answered.
For the present embodiment, can sequentially be turned over one by one by each according to the cypher text serial number of the cypher text Translation sheet and the correct text carry out Dynamic Matching, specifically, by the word content and the correct text in cypher text In each subordinate sentence matched one by one, it is determining with the matched subordinate sentence of the cypher text from the correct text, will be described Subordinate sentence obtains in the corresponding correct text of the cypher text as the corresponding correct subordinate sentence text of the cypher text Correct subordinate sentence text.
Step S104: according to the audio fragment and the correct subordinate sentence text, cleaned audio corpus is obtained.
For the present embodiment, the correct subordinate sentence text is in the correct text determined after machine intelligence cleaning With the matched correct subordinate sentence text of the cypher text, and since cypher text and audio fragment are there are corresponding relationship, therefore described Also audio fragment corresponding with the cypher text matches the corresponding correct subordinate sentence text of cypher text, for the correspondence audio The correct subordinate sentence text of segment.
For the present embodiment, the original audio corpus of the target audio data include for speech form audio fragment and The corresponding cypher text for written form passes through the correct subordinate sentence text in the correct text that obtains the step S103 The corresponding cypher text of this replacement obtains in the audio fragment and the corresponding correct text of the target audio data just True subordinate sentence text, the cleaned audio corpus include the audio fragment of speech form and corresponding for written form Correct subordinate sentence text.Cleaned treated audio corpus, the correct subordinate sentence text of written form and corresponding speech form The content consistency of audio fragment significantly improves.
Audio corpus intelligence cleaning method provided by the invention, by the way that cypher text and correct text are carried out dynamic Match, accurately find the corresponding correct subordinate sentence text of cypher text, can realize audio corpus based on intelligent Matching substitution artificial detection Intelligence cleaning, significantly reduces processing cost compared to artificial treatment mode, improves work efficiency, and significantly improve text and sound The degree of correspondence of frequency, it is ensured that the accuracy of audio corpus.
In one embodiment, described that cutting processing is carried out to the target audio data, obtain cutting treated sound Frequency segment, comprising:
Cutting processing is carried out to the target audio data using silence suppression techniques, is obtained by sentence pause point cutting Multiple audio fragments after reason.
In practical application scene, reads aloud when people reads text and usually pause according to punctuation mark, therefore mute suppression can be used Technology processed is according to mute to voice flow, i.e., target audio data carry out cutting, obtains cutting treated multiple audio fragments.
For the present embodiment, the silence suppression techniques (Voice Activity Detection, VAD) are also known as end-speech Point detection, speech terminals detection are that effective voice segments are detected from continuous voice flow, specifically, can be effective by detecting The end point of the starting point of voice, i.e. forward terminal and efficient voice, i.e. aft terminal obtain effective voice segments.
Silence suppression techniques can realize that the cutting to the target audio data is handled based on various ways, such as based on energy Amount is based on signal-to-noise ratio snr, based on deep neural network DNN etc..
For realizing cutting based on energy, audio is one using the time as the one-dimensional continuous function of independent variable, computer The audio data of processing is the sampled value sequence of audio data according to time sequence, and the size of these sampled values equally illustrates audio Energy of the data at sampled point is readily apparent that mute part, that is, reads aloud people and read corresponding sound wave when article pauses in the process Corresponding sonic wave amplitude is larger when amplitude very little, and efficient voice part, i.e. reading article, the amplitude size and signal energy of sound wave Amount size is proportional, therefore the signal energy of mute phonological component and efficient voice part has difference in size, and silence suppression techniques can Difference in size based on signal energy distinguishes the forward terminal and aft terminal that audio data current time is mute part or subordinate sentence, To realize the cutting processing of target audio data, cutting is obtained treated multiple audio fragments.
In the present embodiment, cutting processing is carried out to the target audio data by using silence suppression techniques, it can be fast Speed is obtained by sentence pause point cutting treated multiple audio fragments, to improve the efficiency of audio corpus cleaning.
In one embodiment, described that cutting processing is carried out to the target audio data, obtain cutting treated sound Before frequency segment, further includes:
The target audio data are pre-processed, pretreated target audio data are used for cutting processing.
For the present embodiment, the pretreatment includes removal direct current signal and windowing process.Specifically, using high-pass filtering Device is removed direct current signal to the target audio data, and carries out windowing process to the target audio data, obtains pre- Target audio data that treated, and pretreated target audio data are used for cutting processing.
In the present embodiment, by being removed direct current and adding window two pretreatments to target audio data, inspection can be improved The accuracy of efficient voice end forward terminal and aft terminal in voice flow is surveyed, and then improves the accuracy of audio data cutting.
In one embodiment, described that the cypher text and the correct text are subjected to Dynamic Matching, it obtains described Correct subordinate sentence text in the corresponding correct text of cypher text, comprising:
Cutting processing is carried out to the correct text, obtains the correct text dividing treated subordinate sentence text;
The cypher text and the subordinate sentence text are subjected to Dynamic Matching, the translation is determined from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text of text.
For the present embodiment, cutting processing is carried out specifically, by the text in the correct text to the correct text Content press punctuation mark, such as ", ", ";","." etc. carry out cutting processing, correspond to institute after obtaining the correct text dividing processing State multiple subordinate sentence texts of each subordinate sentence in correct text.The multiple subordinate sentence text can be sequentially with subordinate sentence text sequence by cutting Number, such as subordinate sentence text the serial number #1, the subordinate sentence text serial number # of second subordinate sentence text of first subordinate sentence text that cutting obtains 2, and so on.
For the present embodiment, can sequentially be turned over one by one by each according to the cypher text serial number of the cypher text Translation sheet and the correct text dividing treated multiple subordinate sentence texts carry out Dynamic Matchings specifically will be in cypher text Word content and the correct text dividing treated subordinate sentence text in word content matched one by one, from it is described just It is determining with the matched subordinate sentence of the cypher text in true text dividing treated subordinate sentence text, by the corresponding subordinate sentence of the subordinate sentence Text obtains in the corresponding correct text of the cypher text as the corresponding correct subordinate sentence text of the cypher text Correct subordinate sentence text.
In one embodiment, described that the cypher text and the subordinate sentence text are subjected to Dynamic Matching, from described point The correct subordinate sentence text in the corresponding correct text of the cypher text is determined in sentence text, comprising:
Based on most short editing distance algorithm, calculates the cypher text and divide respectively with the preset quantity in the correct text Similarity between sentence text;
According to the similarity, determine that the cypher text is corresponding described correct from the preset quantity subordinate sentence text Correct subordinate sentence text in text.
For the present embodiment, most short editing distance refers between two sentences, is transformed into another sentence by a sentence Required least edit operation time, the edit operation include three kinds of " insertion ", " deletion " and " replacement ".The most short editor For the working principle of distance algorithm specifically, with above-mentioned three kinds of edit operations, one or more conversion method steps can be used will The current subordinate sentence text conversion of Dynamic Matching obtains cypher text, by required operation in one or more of conversion method steps The smallest numerical value of number, which saves, is used as most short editing distance.
Calculate cypher text and Dynamic Matching current subordinate sentence text most short editing distance and then according to most short Editing distance calculates the similarity of the current subordinate sentence text of cypher text and Dynamic Matching.
Wherein, calculating formula of similarity specifically:
Similarity=1-ld//Math.Max (str1.length, str2.length).
Wherein, most short editing distance of the ld between cypher text and current subordinate sentence text, Max (str1.length, It str2.length is) maximum value of string length in translation Wen Zhongyu subordinate sentence text.
For the present embodiment, according to the similarity, correct point in the corresponding correct text of cypher text is determined The highest subordinate sentence text of similarity described in preset quantity subordinate sentence text specifically, is set as pair of current cypher text by sentence text The correct subordinate sentence text in the correct text answered.Similarity i.e. in preset quantity subordinate sentence text, with the cypher text Highest subordinate sentence text is the correct subordinate sentence text in the corresponding correct text of the cypher text.
For the present embodiment, the preset quantity subordinate sentence text can be the institute obtained after the correct text dividing processing There is subordinate sentence text, is matched cypher text one by one with all subordinate sentence texts based on the most short editing distance algorithm; The preset quantity subordinate sentence text can also be the part subordinate sentence text selected after the correct text dividing processing, based on described Most short editing distance algorithm is matched cypher text with the selected part subordinate sentence text one by one, and the preset quantity can To be any positive integers such as 1,3,5,10, those skilled in the art can be according to practical application request be determining or adjusts The specific value of preset quantity, the present embodiment do not limit this.
In the present embodiment, by calculating the similarity of cypher text and subordinate sentence text based on most short editing distance algorithm, And the corresponding correct subordinate sentence text of cypher text is determined, it can be achieved that the Fast Fuzzy to sentence is matched according to similarity, algorithm is simple List and matching efficiency height, can further improve the efficiency of audio corpus cleaning.
In one embodiment, it is described be based on most short editing distance algorithm, calculate the cypher text respectively with it is described just Before the similarity between preset quantity subordinate sentence text in true text, further includes:
Obtain the text serial number of the cypher text;
According to the text serial number, text serial number subordinate sentence text corresponding with the cypher text and the subordinate sentence text are obtained At least one each subordinate sentence text of front and back;
By before and after text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text it is each at least one Subordinate sentence text be set as the preset quantity subordinate sentence text, with for calculating the cypher text respectively and in the correct text Preset quantity subordinate sentence text between similarity.
For the present embodiment, the cypher text and subordinate sentence text are previously provided with text serial number, the cypher text It is correspondingly provided with cypher text serial number, the subordinate sentence text is correspondingly provided with subordinate sentence text serial number.
According to above content it is found that the corresponding audio fragment of cypher text be using preset audio segmentation technique, such as Silence suppression techniques carry out what cutting was handled to the target audio data, and the cypher text serial number of the cypher text is pressed The audio file serial number of its corresponding audio fragment is arranged;The subordinate sentence text is by the word content in the correct text by mark Point symbol cutting obtains, and the subordinate sentence text serial number of the subordinate sentence text is arranged by the cutting sequence of the correct text.
And in practical application scene, it reads aloud when people reads text and usually pauses according to punctuation mark, therefore the audio piece The cutting handling principle of section and the cutting handling principle of the subordinate sentence text can be considered identical, read text pause just reading aloud people Really and audio segmentation technique has very in the case where accuracy, the identical cypher text of the text serial number and subordinate sentence text The same content in the target audio data ought to be corresponded to, i.e., the described audio fragment and the subordinate sentence text are based on punctuate symbol It number pauses and to carry out cutting, the corresponding cypher text of the audio fragment ought to have corresponding pass with subordinate sentence text in text serial number System.
It can be by the corresponding translation text of audio fragment for the sample text for further reducing Dynamic Matching for the present embodiment This subordinate sentence text corresponding with its text serial number and its front and back preset quantity subordinate sentence text carry out Dynamic Matching.
The preset quantity can be any positive integers such as 1,3,5,10, and those skilled in the art can be according to reality Application demand is determining or adjusts the specific value of the preset quantity, and the present embodiment does not limit this.
For example, if the text serial number of the cypher text to Dynamic Matching #6, it can be by the cypher text and subordinate sentence Text serial number is similarly No. #6 subordinate sentence text and No. #6 each two subordinate sentence text in front and back, i.e. #4, #5, #7, #8 subordinate sentence text This carries out Dynamic Matching one by one.
In the present embodiment, by by cypher text only subordinate sentence text corresponding with its text serial number and front and back preset quantity A subordinate sentence text carries out Dynamic Matching, and the effect of Dynamic Matching is ensured while being reduced significantly Dynamic Matching sample, can be into one Step improves the efficiency that audio expects cleaning.
In one embodiment, described according to the similarity, it is turned over described in determination from the preset quantity subordinate sentence text After correct subordinate sentence text in the corresponding correct text of translation sheet, further includes:
Obtain the similarity between the cypher text and the correct subordinate sentence text;
If the similarity between the cypher text and the correct subordinate sentence text is less than preset threshold value, to described correct point Sentence text carries out secondary cleaning processing.
In practical application scene, there is the word content and translation text of the correct subordinate sentence text obtained after Dynamic Matching The different situation of word content in this then needs to carry out secondary cleaning processing to the correct subordinate sentence text in this case.
Wherein, judge whether the word content of the correct subordinate sentence text and cypher text is consistent, specifically: described in acquisition Similarity between cypher text and the correct subordinate sentence text;Judge whether the similarity is less than preset threshold value;If described Similarity is less than preset threshold value, then the word content of the correct subordinate sentence text and cypher text differs greatly, and described point of reply Sentence text carries out secondary cleaning processing;If the similarity is more than or equal to preset threshold value, the correct subordinate sentence text and translation The word content difference of text is smaller, without carrying out secondary cleaning processing to the subordinate sentence text.
For the present embodiment, the secondary cleaning processing is specially artificial cleaning treatment, i.e., by manually to current subordinate sentence Text such as is edited, is corrected at the work, to obtain original machine cypher text pair that it fails to match or matching similarity is too low The subordinate sentence text in the correct text answered.
In the present embodiment, by carrying out secondary cleaning to the correct subordinate sentence text low with the similarity of the cypher text Processing, it can be ensured that the accuracy of audio corpus after cleaning.
In addition, the embodiment of the invention provides a kind of audio corpus intelligence cleaning devices, as shown in Fig. 2, described device packet It includes: the audio corpus acquiring unit 201, cypher text acquiring unit 202, Dynamic Matching unit 203 and corpus cleaning unit 204;Wherein,
The audio corpus acquiring unit 201, it is corresponding for obtaining target audio data and the target audio data Correct text;
The cypher text acquiring unit 202 obtains at cutting for carrying out cutting processing to the target audio data Audio fragment after reason;According to the audio fragment, the audio fragment pair is obtained based on preset speech text switching method The cypher text answered;
The Dynamic Matching unit 203 is obtained for the cypher text and the correct text to be carried out Dynamic Matching Correct subordinate sentence text in the corresponding correct text of the cypher text;
The corpus cleaning unit 204, for obtaining at cleaning according to the audio fragment and the correct subordinate sentence text Audio corpus after reason.
In one embodiment, the Dynamic Matching unit 203, is specifically used for:
Cutting processing is carried out to the correct text, obtains the correct text dividing treated subordinate sentence text;
The cypher text and the subordinate sentence text are subjected to Dynamic Matching, the translation is determined from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text of text.
In one embodiment, described that the cypher text and the subordinate sentence text are subjected to Dynamic Matching, from described point The correct subordinate sentence text in the corresponding correct text of the cypher text is determined in sentence text, comprising:
Based on most short editing distance algorithm, calculates the cypher text and divide respectively with the preset quantity in the correct text Similarity between sentence text;
According to the similarity, determine that the cypher text is corresponding described correct from the preset quantity subordinate sentence text Correct subordinate sentence text in text.
In one embodiment, it is described be based on most short editing distance algorithm, calculate the cypher text respectively with it is described just Before the similarity between preset quantity subordinate sentence text in true text, further includes:
Obtain the text serial number of the cypher text;
According to the text serial number, text serial number subordinate sentence text corresponding with the cypher text and the subordinate sentence text are obtained At least one each subordinate sentence text of front and back;
By before and after text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text it is each at least one Subordinate sentence text be set as the preset quantity subordinate sentence text, with for calculating the cypher text respectively and in the correct text Preset quantity subordinate sentence text between similarity.
In one embodiment, described according to the similarity, it is turned over described in determination from the preset quantity subordinate sentence text After correct subordinate sentence text in the corresponding correct text of translation sheet, further includes:
Obtain the similarity between the cypher text and the correct subordinate sentence text;
If the similarity between the cypher text and the correct subordinate sentence text is less than preset threshold value, to described correct point Sentence text carries out secondary cleaning processing.
In one embodiment, described that cutting processing is carried out to the target audio data, obtain cutting treated sound Frequency segment, comprising:
Cutting processing is carried out to the target audio data using silence suppression techniques, is obtained by sentence pause point cutting Multiple audio fragments after reason.
In one embodiment, described that cutting processing is carried out to the target audio data, obtain cutting treated sound Before frequency segment, further includes:
The target audio data are pre-processed, pretreated target audio data are used for cutting processing.
Audio corpus intelligence cleaning device provided by the invention can be realized: by moving cypher text and correct text State matching, accurately finds the corresponding correct subordinate sentence text of cypher text, can realize audio based on intelligent Matching substitution artificial detection Corpus intelligently cleans, and significantly reduces processing cost compared to artificial treatment mode, improves work efficiency, and significantly improve text With the degree of correspondence of audio, it is ensured that the accuracy of audio corpus.In addition, can also be achieved: by using silence suppression techniques to institute It states target audio data and carries out cutting processing, can quickly obtain by sentence pause point cutting treated multiple audio fragments, and Translation text is determined by calculating the similarity of cypher text and subordinate sentence text based on most short editing distance algorithm, and according to similarity For this corresponding correct subordinate sentence text, it can be achieved that the Fast Fuzzy to sentence matches, algorithm is simple and matching efficiency is high, further mentions The efficiency of high audio corpus cleaning;It is also secondary clear by being carried out to the correct subordinate sentence text low with the similarity of the cypher text Wash processing, it can be ensured that the accuracy of audio corpus after cleaning.
The embodiment of the method for above-mentioned offer may be implemented in audio corpus intelligence cleaning device provided in an embodiment of the present invention, tool Body function realizes the explanation referred in embodiment of the method, and details are not described herein.
In addition, being deposited on computer readable storage medium the embodiment of the invention provides a kind of computer readable storage medium Computer program is contained, realizes that audio corpus described in above embodiments is intelligently clear when the computer program is executed by processor Washing method.Wherein, the computer readable storage medium includes but is not limited to any kind of disk (including floppy disk, hard disk, light Disk, CD-ROM and magneto-optic disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, immediately memory), EPROM (EraSable Programmable Read-Only Memory, erasable programmable Read-only memory), EEPROM (Electrically EraSable Programmable Read-Only Memory, electrically erasable Programmable read only memory), flash memory, magnetic card or light card.It is, storage equipment includes by equipment (for example, calculating Machine, mobile phone) with any medium for the form storage or transmission information that can be read, it can be read-only memory, disk or CD etc..
Computer readable storage medium provided by the invention, it can be achieved that: by the way that cypher text and correct text are moved State matching, accurately finds the corresponding correct subordinate sentence text of cypher text, can realize audio based on intelligent Matching substitution artificial detection Corpus intelligently cleans, and significantly reduces processing cost compared to artificial treatment mode, improves work efficiency, and significantly improve text With the degree of correspondence of audio, it is ensured that the accuracy of audio corpus.In addition, can also be achieved: by using silence suppression techniques to institute It states target audio data and carries out cutting processing, can quickly obtain by sentence pause point cutting treated multiple audio fragments, and Translation text is determined by calculating the similarity of cypher text and subordinate sentence text based on most short editing distance algorithm, and according to similarity For this corresponding correct subordinate sentence text, it can be achieved that the Fast Fuzzy to sentence matches, algorithm is simple and matching efficiency is high, further mentions The efficiency of high audio corpus cleaning;It is also secondary clear by being carried out to the correct subordinate sentence text low with the similarity of the cypher text Wash processing, it can be ensured that the accuracy of audio corpus after cleaning.
The embodiment of the method for above-mentioned offer may be implemented in computer readable storage medium provided in an embodiment of the present invention, specifically Function realizes the explanation referred in embodiment of the method, and details are not described herein.
In addition, the embodiment of the invention also provides a kind of computer equipments, as shown in Figure 3.Calculating described in the present embodiment Machine equipment can be the equipment such as server, personal computer and the network equipment.The computer equipment include processor 302, The devices such as memory 303, input unit 304 and display unit 305.It will be understood by those skilled in the art that setting shown in Fig. 3 Standby structure devices do not constitute the restriction to all devices, may include components more more or fewer than diagram, or combine certain A little components.Memory 303 can be used for storing computer program 301 and each functional module, and the operation of processor 302 is stored in storage The computer program 301 of device 303, thereby executing the various function application and data processing of equipment.Memory can be memory Reservoir or external memory, or including both built-in storage and external memory.Built-in storage may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash Device or random access memory.External memory may include hard disk, floppy disk, ZIP disk, USB flash disk, tape etc..It is disclosed in this invention to deposit Reservoir includes but is not limited to the memory of these types.Memory disclosed in this invention is only used as example rather than as restriction.
Input unit 304 is used to receive the input of signal, and receives the keyword of user's input.Input unit 304 can Including touch panel and other input equipments.Touch panel collects the touch operation of user on it or nearby and (for example uses Family uses the operations of any suitable object or attachment on touch panel or near touch panel such as finger, stylus), and root According to the corresponding attachment device of preset driven by program;Other input equipments can include but is not limited to physical keyboard, function One of key (such as broadcasting control button, switch key etc.), trace ball, mouse, operating stick etc. are a variety of.Display unit 305 can be used for showing the information of user's input or be supplied to the information of user and the various menus of computer equipment.Display is single The forms such as liquid crystal display, Organic Light Emitting Diode can be used in member 305.Processor 302 is the control centre of computer equipment, benefit With the various pieces of various interfaces and the entire computer of connection, by running or executing the software being stored in memory 302 Program and/or module, and the data being stored in memory are called, perform various functions and handle data.
As one embodiment, the computer equipment includes: one or more processors 302, memory 303, and one Or multiple computer programs 301, wherein one or more of computer programs 301 are stored in memory 303 and are matched It is set to and is executed by one or more of processors 302, one or more of computer programs 301 are configured to carry out above Audio corpus intelligence cleaning method described in any embodiment.
Computer equipment provided by the invention, it can be achieved that: it is quasi- by the way that cypher text and correct text are carried out Dynamic Matching The corresponding correct subordinate sentence text of cypher text really is found, can realize that audio corpus is intelligently clear based on intelligent Matching substitution artificial detection It washes, significantly reduces processing cost compared to artificial treatment mode, improve work efficiency, and significantly improve pair of text and audio Answer degree, it is ensured that the accuracy of audio corpus.In addition, can also be achieved: by using silence suppression techniques to the target audio Data carry out cutting processing, can quickly obtain by sentence pause point cutting treated multiple audio fragments, and by based on most Short editing distance algorithm calculates the similarity of cypher text and subordinate sentence text, and determines that cypher text is corresponding just according to similarity For true subordinate sentence text, it can be achieved that the Fast Fuzzy to sentence matches, algorithm is simple and matching efficiency is high, further increases audio corpus The efficiency of cleaning;It, can also by carrying out secondary cleaning processing to the correct subordinate sentence text low with the similarity of the cypher text The accuracy of audio corpus after ensuring to clean.
The embodiment of the method for above-mentioned offer may be implemented in computer equipment provided in an embodiment of the present invention, and concrete function is realized The explanation in embodiment of the method is referred to, details are not described herein.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of audio corpus intelligence cleaning method, which comprises the steps of:
Obtain target audio data and the corresponding correct text of the target audio data;
Cutting processing is carried out to the target audio data, obtains cutting treated audio fragment;According to the audio fragment, The corresponding cypher text of the audio fragment is obtained based on preset speech text switching method;
The cypher text and the correct text are subjected to Dynamic Matching, obtain the corresponding correct text of the cypher text Correct subordinate sentence text in this;
According to the audio fragment and the correct subordinate sentence text, cleaned audio corpus is obtained.
2. audio corpus intelligence cleaning method according to claim 1, which is characterized in that it is described by the cypher text with The correct text carries out Dynamic Matching, obtains the correct subordinate sentence text in the corresponding correct text of the cypher text, Include:
Cutting processing is carried out to the correct text, obtains the correct text dividing treated subordinate sentence text;
The cypher text and the subordinate sentence text are subjected to Dynamic Matching, the cypher text is determined from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text.
3. audio corpus intelligence cleaning method according to claim 2, which is characterized in that it is described by the cypher text with The subordinate sentence text carries out Dynamic Matching, from being determined in the corresponding correct text of the cypher text in the subordinate sentence text Correct subordinate sentence text, comprising:
Based on most short editing distance algorithm, it is literary with the preset quantity subordinate sentence in the correct text respectively to calculate the cypher text Similarity between this;
According to the similarity, the corresponding correct text of the cypher text is determined from the preset quantity subordinate sentence text In correct subordinate sentence text.
4. audio corpus intelligence cleaning method according to claim 3, which is characterized in that described to be based on most short editing distance Algorithm, before calculating similarity of the cypher text respectively between the preset quantity subordinate sentence text in the correct text, Further include:
Obtain the text serial number of the cypher text;
According to the text serial number, text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text front and back are obtained The respectively subordinate sentence text of at least one;
By at least one point each before and after text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text Sentence text be set as the preset quantity subordinate sentence text, with for calculate the cypher text respectively with it is pre- in the correct text Set the similarity between quantity subordinate sentence text.
5. audio corpus intelligence cleaning method according to claim 3, which is characterized in that it is described according to the similarity, From the correct subordinate sentence text determined in the preset quantity subordinate sentence text in the corresponding correct text of the cypher text it Afterwards, further includes:
Obtain the similarity between the cypher text and the correct subordinate sentence text;
If the similarity between the cypher text and the correct subordinate sentence text is less than preset threshold value, to the correct subordinate sentence text This progress secondary cleaning processing.
6. audio corpus intelligence cleaning method according to claim 1, which is characterized in that described to the target sound frequency According to cutting processing is carried out, cutting is obtained treated audio fragment, comprising:
Cutting processing is carried out to the target audio data using silence suppression techniques, is obtained after being handled by sentence pause point cutting Multiple audio fragments.
7. audio corpus intelligence cleaning method according to claim 1, which is characterized in that described to the target sound frequency According to cutting processing is carried out, before obtaining cutting treated audio fragment, further includes:
The target audio data are pre-processed, pretreated target audio data are used for cutting processing.
8. a kind of audio corpus intelligence cleaning device characterized by comprising
Audio corpus acquiring unit, for obtaining target audio data and the corresponding correct text of the target audio data;
Cypher text acquiring unit obtains cutting treated audio for carrying out cutting processing to the target audio data Segment;According to the audio fragment, the corresponding translation text of the audio fragment is obtained based on preset speech text switching method This;
Dynamic Matching unit obtains the translation text for the cypher text and the correct text to be carried out Dynamic Matching Correct subordinate sentence text in this corresponding described correct text;
Corpus cleaning unit, for obtaining cleaned audio according to the audio fragment and the correct subordinate sentence text Corpus.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes that the described in any item audio corpus of claim 1 to 7 intelligently clean when the computer program is executed by processor Method.
10. a kind of computer equipment, characterized in that it comprises:
One or more processors;
Memory;
One or more computer programs, wherein one or more of computer programs are stored in the memory and quilt It is configured to be executed by one or more of processors, one or more of computer programs are configured to: execute according to power Benefit requires 1 to 7 described in any item audio corpus intelligence cleaning methods.
CN201811512398.1A 2018-12-11 2018-12-11 Audio corpus intelligence cleaning method, device, storage medium and computer equipment Pending CN109830229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811512398.1A CN109830229A (en) 2018-12-11 2018-12-11 Audio corpus intelligence cleaning method, device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811512398.1A CN109830229A (en) 2018-12-11 2018-12-11 Audio corpus intelligence cleaning method, device, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN109830229A true CN109830229A (en) 2019-05-31

Family

ID=66859860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811512398.1A Pending CN109830229A (en) 2018-12-11 2018-12-11 Audio corpus intelligence cleaning method, device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN109830229A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110767217A (en) * 2019-10-30 2020-02-07 爱驰汽车有限公司 Audio segmentation method, system, electronic device and storage medium
CN111091834A (en) * 2019-12-23 2020-05-01 科大讯飞股份有限公司 Text and audio alignment method and related product
CN111191469A (en) * 2019-12-17 2020-05-22 语联网(武汉)信息技术有限公司 Large-scale corpus cleaning and aligning method and device
CN111933120A (en) * 2020-08-19 2020-11-13 潍坊医学院 Voice data automatic labeling method and system for voice recognition
CN112951233A (en) * 2021-03-30 2021-06-11 平安科技(深圳)有限公司 Voice question and answer method and device, electronic equipment and readable storage medium
CN113178188A (en) * 2021-04-26 2021-07-27 平安科技(深圳)有限公司 Speech synthesis method, apparatus, device and storage medium
CN114171065A (en) * 2021-11-29 2022-03-11 重庆长安汽车股份有限公司 Audio acquisition and comparison method and system and vehicle

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559214A (en) * 2013-10-11 2014-02-05 中国农业大学 Method and device for automatically generating video
US20140303957A1 (en) * 2013-04-08 2014-10-09 Electronics And Telecommunications Research Institute Automatic translation and interpretation apparatus and method
CN105244022A (en) * 2015-09-28 2016-01-13 科大讯飞股份有限公司 Audio and video subtitle generation method and apparatus
CN105302779A (en) * 2015-10-23 2016-02-03 北京慧点科技有限公司 Text similarity comparison method and device
CN106708811A (en) * 2016-12-19 2017-05-24 新译信息科技(深圳)有限公司 Data processing method and data processing device
CN106980870A (en) * 2016-12-30 2017-07-25 中国银联股份有限公司 Text matches degree computational methods between short text
CN107544956A (en) * 2016-06-24 2018-01-05 科大讯飞股份有限公司 A kind of text wants point detecting method and system
CN107657947A (en) * 2017-09-20 2018-02-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device based on artificial intelligence
CN108831441A (en) * 2018-05-08 2018-11-16 上海依图网络科技有限公司 A kind of training method and device of speech recognition modeling
CN109065031A (en) * 2018-08-02 2018-12-21 阿里巴巴集团控股有限公司 Voice annotation method, device and equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140303957A1 (en) * 2013-04-08 2014-10-09 Electronics And Telecommunications Research Institute Automatic translation and interpretation apparatus and method
CN103559214A (en) * 2013-10-11 2014-02-05 中国农业大学 Method and device for automatically generating video
CN105244022A (en) * 2015-09-28 2016-01-13 科大讯飞股份有限公司 Audio and video subtitle generation method and apparatus
CN105302779A (en) * 2015-10-23 2016-02-03 北京慧点科技有限公司 Text similarity comparison method and device
CN107544956A (en) * 2016-06-24 2018-01-05 科大讯飞股份有限公司 A kind of text wants point detecting method and system
CN106708811A (en) * 2016-12-19 2017-05-24 新译信息科技(深圳)有限公司 Data processing method and data processing device
CN106980870A (en) * 2016-12-30 2017-07-25 中国银联股份有限公司 Text matches degree computational methods between short text
CN107657947A (en) * 2017-09-20 2018-02-02 百度在线网络技术(北京)有限公司 Method of speech processing and its device based on artificial intelligence
CN108831441A (en) * 2018-05-08 2018-11-16 上海依图网络科技有限公司 A kind of training method and device of speech recognition modeling
CN109065031A (en) * 2018-08-02 2018-12-21 阿里巴巴集团控股有限公司 Voice annotation method, device and equipment

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110767217A (en) * 2019-10-30 2020-02-07 爱驰汽车有限公司 Audio segmentation method, system, electronic device and storage medium
CN110767217B (en) * 2019-10-30 2022-04-12 爱驰汽车有限公司 Audio segmentation method, system, electronic device and storage medium
CN111191469A (en) * 2019-12-17 2020-05-22 语联网(武汉)信息技术有限公司 Large-scale corpus cleaning and aligning method and device
CN111191469B (en) * 2019-12-17 2023-09-19 语联网(武汉)信息技术有限公司 Large-scale corpus cleaning and aligning method and device
CN111091834A (en) * 2019-12-23 2020-05-01 科大讯飞股份有限公司 Text and audio alignment method and related product
CN111091834B (en) * 2019-12-23 2022-09-06 科大讯飞股份有限公司 Text and audio alignment method and related product
CN111933120A (en) * 2020-08-19 2020-11-13 潍坊医学院 Voice data automatic labeling method and system for voice recognition
CN112951233A (en) * 2021-03-30 2021-06-11 平安科技(深圳)有限公司 Voice question and answer method and device, electronic equipment and readable storage medium
CN113178188A (en) * 2021-04-26 2021-07-27 平安科技(深圳)有限公司 Speech synthesis method, apparatus, device and storage medium
CN113178188B (en) * 2021-04-26 2024-05-28 平安科技(深圳)有限公司 Speech synthesis method, device, equipment and storage medium
CN114171065A (en) * 2021-11-29 2022-03-11 重庆长安汽车股份有限公司 Audio acquisition and comparison method and system and vehicle

Similar Documents

Publication Publication Date Title
CN109830229A (en) Audio corpus intelligence cleaning method, device, storage medium and computer equipment
US9201862B2 (en) Method for symbolic correction in human-machine interfaces
EP2572355B1 (en) Voice stream augmented note taking
CN111797632B (en) Information processing method and device and electronic equipment
CN108197116B (en) Method and device for segmenting Chinese text, segmentation equipment and storage medium
CN109754809A (en) Audio recognition method, device, electronic equipment and storage medium
CN110674314A (en) Sentence recognition method and device
CN111324727A (en) User intention recognition method, device, equipment and readable storage medium
CN110853628A (en) Model training method and device, electronic equipment and storage medium
CN112071310B (en) Speech recognition method and device, electronic equipment and storage medium
CN110674378A (en) Chinese semantic recognition method based on cosine similarity and minimum editing distance
CN103903618B (en) A kind of pronunciation inputting method and electronic equipment
CN106708885A (en) Method and device for achieving searching
CN111508481B (en) Training method and device of voice awakening model, electronic equipment and storage medium
CN113053362A (en) Method, device, equipment and computer readable medium for speech recognition
Jing et al. Speech recognition system based improved DTW algorithm
CN116564286A (en) Voice input method and device, storage medium and electronic equipment
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium
Prasanna et al. Low cost home automation using offline speech recognition
CN114254628A (en) Method and device for quickly extracting hot words by combining user text in voice transcription, electronic equipment and storage medium
Kinsner et al. Amplification of signal features using variance fractal dimension trajectory
CN110210030B (en) Statement analysis method and device
Hahn et al. Optimizing CRFs for SLU tasks in various languages using modified training criteria
CN110334253A (en) A kind of furniture control method and device
CN110705220A (en) Text editing method, device and system applied to intelligent voice mouse and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination