CN109830229A

CN109830229A - Audio corpus intelligence cleaning method, device, storage medium and computer equipment

Info

Publication number: CN109830229A
Application number: CN201811512398.1A
Authority: CN
Inventors: 贾克尧; 陈磊
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2019-05-31

Abstract

The present invention provides a kind of audio corpus intelligence cleaning method, comprising: obtains target audio data and the corresponding correct text of the target audio data；Cutting processing is carried out to the target audio data, obtains cutting treated audio fragment；According to the audio fragment, the corresponding cypher text of the audio fragment is obtained based on preset speech text switching method；The cypher text and the correct text are subjected to Dynamic Matching, obtain the correct subordinate sentence text in the corresponding correct text of the cypher text；According to the audio fragment and the correct subordinate sentence text, cleaned audio corpus is obtained.This method can realize that audio corpus intelligently cleans based on intelligent Matching substitution artificial detection, significantly reduce processing cost compared to artificial treatment mode, improve work efficiency, and significantly improve the degree of correspondence of text and audio, it is ensured that the accuracy of audio corpus.

Description

Audio corpus intelligence cleaning method, device, storage medium and computer equipment

Technical field

The present invention relates to field of computer technology, specifically, the present invention relates to a kind of audio corpus intelligence cleaning method, Device, computer readable storage medium and computer equipment.

Background technique

With the fast development of artificial intelligence technology, speech recognition is gradually applied to types of applications software, equipment, is people Work, studying and living provide convenience.To realize speech recognition, a large amount of voice and corpus of text, text and sound are needed The degree of correspondence of frequency affects the accuracy of speech recognition result.

Currently, mainly obtain corpus by way of manually marking, limitation is, by manually marking or other hands The corpus that section obtains is easy a series of problems, such as there are multiword, few word or wrongly written characters, and artificial treatment inefficiency while exists Higher error rate, and high labor cost.

Summary of the invention

It is that at least can solve above-mentioned one of technological deficiency, the present invention provides the audio corpus intelligence of following technical scheme Cleaning method and corresponding device, computer readable storage medium and computer equipment.

The embodiment of the present invention provides a kind of audio corpus intelligence cleaning method, including walk as follows according on one side It is rapid:

Obtain target audio data and the corresponding correct text of the target audio data；

Cutting processing is carried out to the target audio data, obtains cutting treated audio fragment；According to the audio Segment obtains the corresponding cypher text of the audio fragment based on preset speech text switching method；

The cypher text and the correct text are subjected to Dynamic Matching, obtain the cypher text it is corresponding it is described just Correct subordinate sentence text in true text；

According to the audio fragment and the correct subordinate sentence text, cleaned audio corpus is obtained.

Preferably, described that the cypher text and the correct text are subjected to Dynamic Matching, obtain the cypher text Correct subordinate sentence text in the corresponding correct text, comprising:

Cutting processing is carried out to the correct text, obtains the correct text dividing treated subordinate sentence text；

The cypher text and the subordinate sentence text are subjected to Dynamic Matching, the translation is determined from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text of text.

Further, described that the cypher text and the subordinate sentence text are subjected to Dynamic Matching, from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text of the middle determination cypher text, comprising:

Based on most short editing distance algorithm, calculates the cypher text and divide respectively with the preset quantity in the correct text Similarity between sentence text；

According to the similarity, determine that the cypher text is corresponding described correct from the preset quantity subordinate sentence text Correct subordinate sentence text in text.

Further, it is described be based on most short editing distance algorithm, calculate the cypher text respectively with the correct text In preset quantity subordinate sentence text between similarity before, further includes:

Obtain the text serial number of the cypher text；

According to the text serial number, text serial number subordinate sentence text corresponding with the cypher text and the subordinate sentence text are obtained At least one each subordinate sentence text of front and back；

By before and after text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text it is each at least one Subordinate sentence text be set as the preset quantity subordinate sentence text, with for calculating the cypher text respectively and in the correct text Preset quantity subordinate sentence text between similarity.

Preferably, described according to the similarity, the cypher text pair is determined from the preset quantity subordinate sentence text After the correct subordinate sentence text in the correct text answered, further includes:

Obtain the similarity between the cypher text and the correct subordinate sentence text；

If the similarity between the cypher text and the correct subordinate sentence text is less than preset threshold value, to described correct point Sentence text carries out secondary cleaning processing.

Preferably, described that cutting processing is carried out to the target audio data, obtain cutting treated audio fragment, packet It includes:

Cutting processing is carried out to the target audio data using silence suppression techniques, is obtained by sentence pause point cutting Multiple audio fragments after reason.

Preferably, described that cutting processing is carried out to the target audio data, obtain cutting treated audio fragment it Before, further includes:

The target audio data are pre-processed, pretreated target audio data are used for cutting processing.

In addition, the embodiment of the present invention provides a kind of audio corpus intelligence cleaning device according on the other hand, wrap It includes:

Audio corpus acquiring unit, for obtaining target audio data and the corresponding correct text of the target audio data This；

Cypher text acquiring unit, for carrying out cutting processing to the target audio data, obtaining cutting, treated Audio fragment；According to the audio fragment, obtain that the audio fragment is corresponding to be turned over based on preset speech text switching method Translation sheet；

Dynamic Matching unit obtains described turn over for the cypher text and the correct text to be carried out Dynamic Matching Correct subordinate sentence text in the corresponding correct text of translation sheet；

Corpus cleaning unit, for obtaining cleaned according to the audio fragment and the correct subordinate sentence text Audio corpus.

The embodiment of the present invention provides a kind of computer readable storage medium, the computer according to another aspect Computer program is stored on readable storage medium storing program for executing, the computer program realizes above-mentioned audio corpus when being executed by processor Intelligent cleaning method.

The embodiment of the present invention provides a kind of computer equipment according to another aspect, and the computer includes one Or multiple processors；Memory；One or more computer programs, wherein one or more of computer programs are stored in It in the memory and is configured as being executed by one or more of processors, one or more of computer program configurations For: execute above-mentioned audio corpus intelligence cleaning method.

Compared with the prior art, the present invention has the following beneficial effects:

Audio corpus intelligence cleaning method, device, computer readable storage medium and computer equipment provided by the invention, It, can base by accurately finding the corresponding correct subordinate sentence text of cypher text for cypher text and correct text progress Dynamic Matching It realizes that audio corpus intelligently cleans in intelligent Matching substitution artificial detection, significantly reduces and be processed into compared to artificial treatment mode This, improves work efficiency, and significantly improve the degree of correspondence of text and audio, it is ensured that the accuracy of audio corpus.

In addition, audio corpus intelligence cleaning method, device, computer readable storage medium and computer provided by the invention Equipment also carries out cutting processing to the target audio data by using silence suppression techniques, can quickly obtain stopping by sentence Pause point cutting treated multiple audio fragments, and by calculating cypher text and subordinate sentence text based on most short editing distance algorithm Similarity, and the corresponding correct subordinate sentence text of cypher text determined according to similarity, it can be achieved that Fast Fuzzy to sentence Match, algorithm is simple and matching efficiency is high, further increases the efficiency of audio corpus cleaning；Also by with the cypher text The low correct subordinate sentence text of similarity carries out secondary cleaning processing, it can be ensured that the accuracy of audio corpus after cleaning.

The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:

Fig. 1 is the method flow diagram of audio corpus intelligence cleaning method provided in an embodiment of the present invention；

Fig. 2 is the structural schematic diagram of audio corpus intelligence cleaning device provided in an embodiment of the present invention；

Fig. 3 is the structural schematic diagram of computer equipment provided in an embodiment of the present invention.

Specific embodiment

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.Wording used herein " and/ Or " it include one or more associated wholes for listing item or any cell and all combinations.

Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.

In the application scenarios of speech recognition technology, speech recognition needs a large amount of voice and corpus of text, corpus of text The accuracy of speech recognition result is affected with the degree of correspondence of audio corpus, the degree of correspondence of corpus of text and audio corpus is got over Height, then correspondingly the accuracy of speech recognition result is also higher.For the standard for guaranteeing the audio corpus comprising voice and corpus of text True property need to usually clean audio corpus.

The embodiment of the invention provides a kind of audio corpus intelligence cleaning methods, as shown in Figure 1, this method comprises:

Step S101: target audio data and the corresponding correct text of the target audio data are obtained.

For the present embodiment, the target audio data are the audio data of pending corpus cleaning treatment, can be specially .mp3 the audio file of format.

The corresponding correct text of the target audio data is the voice content of word content and the target audio data Completely the same text can be specially the text of .txt format.

For example, the target audio data of pending corpus cleaning treatment are to read the voice document of certain article, it is described correct Text is the text of corresponding this article.

Step S102: carrying out cutting processing to the target audio data, obtains cutting treated audio fragment；According to The audio fragment obtains the corresponding cypher text of the audio fragment based on preset speech text switching method.

For the present embodiment, using preset audio segmentation technique, such as silence suppression techniques are to the target sound frequency According to cutting processing is carried out, the target audio data can be cut into based on the cutting principle of the preset audio segmentation technique Several segments obtain cutting treated multiple audio fragments.The multiple audio fragment can be sequentially with audio text by cutting Part serial number, such as the audio file serial number #1 for first audio fragment that cutting obtains, the audio file sequence of second audio fragment Number be #2, and so on.

For the present embodiment, using preset speech text diverter tool, such as the achievable voice such as news fly, Baidu AI turns Audio data is converted to writing text by the Open-Source Tools of text, and specifically, by cutting, treated that multiple audio fragments are based on Speech-to-text algorithm is respectively converted into corresponding multiple cypher texts.The multiple cypher text can be by its corresponding audio piece The audio file serial number of section is equipped with cypher text serial number, the cypher text serial number of the cypher text of such as corresponding first audio fragment For #1, the cypher text serial number #2 of the cypher text of second audio fragment is corresponded to, and so on.

It, can be by each audio file and each cypher text according to the audio file serial number and the cypher text serial number It corresponds.

In practical application scene, accuracy is absolutely converted since speech-to-text algorithm not can guarantee, is used That there may be contents is inconsistent for the corresponding audio fragment of the cypher text that preset speech text diverter tool is converted to The problem of.

Step S103: the cypher text and the correct text are subjected to Dynamic Matching, obtain the cypher text pair The correct subordinate sentence text in the correct text answered.

For the present embodiment, can sequentially be turned over one by one by each according to the cypher text serial number of the cypher text Translation sheet and the correct text carry out Dynamic Matching, specifically, by the word content and the correct text in cypher text In each subordinate sentence matched one by one, it is determining with the matched subordinate sentence of the cypher text from the correct text, will be described Subordinate sentence obtains in the corresponding correct text of the cypher text as the corresponding correct subordinate sentence text of the cypher text Correct subordinate sentence text.

Step S104: according to the audio fragment and the correct subordinate sentence text, cleaned audio corpus is obtained.

For the present embodiment, the correct subordinate sentence text is in the correct text determined after machine intelligence cleaning With the matched correct subordinate sentence text of the cypher text, and since cypher text and audio fragment are there are corresponding relationship, therefore described Also audio fragment corresponding with the cypher text matches the corresponding correct subordinate sentence text of cypher text, for the correspondence audio The correct subordinate sentence text of segment.

For the present embodiment, the original audio corpus of the target audio data include for speech form audio fragment and The corresponding cypher text for written form passes through the correct subordinate sentence text in the correct text that obtains the step S103 The corresponding cypher text of this replacement obtains in the audio fragment and the corresponding correct text of the target audio data just True subordinate sentence text, the cleaned audio corpus include the audio fragment of speech form and corresponding for written form Correct subordinate sentence text.Cleaned treated audio corpus, the correct subordinate sentence text of written form and corresponding speech form The content consistency of audio fragment significantly improves.

Audio corpus intelligence cleaning method provided by the invention, by the way that cypher text and correct text are carried out dynamic Match, accurately find the corresponding correct subordinate sentence text of cypher text, can realize audio corpus based on intelligent Matching substitution artificial detection Intelligence cleaning, significantly reduces processing cost compared to artificial treatment mode, improves work efficiency, and significantly improve text and sound The degree of correspondence of frequency, it is ensured that the accuracy of audio corpus.

In one embodiment, described that cutting processing is carried out to the target audio data, obtain cutting treated sound Frequency segment, comprising:

In practical application scene, reads aloud when people reads text and usually pause according to punctuation mark, therefore mute suppression can be used Technology processed is according to mute to voice flow, i.e., target audio data carry out cutting, obtains cutting treated multiple audio fragments.

For the present embodiment, the silence suppression techniques (Voice Activity Detection, VAD) are also known as end-speech Point detection, speech terminals detection are that effective voice segments are detected from continuous voice flow, specifically, can be effective by detecting The end point of the starting point of voice, i.e. forward terminal and efficient voice, i.e. aft terminal obtain effective voice segments.

Silence suppression techniques can realize that the cutting to the target audio data is handled based on various ways, such as based on energy Amount is based on signal-to-noise ratio snr, based on deep neural network DNN etc..

For realizing cutting based on energy, audio is one using the time as the one-dimensional continuous function of independent variable, computer The audio data of processing is the sampled value sequence of audio data according to time sequence, and the size of these sampled values equally illustrates audio Energy of the data at sampled point is readily apparent that mute part, that is, reads aloud people and read corresponding sound wave when article pauses in the process Corresponding sonic wave amplitude is larger when amplitude very little, and efficient voice part, i.e. reading article, the amplitude size and signal energy of sound wave Amount size is proportional, therefore the signal energy of mute phonological component and efficient voice part has difference in size, and silence suppression techniques can Difference in size based on signal energy distinguishes the forward terminal and aft terminal that audio data current time is mute part or subordinate sentence, To realize the cutting processing of target audio data, cutting is obtained treated multiple audio fragments.

In the present embodiment, cutting processing is carried out to the target audio data by using silence suppression techniques, it can be fast Speed is obtained by sentence pause point cutting treated multiple audio fragments, to improve the efficiency of audio corpus cleaning.

In one embodiment, described that cutting processing is carried out to the target audio data, obtain cutting treated sound Before frequency segment, further includes:

For the present embodiment, the pretreatment includes removal direct current signal and windowing process.Specifically, using high-pass filtering Device is removed direct current signal to the target audio data, and carries out windowing process to the target audio data, obtains pre- Target audio data that treated, and pretreated target audio data are used for cutting processing.

In the present embodiment, by being removed direct current and adding window two pretreatments to target audio data, inspection can be improved The accuracy of efficient voice end forward terminal and aft terminal in voice flow is surveyed, and then improves the accuracy of audio data cutting.

In one embodiment, described that the cypher text and the correct text are subjected to Dynamic Matching, it obtains described Correct subordinate sentence text in the corresponding correct text of cypher text, comprising:

For the present embodiment, cutting processing is carried out specifically, by the text in the correct text to the correct text Content press punctuation mark, such as ", ", "；","." etc. carry out cutting processing, correspond to institute after obtaining the correct text dividing processing State multiple subordinate sentence texts of each subordinate sentence in correct text.The multiple subordinate sentence text can be sequentially with subordinate sentence text sequence by cutting Number, such as subordinate sentence text the serial number #1, the subordinate sentence text serial number # of second subordinate sentence text of first subordinate sentence text that cutting obtains 2, and so on.

For the present embodiment, can sequentially be turned over one by one by each according to the cypher text serial number of the cypher text Translation sheet and the correct text dividing treated multiple subordinate sentence texts carry out Dynamic Matchings specifically will be in cypher text Word content and the correct text dividing treated subordinate sentence text in word content matched one by one, from it is described just It is determining with the matched subordinate sentence of the cypher text in true text dividing treated subordinate sentence text, by the corresponding subordinate sentence of the subordinate sentence Text obtains in the corresponding correct text of the cypher text as the corresponding correct subordinate sentence text of the cypher text Correct subordinate sentence text.

In one embodiment, described that the cypher text and the subordinate sentence text are subjected to Dynamic Matching, from described point The correct subordinate sentence text in the corresponding correct text of the cypher text is determined in sentence text, comprising:

For the present embodiment, most short editing distance refers between two sentences, is transformed into another sentence by a sentence Required least edit operation time, the edit operation include three kinds of " insertion ", " deletion " and " replacement ".The most short editor For the working principle of distance algorithm specifically, with above-mentioned three kinds of edit operations, one or more conversion method steps can be used will The current subordinate sentence text conversion of Dynamic Matching obtains cypher text, by required operation in one or more of conversion method steps The smallest numerical value of number, which saves, is used as most short editing distance.

Calculate cypher text and Dynamic Matching current subordinate sentence text most short editing distance and then according to most short Editing distance calculates the similarity of the current subordinate sentence text of cypher text and Dynamic Matching.

Wherein, calculating formula of similarity specifically:

Similarity=1-ld//Math.Max (str1.length, str2.length).

Wherein, most short editing distance of the ld between cypher text and current subordinate sentence text, Max (str1.length, It str2.length is) maximum value of string length in translation Wen Zhongyu subordinate sentence text.

For the present embodiment, according to the similarity, correct point in the corresponding correct text of cypher text is determined The highest subordinate sentence text of similarity described in preset quantity subordinate sentence text specifically, is set as pair of current cypher text by sentence text The correct subordinate sentence text in the correct text answered.Similarity i.e. in preset quantity subordinate sentence text, with the cypher text Highest subordinate sentence text is the correct subordinate sentence text in the corresponding correct text of the cypher text.

For the present embodiment, the preset quantity subordinate sentence text can be the institute obtained after the correct text dividing processing There is subordinate sentence text, is matched cypher text one by one with all subordinate sentence texts based on the most short editing distance algorithm； The preset quantity subordinate sentence text can also be the part subordinate sentence text selected after the correct text dividing processing, based on described Most short editing distance algorithm is matched cypher text with the selected part subordinate sentence text one by one, and the preset quantity can To be any positive integers such as 1,3,5,10, those skilled in the art can be according to practical application request be determining or adjusts The specific value of preset quantity, the present embodiment do not limit this.

In the present embodiment, by calculating the similarity of cypher text and subordinate sentence text based on most short editing distance algorithm, And the corresponding correct subordinate sentence text of cypher text is determined, it can be achieved that the Fast Fuzzy to sentence is matched according to similarity, algorithm is simple List and matching efficiency height, can further improve the efficiency of audio corpus cleaning.

In one embodiment, it is described be based on most short editing distance algorithm, calculate the cypher text respectively with it is described just Before the similarity between preset quantity subordinate sentence text in true text, further includes:

Obtain the text serial number of the cypher text；

For the present embodiment, the cypher text and subordinate sentence text are previously provided with text serial number, the cypher text It is correspondingly provided with cypher text serial number, the subordinate sentence text is correspondingly provided with subordinate sentence text serial number.

According to above content it is found that the corresponding audio fragment of cypher text be using preset audio segmentation technique, such as Silence suppression techniques carry out what cutting was handled to the target audio data, and the cypher text serial number of the cypher text is pressed The audio file serial number of its corresponding audio fragment is arranged；The subordinate sentence text is by the word content in the correct text by mark Point symbol cutting obtains, and the subordinate sentence text serial number of the subordinate sentence text is arranged by the cutting sequence of the correct text.

And in practical application scene, it reads aloud when people reads text and usually pauses according to punctuation mark, therefore the audio piece The cutting handling principle of section and the cutting handling principle of the subordinate sentence text can be considered identical, read text pause just reading aloud people Really and audio segmentation technique has very in the case where accuracy, the identical cypher text of the text serial number and subordinate sentence text The same content in the target audio data ought to be corresponded to, i.e., the described audio fragment and the subordinate sentence text are based on punctuate symbol It number pauses and to carry out cutting, the corresponding cypher text of the audio fragment ought to have corresponding pass with subordinate sentence text in text serial number System.

It can be by the corresponding translation text of audio fragment for the sample text for further reducing Dynamic Matching for the present embodiment This subordinate sentence text corresponding with its text serial number and its front and back preset quantity subordinate sentence text carry out Dynamic Matching.

The preset quantity can be any positive integers such as 1,3,5,10, and those skilled in the art can be according to reality Application demand is determining or adjusts the specific value of the preset quantity, and the present embodiment does not limit this.

For example, if the text serial number of the cypher text to Dynamic Matching #6, it can be by the cypher text and subordinate sentence Text serial number is similarly No. #6 subordinate sentence text and No. #6 each two subordinate sentence text in front and back, i.e. #4, #5, #7, #8 subordinate sentence text This carries out Dynamic Matching one by one.

In the present embodiment, by by cypher text only subordinate sentence text corresponding with its text serial number and front and back preset quantity A subordinate sentence text carries out Dynamic Matching, and the effect of Dynamic Matching is ensured while being reduced significantly Dynamic Matching sample, can be into one Step improves the efficiency that audio expects cleaning.

In one embodiment, described according to the similarity, it is turned over described in determination from the preset quantity subordinate sentence text After correct subordinate sentence text in the corresponding correct text of translation sheet, further includes:

In practical application scene, there is the word content and translation text of the correct subordinate sentence text obtained after Dynamic Matching The different situation of word content in this then needs to carry out secondary cleaning processing to the correct subordinate sentence text in this case.

Wherein, judge whether the word content of the correct subordinate sentence text and cypher text is consistent, specifically: described in acquisition Similarity between cypher text and the correct subordinate sentence text；Judge whether the similarity is less than preset threshold value；If described Similarity is less than preset threshold value, then the word content of the correct subordinate sentence text and cypher text differs greatly, and described point of reply Sentence text carries out secondary cleaning processing；If the similarity is more than or equal to preset threshold value, the correct subordinate sentence text and translation The word content difference of text is smaller, without carrying out secondary cleaning processing to the subordinate sentence text.

For the present embodiment, the secondary cleaning processing is specially artificial cleaning treatment, i.e., by manually to current subordinate sentence Text such as is edited, is corrected at the work, to obtain original machine cypher text pair that it fails to match or matching similarity is too low The subordinate sentence text in the correct text answered.

In the present embodiment, by carrying out secondary cleaning to the correct subordinate sentence text low with the similarity of the cypher text Processing, it can be ensured that the accuracy of audio corpus after cleaning.

In addition, the embodiment of the invention provides a kind of audio corpus intelligence cleaning devices, as shown in Fig. 2, described device packet It includes: the audio corpus acquiring unit 201, cypher text acquiring unit 202, Dynamic Matching unit 203 and corpus cleaning unit 204；Wherein,

The audio corpus acquiring unit 201, it is corresponding for obtaining target audio data and the target audio data Correct text；

The cypher text acquiring unit 202 obtains at cutting for carrying out cutting processing to the target audio data Audio fragment after reason；According to the audio fragment, the audio fragment pair is obtained based on preset speech text switching method The cypher text answered；

The Dynamic Matching unit 203 is obtained for the cypher text and the correct text to be carried out Dynamic Matching Correct subordinate sentence text in the corresponding correct text of the cypher text；

The corpus cleaning unit 204, for obtaining at cleaning according to the audio fragment and the correct subordinate sentence text Audio corpus after reason.

In one embodiment, the Dynamic Matching unit 203, is specifically used for:

Obtain the text serial number of the cypher text；

Audio corpus intelligence cleaning device provided by the invention can be realized: by moving cypher text and correct text State matching, accurately finds the corresponding correct subordinate sentence text of cypher text, can realize audio based on intelligent Matching substitution artificial detection Corpus intelligently cleans, and significantly reduces processing cost compared to artificial treatment mode, improves work efficiency, and significantly improve text With the degree of correspondence of audio, it is ensured that the accuracy of audio corpus.In addition, can also be achieved: by using silence suppression techniques to institute It states target audio data and carries out cutting processing, can quickly obtain by sentence pause point cutting treated multiple audio fragments, and Translation text is determined by calculating the similarity of cypher text and subordinate sentence text based on most short editing distance algorithm, and according to similarity For this corresponding correct subordinate sentence text, it can be achieved that the Fast Fuzzy to sentence matches, algorithm is simple and matching efficiency is high, further mentions The efficiency of high audio corpus cleaning；It is also secondary clear by being carried out to the correct subordinate sentence text low with the similarity of the cypher text Wash processing, it can be ensured that the accuracy of audio corpus after cleaning.

The embodiment of the method for above-mentioned offer may be implemented in audio corpus intelligence cleaning device provided in an embodiment of the present invention, tool Body function realizes the explanation referred in embodiment of the method, and details are not described herein.

In addition, being deposited on computer readable storage medium the embodiment of the invention provides a kind of computer readable storage medium Computer program is contained, realizes that audio corpus described in above embodiments is intelligently clear when the computer program is executed by processor Washing method.Wherein, the computer readable storage medium includes but is not limited to any kind of disk (including floppy disk, hard disk, light Disk, CD-ROM and magneto-optic disk), ROM (Read-Only Memory, read-only memory), RAM (Random AcceSS Memory, immediately memory), EPROM (EraSable Programmable Read-Only Memory, erasable programmable Read-only memory), EEPROM (Electrically EraSable Programmable Read-Only Memory, electrically erasable Programmable read only memory), flash memory, magnetic card or light card.It is, storage equipment includes by equipment (for example, calculating Machine, mobile phone) with any medium for the form storage or transmission information that can be read, it can be read-only memory, disk or CD etc..

Computer readable storage medium provided by the invention, it can be achieved that: by the way that cypher text and correct text are moved State matching, accurately finds the corresponding correct subordinate sentence text of cypher text, can realize audio based on intelligent Matching substitution artificial detection Corpus intelligently cleans, and significantly reduces processing cost compared to artificial treatment mode, improves work efficiency, and significantly improve text With the degree of correspondence of audio, it is ensured that the accuracy of audio corpus.In addition, can also be achieved: by using silence suppression techniques to institute It states target audio data and carries out cutting processing, can quickly obtain by sentence pause point cutting treated multiple audio fragments, and Translation text is determined by calculating the similarity of cypher text and subordinate sentence text based on most short editing distance algorithm, and according to similarity For this corresponding correct subordinate sentence text, it can be achieved that the Fast Fuzzy to sentence matches, algorithm is simple and matching efficiency is high, further mentions The efficiency of high audio corpus cleaning；It is also secondary clear by being carried out to the correct subordinate sentence text low with the similarity of the cypher text Wash processing, it can be ensured that the accuracy of audio corpus after cleaning.

The embodiment of the method for above-mentioned offer may be implemented in computer readable storage medium provided in an embodiment of the present invention, specifically Function realizes the explanation referred in embodiment of the method, and details are not described herein.

In addition, the embodiment of the invention also provides a kind of computer equipments, as shown in Figure 3.Calculating described in the present embodiment Machine equipment can be the equipment such as server, personal computer and the network equipment.The computer equipment include processor 302, The devices such as memory 303, input unit 304 and display unit 305.It will be understood by those skilled in the art that setting shown in Fig. 3 Standby structure devices do not constitute the restriction to all devices, may include components more more or fewer than diagram, or combine certain A little components.Memory 303 can be used for storing computer program 301 and each functional module, and the operation of processor 302 is stored in storage The computer program 301 of device 303, thereby executing the various function application and data processing of equipment.Memory can be memory Reservoir or external memory, or including both built-in storage and external memory.Built-in storage may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash Device or random access memory.External memory may include hard disk, floppy disk, ZIP disk, USB flash disk, tape etc..It is disclosed in this invention to deposit Reservoir includes but is not limited to the memory of these types.Memory disclosed in this invention is only used as example rather than as restriction.

Input unit 304 is used to receive the input of signal, and receives the keyword of user's input.Input unit 304 can Including touch panel and other input equipments.Touch panel collects the touch operation of user on it or nearby and (for example uses Family uses the operations of any suitable object or attachment on touch panel or near touch panel such as finger, stylus), and root According to the corresponding attachment device of preset driven by program；Other input equipments can include but is not limited to physical keyboard, function One of key (such as broadcasting control button, switch key etc.), trace ball, mouse, operating stick etc. are a variety of.Display unit 305 can be used for showing the information of user's input or be supplied to the information of user and the various menus of computer equipment.Display is single The forms such as liquid crystal display, Organic Light Emitting Diode can be used in member 305.Processor 302 is the control centre of computer equipment, benefit With the various pieces of various interfaces and the entire computer of connection, by running or executing the software being stored in memory 302 Program and/or module, and the data being stored in memory are called, perform various functions and handle data.

As one embodiment, the computer equipment includes: one or more processors 302, memory 303, and one Or multiple computer programs 301, wherein one or more of computer programs 301 are stored in memory 303 and are matched It is set to and is executed by one or more of processors 302, one or more of computer programs 301 are configured to carry out above Audio corpus intelligence cleaning method described in any embodiment.

Computer equipment provided by the invention, it can be achieved that: it is quasi- by the way that cypher text and correct text are carried out Dynamic Matching The corresponding correct subordinate sentence text of cypher text really is found, can realize that audio corpus is intelligently clear based on intelligent Matching substitution artificial detection It washes, significantly reduces processing cost compared to artificial treatment mode, improve work efficiency, and significantly improve pair of text and audio Answer degree, it is ensured that the accuracy of audio corpus.In addition, can also be achieved: by using silence suppression techniques to the target audio Data carry out cutting processing, can quickly obtain by sentence pause point cutting treated multiple audio fragments, and by based on most Short editing distance algorithm calculates the similarity of cypher text and subordinate sentence text, and determines that cypher text is corresponding just according to similarity For true subordinate sentence text, it can be achieved that the Fast Fuzzy to sentence matches, algorithm is simple and matching efficiency is high, further increases audio corpus The efficiency of cleaning；It, can also by carrying out secondary cleaning processing to the correct subordinate sentence text low with the similarity of the cypher text The accuracy of audio corpus after ensuring to clean.

The embodiment of the method for above-mentioned offer may be implemented in computer equipment provided in an embodiment of the present invention, and concrete function is realized The explanation in embodiment of the method is referred to, details are not described herein.

It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.

The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of audio corpus intelligence cleaning method, which comprises the steps of:

Cutting processing is carried out to the target audio data, obtains cutting treated audio fragment；According to the audio fragment, The corresponding cypher text of the audio fragment is obtained based on preset speech text switching method；

The cypher text and the correct text are subjected to Dynamic Matching, obtain the corresponding correct text of the cypher text Correct subordinate sentence text in this；

2. audio corpus intelligence cleaning method according to claim 1, which is characterized in that it is described by the cypher text with The correct text carries out Dynamic Matching, obtains the correct subordinate sentence text in the corresponding correct text of the cypher text, Include:

The cypher text and the subordinate sentence text are subjected to Dynamic Matching, the cypher text is determined from the subordinate sentence text Correct subordinate sentence text in the corresponding correct text.

3. audio corpus intelligence cleaning method according to claim 2, which is characterized in that it is described by the cypher text with The subordinate sentence text carries out Dynamic Matching, from being determined in the corresponding correct text of the cypher text in the subordinate sentence text Correct subordinate sentence text, comprising:

Based on most short editing distance algorithm, it is literary with the preset quantity subordinate sentence in the correct text respectively to calculate the cypher text Similarity between this；

According to the similarity, the corresponding correct text of the cypher text is determined from the preset quantity subordinate sentence text In correct subordinate sentence text.

4. audio corpus intelligence cleaning method according to claim 3, which is characterized in that described to be based on most short editing distance Algorithm, before calculating similarity of the cypher text respectively between the preset quantity subordinate sentence text in the correct text, Further include:

Obtain the text serial number of the cypher text；

According to the text serial number, text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text front and back are obtained The respectively subordinate sentence text of at least one；

By at least one point each before and after text serial number subordinate sentence text corresponding with the cypher text and subordinate sentence text Sentence text be set as the preset quantity subordinate sentence text, with for calculate the cypher text respectively with it is pre- in the correct text Set the similarity between quantity subordinate sentence text.

5. audio corpus intelligence cleaning method according to claim 3, which is characterized in that it is described according to the similarity, From the correct subordinate sentence text determined in the preset quantity subordinate sentence text in the corresponding correct text of the cypher text it Afterwards, further includes:

If the similarity between the cypher text and the correct subordinate sentence text is less than preset threshold value, to the correct subordinate sentence text This progress secondary cleaning processing.

6. audio corpus intelligence cleaning method according to claim 1, which is characterized in that described to the target sound frequency According to cutting processing is carried out, cutting is obtained treated audio fragment, comprising:

Cutting processing is carried out to the target audio data using silence suppression techniques, is obtained after being handled by sentence pause point cutting Multiple audio fragments.

7. audio corpus intelligence cleaning method according to claim 1, which is characterized in that described to the target sound frequency According to cutting processing is carried out, before obtaining cutting treated audio fragment, further includes:

8. a kind of audio corpus intelligence cleaning device characterized by comprising

Audio corpus acquiring unit, for obtaining target audio data and the corresponding correct text of the target audio data；

Cypher text acquiring unit obtains cutting treated audio for carrying out cutting processing to the target audio data Segment；According to the audio fragment, the corresponding translation text of the audio fragment is obtained based on preset speech text switching method This；

Dynamic Matching unit obtains the translation text for the cypher text and the correct text to be carried out Dynamic Matching Correct subordinate sentence text in this corresponding described correct text；

Corpus cleaning unit, for obtaining cleaned audio according to the audio fragment and the correct subordinate sentence text Corpus.

9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes that the described in any item audio corpus of claim 1 to 7 intelligently clean when the computer program is executed by processor Method.

10. a kind of computer equipment, characterized in that it comprises:

One or more processors；

Memory；

One or more computer programs, wherein one or more of computer programs are stored in the memory and quilt It is configured to be executed by one or more of processors, one or more of computer programs are configured to: execute according to power Benefit requires 1 to 7 described in any item audio corpus intelligence cleaning methods.