CN107239547B

CN107239547B - Voice error correction method, terminal and storage medium for ordering song by voice

Info

Publication number: CN107239547B
Application number: CN201710414657.6A
Authority: CN
Inventors: 马雅茹; 刘昕; 苟津川; 雷宇
Original assignee: Beijing Rubo Technology Co Ltd
Current assignee: Beijing Rubu Technology Co.,Ltd.
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2019-05-28
Anticipated expiration: 2037-06-05
Also published as: CN107239547A

Abstract

The invention discloses a kind of voice error correction method, terminal and storage mediums for ordering song by voice, wherein this method comprises: speech recognition result is matched with the information in preset musical dictionary, wherein preset musical dictionaries store has the attribute information and its corresponding relationship of music sources；From acquisition in preset musical dictionary and the matched attribute information of song information in speech recognition result；Judge song information with the presence or absence of mistake according to matched attribute information；If there is mistake, song information is corrected according to matched attribute information.The present invention carries out error correction to speech recognition result by preset musical dictionary, avoids music sources retrieval error caused by speech recognition errors, improves the success rate of music service.

Description

Voice error correction method, terminal and storage medium for ordering song by voice

Technical field

The present embodiments relate to music service technology more particularly to a kind of voice error correction method for ordering song by voice, Terminal and storage medium.

Background technique

As speech recognition technology is continuously improved, application is also more and more extensive, is also complied with by playing speech on demand song existing The usage scenario at alternative family.Compared with touch screen requesting song, ordering song by voice gets rid of the constraint at interface, can allow user completely according to a The retrieval that people is accustomed to completing song plays, but due to the complexity of human linguistic communication, this considerably increases the difficulty of requesting song, lead to The mode for crossing natural language program request wants more flexible and extensive, is just able to satisfy the different language mode of various users and requesting song is practised It is used.

Ordering song by voice is related to the identification to voice, speech recognition it is accurate whether, it is whether full to the song finally played Sufficient user demand is expected to have very big influence.Existing voice error correction typically carries out in speech recognition process, for example, For the speech recognition result of natural language, by syntactic information (position, identification stability), the semantic letter of analysing in depth word Breath (sentence target meaning) and pragmatic information (context harmony degree) are assessed speech recognition result sentence, EDC error detection and correction, Final output optimizes sentence.The above method is sensu lato voice error correction method, needs to carry out point of grammer, semanteme and pragmatic Analysis, method is complicated, and time-consuming, and not applicable ordering song by voice.Currently, there is no the voice error correction sides proposed to ordering song by voice process Method.

Summary of the invention

The present invention provides a kind of voice error correction method, terminal and storage medium for ordering song by voice, can be quickly to language The speech recognition result of point of articulation song carries out error correction, avoids music sources retrieval error caused by speech recognition errors, improves music The success rate of service.

In a first aspect, the embodiment of the invention provides a kind of voice error correction methods for ordering song by voice, comprising:

Speech recognition result is matched with the information in preset musical dictionary, wherein the preset musical dictionaries store There are the attribute information and its corresponding relationship of music sources；

From acquisition in the preset musical dictionary and the matched attribute information of song information in institute's speech recognition result；

Judge the song information with the presence or absence of mistake according to the matched attribute information；

If there is mistake, the song information is corrected according to the matched attribute information.

Further, speech recognition result is matched with the information in preset musical dictionary, comprising:

Receive the voice messaging of user's input；

Speech recognition is carried out to the voice messaging, obtains speech recognition result；

Word segmentation processing is carried out to institute's speech recognition result, obtains at least one word；

At least one described word is matched with the information in the preset musical dictionary.

Further, matched with the song information in institute's speech recognition result from being obtained in the preset musical dictionary Attribute information, comprising:

According to the text and phonetic of the song information, obtains from the preset musical dictionary and matched with the song information Attribute information.

Further, judge the song information with the presence or absence of mistake according to the matched attribute information, comprising:

In the case where an only song information, judge in the matched attribute information whether include and the song The information of information text exact matching；

If it is, determining that the song information identification is correct；

If it is not, then determining the song information, there are mistakes.

Further, the song information is corrected according to the matched attribute information, comprising:

In the case where an only song information,

If there is multiple matched attribute informations and do not include the information of text exact matching, then calculates separately each matched The song information is corrected as the maximum information of similarity by the similarity of attribute information and the song information；

If an only matched attribute information and be not text exact matching information, more by the song information It is just the matched attribute information.

In the case where there are multiple song informations, for current song information, institute is judged according to the preset musical dictionary Whether the attribute information for stating current song information matches with other identifies correct song informations there are corresponding relationships；

If it is, determining that the current song information identification is correct；

If it is not, then determining the current song information, there are mistakes.

In the case where there is multiple song informations, according to the correct song information of identification and the matched category of each song information Property information and attribute information corresponding relationship, corrigendum exist mistake song information.

Further, the method also includes: the preset musical dictionary is updated according to the music sources of update.

Further, after carrying out error correction to the song information according to matched attribute information, the method is also wrapped It includes: corresponding song is searched according to the song information after corrigendum.

Second aspect, the embodiment of the invention also provides a kind of terminal, the terminal includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the voice error correction method for ordering song by voice as described in any embodiment of that present invention.

The third aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes the voice error correction side for ordering song by voice as described in any embodiment of that present invention when the program is executed by processor Method.

The present invention quickly the speech recognition result to ordering song by voice can be carried out error correction, be avoided by preset musical dictionary Music sources retrieval error caused by speech recognition errors, improves the success rate of music service.

Detailed description of the invention

Fig. 1 is the flow chart for the voice error correction method for ordering song by voice that the embodiment of the present invention one provides；

Fig. 2 is a kind of structural schematic diagram for terminal that the embodiment of the present invention three provides.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is the flow chart for the voice error correction method for ordering song by voice that the embodiment of the present invention one provides, the present embodiment The case where being applicable to ordering song by voice, this method can be executed by terminal having data processing function.As shown in Figure 1, should Method specifically comprises the following steps:

Step 110, speech recognition result is matched with the information in preset musical dictionary, wherein preset musical dictionary It is stored with the attribute information and its corresponding relationship of music sources.

Wherein, the attribute information of music sources can be singer's name, song title, album name, style, languages etc., these categories Property information between there are corresponding relationships, for example, sing album belonging to the singer of certain song, the song, the song style, should The languages etc. of song, therefore the corresponding relationship of preset musical dictionary also attribute information storage carry out that these can be referred to when error correction Corresponding relationship obtains more accurate result.Preset musical dictionary can be believed according to existing all music sources and its attribute Breath is integrated to obtain, and dictionary is abundanter, and the understanding being intended to for user is more accurate.Preset musical dictionary can store in terminal Or in server.

Step 120, from acquisition in preset musical dictionary and the matched attribute information of song information in speech recognition result.

Wherein, song information refers to information relevant to music sources, can be the attribute information of music sources, for example, Title of the song, Ge Shouming, album name etc..Matched attribute information may include exact matching information and part match information.

Step 130, judge song information with the presence or absence of mistake according to matched attribute information.

Wherein, briefly, if in matched attribute information not including the information with the exact matching of song information text, It can determine that there are mistakes for the song information.Type of error can be text missing or it is extra, phonetic is identical but text is different etc..

Step 140, if there is mistake, song information is corrected according to matched attribute information.

Wherein, if mistake is not present in song information, corresponding song directly can be searched according to the song information.Such as There are mistakes for fruit song information, and after correcting song information, corresponding song can be searched according to the song information after corrigendum, real Existing ordering song by voice.

The technical solution of the present embodiment can be quickly to the speech recognition result of ordering song by voice by preset musical dictionary Error correction is carried out, music sources retrieval error caused by speech recognition errors is avoided, improves the success rate of music service.

Based on the above technical solution, it is preferred that will be in speech recognition result and preset musical dictionary in step 110 Information carry out matching may include: receive user input voice messaging；Speech recognition is carried out to voice messaging, obtains voice Recognition result；Word segmentation processing is carried out to speech recognition result, obtains at least one word；By at least one word and preset musical Information in dictionary is matched.

Wherein, speech recognition result can be text information, and existing audio recognition method specifically can be used and carry out language Sound identification, for example, the algorithm based on dynamic time warping, the hidden Markov method based on parameter model, be based on nonparametric model Vector quantization method, the algorithm based on artificial neural network etc., the embodiment of the present invention is to speech recognition process without specifically It is bright.Existing segmentation methods can be used in word segmentation processing, for example, the mechanical Chinese word segmentation algorithm based on string matching, based on understand Segmentation methods, the segmentation methods based on statistics etc., the embodiment of the present invention is to specific participle process without being described in detail.Example Such as, user issues voice: I wants to listen the east wind of Zhou Jielun broken, then after word segmentation processing, obtained word may be: I, think Listen, Zhou Jielun, east wind it is broken.

Preferably, the song information in speech recognition result can be determined as follows: judging preset musical dictionary In with the presence or absence of the attribute information for being higher than preset value with the text similarity of current term；If so, determining that current term is song Bent information；If it is not, then current term is not song information.Wherein, preset value can be configured according to the actual situation, such as 0.7.Existing method can be used in the calculating of Words similarity, for example, being based on the word of semantic dictionary (such as Wordnet, Hownet) Language similarity algorithm is woven in all word groups in tree structure, by the path length between calculate node as word away from From；For another example, the Words similarity algorithm based on corpus statistics, using word vectors spatial model, which selects one group in advance Feature Words, calculate this group of Feature Words and each word correlation (generally with this group of word in actual large-scale corpus with The frequency that the word occurs within a context is measured), then each word can be obtained the Feature Words of a correlation to Amount, then using the similarity (included angle cosine of vector is generally used to calculate) between these vectors as the similar of the two words Degree.The present invention is to specific similarity calculation process without detailed description.Text similarity is higher than pre- by this preferred embodiment If the word of value can exclude the interference of other words as song information.For example, user issues voice: playing Zhou Jielun's East wind, participle obtain: broadcasting, Zhou Jielun, east wind, by comparison, can determine that song information is Zhou Jielun and east wind.Really The purpose for determining song information be exclude the interference of some texts in voice, for example, play, I will listen.

Based on the above technical solution, it is preferred that step 120 can according to the text and phonetic of song information, from It is obtained and the matched attribute information of the song information in preset musical dictionary.It is carried out pair according to the text of song information and phonetic Than mistake caused by the text missing/extra, phonetically similar word occurred in speech recognition can be evaded.

Furthermore, it is contemplated that music sources can increase at any time, the embodiment of the present invention can be according to the music sources of update to pre- If music dictionary is updated, guarantee the timeliness and accuracy of preset musical dictionary, and then guarantees to make up voice in time Identify the situation of mistake.Preferably, update can be timed to preset musical dictionary according to prefixed time interval.To default sound The step of happy dictionary is updated can execution when not using the dictionary.

Embodiment two

On the basis of the above embodiment 1, it present embodiments provides and judges song information with the presence or absence of wrong and corresponding Corrigendum song information mode, be illustrated separately below.

(1) in the case where an only song information, judge in matched attribute information whether include and song information The information of text exact matching；If it is, determining that song information identification is correct；If it is not, then determining that the song information is deposited In mistake.

Wherein, if other than the information of text exact matching, there is also the attribute information of other unisonance difference words or Attribute information similar in person can also export these attribute informations, and user is prompted to select.

For example, user input voice: black sweater, speech recognition result are also black sweater, by its text, phonetic with Preset musical dictionary is matched, and matched attribute information is black sweater (this belongs to exact matching information), thereby determines that language Sound recognition result is correct.If matched attribute information further includes that (this belongs to portion to grey sweater other than black sweater Divide match information), then it can determine that the identification of black sweater is correct, black sweater and grey sweater can also be showed into user, Prompt user selects.Specifically, can be the matched attribute information of voice output, such as 1 represents selection black sweater, 2 generations Table selects grey sweater, and user speech replies 1 or 2；It is also possible to show matched attribute information on a display screen, user can be with It is selected, can also be selected by voice response 1 or 2 by click keys.

In the case where an only song information, the process of the song information is corrected such as according to matched attribute information Under:

1) if there is multiple matched attribute informations and do not include text exact matching information, then calculate separately each matching Attribute information and the song information similarity, song information is corrected as the maximum information of similarity.It wherein calculates similar The prior art can be used in the method for degree, and as described in above-described embodiment one, the present embodiment repeats no more this.

For example, speech recognition result and song information are the peninsulas, matched according to the peninsula and bandao, in default sound It is peninsula can and with island that matched attribute information is found in happy dictionary, is not the information of text exact matching, then counts respectively Calculate peninsula can, the similarity on companion island and the peninsula obtains for example, the Words similarity algorithm based on corpus statistics is calculated The similarity highest of peninsula can and the peninsula, therefore the peninsula is corrected as peninsula can.This belongs to the case where text missing.

If 2) an only matched attribute information and be not text exact matching information, more by the song information It is just the matched attribute information.

For example, speech recognition result and song information are the peninsulas, matched according to the peninsula and bandao, in default sound It is peninsula can that matched attribute information is found in happy dictionary, information that is as a result unique and not being text exact matching, then will be partly Island is corrected as peninsula can.

For another example, speech recognition result and song information are " being not desired to grow up ", according to " being not desired to grow up " and " buxiangzhangdaya " is matched, and it is " being not desired to grow up " that part match information is found in preset musical dictionary, as a result It is uniquely and not the information of text exact matching, then " will be not desired to grow up " is corrected as " being not desired to grow up ".It is extra that this belongs to text The case where.

(2) in the case where there are multiple song informations, judge that song information is as follows with the presence or absence of wrong step: for working as Preceding song information judges whether the attribute information of current song information matches is correct with other identifications according to preset musical dictionary There are corresponding relationships for song information；If it is, determining that the identification of current song information is correct；If it is not, then determining current song There are mistakes for information.

For example, speech recognition result is: it is good that I will listen poplar ancestor's latitude and Zhang Bichen to sing, and song information has Yang Zongwei, opens It is green morning, good, by the matching with preset musical dictionary, determines poplar ancestor's latitude and Zhang Bichen is the correct song information of identification；It will " good " and " liangliang " are matched in preset musical dictionary respectively, find matched attribute information have it is good and cool It is cool, judged at this time according to the corresponding relationship with Yang Zongwei, Zhang Bichen, can determine that there are mistakes for song information " good ". This belongs to mistake caused by phonetically similar word.

In the case where there is multiple song informations, the process according to matched attribute information corrigendum song information is as follows: root According to correct song information, the corresponding relationship with the matched attribute information of each song information and attribute information is identified, corrigendum exists The song information of mistake.

For example, speech recognition result is: it is good that I will listen poplar ancestor's latitude and Zhang Bichen to sing, and determines that song information has Yang Zong It is latitude, Zhang Bichen, good, " good " and " liangliang " is matched in preset musical dictionary respectively, is found matched Attribute information and its corresponding relationship are as follows: what the good of the good performance of singer, singer Yang Zongwei and Zhang Bichen were sung cools, thus Can determine that user wants to listen according to singer's name should cool, and thus be corrected as cooling by good.

For another example, speech recognition result is: I will listen Christmas Day of Chen Yixun, and song information is that Chen Yi is fast and Christmas Day, benefit Matched attribute information Christmas knot is found in preset musical dictionary with " Christmas Day " and " shengdanjie " and its singer is old Yi Xun can determine that Christmas Day is identification mistake thus according to singer's name, will be corrected as Christmas knot Christmas Day.This belongs to unisonance Mistake caused by word.

(3) if there is no any matched attribute information, then prompt information is exported, user is prompted to input voice mistake； Or song retrieval is carried out still according to speech recognition result, export song retrieval result.

To sum up, error correction may be summarized to be content augmentation, content removal and wrongly written character and correct these types of situation, wherein content is mended Filling is that auto-complete user does not state complete resource name, and content removal is to delete user to state extra resource name, wrong Word correction is to correct the errors in text different with word of the sound as caused by speech recognition.It can be evaded in speech recognition by error correction The case where text of appearance lacks, text is extra and errors in text, reduces since resource name is imperfect, resource name text The failure of resource retrieval caused by extra or Text region mistake, to improve music service success rate.

Embodiment three

Fig. 2 is a kind of structural schematic diagram for terminal that the embodiment of the present invention three provides, as shown in Fig. 2, the terminal includes: place Manage device 210, memory 220, input unit 230 and output device 240；In terminal the quantity of processor 210 can be one or It is multiple, in Fig. 2 by taking a processor 210 as an example；Processor 210, memory 220, input unit 230 and output dress in terminal Setting 240 can be connected by bus or other modes, in Fig. 2 for being connected by bus.

Memory 220 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer Sequence and module, such as the corresponding program instruction of voice error correction method for ordering song by voice in the embodiment of the present invention.Processor 210 software program, instruction and the modules being stored in memory 220 by operation, are answered thereby executing the various functions of terminal With and data processing, that is, realize the above-mentioned voice error correction method for ordering song by voice.

Memory 220 can mainly include storing program area and storage data area, wherein storing program area can store operation system Application program needed for system, at least one function；Storage data area, which can be stored, uses created data etc. according to terminal.This Outside, memory 220 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 220 can be into one Step includes the memory remotely located relative to processor 210, these remote memories can pass through network connection to terminal.On The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

Input unit 230 can be used for receiving the voice messaging and character information of input, and generates and set with the user of terminal It sets and the related key signals of function control inputs, for example, input unit 230 can be microphone, keyboard, display screen etc..It is defeated Device 240 may include the equipment such as loudspeaker, display screen out, and wherein loudspeaker is for playing voice and song, and display screen is for showing Show song and relevant information.

Example IV

The embodiment of the present invention four also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey For executing a kind of voice error correction method for ordering song by voice when sequence is executed by processor, this method comprises:

Speech recognition result is matched with the information in preset musical dictionary, wherein preset musical dictionaries store has sound The attribute information and its corresponding relationship of happy resource；

From acquisition in preset musical dictionary and the matched attribute information of song information in speech recognition result；

Judge song information with the presence or absence of mistake according to matched attribute information；

If there is mistake, song information is corrected according to matched attribute information.

Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, is stored thereon with computer program (also referred to as computer executable instructions), the method operation that computer executable instructions are not limited to the described above, can also hold Relevant operation in the row voice error correction method provided by any embodiment of the invention for ordering song by voice.

By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of voice error correction method for ordering song by voice characterized by comprising

Speech recognition result is matched with the information in preset musical dictionary, wherein the preset musical dictionaries store has sound The attribute information and its corresponding relationship of happy resource；

If there is mistake, the song information is corrected according to the matched attribute information；

Wherein, in the case where an only song information, judge in the matched attribute information whether include and the song The information of bent information text exact matching；

If it is, determining that the song information identification is correct；

If it is not, then determining the song information, there are mistakes；

If there is multiple matched attribute informations and do not include the information of text exact matching, then calculates separately each matched attribute The song information is corrected as the maximum information of similarity by the similarity of information and the song information；

If an only matched attribute information and be not text exact matching information, the song information is corrected as The matched attribute information；

In the case where there are multiple song informations, for current song information, work as according to preset musical dictionary judgement Whether there are corresponding relationships with other correct song informations of identification for the preceding matched attribute information of song information；If it is, really The fixed current song information identification is correct；If it is not, then determining the current song information, there are mistakes.

2. the method according to claim 1, wherein by the information in speech recognition result and preset musical dictionary It is matched, comprising:

Receive the voice messaging of user's input；

3. knowing the method according to claim 1, wherein being obtained from the preset musical dictionary with the voice The matched attribute information of song information in other result, comprising:

According to the text and phonetic of the song information, obtained and the matched category of the song information from the preset musical dictionary Property information.

4. believing the method according to claim 1, wherein correcting the song according to the matched attribute information Breath, comprising:

In the case where there are multiple song informations, believe according to the correct song information of identification, with the matched attribute of each song information There is the song information of mistake in the corresponding relationship of breath and attribute information, corrigendum.

5. the method according to claim 1, wherein the method also includes:

The preset musical dictionary is updated according to the music sources of update.

6. a kind of terminal, which is characterized in that the terminal includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now voice error correction method for ordering song by voice as described in any one of claims 1 to 5.

7. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The voice error correction method for ordering song by voice as described in any one of claims 1 to 5 is realized when row.