CN104778946A

CN104778946A - Voice control method and system

Info

Publication number: CN104778946A
Application number: CN201410011484.XA
Authority: CN
Inventors: 马宇飞; 邓佳佳; 林毅
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2014-01-10
Filing date: 2014-01-10
Publication date: 2015-07-15

Abstract

The invention discloses a voice control method and a system. In the voice control method, a mobile terminal sends acquired user voice instruction information to a voiceprint recognition server via a network access terminal; the voiceprint recognition server carries out voiceprint recognition on the user voice instruction information and sends a user identifier corresponding to the recognized voiceprint to the network access equipment; the network access equipment sends the user voice instruction information and the user identifier to a voice recognition server; the voice recognition server extracts a control instruction corresponding to the user voice instruction information in a user corpus related to the user identifier and sends the control instruction to the network access equipment so as to enable the network access equipment to perform corresponding operation according to the control instruction. The voiceprint recognition technology is used for distinguishing users, voice recognition is carried out on the basis of the user personalized corpus, voice recognition accuracy is improved, the voice recognition consumption time is shortened, and the user can acquire better use experience.

Description

Sound control method and system

Technical field

The present invention relates to the communications field, particularly a kind of sound control method and system.

Background technology

Speech recognition is exactly allow machine, by speech recognition and semantic understanding technology, voice signal is changed into the new and high technology of corresponding text and order, and utilizes speech recognition technology to realize controlling to be voice control technology to real things.

Wherein, what deposit in corpus is the true linguistic data occurred constantly accumulated in the actual use of speech control system, can be improved accuracy rate and the efficiency of semantic understanding in speech control system by corpus.

Sound groove recognition technology in e be one according to the speaker information contained in speech waveform, automatically identify the technology of speaker ' s identity.The behavior difference that everyone will be formed due to the differences of Physiological of vocal organs and the day after tomorrow, to make in their voice, all with strong personal colors, to be difficult to find two duplicate people of vocal print.Thus this characteristic can be utilized to carry out authentication.

But, under the scene that user is more, due to the use habit of each user and conventional language different again, just make voice server be difficult to be formed corpus accurately, often need repeatedly could confirm user semantic alternately, have impact on Consumer's Experience.

Summary of the invention

The embodiment of the present invention provides a kind of sound control method and system.By utilizing sound groove recognition technology in e to distinguish user, speech recognition is carried out on the basis of user individual corpus, thus improve accuracy rate and the efficiency of speech recognition.

According to an aspect of the present invention, a kind of sound control method is provided, comprises:

The user speech command information collected is sent to network insertion terminal by mobile terminal;

User speech command information is sent to Application on Voiceprint Recognition server by network insertion terminal;

Application on Voiceprint Recognition server carries out Application on Voiceprint Recognition to user speech command information, and the user ID corresponding with the vocal print identified is sent to network access equipment;

User speech command information and user ID are sent to speech recognition server by network access equipment;

Speech recognition server inquires about the user's corpus be associated with user ID;

Speech recognition server, in the user's corpus be associated with user ID, extracts the steering order corresponding with user speech command information, steering order is sent to network access equipment, so that network access equipment carries out corresponding operating according to steering order.

In one embodiment, speech recognition server is in the user's corpus be associated with user ID, and the step extracting the steering order corresponding with user speech command information comprises:

Speech recognition server judges, in the user's corpus be associated with user ID, whether to there is the steering order corresponding with user speech command information;

If with exist in user's corpus that user ID is associated and the corresponding steering order of user speech command information, then perform the step extracting the steering order corresponding with user speech command information.

In one embodiment, if there is not the steering order corresponding with user speech command information with user's corpus that user ID is associated, then by general corpus, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information.

In one embodiment, the step that speech recognition server inquires about the user's corpus be associated with user ID comprises:

Speech recognition server judges whether to inquire the user's corpus be associated with user ID;

The user's corpus be associated with user ID if inquire, then perform speech recognition server in the user's corpus be associated with user ID, extract the step of the steering order corresponding with user speech command information.

In one embodiment, the user's corpus be associated with user ID if do not inquire, then speech recognition server sets up the user's corpus be associated with user ID, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information, then perform step steering order being sent to network access equipment.

In one embodiment, Application on Voiceprint Recognition server carries out Application on Voiceprint Recognition to user speech command information, and sends to the step of network access equipment to comprise the user ID corresponding with the vocal print identified:

Application on Voiceprint Recognition server carries out Application on Voiceprint Recognition to user speech command information, to obtain voiceprint;

Judge whether there is described voiceprint in vocal print storehouse;

If there is described voiceprint in vocal print storehouse, then perform the step user ID corresponding with the vocal print identified being sent to network access equipment.

In one embodiment, if there is not described voiceprint in vocal print storehouse, then described voiceprint is stored in vocal print storehouse, and distributes corresponding user ID for described voiceprint, then the user ID of distribution is sent to network access equipment.

In one embodiment, mobile terminal is telepilot, and network insertion terminal is Set Top Box.

According to a further aspect in the invention, provide a kind of speech control system, comprise mobile terminal, network insertion terminal, Application on Voiceprint Recognition server and speech recognition server, wherein:

Mobile terminal, for gathering user speech command information, sends to network insertion terminal by the user speech command information collected;

Network insertion terminal, for when receiving the user speech command information that mobile terminal sends, sends to Application on Voiceprint Recognition server by user speech command information; When receiving the user ID that Application on Voiceprint Recognition server sends, user speech command information and user ID are sent to speech recognition server;

Application on Voiceprint Recognition server, for when receiving the user speech command information that network insertion terminal sends, carrying out Application on Voiceprint Recognition to user speech command information, and the user ID corresponding with the vocal print identified is sent to network access equipment;

Speech recognition server, for when receiving user speech command information and the user ID of network insertion terminal transmission, inquire about the user's corpus be associated with user ID, in the user's corpus be associated with user ID, extract the steering order corresponding with user speech command information, steering order is sent to network access equipment, so that network access equipment carries out corresponding operating according to steering order.

In one embodiment, speech recognition server is specifically when receiving user speech command information and the user ID of network insertion terminal transmission, judge, in the user's corpus be associated with user ID, whether to there is the steering order corresponding with user speech command information; If with exist in user's corpus that user ID is associated and the corresponding steering order of user speech command information, then perform the operation of extracting the steering order corresponding with user speech command information.

In one embodiment, also for when there is not the steering order corresponding with user speech command information in user's corpus that user ID is associated in speech recognition server, by general corpus, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information.

In one embodiment, speech recognition server specifically when receiving user speech command information and the user ID of network insertion terminal transmission, judges whether to inquire the user's corpus be associated with user ID; The user's corpus be associated with user ID if inquire, then perform in the user's corpus be associated with user ID, extracts the operation of the steering order corresponding with user speech command information.

In one embodiment, speech recognition server is not also for when inquiring the user's corpus be associated with user ID, set up the user's corpus be associated with user ID, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information, then perform operation steering order being sent to network access equipment.

In one embodiment, Application on Voiceprint Recognition server specifically when receiving the user speech command information that network insertion terminal sends, carries out Application on Voiceprint Recognition to user speech command information, to obtain voiceprint; Judge whether there is described voiceprint in vocal print storehouse; If there is described voiceprint in vocal print storehouse, then perform the operation user ID corresponding with the vocal print identified being sent to network access equipment.

In one embodiment, when also for there is not described voiceprint in vocal print storehouse in Application on Voiceprint Recognition server, described voiceprint is stored in vocal print storehouse, and distributes corresponding user ID for described voiceprint, then the user ID of distribution is sent to network access equipment.

The present invention confirms the identity of active user by Application on Voiceprint Recognition, utilizes and extracts the steering order corresponding with user speech instruction with the personalized corpus that user identity is associated.Thus the accuracy rate of speech recognition can be improved, shorten the elapsed time of speech recognition, make user obtain better experience.

Description of the invention provides in order to example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selecting and describing embodiment is in order to principle of the present invention and practical application are better described, and enables those of ordinary skill in the art understand the present invention thus design the various embodiments with various amendment being suitable for special-purpose.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the schematic diagram of a sound control method of the present invention embodiment.

Fig. 2 is the schematic diagram of another embodiment of sound control method of the present invention.

Fig. 3 is the schematic diagram of the another embodiment of sound control method of the present invention.

Fig. 4 is the schematic diagram of a speech control system of the present invention embodiment.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Illustrative to the description only actually of at least one exemplary embodiment below, never as any restriction to the present invention and application or use.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

Unless specifically stated otherwise, otherwise positioned opposite, the numerical expression of the parts of setting forth in these embodiments and step and numerical value do not limit the scope of the invention.

Meanwhile, it should be understood that for convenience of description, the size of the various piece shown in accompanying drawing is not draw according to the proportionate relationship of reality.

May not discuss in detail for the known technology of person of ordinary skill in the relevant, method and apparatus, but in the appropriate case, described technology, method and apparatus should be regarded as a part of authorizing instructions.

In all examples with discussing shown here, any occurrence should be construed as merely exemplary, instead of as restriction.Therefore, other example of exemplary embodiment can have different values.

It should be noted that: represent similar terms in similar label and letter accompanying drawing below, therefore, once be defined in an a certain Xiang Yi accompanying drawing, then do not need to be further discussed it in accompanying drawing subsequently.

Fig. 1 is the schematic diagram of a sound control method of the present invention embodiment.As shown in Figure 1, the method step of the present embodiment is as follows:

Step 101, the user speech command information collected is sent to network insertion terminal by mobile terminal.

Step 102, user speech command information is sent to Application on Voiceprint Recognition server by network insertion terminal.

Step 103, Application on Voiceprint Recognition server carries out Application on Voiceprint Recognition to user speech command information, and the user ID corresponding with the vocal print identified is sent to network access equipment.

Step 104, user speech command information and user ID are sent to speech recognition server by network access equipment.

Step 105, speech recognition server inquires about the user's corpus be associated with user ID.

Step 106, speech recognition server is in the user's corpus be associated with user ID, extract the steering order corresponding with user speech command information, steering order is sent to network access equipment, so that network access equipment carries out corresponding operating according to steering order.

Based on the sound control method that the above embodiment of the present invention provides, confirmed the identity of active user by Application on Voiceprint Recognition, utilize and extract the steering order corresponding with user speech instruction with the personalized corpus that user identity is associated.Thus the accuracy rate of speech recognition can be improved, shorten the elapsed time of speech recognition, make user obtain better experience.

In one embodiment, the method can be applicable to IPTV(Internet ProtocolTelevision, Web TV) in speech control system, wherein mobile terminal can be telepilot, and network insertion terminal can be Set Top Box.Wherein in IPTV speech control system, IPTV voice remote controller is collected the voice of each new user and is delivered to Set Top Box, forms the vocal print storehouse of this new user under this Set Top Box.When user is when using Voice command IPTV, user identity can be identified according to user's vocal print feature, in user's routine use process, progressively setting up the personalized corpus of this user.Like this, IPTV speech recognition server carries out in speech recognition process, can first search user individual corpus, in order to improve the accuracy of voice identification result, shortens the time that speech recognition consumes, the Consumer's Experience of optimizing product.

Equally, all the present invention can be adopted in other similar Voice command scenes with fixed-line subscriber.As Intelligent household voice control system, family KTV to request a song speech control system, vehicle-mounted program request speech control system etc.

Fig. 2 is the schematic diagram of another embodiment of sound control method of the present invention.Compared with embodiment illustrated in fig. 1, further replenish user corpus when there is not corresponding information in user's corpus embodiment illustrated in fig. 2, thus can Consumer's Experience be improved.

Step 201, the user speech command information collected is sent to network insertion terminal by mobile terminal.

Step 202, user speech command information is sent to Application on Voiceprint Recognition server by network insertion terminal.

Step 203, Application on Voiceprint Recognition server carries out Application on Voiceprint Recognition to user speech command information, and the user ID corresponding with the vocal print identified is sent to network access equipment.

Step 204, user speech command information and user ID are sent to speech recognition server by network access equipment.

Step 205, speech recognition server inquires about the user's corpus be associated with user ID.

Step 206, speech recognition server judges, in the user's corpus be associated with user ID, whether to there is the steering order corresponding with user speech command information.If there is not the steering order corresponding with user speech command information with user's corpus that user ID is associated, then performing step 207; If with exist in user's corpus that user ID is associated and the corresponding steering order of user speech command information, then perform step 208.

Step 207, speech recognition server carries out speech recognition to obtain steering order by general corpus to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information.Then step 209 is performed.

Step 208, speech recognition server extracts the steering order corresponding with user speech command information.

Step 209, steering order is sent to network access equipment by speech recognition server, so that network access equipment carries out corresponding operating according to steering order.

Fig. 3 is the schematic diagram of the another embodiment of sound control method of the present invention.In this embodiment, when occur new user time or when not existing when user's corpus, system all automatically can add relevant information, thus improves Consumer's Experience.

Step 301, the user speech command information collected is sent to network insertion terminal by mobile terminal.

Step 302, user speech command information is sent to Application on Voiceprint Recognition server by network insertion terminal.

Step 303, Application on Voiceprint Recognition server carries out Application on Voiceprint Recognition to user speech command information, to obtain voiceprint.

Step 304, Application on Voiceprint Recognition server judges whether there is described voiceprint in vocal print storehouse.If there is not described voiceprint in vocal print storehouse, then perform step 305; If there is described voiceprint in vocal print storehouse, then perform step 307.

Step 305, is stored into described voiceprint in vocal print storehouse, and distributes corresponding user ID for described voiceprint.

Step 306, sends to network access equipment by the user ID of distribution.Then step 308 is performed.

Step 307, sends to network access equipment by the user ID corresponding with the vocal print identified.

Step 308, user speech command information and user ID are sent to speech recognition server by network access equipment.

Step 309, speech recognition server judges whether to inquire the user's corpus be associated with user ID.The user's corpus be associated with user ID if do not inquire, then perform step 310; The user's corpus be associated with user ID if inquire, then perform step 311.

Step 310, speech recognition server sets up the user's corpus be associated with user ID, speech recognition is carried out to obtain steering order to user speech command information, and steering order and corresponding user speech command information are deposited in the user's corpus be associated with user ID, then perform step 312.

Step 311, speech recognition server, in the user's corpus be associated with user ID, extracts the steering order corresponding with user speech command information.

Preferably, the extraction operation in step 311, can adopt embodiment illustrated in fig. 2 process.

Step 312, steering order is sent to network access equipment by speech recognition server, so that network access equipment carries out corresponding operating according to steering order.

Fig. 4 is the schematic diagram of a speech control system of the present invention embodiment.As shown in Figure 4, this system comprises mobile terminal 401, network insertion terminal 402, Application on Voiceprint Recognition server 403 and speech recognition server 404.Wherein:

Mobile terminal 401, for gathering user speech command information, sends to network insertion terminal 402 by the user speech command information collected.

Network insertion terminal 402, for when receiving the user speech command information that mobile terminal 401 sends, sends to Application on Voiceprint Recognition server 403 by user speech command information; When receiving the user ID that Application on Voiceprint Recognition server 403 sends, user speech command information and user ID are sent to speech recognition server 404.

Application on Voiceprint Recognition server 403, for when receiving the user speech command information that network insertion terminal 402 sends, carrying out Application on Voiceprint Recognition to user speech command information, and the user ID corresponding with the vocal print identified is sent to network access equipment 402.

Speech recognition server 404, for when receiving user speech command information and the user ID of network insertion terminal 402 transmission, inquire about the user's corpus be associated with user ID, in the user's corpus be associated with user ID, extract the steering order corresponding with user speech command information, steering order is sent to network access equipment 402, so that network access equipment 402 carries out corresponding operating according to steering order.

Based on the speech control system that the above embodiment of the present invention provides, confirmed the identity of active user by Application on Voiceprint Recognition, utilize and extract the steering order corresponding with user speech instruction with the personalized corpus that user identity is associated.Thus the accuracy rate of speech recognition can be improved, shorten the elapsed time of speech recognition, make user obtain better experience.

Preferably, this system can be in IPTV speech control system, and wherein mobile terminal is telepilot, and network insertion terminal is Set Top Box.Equally, all the present invention can be adopted in other similar Voice command scenes with fixed-line subscriber.As Intelligent household voice control system, family KTV to request a song speech control system, vehicle-mounted program request speech control system etc.

Preferably, speech recognition server 404 is specifically when receiving user speech command information and the user ID of network insertion terminal 402 transmission, judge, in the user's corpus be associated with user ID, whether to there is the steering order corresponding with user speech command information; If with exist in user's corpus that user ID is associated and the corresponding steering order of user speech command information, then perform the operation of extracting the steering order corresponding with user speech command information.

Preferably, also for when there is not the steering order corresponding with user speech command information in user's corpus that user ID is associated in speech recognition server 404, by general corpus, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information.

Preferably, speech recognition server 404 specifically when receiving user speech command information and the user ID of network insertion terminal transmission, judges whether to inquire the user's corpus be associated with user ID; The user's corpus be associated with user ID if inquire, then perform in the user's corpus be associated with user ID, extracts the operation of the steering order corresponding with user speech command information.

Preferably, speech recognition server 404 is not also for when inquiring the user's corpus be associated with user ID, set up the user's corpus be associated with user ID, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information, then perform operation steering order being sent to network access equipment.

Preferably, Application on Voiceprint Recognition server 403 specifically when receiving the user speech command information that network insertion terminal sends, carries out Application on Voiceprint Recognition to user speech command information, to obtain voiceprint; Judge whether there is described voiceprint in vocal print storehouse; If there is described voiceprint in vocal print storehouse, then perform the operation user ID corresponding with the vocal print identified being sent to network access equipment.

Preferably, when also for there is not described voiceprint in vocal print storehouse in Application on Voiceprint Recognition server 403, described voiceprint is stored in vocal print storehouse, and distributes corresponding user ID for described voiceprint, then the user ID of distribution is sent to network access equipment.

Such as, the new user of IPTV allows IPTV service provider to set up oneself vocal print storehouse and corpus, and typing one section of voice.Application on Voiceprint Recognition server extracts user's vocal print characteristic storage (Application on Voiceprint Recognition server and storer can be arranged on Set Top Box this locality, also can be arranged in speech recognition server) under this family's vocal print storehouse from this section of voice.Speech recognition server sets up user individual corpus according to the user's common-use words in User IP TV use procedure and speech habits in the actual use procedure of user, under being stored in this family's corpus.

Finally, when this user sends phonetic order, Application on Voiceprint Recognition server identifies user identity, and speech recognition server searches for this user individual corpus, and returns accordingly result.

In the present invention, because corpus is for each individual subscriber, therefore speech discrimination accuracy can significantly improve, domestic consumer's number is few, vocal print storehouse is little, Application on Voiceprint Recognition is consuming time almost can be ignored, and individual corpus is more much smaller than family corpus, and therefore speech recognition elapsed time also can shorten greatly.

One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.

Claims

1. a sound control method, is characterized in that, comprising:

2. method according to claim 1, is characterized in that,

Speech recognition server is in the user's corpus be associated with user ID, and the step extracting the steering order corresponding with user speech command information comprises:

3. method according to claim 2, is characterized in that,

If there is not the steering order corresponding with user speech command information with user's corpus that user ID is associated, then by general corpus, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information.

4. the method according to any one of claim 1-3, is characterized in that,

The step that speech recognition server inquires about the user's corpus be associated with user ID comprises:

5. method according to claim 4, is characterized in that,

The user's corpus be associated with user ID if do not inquire, then speech recognition server sets up the user's corpus be associated with user ID, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information, then perform step steering order being sent to network access equipment.

6. the method according to any one of claim 1-3, is characterized in that,

Application on Voiceprint Recognition server carries out Application on Voiceprint Recognition to user speech command information, and sends to the step of network access equipment to comprise the user ID corresponding with the vocal print identified:

Judge whether there is described voiceprint in vocal print storehouse;

7. method according to claim 6, is characterized in that,

If there is not described voiceprint in vocal print storehouse, then described voiceprint is stored in vocal print storehouse, and distributes corresponding user ID for described voiceprint, then the user ID of distribution is sent to network access equipment.

8. the method according to any one of claim 1-3, is characterized in that,

Mobile terminal is telepilot;

Network insertion terminal is Set Top Box.

9. a speech control system, is characterized in that, comprises mobile terminal, network insertion terminal, Application on Voiceprint Recognition server and speech recognition server, wherein:

10. system according to claim 9, is characterized in that,

Speech recognition server specifically when receiving user speech command information and the user ID of network insertion terminal transmission, judges, in the user's corpus be associated with user ID, whether to there is the steering order corresponding with user speech command information; If with exist in user's corpus that user ID is associated and the corresponding steering order of user speech command information, then perform the operation of extracting the steering order corresponding with user speech command information.

11. systems according to claim 10, is characterized in that,

Also for when there is not the steering order corresponding with user speech command information in user's corpus that user ID is associated in speech recognition server, by general corpus, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information.

12. systems according to any one of claim 9-11, is characterized in that,

Speech recognition server specifically when receiving user speech command information and the user ID of network insertion terminal transmission, judges whether to inquire the user's corpus be associated with user ID; The user's corpus be associated with user ID if inquire, then perform in the user's corpus be associated with user ID, extracts the operation of the steering order corresponding with user speech command information.

13. systems according to claim 12, is characterized in that,

Speech recognition server is not also for when inquiring the user's corpus be associated with user ID, set up the user's corpus be associated with user ID, speech recognition is carried out to obtain steering order to user speech command information, and steering order is deposited in the user's corpus be associated with user ID with corresponding user speech command information, then perform operation steering order being sent to network access equipment.

14. systems according to any one of claim 9-11, is characterized in that,

Application on Voiceprint Recognition server specifically when receiving the user speech command information that network insertion terminal sends, carries out Application on Voiceprint Recognition to user speech command information, to obtain voiceprint; Judge whether there is described voiceprint in vocal print storehouse; If there is described voiceprint in vocal print storehouse, then perform the operation user ID corresponding with the vocal print identified being sent to network access equipment.

15. systems according to claim 14, is characterized in that,

When Application on Voiceprint Recognition server also for not existing described voiceprint in vocal print storehouse, described voiceprint being stored in vocal print storehouse, and distributing corresponding user ID for described voiceprint, then the user ID of distribution being sent to network access equipment.

16. systems according to any one of claim 9-11, is characterized in that,

Mobile terminal is telepilot;

Network insertion terminal is Set Top Box.