CN106228974A

CN106228974A - Control method based on speech recognition, Apparatus and system

Info

Publication number: CN106228974A
Application number: CN201610700858.8A
Authority: CN
Inventors: 龙涛; 刘正东; 龙江; 乔磊
Original assignee: Zhenjiang Huitong Electronics Co Ltd
Current assignee: Zhenjiang Huitong Electronics Co Ltd
Priority date: 2016-08-19
Filing date: 2016-08-19
Publication date: 2016-12-14

Abstract

A kind of control method based on speech recognition, Apparatus and system, control method based on speech recognition includes: obtain speech data to be identified from source device；Determine the language form of described speech data to be identified, and choose the speech recognition server of coupling according to described language form；Sending described speech data to be identified to selected speech recognition server, described speech recognition server is in order to process described speech data to be identified, to be identified result；Receiving the recognition result of described speech recognition server feedback, described recognition result is converted to executable operational order, and send described operational order to target device, described operational order is used for controlling described target device.The embodiment of the present invention achieves user and uses convenience and the high efficiency of speech recognition.

Description

Control method based on speech recognition, Apparatus and system

Technical field

The present invention relates to technical field of information processing, particularly relate to a kind of control method based on speech recognition, device and System.

Background technology

Along with the development of speech recognition technology, the field that speech recognition is supported constantly is expanding, speech recognition accurate Degree is also constantly improving.Speech recognition at present can achieve the identification close to 80% to the standard pronunciation of world languages Various dialects are identified that the growth also with data base is constantly increasing by rate.Simultaneously because the maturation of speech recognition technology, And the reduction of speech recognition cost, use the potential field of speech recognition to get more and more, what increasing field used sets For all beginning to use speech recognition as the interface of man-machine interaction.

In prior art, speech recognition is broadly divided into two ways, and a kind of mode is for being provided clothes by speech-recognition services business Business, the data form being met Carrier Requirements by the service submission buying service provider realizes speech recognition；Another kind side Formula is to use special voice recognition chip to realize speech recognition.

But, in the voice recognition mode of prior art, first kind of way needs certain exploitation for a user Work and corresponding technology requirement；Second way hardware cost is higher, and identify be limited in scope.

Summary of the invention

Present invention solves the technical problem that it is how to improve user to use convenience and the high efficiency of speech recognition.

For solving above-mentioned technical problem, the embodiment of the present invention provides a kind of control method based on speech recognition, based on language The control method of sound identification includes:

Speech data to be identified is obtained from source device；Determine the language form of described speech data to be identified, and according to Described language form chooses the speech recognition server of coupling；Described speech data to be identified is sent to selected speech recognition Server, described speech recognition server is in order to process described speech data to be identified, to be identified result；Receive The recognition result of described speech recognition server feedback, is converted to executable operational order by described recognition result, and sends Described operational order is to target device, and described operational order is used for controlling described target device.

Optionally, described control method based on speech recognition also includes: cannot respond to described behaviour at described target device When instructing, described speech data to be identified is forwarded to manual service platform.

Optionally, described acquisition before speech data to be identified from source device also includes: to described source device and institute State target device and carry out P2P address registration.

Optionally, obtain described speech data to be identified and the described operational order of transmission is grasped by P2P communication modes Make.

Optionally, the described language form determining described speech data to be identified includes: according to described voice number to be identified According to the geographical position of described source device determine language mode to be selected；Utilize described language mode to be selected to described language to be identified Sound data are mated, to determine described language form.

Optionally, described transmission by described speech data to be identified to selected speech recognition server includes: by described Speech data to be identified is converted into the audio file with preset format；The audio file with preset format is sent to selected Speech recognition server, described audio file includes the recognition mode of described source device.

For solving above-mentioned technical problem, the embodiment of the invention also discloses a kind of control device based on speech recognition, base Control device in speech recognition includes:

Acquiring unit, is suitable to obtain speech data to be identified from source device；Matching unit, is adapted to determine that described to be identified The language form of speech data, and the speech recognition server of coupling is chosen according to described language form；Transmitting element, be suitable to by Described speech data to be identified sends to selected speech recognition server, and described speech recognition server is in order to know described waiting Other speech data processes, to be identified result；Control unit, is suitable to receive the knowledge of described speech recognition server feedback Other result, is converted to executable operational order by described recognition result, and sends described operational order to target device, described Operational order is used for controlling described target device.

Optionally, described control device based on speech recognition also includes: retransmission unit, be suitable to described target device without When method responds described operational order, described speech data to be identified is forwarded to manual service platform.

Optionally, described control device based on speech recognition also includes: address registration unit, is suitable to set described source Standby and described target device carries out P2P address registration.

Optionally, described matching unit includes: speech pattern determines subelement, is suitable to according to described speech data to be identified The geographical position of described source device determine language mode to be selected；Sound-type determines subelement, is suitable to utilize described to be selected Described speech data to be identified is mated by language mode, to determine described language form.

Optionally, described transmitting element includes: format conversion subelement, is suitable to be converted into described speech data to be identified There is the audio file of preset format；Send subelement, be suitable to send to selected language the audio file with preset format Sound identification server, described audio file includes the recognition mode of described source device.

The embodiment of the invention also discloses a kind of control system based on speech recognition, control system based on speech recognition Including:

Control device based on speech recognition；At least one speech recognition server.

Optionally, described control system based on speech recognition also includes: P2P client server, in order to provide P2P communication Address, carries out P2P communication for described source device.

Compared with prior art, the technical scheme of the embodiment of the present invention has the advantages that

The embodiment of the present invention is by obtaining speech data to be identified from source device；Determine described speech data to be identified Language form, and the speech recognition server of coupling is chosen according to described language form；Described speech data to be identified is sent To selected speech recognition server, described speech recognition server in order to described speech data to be identified is processed, with It is identified result；Receive the recognition result of described speech recognition server feedback, be converted to perform by described recognition result Operational order, and send described operational order to target device, described operational order is used for controlling described target device.Pass through Speech data to be identified is sent and carries out speech recognition to the speech recognition server mated with its language form, improve user Use the accuracy of speech recognition；Additionally, by recognition result being converted to control instruction to control target device, it is possible to achieve Voice command to the target device not possessing speech identifying function, is conducive to improving Consumer's Experience.

Further, when described target device cannot respond to described operational order, described speech data to be identified is forwarded To manual service platform.Source device cannot respond to described operational order and shows speech recognition server recognition result mistake, passes through It is forwarded to manual service platform to be identified by manual service platform, is conducive to improving the discrimination of speech recognition, thus controls The accuracy of target device, improves Consumer's Experience further.

Further, obtain described speech data to be identified and the described operational order of transmission is grasped by P2P communication modes Make.The high speed communication of speech recognition process can be realized by P2P communication modes, be conducive to improving Consumer's Experience further.

Further, described speech data to be identified is converted into there is the audio file of preset format；To have default lattice The audio file of formula sends to selected speech recognition server.By being converted into by speech data to be identified, there is preset format The conversion operation of audio file, it is ensured that speech data to be identified can be received by speech recognition server；And then make to treat Identify that speech data can support multiple format, reduce the speech recognition server requirement to speech data to be identified, thus Avoid the technology development work of user side, user cost can be reduced.

Accompanying drawing explanation

Fig. 1 is the flow chart of a kind of control method based on speech recognition of the embodiment of the present invention；

Fig. 2 is the flow chart of the another kind of control method based on speech recognition of the embodiment of the present invention；

Fig. 3 is a kind of structural representation controlling device based on speech recognition of the embodiment of the present invention；

Fig. 4 is the another kind of structural representation controlling device based on speech recognition of the embodiment of the present invention；

Fig. 5 is the schematic diagram of a kind of control system based on speech recognition of the embodiment of the present invention.

Detailed description of the invention

As described in the background art, in the voice recognition mode of prior art, first kind of way needs for a user There are certain development and corresponding technology requirement；Second way hardware cost is higher, and identify be limited in scope.

Understandable, below in conjunction with the accompanying drawings to the present invention for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from Specific embodiment be described in detail.

Fig. 1 is the flow chart of a kind of control method based on speech recognition of the embodiment of the present invention.

Described control method based on speech recognition may comprise steps of:

Step S101: obtain speech data to be identified from source device；

Step S102: determine the language form of described speech data to be identified, and choose coupling according to described language form Speech recognition server；

Step S103: described speech data to be identified is sent to selected speech recognition server, described speech recognition Server is in order to process described speech data to be identified, to be identified result；

Step S104: receiving the recognition result of described speech recognition server feedback, being converted to by described recognition result can The operational order performed, and send described operational order to target device, described operational order is used for controlling described target device.

Below in conjunction with Fig. 1, above-mentioned steps S101 is described in detail to step S104.

In being embodied as, described source device can gather speech data.In step S101, the speech data got For speech data to be identified, say, that speech data to be identified needs to carry out speech recognition operation.

It is understood that source device can be arbitrarily to implement the smart machine of voice collecting operation, the most permissible Being to possess the air-conditioning of voice collecting function, TV, refrigerator, washing machine etc., the embodiment of the present invention is without limitation.

In being embodied as, choose the speech recognition server matched according to the language form of speech data to be identified.Institute Stating sound-type can be languages, such as, and mandarin, English, Russian etc.；Can also be dialect, such as Shanghai native language, Sichuan words Deng.Every kind of sound-type all possesses the speech recognition server matched, and described speech recognition server can possess and language The corresponding speech database of sound type, thus ensure the accuracy of speech recognition.

It is understood that speech recognition server can be the server of speech recognition operator.

Specifically, it is determined that the language form of described speech data to be identified may include that according to described voice number to be identified According to the geographical position of described source device determine language mode to be selected；Utilize described language mode to be selected to described language to be identified Sound data are mated, to determine described language form.Specifically, in the case of knowing the geographical position of source device, The speech pattern to be selected that voice collecting uses can be determined, the voice to be selected determined according to languages corresponding to geographical position or dialect Pattern can have one or more.Such as, if source device is positioned at Shanghai, then the speech pattern to be selected determined can have general Call mode, Shanghai native language pattern, English mode.

In being embodied as, in step s 103, described speech data to be identified is sent to selected speech-recognition services Device, in order to described speech data to be identified is identified by speech recognition server.

Specifically, described speech data to be identified is sent the most selected speech recognition server to may include that described Speech data to be identified is converted into the audio file with preset format；The audio file with preset format is sent to selected Speech recognition server, described audio file includes the recognition mode of described source device.Specifically, preset format and voice Identification server is corresponding.It is to say, speech recognition server has call format for speech data to be identified, therefore sending Speech data to be identified, to before speech recognition server, is converted into speech recognition server requirement by speech data to be identified Form.

For various types of subscriber equipmenies, subscriber equipment only need to gather the audio file of reference format i.e. Can, such as ADPCM, AMR, WAV, MP2, MP3 etc..Avoid subscriber equipment and do technical transformation, improve user and make By the convenience of speech recognition.

Meanwhile, described recognition mode may refer to the different lexical types that different source device is arranged, the most permissible It is compuword, film vocabulary, medical vocabulary etc..By recognition mode is sent to speech recognition server, beneficially language Speech data to be identified is identified by sound identification server targetedly, improves the accuracy of recognition result.

In being embodied as, in step S104, described recognition result is converted to executable operational order, and in order to control Target device processed.Specifically, target device can be identical with source device, it is also possible to different from source device.It is appreciated that , according to actual application scenarios, target device and source device may be located at same LAN.

So far, control process based on speech recognition completes, and target device can perform corresponding behaviour according to control instruction Make.Such as, air-conditioning can switch on power start working according to opening the instruction of control instruction.

The embodiment of the present invention carries out speech recognition by speech data to be identified is matched to speech recognition server, thus Avoid the technology development work of user side, user cost can be reduced, improve user simultaneously and use the convenience of speech recognition； Meanwhile, by recognition result being converted to control instruction to control target device, Consumer's Experience is improved.

Fig. 2 is the flow chart of the another kind of control method based on speech recognition of the embodiment of the present invention.

Described control method based on speech recognition may comprise steps of:

Step S201: described source device and described target device are carried out P2P address registration；

Step S202: obtain speech data to be identified from source device；

Step S203: determine the language form of described speech data to be identified, and choose coupling according to described language form Speech recognition server；

Step S204: described speech data to be identified is sent to selected speech recognition server；

Step S205: receiving the recognition result of described speech recognition server feedback, being converted to by described recognition result can The operational order performed, and send described operational order to target device；

Step S206: when described target device cannot respond to described operational order, turns described speech data to be identified Send to manual service platform.

Below in conjunction with Fig. 2, above-mentioned steps S201 is described in detail to step S206.

In being embodied as, in step s 201, by described source device is carried out peer-to-peer network (Peer to Peer, P2P) address registration, can pass through P2P so that obtain described speech data to be identified with sending described operational order Communication modes operates.Thus, achieved the high speed communication of speech recognition process by P2P communication modes, improve further Consumer's Experience.

In being embodied as, in step S202, speech data to be identified can be standard audio file, such as ADPCM, AMR, WAV, MP2, MP3 etc..

In being embodied as, in step S204, it is that unified data-interface sends by the standard audio file of source device To selected speech recognition server.Wherein, the recognition result after speech recognition server completes speech recognition can be text File.

In being embodied as, through step S205, the recognition result of text formatting is converted into control according to the requirement of source device Instruction processed is sent to target device.Control instruction controls target device and performs corresponding operation.In step S206, at described mesh When marking device cannot respond to described operational order, show the recognition result mistake of speech recognition server, by described waiting being known Other speech data is forwarded to manual service platform and carries out more intelligent identification and artificial cognition, can improve the identification of speech recognition Rate, thus reach to control the accuracy of target device, improve Consumer's Experience further.

Specifically, it is also possible to by user's feedback to control instruction, recognition result is unsatisfied with, then may be used by such as user So that described speech data to be identified is forwarded to manual service platform.

The detailed description of the invention of embodiment of the present invention step S201 to S206 can refer to aforementioned corresponding embodiment to step The description of S101 to S104, here is omitted.

The embodiment of the present invention uses the language form determining described speech data to be identified according to source device, chooses not Mode with speech-recognition services operator；By forwarding speech data to be identified to come to manual service when recognition result is wrong Improve the mode of discrimination；Realized the speech recognition of network high-speed communication by P2P user's service, reduce user cost, with Shi Tigao user uses the convenience of speech recognition.

Fig. 3 is a kind of structural representation controlling device based on speech recognition of the embodiment of the present invention.

Control device 30 based on speech recognition may include that acquiring unit 301, matching unit 302, transmitting element 303 With control unit 304.

Wherein, acquiring unit 301 is suitable to obtain speech data to be identified from source device；

Matching unit 302 is adapted to determine that the language form of described speech data to be identified, and selects according to described language form Take the speech recognition server of coupling；

Transmitting element 303 is suitable to send to selected speech recognition server, institute's predicate described speech data to be identified Sound identification server is in order to process described speech data to be identified, to be identified result；

Control unit 304 is suitable to receive the recognition result of described speech recognition server feedback, is turned by described recognition result Being changed to executable operational order, and send described operational order to target device, described operational order is used for controlling described mesh Marking device.

In the present embodiment, carry out speech recognition by speech data to be identified being matched to speech recognition server, thus Avoid the technology development work of user side, user cost can be reduced, improve user simultaneously and use the convenience of speech recognition； Meanwhile, by recognition result being converted to control instruction to control target device, Consumer's Experience is improved.

Fig. 4 is the another kind of structural representation controlling device based on speech recognition of the embodiment of the present invention.

Control device 40 based on speech recognition may include that address registration unit 401, acquiring unit 402, matching unit 403, transmitting element 404, control unit 405 and retransmission unit 406.

Wherein, address registration unit 401 is suitable to described source device and described target device are carried out P2P address note Volume.

Acquiring unit 402 is suitable to obtain speech data to be identified from source device；

Matching unit 403 is adapted to determine that the language form of described speech data to be identified, and selects according to described language form Take the speech recognition server of coupling.

In being embodied as, matching unit 403 may include that speech pattern determines that subelement (not shown) and sound-type are true Stator unit (not shown).

Wherein, speech pattern determines that subelement is suitable to the geography of the described source device according to described speech data to be identified Position determines language mode to be selected；Sound-type determines that subelement is suitable to utilize described language mode to be selected to described language to be identified Sound data are mated, to determine described language form.

Transmitting element 404 is suitable to send to selected speech recognition server, institute's predicate described speech data to be identified Sound identification server is in order to process described speech data to be identified, to be identified result.

Specifically, transmitting element 404 can include format conversion subelement (not shown) and send subelement (not shown).

Wherein, format conversion subelement, be suitable to described speech data to be identified is converted into the audio frequency with preset format File；Send subelement, be suitable to send to selected speech recognition server, described sound the audio file with preset format Frequency file includes the recognition mode of described source device

Control unit 405 is suitable to receive the recognition result of described speech recognition server feedback, is turned by described recognition result Being changed to executable operational order, and send described operational order to target device, described operational order is used for controlling described mesh Marking device.Retransmission unit 406 is suitable to when described target device cannot respond to described operational order, by described voice number to be identified According to being forwarded to manual service platform.

The embodiment of the present invention detailed description of the invention controlling device 40 based on speech recognition can refer to aforementioned based on voice The corresponding embodiment of the control method identified, here is omitted.

The embodiment of the invention also discloses a kind of control system based on speech recognition.Described system can include based on language The control device of sound identification and at least one speech recognition server, speech recognition server is in order to enter speech data to be identified Row processes, to be identified result.

Wherein, the described device that controls based on speech recognition sends extremely in order to the speech data described to be identified that will get The speech recognition server of coupling, and the most described recognition result is converted to executable operational order, described operational order is used To control the source device of described speech data to be identified.

In being embodied as, described system can also include P2P client server, and P2P client server leads in order to provide P2P News address, carries out P2P communication for described source device.

The embodiment of the present invention builds P2P network by combining P2P client server, thus constitutes base based on speech recognition Plinth service system, makes the accuracy of speech recognition and real-time improve further；Make the integration between internal system more simultaneously Simply.

As it is shown in figure 5, the schematic diagram of Fig. 5 a kind of control system based on speech recognition that is the embodiment of the present invention.

In the present embodiment, control device 501 based on speech recognition can be to realize in the manner of a server.

In the present embodiment, for various types of source device 501, source device 501 need to gather the audio frequency literary composition of standard Part.And after being submitted to control device 502 based on speech recognition, control device 502 based on speech recognition is according to source device The address of 501 judges the language form of audio file, and matches suitable speech recognition server 503 according to language form. Such as it may be that for the speech recognition server A of mandarin identification, for the speech recognition server B of English identification or use Speech recognition server C in Shanghai native language identification.

Audio file is converted to the form that speech recognition server 503 requires by control device 502 based on speech recognition, And it is committed to speech recognition server 503 according to corresponding form.

Speech recognition server 503 carries out speech recognition to audio file, and returns recognition result to based on speech recognition Control device 502, recognition result is assembled into the demand form of source device 501 by the device 502 that controls based on speech recognition Send to source device 501.

If speech recognition server 503 None-identified or source device 501 feedback identifying result are wrong, Ke Yijing By, audio file is submitted to manual service platform 505 by based on speech recognition control device 502 again, carries out more intelligent knowledge Not or manual service.

One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can Completing instructing relevant hardware by program, this program can be stored in in computer-readable recording medium, storage Medium may include that ROM, RAM, disk or CD etc..

Although present disclosure is as above, but the present invention is not limited to this.Any those skilled in the art, without departing from this In the spirit and scope of invention, all can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Limit in the range of standard.

Claims

1. a control method based on speech recognition, it is characterised in that including:

Speech data to be identified is obtained from source device；

Determine the language form of described speech data to be identified, and choose the speech-recognition services of coupling according to described language form Device；

Sending described speech data to be identified to selected speech recognition server, described speech recognition server is in order to institute State speech data to be identified to process, to be identified result；

Receive the recognition result of described speech recognition server feedback, described recognition result is converted to executable operation and refers to Order, and send described operational order to target device, described operational order is used for controlling described target device.

Control method based on speech recognition the most according to claim 1, it is characterised in that also include:

When described target device cannot respond to described operational order, described speech data to be identified is forwarded to manual service Platform.

Control method based on speech recognition the most according to claim 1, it is characterised in that described from source device acquisition Also include before speech data to be identified:

Described source device and described target device are carried out P2P address registration.

Control method based on speech recognition the most according to claim 3, it is characterised in that obtain described voice to be identified Data and the described operational order of transmission are operated by P2P communication modes.

Control method based on speech recognition the most according to claim 1, it is characterised in that described determine described to be identified The language form of speech data includes:

The geographical position of the described source device according to described speech data to be identified determines language mode to be selected；

Utilize described language mode to be selected that described speech data to be identified is mated, to determine described language form.

6. according to the control method based on speech recognition described in any one of claim 1 to 5, it is characterised in that described by institute State speech data to be identified transmission to include to selected speech recognition server:

Described speech data to be identified is converted into the audio file with preset format；

Sending the audio file with preset format to selected speech recognition server, described audio file includes described next The recognition mode of source device.

7. a control device based on speech recognition, it is characterised in that including:

Acquiring unit, is suitable to obtain speech data to be identified from source device；

Matching unit, is adapted to determine that the language form of described speech data to be identified, and chooses coupling according to described language form Speech recognition server；

Transmitting element, is suitable to send to selected speech recognition server, described speech recognition described speech data to be identified Server is in order to process described speech data to be identified, to be identified result；

Control unit, is suitable to receive the recognition result of described speech recognition server feedback, and being converted to by described recognition result can The operational order performed, and send described operational order to target device, described operational order is used for controlling described target device.

Control device based on speech recognition the most according to claim 7, it is characterised in that also include:

Retransmission unit, is suitable to, when described target device cannot respond to described operational order, be turned by described speech data to be identified Send to manual service platform.

Address registration unit, is suitable to described source device and described target device are carried out P2P address registration.

Control device based on speech recognition the most according to claim 9, it is characterised in that obtain described language to be identified Sound data and the described operational order of transmission are operated by P2P communication modes.

11. control devices based on speech recognition according to claim 7, it is characterised in that described matching unit includes:

Speech pattern determines subelement, and the geographical position being suitable to the described source device according to described speech data to be identified determines Language mode to be selected；

Sound-type determines subelement, is suitable to utilize described language mode to be selected to mate described speech data to be identified, To determine described language form.

12. according to the control device based on speech recognition described in any one of claim 7 to 11, it is characterised in that described Unit is sent to include:

Format conversion subelement, is suitable to described speech data to be identified is converted into the audio file with preset format；

Send subelement, be suitable to send to selected speech recognition server, described sound the audio file with preset format Frequency file includes the recognition mode of described source device.

13. 1 kinds of control systems based on speech recognition, including:

Control device based on speech recognition as described in any one of claim 7 to 12；

At least one described speech recognition server.

14. control systems based on speech recognition according to claim 13, it is characterised in that also include:

P2P client server, in order to provide P2P address, for described source device and described control based on speech recognition Device processed carries out P2P communication.