CN102868740A

CN102868740A - Method and system for controlling toy based on mobile communication terminal and internet voice interaction

Info

Publication number: CN102868740A
Application number: CN2012103297631A
Authority: CN
Inventors: 吴玉胜; 李新岗
Original assignee: SHENZHEN SILICON ELECTRONICS CO Ltd
Current assignee: SHENZHEN SILICON ELECTRONICS CO Ltd
Priority date: 2012-09-07
Filing date: 2012-09-07
Publication date: 2013-01-09

Abstract

The invention relates to a method and a system for controlling a toy based on a mobile communication terminal and internet voice interaction. The system comprises a toy provided with communication connection, the mobile communication terminal provided with voice input, and a network server provided with voice identification and conversion. By the adoption of the method and the system for controlling the toy based on the mobile communication terminal and internet voice interaction, the identification of voice input is realized through the mobile communication terminal and the internet, by calling contents of the network server by the internet, functions of large storage of the contents, remarkable effect of an identification result and timely updating of the contents are realized, so that the functions of the toy is enabled to be stronger, and cost is greatly saved at the same time.

Description

The toy control method that movement-based communicating terminal and internet voice are mutual and system

Technical field

The present invention relates to a kind of toy sound control method and system, relate in particular to a kind of movement-based communicating terminal and internet voice mutual toy control method and system.

Background technology

Along with the development of society and the raising of voice technology, voice toy more and more comes extensive use.The existing voice toy arranges voice recognition chip at toy mostly, stores simple phonetic order and content, by instruction and the content of calling storage after the speech recognition, thus the operation voice toy.For now, there is following defective in prior art: 1, usually toy need to be controlled cost, and the instruction and the content memory capacity that have of toy is limited cheaply, and content is few; 2, each toy itself need to possess a cover voice input, voice recognition chip and memory module, causes like this cost high.

Summary of the invention

The technical problem that the present invention solves is: make up a kind of movement-based communicating terminal and internet voice mutual toy control method and system, overcome that prior art voice toy memory capacity is limited to cause the technical problem that content is few and cost is high.The present invention benefits from calculation process ability and the network communication ability that communication terminal is better than the toy terminal far away, so that the user can be by the communication terminal that extensive use develops on market at present, on its terminal platform, the speech recognition that employing intelligent interaction ability is stronger, the identification accuracy is higher and natural language understanding system finally realize with the interaction of entity toy mutual, bring user's experience of the interactive experience of ultra-traditional toy scheme far away.

Technical scheme of the present invention is: a kind of movement-based communicating terminal and the mutual toy control method of internet voice are provided, comprise the toy that has communication and connect, have the communication terminal of phonetic entry and speech recognition, the webserver with speech recognition conversion, the mutual toy control method of movement-based communicating terminal and internet voice comprises the steps:

Input voice: by described communication terminal input voice;

Upload voice: with described communication terminal connecting Internet, and the voice messaging of input uploaded to the described webserver by the Internet that described communication terminal connects;

Speech recognition conversion: conversion identified in the voice that described communication terminal and described webserver side-by-side docking are received, and this speech recognition conversion result is the form of instruction or instruction and parameter;

Carry out the identification transformation result: by the described webserver, described communication terminal, described toy jointly carry out this speech recognition conversion result or by wherein any two carry out this speech recognition conversion results or any one carries out this speech recognition conversion result by described toy and described communication terminal.

Further technical scheme of the present invention is: also comprise according to the identification scene making up semantic knowledge-base, described semantic knowledge-base comprises the semantic attribute of words, in the speech recognition conversion step, also comprises and carries out semanteme identification conversion, specifically comprises the steps:

Participle and semantic disambiguation: voice identification result is carried out the participle lang justice disambiguation of going forward side by side according to the semantic attribute of knowledge base words;

Intention classification and parameter extraction: the result to participle and semantic disambiguation is intended to classification, and the line parameter of going forward side by side extracts.

Further technical scheme of the present invention is: the speech recognition conversion result of the described webserver and described communication terminal includes the confidence level of speech recognition conversion, described communication terminal arranges speech recognition conversion result's confidence threshold value, when described communication terminal speech recognition conversion result's confidence level during more than or equal to this confidence threshold value, get this speech recognition conversion result, if described communication terminal speech recognition conversion result's confidence level during less than this confidence threshold value, is got the speech recognition conversion result of higher value in described webserver speech recognition conversion result's confidence level and described communication terminal speech recognition conversion result's the confidence level.

Further technical scheme of the present invention is: also comprise and wake described communication terminal up, wake described communication terminal up by any mode in phonetic order, button or the wireless signal and make described communication terminal enter the state of input voice.

Further technical scheme of the present invention is: described toy and described communication terminal are by any is connected in infrared communication assembly, high frequency modulated communication part, bluetooth communication assembly, 2.4G wireless communication assembly, the RFID radio-frequency communication assembly.

Further technical scheme of the present invention is: when input can not be identified voice messaging or inexecutable voice messaging, carry out interactive voice by communication terminal or toy, can identify the voice messaging that maybe can carry out to obtain.

Further technical scheme of the present invention is: comprise described toy to the input voice identify conversion, described toy is carried out this speech recognition conversion result.

Technical scheme of the present invention is: make up the mutual toy control system of a kind of movement-based communicating terminal and internet voice, comprise and have the toy that communication connects, communication terminal with phonetic entry and speech recognition, the webserver with speech recognition conversion, described toy comprises the second wireless communication module that connects described communication terminal, described communication terminal comprises the voice-input unit of inputting voice, carry out the first wireless communication module that wireless telecommunications are connected with described toy, carry out the first speech conversion unit of speech recognition conversion and carry out the network connecting module that the internet is connected with the described webserver, the described webserver has the 3rd speech conversion unit of the voice messaging that receives being identified conversion process, described voice-input unit input voice, conversion identified in the parallel voice to input in the first speech conversion unit of described communication terminal and the 3rd speech conversion unit of the described webserver, by the described webserver, described communication terminal, described toy jointly carry out this speech recognition conversion result or by wherein any two carry out this speech recognition conversion results or any one carries out this speech recognition conversion result by described toy and described communication terminal.

Further technical scheme of the present invention is: described the 3rd speech conversion unit comprises sound identification module and semantic identification module, and described semantic identification module goes out the semanteme of described voice-input unit input voice according to the phonetic decision of described sound identification module identification.

Further technical scheme of the present invention is: described communication terminal or the described webserver all arrange the interactive voice storehouse of carrying out interactive voice, when input can not be identified voice messaging or inexecutable voice messaging, carry out interactive voice, can identify the voice messaging that maybe can carry out to obtain.

Further technical scheme of the present invention is: described toy has the second speech conversion unit that carries out speech recognition, and conversion identified in described the second speech conversion unit voice.

Further technical scheme of the present invention is: described communication terminal also comprises and wakes the wake module of state that described communication terminal enters the input voice up, and described wake module comprises any mode in phonetic order, button or the wireless signal.

Technique effect of the present invention is: by a kind of movement-based communicating terminal and mutual toy control method and the system of internet voice, comprise the toy that has communication and connect, have the communication terminal of phonetic entry, the webserver with speech recognition conversion.A kind of movement-based communicating terminal of the present invention and mutual toy control method and the system of internet voice, by communication terminal and the Internet, realize the identification of phonetic entry, call the content of the webserver by the Internet, realized that the content memory space is large, recognition result is effective and the upgrading in time of content.The present invention benefits from calculation process ability and the network communication ability that communication terminal is better than the toy terminal far away, so that the user can be by the communication terminal that extensive use develops on market at present, on its terminal platform, adopt speech recognition and the natural language understanding system that the intelligent interaction ability is stronger, the identification accuracy is higher finally to realize with the interaction of entity toy mutual, bring the user of the interactive experience of ultra-traditional toy scheme far away to experience, make the function of toy more powerful, simultaneously, greatly saved cost.

Description of drawings

Fig. 1 is flow chart of the present invention.

Fig. 2 is structural representation of the present invention.

Embodiment

Below in conjunction with specific embodiment, technical solution of the present invention is further specified.

As shown in Figure 1 and Figure 2, the specific embodiment of the present invention is: make up a kind of movement-based communicating terminal and the mutual toy control method of internet voice, comprise the toy 2 that has communication and connect, have the communication terminal 1 of phonetic entry, the webserver 3 with speech recognition conversion, the mutual toy control method of movement-based communicating terminal and internet voice comprises the steps:

Step 100: the input voice, that is: by described communication terminal 1 input voice;

Step 200: upload voice, that is: with described communication terminal 1 connecting Internet, and the voice messaging of input uploaded to the described webserver 3 by the Internet that described communication terminal 1 connects;

Step 300: speech recognition conversion, that is: conversion identified in the voice of described

communication terminal

1 and 3 side-by-side dockings of described webserver receipts, and this speech recognition conversion result is the form of instruction or instruction and parameter;

Step 400: carry out transformation result, that is: by the described webserver 3, described communication terminal 1, described toy 2 common this speech recognition conversion results of execution or by wherein any two carry out this speech recognition conversion results or any one carries out this speech recognition conversion result by described toy 2 and described communication terminal 1.

As shown in Figure 1 and Figure 2, specific implementation process of the present invention is: by communication terminal 1 input voice, communication terminal 1 uploads to the described webserver 3 with the voice that receive, the voice that described

communication terminal

1 and 3 side-by-side dockings of the described webserver are received are identified, voice identification result is changed again, and this speech recognition conversion result is the form of instruction or instruction and parameter.Speech recognition conversion result is carried out separately by toy 2, that is: the described webserver 3 instruction or instruction and the parameter that will identify conversion is sent to described communication terminal 1, described communication terminal 1 is set up wireless telecommunications with described toy 2 and is connected, then described communication terminal 1 is sent to described toy 2 with voice identification result, is carried out by described toy 2.If speech recognition conversion result comprises the control command of controlling toy 2, this control command of storage reaches the content that matches with phonetic order on the described toy, then described communication terminal 1 is sent to described toy 2 with voice identification result, is carried out the content of this instruction and call instruction by described toy 2.Speech recognition conversion result is carried out jointly by the described webserver 3 and toy 2, if on the webserver 3 storage corresponding with phonetic order in respective quadrature mutual information perhaps, the described webserver 3 according to the speech conversion call by result should be corresponding with phonetic order in perhaps the respective quadrature mutual information be sent to described toy 2 by described communication terminal 1, carry out these execution results by toy 2.Speech recognition conversion result is carried out jointly by described communication terminal 1 and toy 2, respective quadrature mutual information perhaps in if 1 storage of described communication terminal is corresponding with phonetic order, then by the described webserver 3 the speech recognition conversion result is sent to described communication terminal 1, carry out this speech recognition conversion result by described communication terminal 1, namely call this corresponding with phonetic order in respective quadrature mutual information perhaps, then be sent to described toy 2, institute's toy 2 is carried out this execution result.Speech recognition conversion result is carried out separately by described communication terminal 1, if communication terminal 1 has corresponding interaction content, then mobile communication is opened terminal and carried out this speech recognition conversion result, then plays back the independent execution of finishing communication terminal 1.For instruction and the content of control toy, such as music playing, tell a story, take off, rotation etc.Described speech recognition conversion result is instruction or instruction and parameter, carries out this instruction or instruction and parameter, such as, play " little swallow ", then " broadcast " is instruction, " little swallow " audio content is that content is as parameter.Speech recognition conversion result is carried out jointly by the described webserver 3, communication terminal 1, toy 3, such as: if the input phonetic order " how may I ask Beijing weather today? " then the webserver 3 is inquired about Beijing weather condition of today, then send to communication terminal 1, play by communication terminal, then the audio signal of playing is sent to toy 3 outputs, finished so the common execution by the described webserver 3, communication terminal 1, toy 3.In the specific embodiment, described content comprises one or more in audio content, the word content.In the specific embodiments of the invention, described communication terminal 1 comprises that also waking described communication terminal 1 up receives the wake-up step of inputting voice status, in the described wake-up step, realizes waking up by the input phonetic order or by button.Conversion identified in the voice that comprise 1 pair of input of described communication terminal, and conversion identified in described communication terminal 1 and the described webserver 3 parallel voice to input, to obtain first the speech recognition conversion result for obtaining the result.Because communication terminal 1 has larger storage capacity, therefore, its content library can be larger, the content that can store more phonetic order and match with phonetic order in communication terminal 1.

In the specific embodiment, the described webserver 3 walks abreast with described communication terminal 1 voice messaging is identified conversion.The speech recognition conversion result of described communication terminal and the described webserver includes the confidence level of speech recognition conversion.So-called confidence level also is confidence level.It refers to that particular individual treats the degree that the particular proposition authenticity is believed, namely probability is to measure individual conviction is rational. the confidence level of probability is explained and is shown, event itself is what probability not, and why event assigns probability is the conviction evidence that has in people's brains of assign probabilities.Confidence level refers to that the population parameter value drops on the probability in a certain district of sample statistics value; And confidential interval refers under a certain confidence level, error range between sample statistics value and population parameter value.Confidential interval is larger, and confidence level is higher.The confidence level of speech recognition conversion is namely to the degree of faith of speech recognition conversion real result.The described webserver 3 and described communication terminal 1 are parallel when voice messaging is identified conversion, described communication terminal 1 arranges speech recognition conversion result's confidence threshold value, when described communication terminal 1 speech recognition conversion result's confidence level during more than or equal to this confidence threshold value, get this speech recognition conversion result, if described webserver speech recognition conversion result's confidence level during less than this confidence threshold value, is got the speech recognition conversion result of higher value in described communication terminal 1 speech recognition conversion result's confidence level and the described webserver 3 speech recognition conversion results' the confidence level.

In the specific embodiment, described communication terminal 1 comprises mobile phone, mobile panel computer, mobile communication amusement equipment.Described communication terminal 1 and the described webserver 3 all arrange the interactive voice storehouse of carrying out interactive voice, when input can not be identified voice messaging or inexecutable voice messaging, carry out interactive voice by communication terminal 1 or toy 2, can identify the phonetic order that maybe can carry out to obtain.If voice identification result comprises interactive information, then call in the interactive voice storehouse corresponding interactive information and be sent to described toy 2 by described communication terminal 1, play this interactive information by toy 2 and realize interactive voice.Interactive information, such as, be by interactive voice " song whether Wang Fei is arranged ", then the described webserver 3 obtains Query Result and is " having " or " nothing " by inquiry, this Query Result " has " or " nothing " then is corresponding interactive information.In addition, when input can not be identified voice messaging or inexecutable voice messaging, input voice and carry out interactive voice, the voice messaging that can carry out to obtain the described webserver 3 or communication terminal 1 or toy 2 by described communication terminal 1.Such as, in input during " start " voice messaging, if may be owing to aphthenxia Chu or excessive with the received pronunciation difference, when causing identifying, can call the interactive voice information bank and point out and input again voice.For another example, in input " opening now story ", at this moment, the possible webserver 3 or communication terminal 1 or toy 2 can not be converted to control command with this phonetic order, at this moment, need to replenish input voice information, such as, call the interactive information storehouse " you want to listen a story? " replenishing of phonetic order information finished in like this interactive voice prompting, realizes can controlling toy with natural-sounding.

As shown in Figure 1, preferred implementation of the present invention is: also comprise according to the identification scene making up semantic knowledge-base, described semantic knowledge-base comprises the semantic attribute of words.Making up semantic knowledge-base is the primary condition of semantic identification, and some words are made up its knowledge base, defines its semantic attribute.Such as: " Liu Dehua ", its knowledge base comprises: man, Hong Kong native, singer, performer, its semantic attribute is " amusement personage "." raining ", then is a kind of weather condition, weather forecast, and its semantic attribute is " weather ".In the speech conversion step, also comprise according to the speech conversion result and carry out semantic conversion.Specifically comprise:

Step 10: participle and semantic disambiguation, that is: according to the semantic attribute of knowledge base words voice identification result is carried out the participle lang justice disambiguation of going forward side by side.Detailed process is as follows: according to the semantic attribute of words in knowledge base, voice identification result is carried out participle or disambiguation, such as: voice identification result for " tomorrow can rain in Beijing? " semantic attribute participle according to the knowledge base words is " tomorrow ", " Beijing ", " meeting ", " raining ", " ", " tomorrow " is time attribute, " Beijing " is site attribute, " meeting " is verb, and " raining " is the weather attribute, and " " is for puing question to.In some cases, need disambiguation, such as " song of Liu Dehua ", may be identified as " clear must be sliding ", but through the definition of knowledge base to " Liu Dehua ", analyze and be judged as " Liu Dehua ".This belongs to the semantic attribute disambiguation according to the knowledge base words.

Step 20: intention classification and parameter extraction, that is: the result of participle and semantic disambiguation is intended to classification, the line parameter of going forward side by side extracts.Such as: voice identification result for " tomorrow can rain in Beijing? " result according to participle and semantic disambiguation is intended to classification, and its intention class is " inquiry weather ", and extracting parameter is: the place is Beijing, and the time is tomorrow.Like this to " tomorrow can rain in Beijing? " carried out semantic conversion.

Detailed process is as follows: the input voice are " It's lovely day? " at first, carry out speech recognition, output recognition result is " It's lovely day? " then according to voice identification result, carry out Semantic judgement, according to Semantic judgement be: the weather condition of broadcasting this ground today.For another example: phonetic entry is: " I want to listen the music of Wang Fei ", final semantic discriminance analysis obtains user's be intended to " played songs ", and parameter is " Wang Fei ", then according to analysis result, calls the playback of songs function and play-overs the song of Wang Fei.Because adopt semantic identification is arranged, the user does not need to remember the voice control command of fixing, but the language performance that can adopt the user oneself to be accustomed to most comes and toy interaction.So to a upper intention, the user also can say " please help me to look for the song of Wang Fei ", " the up-to-date special edition of Wang Fei is arranged? ", " Wang Fei perverse ", that is to say, the user can freely express order and the intention of oneself, powerful speech recognition and semantic understanding engine on the portable terminal, can extraordinaryly identify user's real intention: play the song of Wang Fei, or play a certain song of Wang Fei.So, allow the better free, interesting alternately of intelligent toy and user, and do not increase the direct hardware cost of original toy terminal, allow toy manufacturer can use lower cost, but realized high performance man-machine interaction effect.

As shown in Figure 1 and Figure 2, preferred implementation of the present invention is: in the speech recognition conversion step, comprise that the voice of 2 pairs of inputs of described toy are changed.The function of toy 2 concrete standby speech recognition conversion own, simultaneously, instruction and content library are set, and for simple voice, conversion identified in the speech recognition conversion module of toy 2 and described communication terminal 1 and the described webserver 3 parallel voice to input.

As shown in Figure 1 and Figure 2, preferred implementation of the present invention is: described toy 2 and described communication terminal 1 are by any is connected in infrared communication assembly, high frequency modulated communication part, bluetooth communication assembly, 2.4G wireless communication assembly, the RFID radio-frequency communication assembly.On the described toy wireless communication receiver is set, in the art of this patent scheme, the second wireless communication module 21 on the described toy 2 is infrared signal receiver, the bluetooth communication assembly, in RFID radio-frequency communication assembly and the 2.4G wireless communication assembly one or more, the first wireless communication module 11 of described communication terminal 1 is Infrared Projector, the bluetooth communication assembly, in RFID radio-frequency communication assembly and the 2.4G wireless communication assembly any one or more, instruction after described communication terminal 1 will be changed by wireless communication signal or instruction and parameter send to toy 2, carry out this instruction or instruction and parameter by toy 2.

As shown in Figure 2, the specific embodiment of the present invention is: the toy speech control system that makes up a kind of movement-based communicating terminal 1 and the Internet, comprise and have the toy 2 that communication connects, communication terminal 1 with phonetic entry and speech recognition, the webserver 3 with speech recognition conversion, described toy 2 comprises the second wireless communication module 21 that connects described communication terminal 1, described communication terminal 1 comprises the voice-input unit 15 of inputting voice, carry out the first wireless communication module 11 that wireless telecommunications are connected with described toy 2, carry out the first speech conversion unit 13 of speech recognition conversion and carry out the network connecting module 12 that the internet is connected with the described webserver 3, the described webserver 3 has the 3rd speech conversion unit 31 of the voice messaging that receives being identified conversion process, described voice-input unit 15 input voice, conversion is identified with the 3rd speech conversion unit 31 parallel voice to input of the described webserver 3 in the first speech conversion unit 13 of described communication terminal 1, by the described webserver 3, described communication terminal 1, described toy 2 common this speech recognition conversion results of execution or by wherein any two carry out this speech recognition conversion results or any one carries out this speech recognition conversion result by described toy 2 and described communication terminal 1.

As shown in Figure 2, specific implementation process of the present invention is: by communication terminal 1 input voice, network connecting module 12 connecting Internets of communication terminal 1, then the voice that receive are uploaded to the described webserver 3 by the Internet, the voice that the first speech conversion unit 32 side-by-side dockings of the first speech conversion unit 13 of described communication terminal 1 and the described webserver 3 are received are identified, voice identification result is changed again, and this speech recognition conversion result is the form of instruction or instruction and parameter.Speech recognition conversion result is carried out separately by toy 2, that is: the described webserver 3 instruction or instruction and the parameter that will identify conversion is sent to described communication terminal 1, described communication terminal 1 first wireless communication module 11 and described toy 2 be connected wireless communication module 21 and set up wireless telecommunications and be connected, then described communication terminal 1 is sent to described toy 2 with voice identification result, is carried out by described toy 2.If speech recognition conversion result comprises the control command of controlling toy 2, this control command of storage reaches the content that matches with phonetic order on the described toy, then described communication terminal 1 is sent to described toy 2 with voice identification result, is carried out the content of this instruction and call instruction by described toy 2.Speech recognition conversion result is carried out jointly by the described webserver 3 and toy 2, if on the webserver 3 storage corresponding with phonetic order in respective quadrature mutual information perhaps, the described webserver 3 according to the speech conversion call by result should be corresponding with phonetic order in perhaps the respective quadrature mutual information be sent to described toy 2 by described communication terminal 1, carry out these execution results by toy 2.Speech recognition conversion result is carried out jointly by described communication terminal 1 and toy 2, respective quadrature mutual information perhaps in if 1 storage of described communication terminal is corresponding with phonetic order, then by the described webserver 3 the speech recognition conversion result is sent to described communication terminal 1, carry out this speech recognition conversion result by described communication terminal 1, namely call this corresponding with phonetic order in respective quadrature mutual information perhaps, then be sent to described toy 2, institute's toy 2 is carried out this execution result.Speech recognition conversion result is carried out separately by described communication terminal 1, if communication terminal 1 has corresponding interaction content, then mobile communication is opened terminal and carried out this speech recognition conversion result, then plays back the independent execution of finishing communication terminal 1.For instruction and the content of control toy, such as music playing, tell a story, take off, rotation etc.Described speech recognition conversion result is instruction or instruction and parameter, carries out this instruction or instruction and parameter, such as, play " little swallow ", then " broadcast " is instruction, " little swallow " audio content is that content is as parameter.Speech recognition conversion result is carried out jointly by the described webserver 3, communication terminal 1, toy 3, such as: if the input phonetic order " how may I ask Beijing weather today? " then the webserver 3 is inquired about Beijing weather condition of today, then send to communication terminal 1, play by communication terminal, then the audio signal of playing is sent to toy 3 outputs, finished so the common execution by the described webserver 3, communication terminal 1, toy 3.In the specific embodiment, described content comprises one or more in audio content, the word content.In the specific embodiments of the invention, described communication terminal 1 comprises that also waking described communication terminal 1 up enters the wake module 14 that receives the input voice status, and described wake module 14 realizes waking up by the input phonetic order or by button.Conversion identified in the voice that comprise 1 pair of input of described communication terminal, and conversion identified in described communication terminal 1 and the described webserver 3 parallel voice to input, to obtain first the speech recognition conversion result for obtaining the result.Because communication terminal 1 has larger storage capacity, therefore, its content library can be larger, the content that can store more phonetic order and match with phonetic order in communication terminal 1.

In the specific embodiment, described communication terminal 1 comprises mobile phone, mobile panel computer, mobile communication amusement equipment.Described communication terminal 1 and the described webserver 3 all arrange the interactive voice storehouse of carrying out interactive voice, when input can not be identified voice messaging or inexecutable voice messaging, carry out interactive voice, the voice messaging that can carry out to obtain the described webserver or described communication terminal.If voice identification result comprises interactive information, then call in the interactive voice storehouse corresponding interactive information and be sent to described toy 2 by described communication terminal 1, play this interactive information by toy 2 and realize interactive voice.Interactive information, such as, be by interactive voice " song whether Wang Fei is arranged ", then the described webserver 3 obtains Query Result and is " having " or " nothing " by inquiry, this Query Result " has " or " nothing " then is corresponding interactive information.In addition, when input can not be identified voice messaging or inexecutable voice messaging, input voice and carry out interactive voice, the voice messaging that can carry out to obtain the described webserver 3 or communication terminal 1 or toy 2 by described communication terminal 1.Such as, in input during " start " voice messaging, if may be owing to aphthenxia Chu or excessive with the received pronunciation difference, when causing identifying, can call the interactive voice information bank and point out and input again voice.For another example, in input " opening now story ", at this moment, the possible webserver 3 or communication terminal 1 or toy 2 can not be converted to control command with this phonetic order, at this moment, need to replenish input voice information, such as, call the interactive information storehouse " you want to listen a story? " replenishing of phonetic order information finished in like this interactive voice prompting, realizes can controlling toy with natural-sounding.

As shown in Figure 2, preferred implementation of the present invention is: described speech conversion unit 32 comprises and also comprises semantic identification module 322, and described semantic identification module 322 goes out the semanteme of described voice-input unit 15 input voice according to the phonetic decision of described sound identification module 321 identifications.Such as, described voice-input unit 15 inputs that voice are " It's lovely day? " at first, carry out speech recognition, export that recognition result is " It's lovely day? " then according to voice identification result, carry out Semantic judgement, described semantic identification module 322 according to Semantic judgement is: the weather condition of broadcasting this ground today.For another example: such as, described voice-input unit 15 inputs that voice are " It's lovely day? " at first, carry out speech recognition, export that recognition result is " It's lovely day? " then described semantic identification module 322 is according to voice identification result, carry out Semantic judgement, according to Semantic judgement be: the weather condition of broadcasting this ground today.For another example: phonetic entry is: " I want to listen the music of Wang Fei ", described semantic identification module 322 semantic discriminance analysiss obtain user's be intended to " played songs ", parameter is " Wang Fei ", then according to analysis result, calls the playback of songs function and play-overs the song of Wang Fei.Because adopt semantic identification is arranged, the user does not need to remember the voice control command of fixing, but the language performance that can adopt the user oneself to be accustomed to most comes and toy interaction.So to a upper intention, the user also can say " please help me to look for the song of Wang Fei ", " the up-to-date special edition of Wang Fei is arranged? ", " Wang Fei perverse ", that is to say, the user can freely express order and the intention of oneself, powerful speech recognition and semantic understanding engine on the portable terminal, can extraordinaryly identify user's real intention: play the song of Wang Fei, or play a certain song of Wang Fei.So, allow the better free, interesting alternately of intelligent toy and user, and do not increase the direct hardware cost of original toy terminal, allow toy manufacturer can use lower cost, but realized high performance man-machine interaction effect.

As shown in Figure 2, preferred implementation of the present invention is: conversion identified in the voice that comprise 2 pairs of inputs of described toy, and described toy 2 is carried out this speech recognition conversion result.Described toy 2 comprises the second speech conversion unit 23 that carries out speech recognition conversion, and conversion identified in 23 pairs of voice in the second speech conversion unit of described toy 2.Simultaneously, described toy 2 arranges the content that matches with phonetic order, for simple voice, itself identifies conversion by toy 2, is then carried out by toy 2.When communication terminal 1 can not carry out phonetic entry and identification, input the voice of voice or 1 transmission of reception communication terminal and identify conversion by described toy 2, this speech recognition conversion result is carried out by described toy 2.So just make toy 2 concrete certain abilities that work independently, overcome the dependence of 2 pairs of communication terminals of toy, made things convenient for the use of toy 2.In the specific embodiment, the described content that matches with phonetic order comprises one or more in audio content, the word content.

Preferred implementation of the present invention is: described communication terminal 1 has the memory cell that the storaged voice instruction reaches the content that matches with phonetic order.When toy 2 is operated, comprise the content of operational order or instruction and instruction indication, such as, play " little swallow ", then " broadcast " is instruction, " little swallow " audio content is that content is as parameter.Because communication terminal 1 has larger storage capacity, therefore, its content library can be larger, the content that can store more phonetic order and match with phonetic order in communication terminal 1.

As shown in Figure 1 and Figure 2, preferred implementation of the present invention is: described toy 2 is connected by infrared signal, high frequency modulated communication signal, Bluetooth signal, 2.4G wireless communication signal, RFID radiofrequency signal with described communication terminal 1.On the described toy wireless communication receiver is set, in the art of this patent scheme, wireless communication mode comprises infrared signal, the high frequency modulated communication signal, Bluetooth signal, 2.4G wireless communication signal, in the RFID radiofrequency signal one or more, the second wireless communication module 21 relative set infrared signal receivers on the described toy 2, the high frequency modulated communication signal receiver, the Bluetooth signal receiver, 2.4G wireless communication signal receiver, in the RFID radiofrequency signal receiving unit one or more, 12 of described communication terminal 1 first wireless communication modules are infrared signal, the high frequency modulated communication signal, Bluetooth signal, 2.4G wireless communication signal, in the RFID emission of radio frequency signals assembly one or more, instruction after described communication terminal 1 will be changed by wireless communication signal or instruction and parameter send to toy 2, carry out this instruction or instruction and parameter by toy 2.

Technique effect of the present invention is: by toy sound control method and the system of a kind of movement-based communicating terminal 1 and the Internet, comprise the toy 2 that has communication and connect, have the communication terminal 1 of phonetic entry, the webserver 3 with speech recognition conversion, the described webserver 3 has the storaged voice instruction and the memory cell 31 of the content that matches with phonetic order.Toy sound control method and the system of a kind of movement-based communicating terminal 1 of the present invention and the Internet, by communication terminal 1 and the Internet, realize the identification of phonetic entry, call the content of the webserver 3 by the Internet, realized that the content memory space is large, recognition result is effective and the upgrading in time of content, make the function of toy more powerful, simultaneously, greatly saved cost.

Above content is in conjunction with concrete preferred implementation further description made for the present invention, can not assert that implementation of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims

1. a movement-based communicating terminal and the mutual toy control method of internet voice, it is characterized in that, comprise the toy that has communication and connect, have the communication terminal of phonetic entry and speech recognition, the webserver with speech recognition conversion, the mutual toy control method of movement-based communicating terminal and internet voice comprises the steps:

Input voice: by described communication terminal input voice;

2. according to claim 1 described movement-based communicating terminal and the mutual toy control method of internet voice, it is characterized in that, also comprise according to the identification scene and make up semantic knowledge-base, described semantic knowledge-base comprises the semantic attribute of words, in the speech recognition conversion step, also comprise and carry out semanteme identification conversion, specifically comprise the steps:

3. according to claim 1 described movement-based communicating terminal and the mutual toy control method of internet voice, it is characterized in that, the speech recognition conversion result of the described webserver and described communication terminal includes the confidence level of speech recognition conversion, described communication terminal arranges speech recognition conversion result's confidence threshold value, when described communication terminal speech recognition conversion result's confidence level during more than or equal to this confidence threshold value, get this speech recognition conversion result, if described communication terminal speech recognition conversion result's confidence level during less than this confidence threshold value, is got the speech recognition conversion result of higher value in described webserver speech recognition conversion result's confidence level and described communication terminal speech recognition conversion result's the confidence level.

4. according to claim 1 described movement-based communicating terminal and the mutual toy control method of internet voice, it is characterized in that, also comprise and wake described communication terminal up, wake described communication terminal up by any mode in phonetic order, button or the wireless signal and make described communication terminal enter the state of input voice.

5. according to claim 1 described movement-based communicating terminal and the mutual toy control method of internet voice, it is characterized in that, when input can not be identified voice messaging or inexecutable voice messaging, carry out interactive voice by communication terminal or toy, can identify the voice messaging that maybe can carry out to obtain.

6. according to claim 1 described movement-based communicating terminal and the mutual toy control method of internet voice is characterized in that, comprise that described toy identifies conversion to the voice of input, and described toy is carried out this speech recognition conversion result.

7. a movement-based communicating terminal and the mutual toy control system of internet voice, it is characterized in that, comprise and have the toy that communication connects, communication terminal with phonetic entry and speech recognition, the webserver with speech recognition conversion, described toy comprises the second wireless communication module that connects described communication terminal, described communication terminal comprises the voice-input unit of inputting voice, carry out the first wireless communication module that wireless telecommunications are connected with described toy, carry out the first speech conversion unit of speech recognition conversion and carry out the network connecting module that the internet is connected with the described webserver, the described webserver has the 3rd speech conversion unit of the voice messaging that receives being identified conversion process, described voice-input unit input voice, conversion identified in the parallel voice to input in the first speech conversion unit of described communication terminal and the 3rd speech conversion unit of the described webserver, by the described webserver, described communication terminal, described toy jointly carry out this speech recognition conversion result or by wherein any two carry out this speech recognition conversion results or any one carries out this speech recognition conversion result by described toy and described communication terminal.

8. the mutual toy control system of described movement-based communicating terminal and internet voice according to claim 7, it is characterized in that, described the 3rd speech conversion unit comprises sound identification module and semantic identification module, and described semantic identification module goes out the semanteme of described voice-input unit input voice according to the phonetic decision of described sound identification module identification.

9. the mutual toy control system of described movement-based communicating terminal and internet voice according to claim 7, it is characterized in that, the speech recognition conversion result of the described webserver and described communication terminal includes the confidence level of speech recognition conversion, described communication terminal arranges speech recognition conversion result's confidence threshold value, when described communication terminal speech recognition conversion result's confidence level during more than or equal to this confidence threshold value, get this speech recognition conversion result, if described communication terminal speech recognition conversion result's confidence level during less than this confidence threshold value, is got the speech recognition conversion result of higher value in described webserver speech recognition conversion result's confidence level and described communication terminal speech recognition conversion result's the confidence level.

10. the mutual toy control system of described movement-based communicating terminal and internet voice according to claim 7 is characterized in that, described toy has the second speech conversion unit that carries out speech recognition, and conversion identified in described the second speech conversion unit voice.

11. the mutual toy control system of described movement-based communicating terminal and internet voice according to claim 7, it is characterized in that, described communication terminal also comprises and wakes the wake module of state that described communication terminal enters the input voice up, and described wake module comprises any mode in phonetic order, button or the wireless signal.