CN104536978A

CN104536978A - Voice data identifying method and device

Info

Publication number: CN104536978A
Application number: CN201410736576.4A
Authority: CN
Inventors: 丁小燕
Original assignee: Chery Automobile Co Ltd
Current assignee: Chery Automobile Co Ltd
Priority date: 2014-12-05
Filing date: 2014-12-05
Publication date: 2015-04-22

Abstract

The invention discloses a voice data identifying method and device and belongs to the technical field of vehicle-mounted voice identification. The voice data identifying method includes that receiving the voice data to be identified input by a user, sending the voice data to be identified to a voice identifying server, recording the send time of the voice data to be identified, dividing the voice data to be identified into a plurality of data segments with preset time length according to the sampling time of the voice data to be identified, detecting the matching degree between the voice characteristic information of each data segment and pre-stored reference information, confirming first reference information matched with the first voice characteristic information in the voice data to be identified, acquiring a first control command corresponding to the voice data to be identified based on the confirmed first reference information, if the identification information sent from the voice identifying server is not received, confirming that the first control command is the identification result of the voice data to be identified. The voice data identifying method and device are capable of improving the voice data identifying flexibility.

Description

Identify the method and apparatus of speech data

Technical field

The present invention relates to vehicle-mounted voice recognition technology field, particularly a kind of method and apparatus identifying speech data.

Background technology

Along with the fast development of automotive electronic technology, car entertainment function is more and more abundanter, and the process operated on it also becomes increasingly complex, and manually operates and controls each amusement function, driver can be made to divert attention when steering vehicle, traffic safety is on the hazard.Exploration on Train Operation Safety can be solved to a certain extent by speech recognition technology.

Normally used speech recognition technology is the speech recognition technology based on local instruction type, namely multiple instruction word is set in advance in this locality, when driver needs a certain function starting vehicle, to the speech data of speech ciphering equipment input command adapted thereto word, when this speech ciphering equipment receives the speech data of this instruction word, after speech data is converted to Word message, this Word message and the local instruction word stored are contrasted, if the local instruction word stored comprises this Word message, then determine the instruction word that this speech data is corresponding, as recognition result, and then can export and respond this recognition result.

Realizing in process of the present invention, inventor finds that prior art at least exists following problem:

Due to the speech recognition technology based on local instruction type, be merely able to identify the speech data of the instruction word preset, driver is so just needed to remember a large amount of instruction word, if driver have input the speech data of non-instruction word, then cannot obtain recognition result by said method, make the dirigibility that identifies speech data poor like this.

Summary of the invention

In order to solve the problem of prior art, embodiments provide a kind of method and apparatus identifying speech data.Described technical scheme is as follows:

First aspect, provide a kind of method identifying speech data, described method comprises:

Receive the speech data to be identified of user's input, described speech data to be identified is sent to speech recognition server, records the transmitting time of described speech data to be identified;

According to the sampling time of described speech data to be identified, described speech data to be identified is divided into the data segment of multiple preset duration, the voice characteristics information of each data segment obtained and the reference information prestored are carried out matching detection, determine and the first reference information that the first voice characteristics information in described speech data to be identified matches, based on the first reference information determined, obtain the first steering order that described speech data to be identified is corresponding;

If from described transmitting time, in preset duration, do not receive the identification message carrying the second steering order that described speech recognition server sends, then described first steering order is defined as the recognition result of described speech data to be identified; If from described transmitting time, in preset duration, receive the identification message carrying the second steering order that described speech recognition server sends, then described second steering order is defined as the recognition result of described speech data to be identified.

Alternatively, described method also comprises:

According to the matching degree of described first voice characteristics information and described first reference information, obtain the degree of confidence of described first steering order;

If described from described transmitting time, in preset duration, do not receive the identification message carrying the second steering order that described speech recognition server sends, then described first steering order be defined as the recognition result of described speech data to be identified, comprise:

If from described transmitting time, in preset duration, do not receive the identification message carrying the second steering order that described speech recognition server sends, and the degree of confidence of described first steering order is not less than default confidence threshold value, then described first steering order is defined as the recognition result of described speech data to be identified.

Alternatively, described method also comprises:

If from described transmitting time, in preset duration, do not receive the identification message carrying the second steering order that described speech recognition server sends, but the degree of confidence of described first steering order is less than default confidence threshold value, then send the cue of described speech data recognition failures to be identified.

Alternatively, the speech data to be identified of described reception user input, comprising:

When receiving phonetic entry request, receive the speech data of user's input, when user stops the duration after inputting to reach default reception duration threshold value, before user being stopped inputting, the speech data of input is defined as speech data to be identified.

Alternatively, described method also comprises:

Send the cue of the recognition result of the speech data described to be identified determined.

Second aspect, provide a kind of device identifying speech data, described device comprises:

Transceiver module, for receiving the speech data to be identified of user's input, sending to speech recognition server by described speech data to be identified, recording the transmitting time of described speech data to be identified;

First acquisition module, for the sampling time according to described speech data to be identified, described speech data to be identified is divided into the data segment of multiple preset duration, the voice characteristics information of each data segment obtained and the reference information prestored are carried out matching detection, determine and the first reference information that the first voice characteristics information in described speech data to be identified matches, based on the first reference information determined, obtain the first steering order that described speech data to be identified is corresponding;

Determination module, if for from described transmitting time, in preset duration, do not receive the identification message carrying the second steering order that described speech recognition server sends, then described first steering order is defined as the recognition result of described speech data to be identified; If from described transmitting time, in preset duration, receive the identification message carrying the second steering order that described speech recognition server sends, then described second steering order is defined as the recognition result of described speech data to be identified.

Alternatively, described device also comprises the second acquisition module, for:

Described determination module, for:

Alternatively, described device also comprises the first reminding module, for:

Alternatively, described transceiver module, for:

Alternatively, described device also comprises the second reminding module, for:

The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:

During the embodiment of the present invention provides, receive the speech data to be identified of user's input, speech data to be identified is sent to speech recognition server, record the transmitting time of speech data to be identified, according to the sampling time of speech data to be identified, speech data to be identified is divided into the data segment of multiple preset duration, the voice characteristics information of each data segment obtained and the reference information prestored are carried out matching detection, determine and the first reference information that the first voice characteristics information in speech data to be identified matches, based on the first reference information determined, obtain the first steering order that speech data to be identified is corresponding, if from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, then the first steering order is defined as the recognition result of speech data to be identified, if from transmitting time, in preset duration, receive the identification message carrying the second steering order that speech recognition server sends, then the second steering order is defined as the recognition result of speech data to be identified, like this, the recognition method of the semantics recognition mode of this locality and speech recognition server can be combined, obtain the recognition result of often kind of recognition method respectively, therefrom choose the recognition result that a recognition result is defined as speech data to be identified, and do not need user to remember a large amount of instruction word, thus, the dirigibility that speech data is identified can be improved.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is a kind of method flow diagram identifying speech data that the embodiment of the present invention provides;

Fig. 2 is the structural representation of a kind of system that the embodiment of the present invention provides;

Fig. 3 is a kind of apparatus structure schematic diagram identifying speech data that the embodiment of the present invention provides;

Fig. 4 is the structural representation of a kind of speech recognition apparatus that the embodiment of the present invention provides.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.

Embodiment one

Embodiments provide a kind of method identifying speech data, as shown in Figure 1, the treatment scheme in the method can comprise following step:

Step 101, receives the speech data to be identified of user's input, speech data to be identified is sent to speech recognition server, records the transmitting time of speech data to be identified.

Step 102, according to the sampling time of speech data to be identified, speech data to be identified is divided into the data segment of multiple preset duration, the voice characteristics information of each data segment obtained and the reference information prestored are carried out matching detection, determine and the first reference information that the first voice characteristics information in speech data to be identified matches, based on the first reference information determined, obtain the first steering order that speech data to be identified is corresponding.

Step 103, if from transmitting time, in preset duration, does not receive the identification message carrying the second steering order that speech recognition server sends, then the first steering order is defined as the recognition result of speech data to be identified; If from transmitting time, in preset duration, receive the identification message carrying the second steering order that speech recognition server sends, then the second steering order is defined as the recognition result of speech data to be identified.

Embodiment two

Embodiments provide a kind of method identifying speech data, the method can be realized by speech recognition apparatus.Speech recognition apparatus wherein can be the arbitrary equipment with speech identifying function.

Below in conjunction with embodiment, be described in detail the treatment scheme shown in Fig. 1, content can be as follows:

In force, speech recognition and treatment technology obtain common concern at the man-machine interface of infotech, its application in electronic product makes the life of people become more excellent, pass through voice command, people can make the corresponding operating of its voice responsive instruction by control system equipment, speech recognition can be applied to multiple field, and such as speech recognition technology is applied on vehicular platform, and it can make to seem more flexibly simply to the driving of automobile, safety and comfort more.In the embodiment of the present invention, carry out the detailed description of scheme for the speech recognition of vehicular platform, similar for the situation being applied to other field, do not repeat them here.

Along with the development of automobile industry and the universal of automobile, people are to the security of automobile.Convenience is had higher requirement, like this, the function of adding in automobile gets more and more, more and more intelligent, vehicle-mounted voice has become the important component part of onboard system, the various functions of onboard system can be controlled by the voice of user, particularly, as shown in Figure 2, the equipment for identifying speech data can be provided with in automobile, speech recognition button can be provided with in this equipment, when user needs to carry out a certain operation, such as, start navigating instrument to navigate, now, user can click speech recognition button, this equipment generates phonetic entry request, open the microphone of this equipment, open successfully, cue can be sent by loudspeaker, prompting user input voice data, user can to this equipment input voice, this equipment receives this voice, these voice are simulating signal, by microphone, this simulating signal can be converted to digital signal, after user has inputted, button can have been clicked, now, if because the sound of user is too small, this equipment cannot receive speech data, then this equipment can send the cue that speech data takes defeat, if the speech data that this equipment receives, then the speech data received can be defined as speech data to be identified, in order to make the recognition result of speech data to be identified more accurate, this equipment can by the wireless communication devices of self, speech data to be identified is sent to speech recognition server, when receiving speech data to be identified to make speech recognition server, it is identified, this equipment is when sending speech data to be identified to speech recognition server, the transmitting time of speech data to be identified can be recorded.

Alternatively, the processing mode of the speech data to be identified of above-mentioned reception user input can be varied, a kind of optional processing mode is below provided, specifically can comprise following content: when receiving phonetic entry request, receive the speech data of user's input, when user stops the duration after inputting to reach default reception duration threshold value, before user being stopped inputting, the speech data of input is defined as speech data to be identified.

In force, user can click speech recognition button, this equipment generates phonetic entry request, this equipment opens the microphone of this equipment by this phonetic entry request, open successfully, the voice prestored can be play, prompting user input voice data, user can to this equipment input speech data, this equipment can receive this speech data, in order to determine the end time of user input voice data, duration threshold value (namely receiving duration threshold value) can be pre-set, when user stops the duration after inputting to reach reception duration threshold value, before user can being stopped inputting, the speech data of input is defined as speech data to be identified.

Wherein, the first voice characteristics information can be arbitrary voice characteristics information, and the first reference information can be any reference information.

In force, the said equipment can carry out pre-service to speech data to be identified, such as, (sample frequency can be 10KHz or 16KHz etc.) is sampled to speech data to be identified, anti-confusion filtering, remove the process such as glottal excitation and noise effect, then, this equipment can carry out feature extraction to the speech data to be identified after process, the effect of feature extraction is from the waveform of speech data, extract the parameter that one or more groups can describe speech data feature, as average energy, Zero-crossing Number, resonance peak, cepstrum, linear predictor coefficient etc., to carry out follow-up voice training and identification, and the selection direct relation of parameter the height of speech recognition apparatus discrimination, detailed process can be: voice signal can see the signal of short-term stationarity usually as, such as can think within the time period (as 10-20ms) of presetting, its spectral characteristic and some physical features parameter can regard constant as approx, so just can adopt the analysis and processing method of stationary process, speech data to be identified is processed, be specifically as follows: data segment speech data to be identified being separated into multiple preset duration, end-point detection can be carried out to each data segment, end-point detection just refers to from the one piece of data comprising voice, determine voice starting point and end point, multiple reference information can be previously stored with in this equipment, this reference information is that the speech data by obtaining above-mentioned process carries out training in a large number obtaining, this equipment can obtain the voice characteristics information of each data segment, and itself and the reference information prestored are carried out matching detection, obtain the reference information matched with voice characteristics information, if the first voice characteristics information in speech data to be identified and the first reference information match, then this equipment can carry out semantic understanding based on the first reference information, thus obtain the recognition result that this equipment identifies speech data to be identified, this equipment can pass through this recognition result, generate corresponding steering order (i.e. the first steering order).

This equipment can also be identified speech data to be identified by alternate manner, such as based on channel model and phonic knowledge method, utilize the method etc. of artificial neural network, said method can be processed by the mode of prior art, does not repeat them here.

Speech recognition server can be identified speech data to be identified by said method, because the amount of the reference information prestored in speech recognition server is larger, be far longer than the amount of the reference information stored in this equipment, therefore, the recognition result that usual speech recognition server identifies speech data to be identified is more accurate, its concrete processing procedure see above-mentioned related content, can not repeat them here.

Alternatively, because this equipment identifies speech data to be identified, the recognition result obtained may be inaccurate, can be described by the accuracy of some mode to this recognition result, corresponding processing mode can be varied, a kind of optional processing mode is below provided, following content can be comprised: according to the matching degree of the first voice characteristics information and the first reference information, obtain the degree of confidence of the first steering order.

In force, because speech data can be subject to the impacts such as noise, the voice characteristics information of speech data and reference information be there are differences, after the said equipment determines the first reference information that the first voice characteristics information is corresponding, the matching degree of the first voice characteristics information and the first reference information can be calculated, this matching degree can be have characteristic ratio in same characteristic features and the first voice characteristics information with the first reference information, and this equipment can using the degree of confidence of this ratio as the first steering order.

In force, the situation of speech recognition server cannot be connected to owing to often running into the said equipment, like this, the recognition result that speech recognition server cannot be identified in time sends to this equipment, in order to can in time to the recognition result of user feedback speech data to be identified, namely corresponding steering order is performed, can pre-set with transmitting time is the certain time length of sart point in time, if this equipment is in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, then the first steering order can be defined as the recognition result of speech data to be identified by this equipment, if in preset duration, receive the identification message carrying the second steering order that speech recognition server sends, then the second steering order can be defined as the recognition result of speech data to be identified by this equipment.Wherein, if in preset duration, receive the identification message carrying the second steering order that speech recognition server sends, this equipment also can pass through some recognition result system of selection, from the first steering order and the second steering order, select a steering order as the recognition result of speech data to be identified.

Alternatively, for the situation of recognition result in above-mentioned steps 103, first steering order being defined as speech data to be identified, whether this equipment can also can be done further to judge as the recognition result of speech data to be identified to the first steering order by degree of confidence, specifically can comprise following content: if from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, and the degree of confidence of the first steering order is not less than default confidence threshold value, then the first steering order is defined as the recognition result of speech data to be identified.

In force, the confidence threshold value of the steering order that this equipment is determined can be pre-set in the said equipment, if from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, then this equipment can obtain the degree of confidence of the first steering order, and itself and confidence threshold value are compared, if the numerical value of the degree of confidence of the first steering order is more than or equal to default confidence threshold value, then the first steering order is defined as the recognition result of speech data to be identified.

Alternatively, if from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, but the degree of confidence of the first steering order is less than default confidence threshold value, then send the cue of speech data recognition failures to be identified.

In force, if from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, then this equipment can obtain the degree of confidence of the first steering order, and itself and confidence threshold value is compared, if the numerical value of the degree of confidence of the first steering order is less than default confidence threshold value, then can by the voice pre-set, the speech data failure to be identified of this recognition of devices of prompting user.

Alternatively, after the said equipment determines the recognition result of speech data to be identified, can send cue to user, its processing procedure can comprise following content: the cue sending the recognition result of the speech data to be identified determined.

In force, when this equipment determines the recognition result of speech data to be identified, the voice prestored can be play by loudspeaker, to point out user's recognition result, now, user can also judge that whether this recognition result is correct, if correct, user can to the speech data of this equipment input for confirming, when this equipment receives this speech data, corresponding steering order can be performed, if mistake, user can input corresponding speech data to this equipment, when this equipment receives this speech data, can stop performing corresponding steering order, and send cue, speech data to be identified is re-entered to point out user.

Embodiment three

Based on identical technical conceive, the embodiment of the present invention additionally provides a kind of device identifying speech data, and as shown in Figure 3, this device comprises:

Transceiver module 310, for receiving the speech data to be identified of user's input, sending to speech recognition server by speech data to be identified, recording the transmitting time of speech data to be identified;

First acquisition module 320, for the sampling time according to speech data to be identified, speech data to be identified is divided into the data segment of multiple preset duration, the voice characteristics information of each data segment obtained and the reference information prestored are carried out matching detection, determine and the first reference information that the first voice characteristics information in speech data to be identified matches, based on the first reference information determined, obtain the first steering order that speech data to be identified is corresponding;

Determination module 330, if for from transmitting time, in preset duration, does not receive the identification message carrying the second steering order that speech recognition server sends, then the first steering order is defined as the recognition result of speech data to be identified; If from transmitting time, in preset duration, receive the identification message carrying the second steering order that speech recognition server sends, then the second steering order is defined as the recognition result of speech data to be identified.

Alternatively, this device also comprises the second acquisition module, for:

According to the matching degree of the first voice characteristics information and the first reference information, obtain the degree of confidence of the first steering order;

Determination module 330, for:

If from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, and the degree of confidence of the first steering order is not less than default confidence threshold value, then the first steering order is defined as the recognition result of speech data to be identified.

Alternatively, this device also comprises the first reminding module, for:

If from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, but the degree of confidence of the first steering order is less than default confidence threshold value, then send the cue of speech data recognition failures to be identified.

Alternatively, transceiver module 310, for:

Alternatively, this device also comprises the second reminding module, for:

Send the cue of the recognition result of the speech data to be identified determined.

It should be noted that: the device of the identification speech data that above-described embodiment provides is when identifying speech data, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, inner structure by equipment is divided into different functional modules, to complete all or part of function described above.In addition, the device of the identification speech data that above-described embodiment provides belongs to same design with the embodiment of the method for identification speech data, and its specific implementation process refers to embodiment of the method, repeats no more here.

Embodiment four

Fig. 4 is a kind of speech recognition apparatus structural representation that the embodiment of the present invention provides.See Fig. 4, this speech recognition apparatus may be used for the method implementing the identification speech data provided in above-described embodiment.Wherein, this speech recognition apparatus can be mobile phone, panel computer pad, Wearable mobile device (as intelligent watch) etc.Preferred:

Speech recognition apparatus 700 can comprise communication unit 110, includes the storer 120 of one or more computer-readable recording mediums, input block 130, display unit 140, sensor 150, voicefrequency circuit 160, WiFi (wireless fidelity, Wireless Fidelity) module 170, include the parts such as processor 180 and power supply 190 that more than or processes core.It will be understood by those skilled in the art that the speech recognition apparatus structure shown in figure does not form the restriction to speech recognition apparatus, the parts more more or less than diagram can be comprised, or combine some parts, or different parts are arranged.Wherein:

Communication unit 110 can be used for receiving and sending messages or in communication process, the reception of signal and transmission, this communication unit 110 can be RF (Radio Frequency, radio frequency) circuit, router, modulator-demodular unit, etc. network communication equipment.Especially, when communication unit 110 is RF circuit, after being received by the downlink information of base station, more than one or one processor 180 is transferred to process; In addition, base station is sent to by relating to up data.Usually, RF circuit as communication unit includes but not limited to antenna, at least one amplifier, tuner, one or more oscillator, subscriber identity module (SIM) card, transceiver, coupling mechanism, LNA (Low Noise Amplifier, low noise amplifier), diplexer etc.In addition, communication unit 110 can also by radio communication and network and other devices communicatings.Described radio communication can use arbitrary communication standard or agreement, include but not limited to GSM (Global System of Mobile communication, global system for mobile communications), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code DivisionMultiple Access, Wideband Code Division Multiple Access (WCDMA)), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc.Storer 120 can be used for storing software program and module, and processor 180 is stored in software program and the module of storer 120 by running, thus performs the application of various function and data processing.Storer 120 mainly can comprise storage program district and store data field, and wherein, storage program district can store operating system, application program (such as sound-playing function, image player function etc.) etc. needed at least one function; Store data field and can store the data (such as voice data, phone directory etc.) etc. created according to the use of speech recognition apparatus 700.In addition, storer 120 can comprise high-speed random access memory, can also comprise nonvolatile memory, such as at least one disk memory, flush memory device or other volatile solid-state parts.Correspondingly, storer 120 can also comprise Memory Controller, to provide the access of processor 180 and input block 130 pairs of storeies 120.

Input block 130 can be used for the numeral or the character information that receive input, and produces and to arrange with user and function controls relevant keyboard, mouse, control lever, optics or trace ball signal and inputs.Preferably, input block 130 can comprise Touch sensitive surface 131 and other input equipments 132.Touch sensitive surface 131, also referred to as touch display screen or Trackpad, user can be collected or neighbouring touch operation (such as user uses any applicable object or the operations of annex on Touch sensitive surface 131 or near Touch sensitive surface 131 such as finger, stylus) thereon, and drive corresponding coupling arrangement according to the formula preset.Optionally, Touch sensitive surface 131 can comprise touch detecting apparatus and touch controller two parts.Wherein, touch detecting apparatus detects the touch orientation of user, and detects the signal that touch operation brings, and sends signal to touch controller; Touch controller receives touch information from touch detecting apparatus, and converts it to contact coordinate, then gives processor 180, and the order that energy receiving processor 180 is sent also is performed.In addition, the polytypes such as resistance-type, condenser type, infrared ray and surface acoustic wave can be adopted to realize Touch sensitive surface 131.Except Touch sensitive surface 131, input block 130 can also comprise other input equipments 132.Preferably, other input equipments 132 can include but not limited to one or more in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, control lever etc.

Display unit 140 can be used for the various graphical user interface showing information or the information being supplied to user and the speech recognition apparatus 700 inputted by user, and these graphical user interface can be made up of figure, text, icon, video and its combination in any.Display unit 140 can comprise display panel 141, optionally, the forms such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-EmittingDiode, Organic Light Emitting Diode) can be adopted to configure display panel 141.Further, Touch sensitive surface 131 can cover display panel 141, when Touch sensitive surface 131 detects thereon or after neighbouring touch operation, send processor 180 to determine the type of touch event, on display panel 141, provide corresponding vision to export with preprocessor 180 according to the type of touch event.Although in the drawings, Touch sensitive surface 131 and display panel 141 be as two independently parts realize input and input function, in certain embodiments, can by Touch sensitive surface 131 and display panel 141 integrated and realize input and output function.

Speech recognition apparatus 700 also can comprise at least one sensor 150, such as optical sensor, motion sensor and other sensors.Preferably, optical sensor can comprise ambient light sensor and proximity transducer, and wherein, ambient light sensor the light and shade of environmentally light can regulate the brightness of display panel 141, proximity transducer when speech recognition apparatus 700 moves in one's ear, can cut out display panel 141 and/or backlight.As the one of motion sensor, Gravity accelerometer can detect the size of all directions (are generally three axles) acceleration, size and the direction of gravity can be detected time static, can be used for identifying the application (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating) of mobile phone attitude, Vibration identification correlation function (such as passometer, knock) etc.; As for speech recognition apparatus 700 also other sensors such as configurable gyroscope, barometer, hygrometer, thermometer, infrared ray sensor, do not repeat them here.

Voicefrequency circuit 160, loudspeaker 161, microphone 162 can provide the audio interface between user and speech recognition apparatus 700.Voicefrequency circuit 160 can by receive voice data conversion after electric signal, be transferred to loudspeaker 161, by loudspeaker 161 be converted to voice signal export; On the other hand, the voice signal of collection is converted to electric signal by microphone 162, voice data is converted to after being received by voicefrequency circuit 160, after again voice data output processor 180 being processed, through RF circuit 110 to send to such as another speech recognition apparatus, or export voice data to storer 120 to process further.Voicefrequency circuit 160 also may comprise earphone jack, to provide the communication of peripheral hardware earphone and speech recognition apparatus 700.

In order to realize radio communication, this speech recognition apparatus can be configured with wireless communication unit 170, this wireless communication unit 170 can be WiFi module.WiFi belongs to short range wireless transmission technology, and speech recognition apparatus 700 can help user to send and receive e-mail by wireless communication unit 170, browse webpage and access streaming video etc., and its broadband internet wireless for user provides is accessed.Although diagrammatically show wireless communication unit 170, be understandable that, it does not belong to must forming of speech recognition apparatus 700, can omit in the scope of essence not changing invention as required completely.

Processor 180 is control centers of speech recognition apparatus 700, utilize the various piece of various interface and the whole mobile phone of connection, software program in storer 120 and/or module is stored in by running or performing, and call the data be stored in storer 120, perform various function and the process data of speech recognition apparatus 700, thus integral monitoring is carried out to mobile phone.Optionally, processor 180 can comprise one or more process core; Preferably, processor 180 accessible site application processor and modem processor, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor mainly processes radio communication.Be understandable that, above-mentioned modem processor also can not be integrated in processor 180.

Speech recognition apparatus 700 also comprises the power supply 190 (such as battery) of powering to all parts, preferably, power supply can be connected with processor 180 logic by power-supply management system, thus realizes the functions such as management charging, electric discharge and power managed by power-supply management system.Power supply 190 can also comprise one or more direct current or AC power, recharging system, power failure detection circuit, power supply changeover device or the random component such as inverter, power supply status indicator.

Although not shown, speech recognition apparatus 700 can also comprise camera, bluetooth module etc., does not repeat them here.Specifically in the present embodiment, the display unit of speech recognition apparatus is touch-screen display, speech recognition apparatus also includes storer, and one or more than one program, one of them or more than one program are stored in storer, and are configured to perform described more than one or one routine package containing the instruction for carrying out following operation by more than one or one processor:

Receive the speech data to be identified of user's input, speech data to be identified is sent to speech recognition server, records the transmitting time of speech data to be identified;

According to the sampling time of speech data to be identified, speech data to be identified is divided into the data segment of multiple preset duration, the voice characteristics information of each data segment obtained and the reference information prestored are carried out matching detection, determine and the first reference information that the first voice characteristics information in speech data to be identified matches, based on the first reference information determined, obtain the first steering order that speech data to be identified is corresponding;

If from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, then the first steering order is defined as the recognition result of speech data to be identified; If from transmitting time, in preset duration, receive the identification message carrying the second steering order that speech recognition server sends, then the second steering order is defined as the recognition result of speech data to be identified.

Alternatively, the method also comprises:

If from transmitting time, in preset duration, do not receive the identification message carrying the second steering order that speech recognition server sends, then the first steering order be defined as the recognition result of speech data to be identified, comprise:

Alternatively, the method also comprises:

Alternatively, receive the speech data to be identified of user's input, comprising:

Alternatively, the method also comprises:

One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. identify a method for speech data, it is characterized in that, described method comprises:

2. method according to claim 1, is characterized in that, described method also comprises:

3. method according to claim 2, is characterized in that, described method also comprises:

4. method according to claim 1, is characterized in that, the speech data to be identified of described reception user input, comprising:

5. method according to claim 1, is characterized in that, described method also comprises:

6. identify a device for speech data, it is characterized in that, described device comprises:

7. device according to claim 6, is characterized in that, described device also comprises the second acquisition module, for:

Described determination module, for:

8. device according to claim 7, is characterized in that, described device also comprises the first reminding module, for:

9. device according to claim 6, is characterized in that, described transceiver module, for:

10. device according to claim 6, is characterized in that, described device also comprises the second reminding module, for: