WO2017032070A1 - 语音识别方法、设备及计算机存储介质 - Google Patents

语音识别方法、设备及计算机存储介质 Download PDF

Info

Publication number
WO2017032070A1
WO2017032070A1 PCT/CN2016/081829 CN2016081829W WO2017032070A1 WO 2017032070 A1 WO2017032070 A1 WO 2017032070A1 CN 2016081829 W CN2016081829 W CN 2016081829W WO 2017032070 A1 WO2017032070 A1 WO 2017032070A1
Authority
WO
WIPO (PCT)
Prior art keywords
cloud server
retry
audio stream
module
retries
Prior art date
Application number
PCT/CN2016/081829
Other languages
English (en)
French (fr)
Inventor
赵永
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017032070A1 publication Critical patent/WO2017032070A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Definitions

  • the present invention relates to the field of voice recognition technology for mobile terminals, and in particular, to a voice recognition method, device, and computer storage medium.
  • the existing voice recognition schemes such as Apple Smart Voice Assistant (Apple Siri), Google Now, etc.
  • Apple Smart Voice Assistant Apple Siri
  • Google Now etc.
  • the mobile terminal is only responsible for the collection and transmission of voice commands.
  • the prior art does not wait for the user to issue all voice commands, but directly transmits the collected audio stream to the cloud server while the user just sends the voice command, and the cloud server directly processes the audio stream. And identification.
  • the network connection of the terminal needs to be maintained during the voice recognition process.
  • the network connection often occurs, for example, during the driving process.
  • WIFI wireless fidelity
  • the prior art directly transmits the audio stream to the cloud server in real time.
  • the mobile terminal local and the cloud server have no complete audio data, and can only report the network abnormality to the user, for example, the network error occurs when the network is dropped, and the network speed is slow.
  • the server fails to receive the server response for a long time, the network timeout, etc., is re-issued by the user according to the abnormal situation, and the voice command is re-issued after the abnormality is eliminated. Therefore, the prior art speech recognition failure rate is high, and the user needs manual intervention to re-identify, the recognition efficiency is low, and the user experience is not good.
  • embodiments of the present invention are expected to provide a voice recognition method, device, and computer storage medium to improve cloud voice recognition efficiency and enhance user experience.
  • an embodiment of the present invention provides a voice recognition method, where the method includes:
  • the locally recorded audio stream data is again sent to the cloud server.
  • the method further includes: performing the retry waiting and the resending in a loop until the cloud server identifies success or reaches a maximum number of retries.
  • the method further includes: the network abnormality information is returned by the cloud server, or is automatically generated when the response of the cloud server is not received after a predetermined time is exceeded.
  • the method further includes deleting the locally recorded audio stream data after the identification succeeds or the maximum number of retries is reached.
  • the method further includes: setting the maximum number of retries and each retry waiting time at the time of initialization and/or according to user input.
  • an embodiment of the present invention further provides a voice recognition device, where the device includes:
  • the audio collection module is configured to collect voice commands, and send the audio stream of the voice command to the cloud server for identification, and record the audio stream data locally;
  • Retry the wait module configured to initiate a retry wait when a network exception message is received
  • Retrying the processing module configured to record the audio stream locally when the retry condition is met
  • the data is sent again to the cloud server.
  • the device further includes: a loop execution module, configured to cyclically execute the retry wait module and the retry processing module until the cloud server identifies success or reaches a maximum number of retries.
  • the device further includes: an abnormality identifying module configured to identify the network abnormality information returned by the cloud server, or automatically generate the response when the response of the cloud server is not received after a predetermined time is exceeded Network exception information.
  • the device further includes: a data cleaning module, configured to delete the locally recorded audio stream data after the identification succeeds or the maximum number of retries is reached.
  • the device further includes: a setting module configured to set a maximum number of retries and each retry waiting time at the time of initialization and/or according to user input.
  • the embodiment of the invention provides a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is used to execute the voice recognition method described above.
  • the voice recognition method, the device and the computer storage medium that are required to be provided by the embodiment of the present invention automatically perform the retry processing after the abnormality by locally storing the data, and can repeatedly perform the retry processing without the intervention of the user, thereby ensuring the smooth progress of the cloud voice recognition. It can greatly improve the success rate of cloud speech recognition when the network connection is not ideal, and avoid the user repeatedly inputting voice commands, which improves the voice recognition efficiency and enhances the user experience.
  • FIG. 1 is a schematic flowchart of a voice recognition method according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention.
  • the embodiment of the present invention provides a voice recognition method, where the method includes:
  • S1 collecting a voice command, and transmitting the audio stream of the voice command to the cloud server for identification, and recording the audio stream data locally;
  • the network environment in the embodiment of the present invention is a data link, a WIFI, and other wireless network environments.
  • the network abnormality information includes, but is not limited to, network error information when the network is offline, and the network speed is slow, etc., resulting in long time failure. Network timeout information, etc. when the server responds.
  • the network abnormality information is returned by the cloud server, or is automatically generated when the response of the cloud server is not received within a predetermined time.
  • the retry condition includes: excluding a network abnormality and/or reaching a retry waiting time.
  • the steps S2 and S3 for performing the retry are repeatedly performed until the cloud recognition succeeds or the upper limit of the number of retries is reached.
  • the maximum number of retries and the interval of each retry waiting time may be set at the time of voice recognition initialization or dynamically adjusted by the user.
  • the voice recognition efficiency of the cloud is improved by the retrying mechanism.
  • the voice command is directly sent to the cloud for identification by the audio stream, and at least one copy of the complete audio stream data is also recorded locally, and the audio stream data can be recorded by the local recording file.
  • Data, memory data, etc. are saved; to properly control the local storage space, the locally saved audio stream data is deleted after the recognition succeeds or the maximum number of retries is reached.
  • the maximum number of retries N and the waiting time for each retry can be set in advance.
  • the waiting time for each retry can be the same, or can be set to different values, such as:
  • the waiting time Ti of the i-th retry may be set one by one, or may be automatically set according to a certain change rule (increment, decrement, first increase, then decrease, first decrease, increase, etc.); the change rule may be represented by a preset function. It can also be a random sequence.
  • the maximum number of retries and retry waiting time can also be changed by the user at any time.
  • Speech recognition is mainly used when the user inconveniences to directly operate the terminal device by hand. For example, after driving the audio stream of the voice command to the cloud, waiting for the result of the cloud recognition, if the cloud returns the recognized operation command, then the voice recognition is performed. The operation command controls the terminal device, and the speech recognition ends. If the recognized operation command is not received but the network abnormality information is returned for the ith time, the i-th retry wait is started, and the preset waiting time Ti is used for timing. When the waiting time Ti is reached, or when the network abnormality is excluded, for example, after the available network is found and the connection is established, the locally saved audio stream data is sent to the cloud server for identification again. If the number of times i of returning the network abnormality information is greater than the preset maximum number of retries N, the identification failure information is returned and the current speech recognition is ended.
  • the embodiment of the present invention further provides a voice recognition device 1 that interacts with the cloud server 2, and the voice recognition device 1 includes:
  • the audio collection module 101 is configured to collect voice commands, and send the audio stream of the voice command to the cloud server 2 for identification, and record the audio stream data locally;
  • Retry waiting module 102 configured to initiate a retry wait when receiving network abnormality information
  • the retry processing module 103 is configured to send the locally recorded audio stream data to the cloud server 2 again when the retry condition is met.
  • the retry may be repeated multiple times, so the voice recognition device 1 further includes: a loop execution module configured to cyclically execute the retry wait module and the retry processing module until The cloud server identifies the success or reaches the maximum number of retries.
  • the abnormality information is returned or generated locally by the cloud server
  • the voice recognition device 1 further includes: an abnormality identification module configured to identify the network abnormality information returned by the cloud server, Or automatically generating the network abnormality information when the response of the cloud server is not received after the predetermined time is exceeded.
  • the voice recognition device 1 further includes: a data cleaning module configured to delete the locally recorded audio stream data after the identification succeeds or the maximum number of retries is reached.
  • the setting module is configured to set the maximum number of retries and each retry waiting time at initialization and/or according to user input.
  • the voice recognition device may be the mobile terminal itself; or may be a relatively independent functional unit, and the cloud voice recognition of the terminal to the cloud server is implemented after the mobile terminal is loaded.
  • the audio collection module 101, the retry waiting module 102, and the retry processing module 103, the loop execution module, the abnormality recognition module, the data cleaning module, and the setting module may all be configured by a central processing unit located in the terminal device (Central).
  • the audio stream data is recorded in a data recording module, and the data recording module can be implemented by a storage medium such as various memories or storage devices.
  • the user initiates voice recognition, preferably sets the maximum number of retries to 3, and the retry waiting times are 10 seconds, 20 seconds, and 30 seconds, respectively.
  • the user uses the terminal voice recognition in the vehicle environment, the vehicle is in a zone where there is no mobile data signal or the mobile data signal is weak, and the user turns on the voice recognition to issue a voice command, using the solution of the embodiment of the present invention. Even if the first recognition fails, if the vehicle can reach the position of the network signal within the waiting time of 10 seconds, 20 seconds, and 30 seconds to be retried later, the voice can still be successfully recognized.
  • the user terminal accesses the WIFI for voice recognition, the user turns on the voice recognition to issue a voice command, and the remote router fails or restarts, and the solution of the embodiment of the present invention is used, even if the first recognition fails. If the network The route can be restored to normal after 10 seconds, 20 seconds, and 30 seconds of waiting time, and the voice can still be successfully recognized.
  • the embodiment of the invention further describes a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the voice recognition method described in the foregoing embodiments.
  • the user can repeat the retry process without the intervention of the user, ensuring the smooth progress of the cloud voice recognition, and greatly improving the success rate of the cloud voice recognition when the network connection is not ideal, and avoiding the user. Repeated input of voice commands improves voice recognition efficiency and enhances user experience.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage device includes the following steps:
  • the foregoing storage medium includes: a removable storage device, a read-only memory (ROM), a magnetic disk, or an optical disk, and the like, which can store program codes.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a magnetic disk, or an optical disk.
  • the voice command is collected, and the audio stream of the voice command is sent to the cloud server for identification, and the audio stream data is recorded locally; when the network abnormality information is received, the retry wait is started; When the retry condition is met, the locally recorded audio stream data is sent to the cloud server again; thus, the retry processing after the abnormality is automatically performed by locally storing the data, and the retry processing can be repeatedly performed without user intervention.
  • the success rate of cloud voice recognition when the network connection is not ideal can be greatly improved, and the user can repeatedly input voice commands, thereby improving the voice recognition efficiency and improving the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音识别方法,所述方法包括:采集语音指令,将所述语音指令的音频流发送到云端服务器进行识别的同时,在本地记录所述音频流数据(S1);在收到网络异常信息时,启动重试等待(S2);在满足重试条件时,将本地记录的所述音频流数据再次发送到所述云端服务器(S3)。

Description

语音识别方法、设备及计算机存储介质 技术领域
本发明涉及移动终端语音识别技术领域,特别涉及一种语音识别方法、设备及计算机存储介质。
背景技术
由于移动终端的计算和存储能力相对较弱,而语音识别的计算量较大且需要较大的空间来存储语音库,故现有的语音识别方案,比如:苹果智能语音助手(Apple Siri)、谷歌即时(Google Now)等,均是在云端进行语音识别,移动终端只负责语音指令的采集和传输。为尽量降低反应延迟,现有技术不等待用户发出全部语音指令,而是在用户刚发出语音指令的同时,便将采集到的音频流直接传输到云端服务器,由云端服务器直接对音频流进行处理和识别。采用现有技术的方案,在进行语音识别过程中需要保持终端的网络连接,但实际情况中,由于移动终端经常会随用户快速移动,时常会出现网络连接不佳的情况,比如:行车过程中、终端切换信号基站、通过隧道、超出无线保真(WIFI,Wireless Fidelity)范围和WIFI路由重启等,这都会影响语音识别的成功率。
现有技术直接将音频流实时传输到云端服务器,一旦网络异常,移动终端本地和云端服务器均没有完整的音频数据,只能向用户汇报网络异常,如:网络掉线时报网络错误;网速慢等导致长时间收不到服务器响应时报网络超时等,由用户根据异常情况,在排除异常后重新发出语音指令来再次识别。故现有技术的语音识别失败率较高,且需要用户人工干预来重新识别,识别效率低,用户体验不佳。
发明内容
为解决现有存在的技术问题,本发明实施例期望提供一种语音识别方法、设备及计算机存储介质,以提高云端语音识别效率,增强用户体验。
本发明实施例的技术方案是这样实现的:
在本发明实施例的一方面,本发明实施例提供了一种语音识别方法,所述方法包括:
采集语音指令,将所述语音指令的音频流发送到云端服务器进行识别的同时,在本地记录所述音频流数据;
在收到网络异常信息时,启动重试等待;
在满足重试条件时,将本地记录的所述音频流数据再次发送到所述云端服务器。
上述方案中,所述方法还包括:循环执行所述重试等待和所述再次发送,直至所述云端服务器识别成功或者达到最大重试次数。
上述方案中,所述方法还包括:所述网络异常信息由所述云端服务器返回,或者在超出预定时间仍未收到所述云端服务器的响应时自动产生。
上述方案中,所述方法还包括:在识别成功或达到最大重试次数后删除本地记录的所述音频流数据。
上述方案中,所述方法还包括:在初始化时和/或根据用户输入,设置最大重试次数以及各次重试等待时间。
在本发明实施例的另一方面,本发明实施例还提供一种语音识别设备,所述设备包括:
音频采集模块,配置为采集语音指令,将所述语音指令的音频流发送到云端服务器进行识别的同时,在本地记录所述音频流数据;
重试等待模块,配置为在收到网络异常信息时,启动重试等待;
重试处理模块,配置为在满足重试条件时,将本地记录的所述音频流 数据再次发送到所述云端服务器。
上述方案中,所述设备还包括:循环执行模块,配置为循环执行所述重试等待模块和所述重试处理模块,直至所述云端服务器识别成功或者达到最大重试次数。
上述方案中,所述设备还包括:异常识别模块,配置为识别由所述云端服务器返回的所述网络异常信息,或者在超出预定时间仍未收到所述云端服务器的响应时自动产生所述网络异常信息。
上述方案中,所述设备还包括:数据清理模块,配置为在识别成功或达到最大重试次数后删除本地记录的所述音频流数据。
上述方案中,所述设备还包括:设置模块,配置为在初始化时和/或根据用户输入,设置最大重试次数以及各次重试等待时间。
本发明实施例提供了一种计算机存储介质,所述计算机存储介质中存储有计算机程序,所述计算机程序用于执行以上所述的语音识别方法。
本发明实施例期望提供的语音识别方法、设备及计算机存储介质,通过本地存储数据自动进行异常后的重试处理,无需用户的干预即可反复进行重试处理,确保云端语音识别的顺利进行,可大幅提高网络连接不理想时云端语音识别的成功率,避免用户重复输入语音指令,提高了语音识别效率,提升用户体验。
附图说明
图1为本发明实施例提供的语音识别方法的流程示意图;
图2为本发明实施例提供的语音识别设备的结构示意图。
具体实施方式
以下结合附图对本发明的优选实施例进行详细说明,应当理解,以下所说明的优选实施例仅用于说明和解释本发明,并不用于限定本发明。
为在网络连接不理想时确保云端语音识别的成功率,避免用户重复输入语音指令,如图1所示,本发明实施例提供了一种语音识别方法,所述方法包括:
S1:采集语音指令,将所述语音指令的音频流发送到云端服务器进行识别的同时,在本地记录所述音频流数据;
S2:在收到网络异常信息时,启动重试等待;
S3:在满足重试条件时,将本地记录的所述音频流数据再次发送到所述云端服务器。
其中,本发明实施例中的网络环境为数据链路、WIFI以及其它的无线网络环境,所述网络异常信息包括但不限于网络掉线时的网络错误信息、网速慢等导致长时间收不到服务器响应时的网络超时信息等。
上述步骤S2中,所述网络异常信息由云端服务器返回,或在超出预定时间仍未收到云端服务器的响应时自动产生。
上述步骤S3中,所述重试条件包括:排除网络异常情况和/或达到重试等待时间。
在一实施方式中,本发明实施例的方法中,进行重试的步骤S2和S3重复执行,直到云端识别成功或者达到重试次数的上限。其中,最大重试次数以及每次重试等待时间的间隔可在语音识别初始化时设置或由用户动态调整。
本发明实施例通过重试机制提高云端语音识别效率,语音指令以音频流方式直接发送到云端进行识别的同时,本地也记录完整音频流数据的至少一份拷贝,音频流数据可采用本地录音文件数据、内存数据等保存方式;为合理控制本地存储空间,在识别成功或达到最大重试次数后删除本地保存的音频流数据。最大重试次数N以及每次重试的等待时间可以预先设置。其中,各次重试等待时间可以相同,也可以分别设置为不同的值,比如: 第i次重试的等待时间Ti可以逐一设置,也可以按一定变化规律(递增、递减、先增后减、先减后增等等)自动设置;所述变化规律可以采用预设函数来表示也可是随机序列。此外,最大重试次数及重试等待时间也可由用户随时更改调整。
语音识别主要应用于用户不便用手直接操作终端设备的情况,比如驾驶期间,在将语音指令的音频流发送到云端之后,等待云端识别的结果,若云端返回识别后的操作指令,则按所述操作指令控制终端设备,本次语音识别结束。若未收到识别后的操作指令而是第i次返回网络异常信息,则启动第i次重试等待,按预设的等待时间Ti进行计时。在到达等待时间Ti时,或者网络异常情况被排除时,比如发现可用网络并建立连接后,将本地保存的音频流数据再次发送给云端服务器进行识别。若返回网络异常信息的次数i大于预设的最大重试次数N,则返回识别失败信息并结束本次语音识别。
如图2所示,本发明实施例还同时提供了一种语音识别设备1,与云端服务器2进行交互,所述语音识别设备1包括:
音频采集模块101,配置为采集语音指令,将所述语音指令的音频流发送到云端服务器2进行识别的同时,在本地记录所述音频流数据;
重试等待模块102,配置为在收到网络异常信息时,启动重试等待;
重试处理模块103,配置为在满足重试条件时,将本地记录的所述音频流数据再次发送到所述云端服务器2。
在本发明优选实施例中,重试可重复多次循环执行,故所述语音识别设备1还包括:循环执行模块,配置为循环执行所述重试等待模块和所述重试处理模块,直至所述云端服务器识别成功或者达到最大重试次数。
异常信息由云端服务器返回或本地产生,所述语音识别设备1还包括:异常识别模块,配置为识别由所述云端服务器返回的所述网络异常信息, 或者在超出预定时间仍未收到所述云端服务器的响应时自动产生所述网络异常信息。
在一实施方式中,所述语音识别设备1还包括:数据清理模块,配置为在识别成功或达到最大重试次数后删除本地记录的所述音频流数据。以及,设置模块,配置为在初始化时和/或根据用户输入,设置最大重试次数以及各次重试等待时间。
在一实施方式中,上述语音识别设备可以是移动终端本身;也可以是相对独立的功能单元,通过移动终端加载后实现终端到云端服务器的云端语音识别。
在实际应用中,所述音频采集模块101、重试等待模块102以及重试处理模块103、循环执行模块、异常识别模块、数据清理模块和设置模块均可由位于终端设备中的中央处理器(Central Processing Unit,CPU)、微处理器(Micro Processor Unit,MPU)、数字信号处理器(Digital Signal Processor,DSP)、或现场可编程门阵列(Field Programmable Gate Array,FPGA)等实现。所述音频流数据记录在数据记录模块中,所述数据记录模块可由各种存储器、或存储设备等存储介质实现。
下面介绍本发明实施例的几个典型应用场景:用户启动语音识别,优选设置最大重试次数为3,各次重试等待时间分别为10秒、20秒、30秒。在第1个示例的场景中,用户在车载环境中使用终端语音识别,车辆正处在没有移动数据信号或者移动数据信号弱的地带,用户开启语音识别发出语音命令,使用本发明实施例的方案,即使第一次识别失败,如果车辆能够在后面重试的10秒、20秒、30秒的等待时间内到达网络信号好的位置,就仍然能够成功识别语音。在第2个示例的场景中,用户终端接入WIFI进行语音识别,用户开启语音识别发出语音命令,远端路由器出现故障或者重启等问题,使用本发明实施例的方案,即使第一次识别失败,如果网络 路由能够在后面重试的10秒、20秒、30秒的等待时间内恢复正常,就仍然能够成功识别语音。
本发明实施例还记载了一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行前述各个实施例所述的语音识别方法。
综上所述,本发明实施例所述技术方案具有以下技术效果:
通过本地存储数据自动进行异常后的重试处理,无需用户的干预即可反复进行重试处理,确保云端语音识别的顺利进行,可大幅提高网络连接不理想时云端语音识别的成功率,避免用户重复输入语音指令,提高了语音识别效率,提升用户体验。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。
工业实用性
本发明实施例中,采集语音指令,将所述语音指令的音频流发送到云端服务器进行识别的同时,在本地记录所述音频流数据;在收到网络异常信息时,启动重试等待;在满足重试条件时,将本地记录的所述音频流数据再次发送到所述云端服务器;如此,通过本地存储数据自动进行异常后的重试处理,无需用户的干预即可反复进行重试处理,确保云端语音识别的顺利进行,可大幅提高网络连接不理想时云端语音识别的成功率,避免用户重复输入语音指令,提高了语音识别效率,提升用户体验。

Claims (11)

  1. 一种语音识别方法,所述方法包括:
    采集语音指令,将所述语音指令的音频流发送到云端服务器进行识别的同时,在本地记录所述音频流数据;
    在收到网络异常信息时,启动重试等待;
    在满足重试条件时,将本地记录的所述音频流数据再次发送到所述云端服务器。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    循环执行所述重试等待和所述再次发送,直至所述云端服务器识别成功或者达到最大重试次数。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述网络异常信息由所述云端服务器返回,或者在超出预定时间仍未收到所述云端服务器的响应时自动产生。
  4. 根据权利要求2所述的方法,其中,所述方法还包括:
    在识别成功或达到最大重试次数后删除本地记录的所述音频流数据。
  5. 根据权利要求1所述的方法,其中,所述方法还包括:
    在初始化时和/或根据用户输入,设置最大重试次数以及各次重试等待时间。
  6. 一种语音识别设备,所述设备包括:
    音频采集模块,配置为采集语音指令,将所述语音指令的音频流发送到云端服务器进行识别的同时,在本地记录所述音频流数据;
    重试等待模块,配置为在收到网络异常信息时,启动重试等待;
    重试处理模块,配置为在满足重试条件时,将本地记录的所述音频流数据再次发送到所述云端服务器。
  7. 根据权利要求6所述的设备,其中,所述设备还包括:
    循环执行模块,配置为循环执行所述重试等待模块和所述重试处理模块,直至所述云端服务器识别成功或者达到最大重试次数。
  8. 根据权利要求6所述的设备,其中,所述设备还包括:
    异常识别模块,配置为识别由所述云端服务器返回的所述网络异常信息,或者在超出预定时间仍未收到所述云端服务器的响应时自动产生所述网络异常信息。
  9. 根据权利要求7所述的设备,其中,所述设备还包括:
    数据清理模块,配置为在识别成功或达到最大重试次数后删除本地记录的所述音频流数据。
  10. 根据权利要求6所述的设备,其中,所述设备还包括:
    设置模块,配置为在初始化时和/或根据用户输入,设置最大重试次数以及各次重试等待时间。
  11. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至5任一项所述的方法。
PCT/CN2016/081829 2015-08-21 2016-05-12 语音识别方法、设备及计算机存储介质 WO2017032070A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510520981.7 2015-08-21
CN201510520981.7A CN106469558A (zh) 2015-08-21 2015-08-21 语音识别方法及设备

Publications (1)

Publication Number Publication Date
WO2017032070A1 true WO2017032070A1 (zh) 2017-03-02

Family

ID=58099355

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/081829 WO2017032070A1 (zh) 2015-08-21 2016-05-12 语音识别方法、设备及计算机存储介质

Country Status (2)

Country Link
CN (1) CN106469558A (zh)
WO (1) WO2017032070A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108648754A (zh) * 2018-04-26 2018-10-12 北京小米移动软件有限公司 语音控制方法及装置
CN109074808A (zh) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 语音控制方法、中控设备和存储介质
CN110046045A (zh) * 2019-04-03 2019-07-23 百度在线网络技术(北京)有限公司 语音唤醒的数据包处理方法和装置
CN110322885A (zh) * 2018-03-28 2019-10-11 塞舌尔商元鼎音讯股份有限公司 人工智能语音互动的方法、电脑程序产品及其近端电子装置
CN112735414A (zh) * 2020-12-25 2021-04-30 肯特智能技术(深圳)股份有限公司 一种在线离线双模语音控制方法、系统以及存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971723B (zh) * 2017-03-29 2021-02-12 北京搜狗科技发展有限公司 语音处理方法和装置、用于语音处理的装置
CN107146617A (zh) * 2017-06-15 2017-09-08 成都启英泰伦科技有限公司 一种新型语音识别设备及方法
CN109743436B (zh) * 2018-12-29 2020-08-28 苏州思必驰信息科技有限公司 用于语音对话的通讯补偿方法、装置、设备和存储介质
CN110085237B (zh) * 2019-04-29 2022-01-07 大众问问(北京)信息科技有限公司 交互过程的恢复方法、装置及设备
CN110246501B (zh) * 2019-07-02 2022-02-01 思必驰科技股份有限公司 用于会议记录的语音识别方法及系统
CN110853639B (zh) * 2019-10-23 2023-09-01 天津讯飞极智科技有限公司 语音转写方法及相关装置
CN112489660A (zh) * 2020-11-19 2021-03-12 中国第一汽车股份有限公司 一种车载语音识别方法、装置、设备及存储介质
CN113079394A (zh) * 2021-03-27 2021-07-06 深圳市研强物联技术有限公司 Asr平台语音助手实现流媒体播放的方法、系统及终端
CN113810266B (zh) * 2021-08-13 2023-05-12 北京达佳互联信息技术有限公司 针对消息对象的重试操作方法、装置、设备及存储介质
CN113794622B (zh) * 2021-08-17 2023-03-24 北京达佳互联信息技术有限公司 消息处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020184373A1 (en) * 2000-11-01 2002-12-05 International Business Machines Corporation Conversational networking via transport, coding and control conversational protocols
CN2737089Y (zh) * 2003-12-22 2005-10-26 浙江华立通信集团有限公司 Cdma系统中对振铃回应进行检测的设备
CN103824560A (zh) * 2014-03-18 2014-05-28 上海言海网络信息技术有限公司 中文语音识别系统
CN104123942A (zh) * 2014-07-30 2014-10-29 腾讯科技(深圳)有限公司 一种语音识别方法及系统
CN104575502A (zh) * 2014-11-25 2015-04-29 百度在线网络技术(北京)有限公司 智能玩具及智能玩具的语音交互方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020184373A1 (en) * 2000-11-01 2002-12-05 International Business Machines Corporation Conversational networking via transport, coding and control conversational protocols
CN2737089Y (zh) * 2003-12-22 2005-10-26 浙江华立通信集团有限公司 Cdma系统中对振铃回应进行检测的设备
CN103824560A (zh) * 2014-03-18 2014-05-28 上海言海网络信息技术有限公司 中文语音识别系统
CN104123942A (zh) * 2014-07-30 2014-10-29 腾讯科技(深圳)有限公司 一种语音识别方法及系统
CN104575502A (zh) * 2014-11-25 2015-04-29 百度在线网络技术(北京)有限公司 智能玩具及智能玩具的语音交互方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110322885A (zh) * 2018-03-28 2019-10-11 塞舌尔商元鼎音讯股份有限公司 人工智能语音互动的方法、电脑程序产品及其近端电子装置
CN110322885B (zh) * 2018-03-28 2023-11-28 达发科技股份有限公司 人工智能语音互动的方法、电脑程序产品及其近端电子装置
CN108648754A (zh) * 2018-04-26 2018-10-12 北京小米移动软件有限公司 语音控制方法及装置
CN108648754B (zh) * 2018-04-26 2021-09-21 北京小米移动软件有限公司 语音控制方法及装置
CN109074808A (zh) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 语音控制方法、中控设备和存储介质
CN110046045A (zh) * 2019-04-03 2019-07-23 百度在线网络技术(北京)有限公司 语音唤醒的数据包处理方法和装置
CN112735414A (zh) * 2020-12-25 2021-04-30 肯特智能技术(深圳)股份有限公司 一种在线离线双模语音控制方法、系统以及存储介质
CN112735414B (zh) * 2020-12-25 2024-06-07 肯特智能技术(深圳)股份有限公司 一种在线离线双模语音控制方法、系统以及存储介质

Also Published As

Publication number Publication date
CN106469558A (zh) 2017-03-01

Similar Documents

Publication Publication Date Title
WO2017032070A1 (zh) 语音识别方法、设备及计算机存储介质
US10701177B2 (en) Automatic data request recovery after session failure
TWI743405B (zh) 語音播報方法、智慧型播報裝置、編碼有電腦程式指令的一個或多個非暫時性電腦儲存媒體以及智慧型播報設備
US9529439B2 (en) Multi device pairing and sharing via gestures
JP2019091418A (ja) ページを制御する方法および装置
US9826035B2 (en) Piecewise linear, probabilistic, backoff method for retrying message delivery in a cloud-based computing environment
CN107809394B (zh) 经通信网络发送数据的系统和方法
US20140250158A1 (en) Method and device for obtaining file
WO2013071766A1 (zh) 异常处理方法、装置和客户端
JP2017505486A (ja) クラウドストリーミングサービスのためのアプリケーションエラー検出方法、そのための装置及びシステム
US9178860B2 (en) Out-of-path, content-addressed writes with untrusted clients
WO2019037458A1 (zh) 通信方法和装置
WO2018014729A1 (zh) 一种上下文更新方法、系统及设备、存储介质
CN106528866A (zh) 一种更新元数据的方法、装置和系统
WO2015176468A1 (zh) 一种智能终端的系统软件恢复的方法及装置
WO2017157062A1 (zh) 一种动态文件的传输方法、装置及电子设备
CN103716139A (zh) 一种信息推送处理方法和装置
CN105549995B (zh) 一种音频设备升级方法及装置
US9313653B2 (en) Information processing device, server device, data communication system, data communication method, and computer-readable storage medium storing data communication program
RU2615759C2 (ru) Способ и устройство для видеозаписи
JP6331429B2 (ja) 車両システム、車両装置と携帯端末との適合情報取得方法、プログラム及び記録媒体
US10210886B2 (en) Voice segment detection system, voice starting end detection apparatus, and voice terminal end detection apparatus
CN114281673B (zh) 基于虚拟化服务的测试方法、设备以及存储介质
WO2017016279A1 (zh) 终端配置管理方法及装置
CN114158089A (zh) 音频传输方法、终端、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16838332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16838332

Country of ref document: EP

Kind code of ref document: A1