WO2017080195A1 - 音频识别方法及装置 - Google Patents

音频识别方法及装置 Download PDF

Info

Publication number
WO2017080195A1
WO2017080195A1 PCT/CN2016/084617 CN2016084617W WO2017080195A1 WO 2017080195 A1 WO2017080195 A1 WO 2017080195A1 CN 2016084617 W CN2016084617 W CN 2016084617W WO 2017080195 A1 WO2017080195 A1 WO 2017080195A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
audio
code value
character
feature information
Prior art date
Application number
PCT/CN2016/084617
Other languages
English (en)
French (fr)
Inventor
王云华
Original Assignee
深圳Tcl数字技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl数字技术有限公司 filed Critical 深圳Tcl数字技术有限公司
Publication of WO2017080195A1 publication Critical patent/WO2017080195A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/441Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
    • H04N21/4415Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card using biometric characteristics of the user, e.g. by voice recognition or fingerprint scanning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities

Definitions

  • the present invention relates to the field of smart television technologies, and in particular, to an audio recognition method and apparatus.
  • the main object of the present invention is to provide an audio recognition method and apparatus, which aim to solve the problem that the loss of PCM code stream data occurs during the voice transmission process, thereby causing poor accuracy of voice recognition.
  • the present invention provides an audio recognition method, and the audio recognition method includes:
  • the step of acquiring the first feature information of the audio data includes:
  • the second feature information corresponding to each character is obtained from the cloud in sequence according to the repeated occurrence of each character.
  • the present invention also provides an audio recognition method.
  • the audio recognition method includes the following steps:
  • the determined character is taken as a character that matches the audio data.
  • the present invention further provides an audio recognition device, the audio recognition device comprising:
  • a first acquiring module configured to acquire audio data stored in an audio buffer, where the terminal stores the received audio data into the audio buffer when receiving the audio data;
  • a second acquiring module configured to acquire first feature information of the audio data and second feature information of each character in a current display interface of the terminal
  • a determining module configured to determine a character that matches the second feature information and the first feature information in each character of the current display interface.
  • a matching module configured to use the determined character as a character that matches the audio data.
  • the invention acquires the feature information of the audio data, and matches the feature information of the audio data with the feature information corresponding to each character data of the current display interface of the terminal acquired from the cloud, so that even if the audio data is lost, the invention can And matching the feature information corresponding to the respective character data according to the feature information of the audio data, thereby identifying characters corresponding to the audio data, and improving voice recognition accuracy.
  • FIG. 1 is a schematic flow chart of a first embodiment of an audio recognition method according to the present invention.
  • FIG. 2 is a schematic flowchart of refinement of acquiring first feature information of the audio data in FIG. 1;
  • FIG. 3 is a schematic flowchart of refining the first feature information of the audio data according to the obtained code value in FIG. 2;
  • FIG. 4 is a schematic flowchart of refining the second feature information of each character in the current display interface of the terminal in FIG. 1;
  • FIG. 5 is a schematic flowchart diagram of a second embodiment of an audio recognition method according to the present invention.
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of an audio recognition apparatus according to the present invention.
  • FIG. 7 is a schematic diagram of a refinement function module of the second acquisition module in FIG. 6;
  • FIG. 8 is a schematic diagram of functional modules of a second embodiment of an audio recognition apparatus according to the present invention.
  • the present invention provides an audio recognition method.
  • FIG. 1 is a schematic flowchart of a first embodiment of an audio recognition method according to the present invention.
  • the audio recognition method includes:
  • Step S10 Acquire audio data stored in an audio buffer area, where the terminal stores the received audio data into the audio buffer area when receiving the audio data;
  • the terminal may be a smart TV, and the smart TV is taken as an example for description.
  • the received voice PCM stream (audio data) is stored into the audio buffer, and then the audio buffer is detected in the audio buffer in real time or periodically, and is detected. After the audio buffer has audio data, the audio data in the audio buffer is acquired.
  • Step S20 acquiring first feature information of the audio data and second feature information of each character in the current display interface of the terminal;
  • the audio data is summed and shifted, thereby obtaining first feature information of the audio data, for example, after summing the audio data.
  • the first feature information of the audio data is 0x0A00.
  • the audio data is voice PCM stream data corresponding to a control instruction of the smart television, and includes a name of a person, voice PCM stream data corresponding to some specific nouns, and the like.
  • the first feature information is check data of the audio data, and the check data is unique, that is, the check data of each audio data uniquely represents one audio data.
  • each character is a character existing on the current display interface of the terminal, and the character exists in a specific character file, and examples of the existence of each character are as follows:
  • the second feature information of the respective characters is obtained from the cloud.
  • the acquired second feature information of each character is: 0x0B00, 0x0A00, 0x0C00, and the like.
  • Step S30 determining a character that matches the second feature information and the first feature information in each character of the current display interface
  • the first feature information After acquiring the first feature information of the audio data and the second feature information of each character, matching the first feature information of the audio data with the second feature information of each character until determining the audio data The first feature information matches the character corresponding to the successful second feature information.
  • step S40 the determined character is used as a character matching the audio data.
  • the character After determining a character corresponding to the second feature information that the first feature information of the audio data matches successfully, the character is matched as the character matched by the audio data, that is, the meaning of the audio data is the character And displaying the successfully matched characters through the screen of the smart TV for the user to determine whether the displayed characters are correct characters.
  • the invention acquires the feature information of the audio data, and matches the feature information of the audio data with the feature information corresponding to each character data of the current display interface of the terminal acquired from the cloud, so that even if the audio data is lost, the invention can And matching the feature information corresponding to the respective character data according to the feature information of the audio data, thereby identifying characters corresponding to the audio data, and improving voice recognition accuracy.
  • the step of acquiring the first feature information of the audio data includes:
  • Step S21 determining a maximum code value and a minimum code value in a code stream of the audio data
  • the smart TV reads the audio data in the audio buffer, and arranges the audio data in order of the size of the code values, and the audio data may be arranged in descending order of code values, or the The audio data is arranged in ascending order according to the size of the code values. After sorting the audio data, it is easy to obtain the maximum code value and the minimum code value in the code stream of the audio data.
  • Step S22 acquiring a data value between a data time at which a maximum code value occurs in the code stream of the audio data and a data time at which a minimum code value occurs;
  • Step S23 calculating first feature information of the audio data according to the obtained code value.
  • the speech PCM stream feature algorithm is an algorithm for summing and shifting the acquired code values.
  • the step S23 includes:
  • Step S231 summing the obtained code values to obtain a check value of the audio data
  • the smart TV sums the acquired code values to obtain a check value of the audio data.
  • the check value is represented by a 16-bit binary number, and the check value of the audio data is represented as 0x00A0.
  • Step S132 Perform a left shift process on the check value according to a preset left shift algorithm to obtain first feature information of the audio data.
  • the left shift algorithm may be an algorithm that shifts the check value to the left by one bit, or may be an algorithm that shifts the bit to the left.
  • 0x0A00, where 0x00A0 is a check value of the audio data.
  • 0x00A0 is a check value of the audio data.
  • the embodiment represents the audio data by acquiring feature information of some specific audio data in the audio data. Since the feature information of each audio data is unique, the audio data can be represented by the feature information, thereby making the representation of the audio data easier, and solving the problem that the transmission of the audio data occupies a large number of transmission channels.
  • the step of acquiring the second feature information of each character in the current display interface of the terminal includes:
  • Step S24 Obtain a proportion of each character in the current display interface of the terminal that repeatedly appears in the preset character file.
  • the types include browser type and Android system type.
  • the type of the current interface of the smart TV may be only a browser type, or only an Android system type, or a browser type and an Android system type.
  • the preset character file includes a character file of an xml file corresponding to the current interface type being a browser type, and a character file of an xml file of a resource folder of an Android system control corresponding to the current interface type being an Android system type.
  • the smart TV reads the characters of the above two storage intervals and determines the proportion of repetition of each character. For example, the ratio of setting characters 10 times and more than 10 times is 100%, the ratio of 5 repetitions is 50%, the repetition ratio of 4 occurrences is 40%, and the ratio of repeated occurrences is 10%. After reading the characters of the above two storage sections, according to the preset rule of determining the repeated occurrence ratio of the characters, the proportion of repeated occurrence of each character can be determined.
  • Step S25 determining whether the proportion of the repeated occurrence of each character is greater than the first preset ratio
  • Step S26 if yes, acquiring second feature information corresponding to each character from the cloud in sequence according to the repeated occurrence of each character;
  • the first preset ratio is a preset ratio.
  • the first preset ratio is set to 50%. It can be understood that the first preset ratio may also be set to other values.
  • the second feature information corresponding to the character with the largest repetition ratio is preferentially obtained from the cloud server, and then the second feature information acquired from the cloud is The first feature information of the audio data is matched. If the matching is successful, the character is displayed. If the matching fails, the second feature information corresponding to the character with the second largest repetition rate and the feature of the audio data are acquired from the cloud server. The information is matched until the match is successful.
  • the second feature information corresponding to the character with the largest proportion of repeated occurrences is preferentially obtained from the cloud.
  • the first feature information of the audio data is matched, thereby shortening the matching time and improving system efficiency.
  • the audio identification method before the step of acquiring audio data stored in an audio buffer, the audio The identification method also includes:
  • Step S11 determining the number of audio data whose code value is greater than 1 in the audio data stored in the audio buffer area, and the number of audio data corresponding to the second preset ratio of the audio data;
  • the second preset ratio is set to 5%. It can be understood that the second preset ratio may also be set to other scale values according to specific conditions.
  • the total number of audio data of the audio buffer area should be determined first, and then calculated according to the total number of audio data and the second preset ratio value. The number of audio data corresponding to the second preset ratio.
  • Step S12 determining whether the number of audio data whose code value is greater than 1 in the audio data is greater than the number of audio data corresponding to the second preset ratio of the audio data;
  • Step S13 if yes, performing the step of acquiring audio data stored in the audio buffer.
  • the number of the audio data corresponding to the second preset ratio and the number of audio data greater than 1 After determining the number of audio data corresponding to the second preset ratio and the number of audio data greater than 1, comparing the two to determine whether the number of the audio data greater than 1 is greater than The number of the audio data corresponding to the preset ratio is determined to be stored in the audio buffer when it is determined that the number of the audio data greater than 1 is greater than the number of the audio data corresponding to the second preset ratio And determining that the audio data does not exist in the audio buffer area when it is determined that the number of the audio data greater than 1 is less than the number of the audio data corresponding to the second preset ratio.
  • the execution bodies of the audio recognition method of the above embodiments may each be a terminal. Further, the audio recognition method may be implemented by a client control program installed on the terminal, wherein the terminal may be a smart TV.
  • the present invention further provides an audio data processing apparatus based on a smart television.
  • FIG. 6 is a schematic diagram of functional modules of a first embodiment of an audio recognition apparatus according to the present invention.
  • the audio recognition apparatus includes: a first acquisition module 10, a second acquisition module 20, a determination module 30, and a matching module 40.
  • the first obtaining module 10 is configured to acquire audio data stored in an audio buffer, where the terminal stores the received audio data into the audio buffer when receiving the audio data;
  • the terminal may be a smart TV, and the smart TV is taken as an example for description.
  • the smart TV detects the voice PCM stream input
  • the received voice PCM stream (audio data) is stored into the audio buffer, and then the audio buffer is detected in the audio buffer in real time or periodically, and is detected.
  • the audio buffer After the audio buffer has audio data, the audio data in the audio buffer is acquired.
  • the second acquiring module 20 is configured to acquire first feature information of the audio data and second feature information of each character in the current display interface of the terminal;
  • the audio data is summed and shifted, thereby obtaining first feature information of the audio data, for example, after summing the audio data.
  • the first feature information of the audio data is 0x0A00.
  • the audio data is voice PCM stream data corresponding to a control instruction of the smart television, and includes a name of a person, voice PCM stream data corresponding to some specific nouns, and the like.
  • the first feature information is check data of the audio data, and the check data is unique, that is, the check data of each audio data uniquely represents one audio data.
  • each character is a character existing on the current display interface of the terminal, and the character exists in a specific character file, and examples of the existence of each character are as follows:
  • the second feature information of the respective characters is obtained from the cloud.
  • the acquired second feature information of each character is: 0x0B00, 0x0A00, 0x0C00, and the like.
  • the determining module 30 is configured to determine a character that matches the second feature information and the first feature information in each character of the current display interface
  • the first feature information After acquiring the first feature information of the audio data and the second feature information of each character, matching the first feature information of the audio data with the second feature information of each character until determining the audio data The first feature information matches the character corresponding to the successful second feature information.
  • the matching module 40 is configured to use the determined character as a character that matches the audio data.
  • the character After determining a character corresponding to the second feature information that the first feature information of the audio data matches successfully, the character is matched as the character matched by the audio data, that is, the meaning of the audio data is the character And displaying the successfully matched characters through the screen of the smart TV for the user to determine whether the displayed characters are correct characters.
  • the invention acquires the feature information of the audio data, and matches the feature information of the audio data with the feature information corresponding to each character data of the current display interface of the terminal acquired from the cloud, so that even if the audio data is lost, the invention can And matching the feature information corresponding to the respective character data according to the feature information of the audio data, thereby identifying characters corresponding to the audio data, and improving voice recognition accuracy.
  • the second acquisition module 20 includes a determination unit 21, an acquisition unit 22, a calculation unit 23, and a determination unit 24.
  • the determining unit 21 determines a maximum code value and a minimum code value in a code stream of the audio data
  • the smart TV reads the audio data in the audio buffer, and arranges the audio data in order of the size of the code values, and the audio data may be arranged in descending order of code values, or the The audio data is arranged in ascending order according to the size of the code values. After sorting the audio data, it is easy to obtain the maximum code value and the minimum code value in the code stream of the audio data, where the maximum code value is the audio data corresponding to the peak value in the PCM code stream.
  • a code value, the minimum code value being a code value of audio data corresponding to a peak value in the PCM code stream.
  • the obtaining unit 22 is configured to acquire a data value between a data moment in which a maximum code value occurs in a code stream of the audio data and a data moment in which a minimum code value occurs;
  • the obtaining unit 22 is further configured to obtain a proportion of each character in the current display interface of the terminal that repeatedly appears in the preset character file;
  • the types include browser type and Android system type.
  • the type of the current interface of the smart TV may be only a browser type, or only an Android system type, or a browser type and an Android system type.
  • the preset character file includes a character file of an xml file corresponding to the current interface type being a browser type, and a character file of an xml file of a resource folder of an Android system control corresponding to the current interface type being an Android system type.
  • the smart television reads characters in the two storage intervals and determines the proportion of repetition of each character.
  • the ratio of setting characters 10 times and more than 10 times is 100%
  • the proportion of 5 repetitions is 50%
  • the repetition ratio of 4 occurrences is 40%
  • the ratio of repeated occurrences is 10%. Wait. After reading the characters in the above two storage intervals, according to the preset rule of determining the repeated occurrence ratio of the characters, the proportion of repeated occurrence of each character can be determined.
  • the calculating unit 23 is configured to calculate first feature information of the audio data according to the acquired code value.
  • the speech PCM stream feature algorithm is an algorithm for summing and shifting the acquired code values.
  • the calculating unit 23 further includes: a summing subunit 231 and a shifting subunit 232.
  • the summation subunit 231 is configured to obtain the check value of the audio data by summing the obtained code values
  • the smart TV sums the acquired code values to obtain a check value of the audio data.
  • the acquired code value is a code value corresponding to the short audio data.
  • the check value is represented by a 16-bit binary number, and the check value of the audio data is represented as 0x00A0.
  • the shifting sub-unit 232 is configured to perform left shift processing on the check value according to a preset left shift algorithm to obtain first feature information of the audio data.
  • the left shift algorithm may be an algorithm that shifts the check value to the left by one bit, or may be an algorithm that shifts the bit to the left.
  • 0x0A00, where 0x00A0 is a check value of the audio data.
  • 0x00A0 is a check value of the audio data.
  • the determining unit 24 is configured to determine whether a ratio of the repeated occurrence of each character is greater than a first preset ratio
  • the obtaining unit 22 is further configured to: if the proportion of the repeated occurrence of each character is greater than the first preset ratio, obtain the second feature information corresponding to each character from the cloud in sequence according to the repeated occurrence of each character.
  • the first preset ratio is a preset ratio.
  • the first preset ratio is set to 50%. It can be understood that the first preset ratio may also be set to other values.
  • the second feature information corresponding to the character with the largest repetition ratio is preferentially obtained from the cloud server, and then the second feature information acquired from the cloud is The first feature information of the audio data is matched. If the matching is successful, the character is displayed. If the matching fails, the second feature information corresponding to the character with the second largest repetition rate and the feature of the audio data are acquired from the cloud server. The information is matched until the match is successful.
  • the embodiment represents the audio data by acquiring feature information of some specific audio data in the audio data. Since the feature information of each audio data is unique, the audio data can be represented by the feature information, thereby making the representation of the audio data easier, and solving the problem that the transmission of the audio data occupies a large number of transmission channels.
  • the audio recognition apparatus further includes a determination module 50.
  • the determining module 30 is configured to determine the number of audio data whose code value is greater than 1 in the audio data stored in the audio buffer, and the number of audio data corresponding to the second preset ratio of the audio data;
  • the second preset ratio is set to 5%. It can be understood that the second preset ratio may also be set to other scale values according to specific conditions.
  • the total number of audio data of the audio buffer area should be determined first, and then calculated according to the total number of audio data and the second preset ratio value. The number of audio data corresponding to the second preset ratio.
  • the determining module 50 is configured to determine whether the number of audio data whose code value is greater than 1 in the audio data is greater than the number of audio data corresponding to the second preset ratio of the audio data;
  • the first obtaining module 30 is further configured to acquire an audio buffer area if the number of pieces of audio data whose code value is greater than 1 in the audio data is greater than the number of pieces of audio data corresponding to the second preset ratio of the audio data. Audio data stored in.
  • the number of the audio data corresponding to the second preset ratio and the number of audio data greater than 1 After determining the number of audio data corresponding to the second preset ratio and the number of audio data greater than 1, comparing the two to determine whether the number of the audio data greater than 1 is greater than The number of the audio data corresponding to the preset ratio is determined to be stored in the audio buffer when it is determined that the number of the audio data greater than 1 is greater than the number of the audio data corresponding to the second preset ratio And determining that the audio data does not exist in the audio buffer area when it is determined that the number of the audio data greater than 1 is less than the number of the audio data corresponding to the second preset ratio.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • a storage medium such as ROM/RAM, disk
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

一种音频识别方法和装置,所述音频识别方法包括以下步骤:获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符;将获取的所述字符作为与所述音频数据匹配的字符。上述音频识别方法和装置通过智能电视解决了语音传送过程中,出现PCM码流数据的丢失的问题,提高了语音识别准确度。

Description

音频识别方法及装置
技术领域
本发明涉及智能电视技术领域,尤其涉及一种音频识别方法及装置。
背景技术
随着智能电视语音识别技术的高速发展,用户在家由传统的到处找遥控器发展到现在不用遥控器,直接用嘴告诉电视,用户喜欢看的影视、连续剧综艺节目等。虽然该种方式比较方便,然而该方式在语音识别环节的语音传送过程中,可能出现PCM(Pulse Code Modulation,脉冲编码调制)码流数据的丢失,从而造成语音识别的准确度差。例如,小敏拿着自己的无线网的手机对电视说“想看刘德华电影”,在无线网手机麦克风将PCM码流数据传送到TV端,TV端因为传送过程受到无线网多辐射干扰,集成电路片间总线干扰等因素,造成传送的PCM码流数据出现丢失,将丢失“影(0X1100,0X1000,丢失数据)”的码流数据上传到云端,识别出想看刘德华电字串,在TV上显示出刘德华电,此时,就造成小敏的困惑,因为刘德华有电影,刘德华有电视剧等等。
发明内容
本发明的主要目的在于提供一种音频识别方法及装置,旨在解决语音传送过程中,出现PCM码流数据的丢失,从而造成语音识别的准确度差的问题。
为实现上述目的,本发明提供的一种音频识别方法,所述音频识别方法包括:
获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符;以及
将确定的所述字符作为与所述音频数据匹配的字符;
其中,所述获取所述音频数据的第一特征信息的步骤包括:
确定所述音频数据的码流中的最大码值以及最小码值;
获取所述音频数据的码流中出现最大码值的时刻以及出现最小码值的时刻之间的码值;
根据获取到的所述码值计算所述音频数据的第一特征信息;
所述获取终端当前显示界面中各个字符的第二特征信息的步骤包括:
获取终端当前显示界面中所述各个字符在预设的字符文件中重复出现的比例;
判断所述各个字符重复出现的比例是否大于第一预设比例;
若是,则按照各个字符重复出现的比例依次从云端获取各个字符对应的第二特征信息。
此外,为实现上述目的,本发明还提供一种音频识别方法所述音频识别方法包括以下步骤:
获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符;以及
将确定的所述字符作为与所述音频数据匹配的字符。
此外,为实现上述目的,本发明还提供一种音频识别装置,所述音频识别装置包括:
第一获取模块,用于获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
第二获取模块,用于获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
确定模块,用于确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符。
匹配模块,用于将确定的所述字符作为与所述音频数据匹配的字符。
本发明通过获取音频数据的特征信息,并将所述音频数据的特征信息与从云端获取的终端当前显示界面的各个字符数据对应的特征信息进行匹配,使得即使音频数据存在丢失的情况,也能根据所述音频数据的特征信息与所述各个字符数据对应的特征信息进行匹配,从而识别出所述音频数据对应的字符,提高了语音识别准确度。
附图说明
图1为本发明音频识别方法的第一实施例的流程示意图;
图2为图1中所述获取所述音频数据的第一特征信息的细化流程示意图;
图3为图2中根据获取到的所述码值计算所述音频数据的第一特征信息的细化流程示意图;
图4为图1中获取终端当前显示界面中各个字符的第二特征信息的细化流程示意图;
图5为本发明音频识别方法的第二实施例的流程示意图;
图6为本发明音频识别装置的第一实施例的功能模块示意图;
图7为图6中第二获取模块的细化功能模块示意图;
图8为本发明音频识别装置的第二实施例的功能模块示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
基于上述问题,本发明提供一种音频识别方法。
参照图1,图1为本发明音频识别方法的第一实施例的流程示意图。
在本实施例中,所述音频识别方法包括:
步骤S10,获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
在本实施例中,所述终端可为智能电视,以下以智能电视为例进行说明。
智能电视侦测到语音PCM码流输入时,将接收到的语音PCM码流(音频数据)存储至音频缓存区中,然后实时或定时检测所述音频缓存区是否存在音频数据,并在检测到所述音频缓存区存在音频数据后,获取所述音频缓存区中的音频数据。例如,获取到所述音频缓存区的第一存储单元存储的数据为0x50,即APCM_Data(x1)=0x50;获取到所述音频缓存区的第二存储单元存储的数据为0x40,即APCM_Data(x2)=0x40。
步骤S20,获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
当智能电视获取到音频缓存区中存储的音频数据后,对所述音频数据进行求和及移位处理,从而得到所述音频数据的第一特征信息,例如,经过对所述音频数据求和及移位处理后,得到所述音频数据的第一的特征信息为0x0A00。在本实施例中,所述音频数据为智能电视的控制指令对应的语音PCM码流数据,包括人名、一些特定的名词对应的语音PCM码流数据等。所述第一特征信息为所述音频数据的校验数据,该检验数据是唯一的,即每个音频数据的校验数据唯一代表一个音频数据。
在获取到所述音频数据的第一特征信息后,获取终端当前显示界面中各个字符的第二特征信息。在本实施例中,所述各个字符为终端当前显示界面存在的字符,该字符存在于特定的字符文件中,各个字符的存在形式举例如下:
Resource_String(x1)="影视"
Resource_String(x2)="电影"
Video.xml: <string name="app_name">影视</string>
在获取到各个字符后,从云端获取所述各个字符的第二特征信息,例如,获取到的各个字符对应的第二的特征信息分别为:0x0B00、0x0A00、0x0C00等。
步骤S30,确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符;
在获取到所述音频数据的第一特征信息以及各个字符的第二特征信息后,将所述音频数据的第一特征信息与各个字符的第二特征信息进行匹配,直到确定与所述音频数据的第一特征信息匹配成功的第二特征信息所对应的字符为止。
步骤S40,将确定的所述字符作为与所述音频数据匹配的字符。
当确定与所述音频数据的第一特征信息匹配成功的第二特征信息对应的字符后,将所述字符作为所述音频数据匹配的字符,即所述音频数据代表的意思即为所述字符,并通过智能电视的屏幕将所述匹配成功的字符显示出来,以供用户确定所述显示的字符是否为正确的字符。
本发明通过获取音频数据的特征信息,并将所述音频数据的特征信息与从云端获取的终端当前显示界面的各个字符数据对应的特征信息进行匹配,使得即使音频数据存在丢失的情况,也能根据所述音频数据的特征信息与所述各个字符数据对应的特征信息进行匹配,从而识别出所述音频数据对应的字符,提高了语音识别准确度。
进一步地,基于上述第一实施例提出本发明音频识别方法的第二实施例,参照图2,所述获取所述音频数据的第一特征信息的步骤包括:
步骤S21,确定所述音频数据的码流中的最大码值以及最小码值;
智能电视读取所述音频缓存区中的音频数据,并将所述音频数据按照码值的大小顺序排列,可以将所述音频数据按照码值的大小从大到小顺序排列,或者将所述音频数据按照码值的大小从小到大顺序排列。在对所述音频数据排序后,很容易就能获取到所述音频数据的码流中的最大码值以及最小码值。
步骤S22,获取所述音频数据的码流中出现最大码值的数据时刻以及出现最小码值的数据时刻之间的码值;
在确定所述最大码值以及最小码值后,根据所述最大码值及最小码值确定所述音频数据的码流中最大的码值的数据的出现时刻及所述音频数据的码流中最小的码值的数据的出现时刻,并获取所述音频数据的码流中出现最大码值的数据时刻以及出现最小码值的数据时刻之间的码值。
步骤S23,根据获取到的所述码值计算所述音频数据的第一特征信息。
在获取到所述音频数据的码流中出现最大码值的数据时刻以及出现最小码值的数据时刻之间的码值后,根据语音PCM流特征算法计算出音频数据的第一特征信息,所述语音PCM流特征算法为对所述获取到的所述码值进行求和及移位处理的算法。
具体的,参照图3,所述步骤S23包括;
步骤S231,对所获取到的所述码值进行求和得到所述音频数据的校验值;
所述智能电视对所获取到的所述码值进行求和,从而得到所述音频数据的校验值。例如,所获取到的所述码值由最大的PCM码流数据0x50、次最大的PCM码流数据0x40及最小的PCM码流数据0x10组成,则音频数据的校验值=0x50+0x40+0x10=0xA0。可选地,所述校验值由16位二进制数表示,则所述音频数据的校验值表示为0x00A0。
步骤S132,按照预设的左移算法对所述校验值进行左移处理,以得到所述音频数据的第一特征信息。
所述左移算法可为将所述校验值左移一位的算法,也可为左移多位的算法。在本实施例中,将所述校验值进行左移动一位的左移处理,则得到所述音频数据的特征信息为:Personal_PCM_Data(x1) = |0x00A0<<1| =0x0A00,其中,0x00A0为所述音频数据的校验值。每获取到一个音频数据的校验值,都需要通过左移算法对所述校验值进行处理,从而使得每个音频数据的特征信息唯一代表一个音频数据。
本实施例通过获取所述音频数据中的某些特定的音频数据的特征信息来表示所述音频数据。由于每个音频数据的特征信息是唯一的,因此,可以通过所述特征信息来代表该音频数据,从而使得音频数据的表示更加简便,解决了音频数据的传送占有大量的传送通道的问题。
进一步地,基于上述第一或第二实施例提出本发明音频识别方法的第三实施例,参照图4,所述获取终端当前显示界面中各个字符的第二特征信息的步骤包括:
步骤S24,获取终端当前显示界面中所述各个字符在预设的字符文件中重复出现的比例;
在获取到所述音频数据的第一特征信息后,读取所述智能电视当前界面的类型。所述类型包括浏览器类型及安卓系统类型。所述智能电视当前界面的类型可以只为浏览器类型,也可只为安卓系统类型,也可同时存在浏览器类型和安卓系统类型。
在确定到所述智能电视的当前界面类型后,获取所述各个字符在预设的字符文件中重复出现的比例。所述预设的字符文件包括当前界面类型为浏览器类型时所对应的xml文件的字符文件及当前界面类型为安卓系统类型时所对应的安卓系统控件的资源文件夹的xml文件的字符文件。当确定所述当前界面的类型包括浏览器类型时,则读取该类型对应的预设的字符文件中存在的所有字符,并将所述读取到的字符保存到预设的存储区间内;当确定所述当前界面的类型包括安卓系统类型时,则读取该类型对应的预设的字符文件中存在的所有字符,并将所述读取到的字符保存在另一预设的存储区间内。智能电视读取上述两个存储区间的字符,并确定各个字符重复出现的比例。例如,设置字符10次及10次以上重复出现的比例为100%,5次重复出现的比例为50%,4次出现的重复比例为40%,一次重复出现的比例为10%等。在读取到上述两个存储区间的字符后,根据预设的判断字符的重复出现比例的规则,就能确定各个字符重复出现的比例。
步骤S25,判断所述各个字符重复出现的比例是否大于第一预设比例;
步骤S26,若是,则按照各个字符重复出现的比例依次从云端获取各个字符对应的第二特征信息;
在确定各个字符数据重复出现的比例后,判断所述各个字符数据重复出现的比例是否大于第一预设比例。所述第一预设比例为预先设定的比例值,可选地,所述第一预设比例设为50%,可以理解的是,所述第一预设比例也可设为其它值。当判定各个字符重复出现的比例大于第一预设比例,则优先从云端服务器获取所述重复比例最大的字符对应的第二特征信息,然后将所述从云端获取的第二特征信息与所述音频数据的第一特征信息进行匹配,若匹配成功,则显示所述字符,若匹配失败,则从云端服务器获取所述重复比例次大的字符对应的第二特征信息与所述音频数据的特征信息进行匹配,直到匹配成功为止。
本实施例通过确定所述各个字符在预设的字符文件中重复出现的比例,并在确定各个字符重复出现的比例后,优先从云端获取重复出现的比例最大的字符对应的第二特征信息与所述音频数据的第一特征信息进行匹配,从而能够缩短匹配时间,提高了系统效率。
进一步地,基于上述任一实施例,提出本发明音频识别方法的第四实施例,参照图5,在本实施例中,所述获取音频缓存区中存储的音频数据的步骤之前,所述音频识别方法还包括:
步骤S11,确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
可选地,所述第二预设比例设为5%,可以理解的,该第二预设比例也可以根据具体情况设置为其他比例值。在确定所述音频数据的第二预设比例所对应的音频数据的个数时,应该先确定所述音频缓存区的音频数据的总数,然后根据音频数据的总数与第二预设比例值计算出所述第二预设比例所对应的音频数据的个数。
步骤S12,判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
步骤S13,若是,则执行获取音频缓存区中存储的音频数据的步骤。
当确定所述第二预设比例所对应的音频数据的个数及大于1的音频数据的个数后,将两者进行比较,从而判断出所述大于1的音频数据的个数是否大于第二预设比例所对应的音频数据的个数,当判定出所述大于1的音频数据的个数大于所述第二预设比例所对应的音频数据的个数时,获取音频缓存区中存储的音频数据;当判定出所述大于1的音频数据的个数小于所述第二预设比例所对应的音频数据的个数时,确定所述音频缓存区中不存在音频数据。
本实施例通过判断所述音频缓存区中的音频数据中大于1的音频数据的个数是否大于第二预设比例所对应的音频数据的个数,来确定所述音频缓存区中是否存在音频数据。由于所述音频数据的存在与否与音频数据中大于1的音频数据的个数有关,因此通过判断所述音频数据中大于1的音频数据的个数与第二预设比例所对应的音频数据的个数的大小能更准确确定所述音频缓存区中是否存在音频数据,从而预先排除一些噪音的干扰。
上述实施例的音频识别方法的执行主体均可以为终端。更进一步地,该音频识别方法可以由安装在终端上的客户端控制程序实现,其中,该终端可以为智能电视。
本发明进一步提供一种基于智能电视的音频数据处理装置。
参照图6,图6为本发明音频识别装置的第一实施例的功能模块示意图。
在本实施例中,所述音频识别装置包括:第一获取模块10、第二获取模块20、确定模块30及匹配模块40。
第一获取模块10,用于获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
在本实施例中,所述终端可为智能电视,以下以智能电视为例进行说明。智能电视侦测到语音PCM码流输入时,将接收到的语音PCM码流(音频数据)存储至音频缓存区中,然后实时或定时检测所述音频缓存区是否存在音频数据,并在检测到所述音频缓存区存在音频数据后,获取所述音频缓存区中的音频数据。例如,获取到所述音频缓存区的第一存储单元存储的数据为0x50,即APCM_Data(x1)=0x50;获取到所述音频缓存区的第二存储单元存储的数据为0x40,即APCM_Data(x2)=0x40。
所述第二获取模块20,用于获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
当智能电视获取到音频缓存区中存储的音频数据后,对所述音频数据进行求和及移位处理,从而得到所述音频数据的第一特征信息,例如,经过对所述音频数据求和及移位处理后,得到所述音频数据的第一的特征信息为0x0A00。在本实施例中,所述音频数据为智能电视的控制指令对应的语音PCM码流数据,包括人名、一些特定的名词对应的语音PCM码流数据等。所述第一特征信息为所述音频数据的校验数据,该检验数据是唯一的,即每个音频数据的校验数据唯一代表一个音频数据。
在获取到所述音频数据的第一特征信息后,获取终端当前显示界面中各个字符的第二特征信息。在本实施例中,所述各个字符为终端当前显示界面存在的字符,该字符存在于特定的字符文件中,各个字符的存在形式举例如下:
Resource_String(x1)="影视"
Resource_String(x2)="电影"
Video.xml: <string name="app_name">影视</string>
在获取到各个字符后,从云端获取所述各个字符的第二特征信息,例如,获取到的各个字符对应的第二的特征信息分别为:0x0B00、0x0A00、0x0C00等。
所述确定模块30,确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符;
在获取到所述音频数据的第一特征信息以及各个字符的第二特征信息后,将所述音频数据的第一特征信息与各个字符的第二特征信息进行匹配,直到确定与所述音频数据的第一特征信息匹配成功的第二特征信息所对应的字符为止。
所述匹配模块40,用于将确定的所述字符作为与所述音频数据匹配的字符。
当确定与所述音频数据的第一特征信息匹配成功的第二特征信息对应的字符后,将所述字符作为所述音频数据匹配的字符,即所述音频数据代表的意思即为所述字符,并通过智能电视的屏幕将所述匹配成功的字符显示出来,以供用户确定所述显示的字符是否为正确的字符。
本发明通过获取音频数据的特征信息,并将所述音频数据的特征信息与从云端获取的终端当前显示界面的各个字符数据对应的特征信息进行匹配,使得即使音频数据存在丢失的情况,也能根据所述音频数据的特征信息与所述各个字符数据对应的特征信息进行匹配,从而识别出所述音频数据对应的字符,提高了语音识别准确度。
进一步地,基于上述第一实施例提出本发明音频识别装置的第二实施例,参照图7,所述第二获取模块20包括:确定单元21,获取单元22、计算单元23及判断单元24。
所述确定单元21,确定所述音频数据的码流中的最大码值以及最小码值;
智能电视读取所述音频缓存区中的音频数据,并将所述音频数据按照码值的大小顺序排列,可以将所述音频数据按照码值的大小从大到小顺序排列,或者将所述音频数据按照码值的大小从小到大顺序排列。在对所述音频数据排序后,很容易就能获取到所述音频数据的码流中的最大码值以及最小码值,所述最大码值为PCM码流中峰值最大时对应的音频数据的码值,所述最小码值为PCM码流中峰值最小时对应的音频数据的码值。
所述获取单元22,用于获取所述音频数据的码流中出现最大码值的数据时刻以及出现最小码值的数据时刻之间的码值;
在确定所述最大码值以及最小码值后,根据所述最大码值及最小码值确定所述音频数据的码流中最大的码值的数据的出现时刻及所述音频数据的码流中最小的码值的数据的出现时刻,并获取所述音频数据的码流中出现最大码值的数据时刻以及出现最小码值的数据时刻之间的码值。
进一步地,所述获取单元22,还用于获取终端当前显示界面中所述各个字符在预设的字符文件中重复出现的比例;
在获取到所述音频数据的第一特征信息后,读取智能电视当前界面的类型。所述类型包括浏览器类型及安卓系统类型。所述智能电视当前界面的类型可以只为浏览器类型,也可只为安卓系统类型,也可同时存在浏览器类型和安卓系统类型。
在确定到所述智能电视的当前界面类型后,获取所述各个字符在预设的字符文件中重复出现的比例。所述预设的字符文件包括当前界面类型为浏览器类型时所对应的xml文件的字符文件及当前界面类型为安卓系统类型时所对应的安卓系统控件的资源文件夹的xml文件的字符文件。当确定所述当前界面的类型包括浏览器类型时,则读取该类型对应的预设的字符文件中存在的所有字符,并将所述读取到的字符保存到预设的存储区间内;当确定所述当前界面的类型包括安卓系统类型时,则读取该类型对应的预设的字符文件中存在的所有字符,并将所述读取到的字符保存在另一预设的存储区间内。所述智能电视读取上述两个存储区间内的字符,并确定各个字符重复出现的比例。在本实施例中,设置字符10次及10次以上重复出现的比例为100%,5次重复出现的比例为50%,4次出现的重复比例为40%,一次重复出现的比例为10%等。在读取到上述两个存储区间内的字符后,根据预设的判断字符的重复出现比例的规则,就能确定各个字符重复出现的比例。
所述计算单元23,用于根据获取到的所述码值计算所述音频数据的第一特征信息。
在获取到所述音频数据的码流中出现最大码值的数据时刻以及出现最小码值的数据时刻之间的码值后,根据语音PCM流特征算法计算出音频数据的第一特征信息,所述语音PCM流特征算法为对所述获取到的所述码值进行求和及移位处理的算法。
进一步地,所述计算单元23还包括:求和子单元231及移位子单元232。
所述求和子单元231,用于对所获取到的所述码值进行求和得到所述音频数据的校验值;
所述智能电视对所获取到的所述码值进行求和,从而得到所述音频数据的校验值。所述获取到的所述码值为所述短音频数据对应的码值。例如,所所获取到的所述码值由最大的PCM码流数据0x50、次最大的PCM码流数据0x40及最小的PCM码流数据0x10组成,则所述音频数据的校验值=0x50+0x40+0x10=0xA0。可选地,所述校验值由16位二进制数表示,则所述音频数据的校验值表示为0x00A0。
移位子单元232,用于按照预设的左移算法对所述校验值进行左移处理,以得到所述音频数据的第一特征信息。
所述左移算法可为将所述校验值左移一位的算法,也可为左移多位的算法。在本实施例中,将所述校验值进行左移动一位的左移处理,则得到所述音频数据的特征信息为:Personal_PCM_Data(x1) = |0x00A0<<1| =0x0A00,其中,0x00A0为所述音频数据的校验值。每获取到一个音频数据的校验值,都需要通过左移算法对所述校验值进行处理,从而使得每个音频数据的特征信息唯一代表一个音频数据。
所述判断单元24,用于判断所述各个字符重复出现的比例是否大于第一预设比例;
所述获取单元22,还用于若所述各个字符重复出现的比例大于第一预设比例,则按照各个字符重复出现的比例依次从云端获取各个字符对应的第二特征信息。
在确定各个字符数据重复出现的比例后,判断所述各个字符数据重复出现的比例是否大于第一预设比例。所述第一预设比例为预先设定的比例值,可选地,所述第一预设比例设为50%,可以理解的是,所述第一预设比例也可设为其它值。当判定各个字符重复出现的比例大于第一预设比例,则优先从云端服务器获取所述重复比例最大的字符对应的第二特征信息,然后将所述从云端获取的第二特征信息与所述音频数据的第一特征信息进行匹配,若匹配成功,则显示所述字符,若匹配失败,则从云端服务器获取所述重复比例次大的字符对应的第二特征信息与所述音频数据的特征信息进行匹配,直到匹配成功为止。
本实施例通过获取所述音频数据中的某些特定的音频数据的特征信息来表示所述音频数据。由于每个音频数据的特征信息是唯一的,因此,可以通过所述特征信息来代表该音频数据,从而使得音频数据的表示更加简便,解决了音频数据的传送占有大量的传送通道的问题。
进一步地,基于上述第一或第二实施例提出本发明音频识别装置的第三实施例,参照图8,所述音频识别装置还包括:判断模块50。
所述确定模块30,用于确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
在获取音频缓存区中存储的音频数据前,首先确定所述音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数。可选地,所述第二预设比例设为5%,可以理解的,该第二预设比例也可以根据具体情况设置为其他比例值。在确定所述音频数据的第二预设比例所对应的音频数据的个数时,应该先确定所述音频缓存区的音频数据的总数,然后根据音频数据的总数与第二预设比例值计算出所述第二预设比例所对应的音频数据的个数。
判断模块50,用于判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
所述第一获取模块30,还用于若音频数据中码值大于1的音频数据的个数大于所述音频数据的第二预设比例所对应的音频数据的个数,则获取音频缓存区中存储的音频数据。
当确定所述第二预设比例所对应的音频数据的个数及大于1的音频数据的个数后,将两者进行比较,从而判断出所述大于1的音频数据的个数是否大于第二预设比例所对应的音频数据的个数,当判定出所述大于1的音频数据的个数大于所述第二预设比例所对应的音频数据的个数时,获取音频缓存区中存储的音频数据;当判定出所述大于1的音频数据的个数小于所述第二预设比例所对应的音频数据的个数时,确定所述音频缓存区中不存在音频数据。
本实施例通过判断所述音频缓存区中的音频数据中大于1的音频数据的个数是否大于第二预设比例所对应的音频数据的个数,来确定所述音频缓存区中是否存在音频数据。由于所述音频数据的存在与否与音频数据中大于1的音频数据的个数有关,因此通过判断所述音频数据中大于1的音频数据的个数与第二预设比例所对应的音频数据的个数的大小能更准确确定所述音频缓存区中是否存在音频数据,从而预先排除一些噪音的干扰。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (20)

  1. 一种音频识别方法,其特征在于,所述音频识别方法包括以下步骤:
    获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
    获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
    确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符;以及
    将确定的所述字符作为与所述音频数据匹配的字符;
    其中,所述获取所述音频数据的第一特征信息的步骤包括:
    确定所述音频数据的码流中的最大码值以及最小码值;
    获取所述音频数据的码流中出现最大码值的时刻以及出现最小码值的时刻之间的码值;
    根据获取到的所述码值计算所述音频数据的第一特征信息;
    所述获取终端当前显示界面中各个字符的第二特征信息的步骤包括:
    获取终端当前显示界面中所述各个字符在预设的字符文件中重复出现的比例;
    判断所述各个字符重复出现的比例是否大于第一预设比例;
    若是,则按照各个字符重复出现的比例依次从云端获取各个字符对应的第二特征信息。
  2. 如权利要求1所述的音频识别方法,其特征在于,所述根据获取到的所述码值计算所述音频数据的第一特征信息的步骤包括:
    对所获取到的所述码值进行求和得到所述音频数据的校验值;
    按照预设的左移算法对所述校验值进行左移处理,以得到所述音频数据的第一特征信息。
  3. 如权利要求2所述的音频识别方法,其特征在于,所述获取音频缓存区中存储的音频数据的步骤之前,所述音频识别方法还包括:
    确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    若是,则执行获取音频缓存区中存储的音频数据的步骤。
  4. 如权利要求1所述的音频识别方法,其特征在于,所述获取音频缓存区中存储的音频数据的步骤之前,所述音频识别方法还包括:
    确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    若是,则执行获取音频缓存区中存储的音频数据的步骤。
  5. 一种音频识别方法,其特征在于,所述音频识别方法包括以下步骤:
    获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
    获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
    确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符;以及
    将确定的所述字符作为与所述音频数据匹配的字符。
  6. 如权利要求5所述的音频识别方法,其特征在于,所述获取音频缓存区中存储的音频数据的步骤之前,所述音频识别方法还包括:
    确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    若是,则执行获取音频缓存区中存储的音频数据的步骤。
  7. 如权利要求5所述的音频识别方法,其特征在于,所述获取所述音频数据的第一特征信息的步骤包括:
    确定所述音频数据的码流中的最大码值以及最小码值;
    获取所述音频数据的码流中出现最大码值的时刻以及出现最小码值的时刻之间的码值;
    根据获取到的所述码值计算所述音频数据的第一特征信息。
  8. 如权利要求7所述的音频识别方法,其特征在于,所述获取音频缓存区中存储的音频数据的步骤之前,所述音频识别方法还包括:
    确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    若是,则执行获取音频缓存区中存储的音频数据的步骤。
  9. 如权利要求7所述的音频识别方法,其特征在于,所述根据获取到的所述码值计算所述音频数据的第一特征信息的步骤包括:
    对所获取到的所述码值进行求和得到所述音频数据的校验值;
    按照预设的左移算法对所述校验值进行左移处理,以得到所述音频数据的第一特征信息。
  10. 如权利要求9所述的音频识别方法,其特征在于,所述获取音频缓存区中存储的音频数据的步骤之前,所述音频识别方法还包括:
    确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    若是,则执行获取音频缓存区中存储的音频数据的步骤。
  11. 如权利要求5所述的音频识别方法,其特征在于,所述获取终端当前显示界面中各个字符的第二特征信息的步骤包括:
    获取终端当前显示界面中所述各个字符在预设的字符文件中重复出现的比例;
    判断所述各个字符重复出现的比例是否大于第一预设比例;
    若是,则按照各个字符重复出现的比例依次从云端获取各个字符对应的第二特征信息。
  12. 如权利要求11所述的音频识别方法,其特征在于,所述获取音频缓存区中存储的音频数据的步骤之前,所述音频识别方法还包括:
    确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    若是,则执行获取音频缓存区中存储的音频数据的步骤。
  13. 一种音频识别装置,其特征在于,所述音频识别装置包括:
    第一获取模块,用于获取音频缓存区中存储的音频数据,其中,终端在接收到音频数据时,将接收到的音频数据存储至所述音频缓存区中;
    第二获取模块,用于获取所述音频数据的第一特征信息以及终端当前显示界面中各个字符的第二特征信息;
    确定模块,用于确定当前显示界面的各个字符中所述第二特征信息与所述第一特征信息匹配的字符。
    匹配模块,用于将确定的所述字符作为与所述音频数据匹配的字符。
  14. 如权利要求13所述的音频识别装置,其特征在于,所述音频识别装置还包括:
    所述确定模块,还用于确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断模块,用于判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    所述第一获取模块,还用于若音频数据中码值大于1的音频数据的个数大于所述音频数据的第二预设比例所对应的音频数据的个数,则获取音频缓存区中存储的音频数据。
  15. 如权利要求13所述的音频识别装置,其特征在于,所述第二获取模块包括:
    确定单元,用于确定所述音频数据的码流中的最大码值以及最小码值;
    获取单元,用于获取所述音频数据的码流中出现最大码值的数据时刻以及出现最小码值的数据时刻之间的码值;
    计算单元,用于根据获取到的所述码值计算所述音频数据的第一特征信息。
  16. 如权利要求15所述的音频识别装置,其特征在于,所述音频识别装置还包括:
    所述确定模块,还用于确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断模块,用于判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    所述第一获取模块,还用于若音频数据中码值大于1的音频数据的个数大于所述音频数据的第二预设比例所对应的音频数据的个数,则获取音频缓存区中存储的音频数据。
  17. 如权利要求15所述的音频识别装置,其特征在于,所述计算单元包括:
    求和子单元,用于对所获取到的所述码值进行求和得到所述音频数据的校验值;
    处理子单元,用于按照预设的左移算法对所述校验值进行左移处理,以得到所述音频数据的第一特征信息。
  18. 如权利要求17所述的音频识别装置,其特征在于,所述音频识别装置还包括:
    所述确定模块,还用于确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断模块,用于判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    所述第一获取模块,还用于若音频数据中码值大于1的音频数据的个数大于所述音频数据的第二预设比例所对应的音频数据的个数,则获取音频缓存区中存储的音频数据。
  19. 如权利要求13所述的音频识别装置,其特征在于,所述第二获取模块还包括:
    所述获取单元,还用于获取终端当前显示界面中所述各个字符在预设的字符文件中重复出现的比例;
    判断单元,用于判断所述各个字符重复出现的比例是否大于第一预设比例;
    所述获取单元,还用于若所述各个字符重复出现的比例大于第一预设比例,则按照各个字符重复出现的比例依次从云端获取各个字符对应的第二特征信息。
  20. 如权利要求19所述的音频识别装置,其特征在于,所述音频识别装置还包括:
    所述确定模块,还用于确定所述音频缓存区中存储的音频数据中码值大于1的音频数据的个数以及所述音频数据的第二预设比例所对应的音频数据的个数;
    判断模块,用于判断音频数据中码值大于1的音频数据的个数是否大于所述音频数据的第二预设比例所对应的音频数据的个数;
    所述第一获取模块,还用于若音频数据中码值大于1的音频数据的个数大于所述音频数据的第二预设比例所对应的音频数据的个数,则获取音频缓存区中存储的音频数据。
PCT/CN2016/084617 2015-11-12 2016-06-03 音频识别方法及装置 WO2017080195A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510772801.4A CN105469783B (zh) 2015-11-12 2015-11-12 音频识别方法及装置
CN201510772801.4 2015-11-12

Publications (1)

Publication Number Publication Date
WO2017080195A1 true WO2017080195A1 (zh) 2017-05-18

Family

ID=55607413

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/084617 WO2017080195A1 (zh) 2015-11-12 2016-06-03 音频识别方法及装置

Country Status (2)

Country Link
CN (1) CN105469783B (zh)
WO (1) WO2017080195A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112397051A (zh) * 2019-08-16 2021-02-23 武汉Tcl集团工业研究院有限公司 语音识别方法、装置及终端设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469783B (zh) * 2015-11-12 2019-06-21 深圳Tcl数字技术有限公司 音频识别方法及装置
CN105847900B (zh) * 2016-05-26 2018-10-26 无锡天脉聚源传媒科技有限公司 一种节目频道确定方法及装置
CN106648532A (zh) * 2016-12-22 2017-05-10 惠州Tcl移动通信有限公司 一种实现自动搜索的方法、系统及其移动终端
CN115022108A (zh) * 2022-06-16 2022-09-06 深圳市欢太科技有限公司 会议接入方法、装置、存储介质及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006043988A1 (en) * 2004-10-20 2006-04-27 Oracle International Corporation Computer-implemented methods and systems for entering and searching for non-roman-alphabet characters and related search systems
CN103618953A (zh) * 2013-08-15 2014-03-05 北京中视广信科技有限公司 基于音频特征的广播电视节目标识与识别的方法及系统
CN103634613A (zh) * 2013-08-15 2014-03-12 北京中视广信科技有限公司 移动终端与广播电视频道自动同步的方法及系统
CN104036773A (zh) * 2014-05-22 2014-09-10 立德高科(北京)数码科技有限责任公司 将录入的文本内容通过防伪辨别装置以播放的方法及系统
CN104423552A (zh) * 2013-09-03 2015-03-18 联想(北京)有限公司 一种处理信息的方法和电子设备
US20150255059A1 (en) * 2014-03-05 2015-09-10 Casio Computer Co., Ltd. Voice search device, voice search method, and non-transitory recording medium
CN105469783A (zh) * 2015-11-12 2016-04-06 深圳Tcl数字技术有限公司 音频识别方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102177726B (zh) * 2008-08-21 2014-12-03 杜比实验室特许公司 用于音频和视频签名生成和检测的特征优化和可靠性估计
KR101775532B1 (ko) * 2011-01-17 2017-09-06 엘지전자 주식회사 서로 다른 적어도 2개 이상의 데이터베이스를 이용하여 음성 인식 서비스를 제공하는 멀티미디어 디바이스 및 그 제어 방법
CN103686055B (zh) * 2012-09-24 2017-05-10 中兴通讯股份有限公司 电视会议系统中丢包补偿的处理方法及装置
CN104796729B (zh) * 2015-04-09 2018-04-17 宁波创视信息技术有限公司 高清晰实时获取电视播放画面的方法
CN104917671B (zh) * 2015-06-10 2017-11-21 腾讯科技(深圳)有限公司 基于移动终端的音频处理方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006043988A1 (en) * 2004-10-20 2006-04-27 Oracle International Corporation Computer-implemented methods and systems for entering and searching for non-roman-alphabet characters and related search systems
CN103618953A (zh) * 2013-08-15 2014-03-05 北京中视广信科技有限公司 基于音频特征的广播电视节目标识与识别的方法及系统
CN103634613A (zh) * 2013-08-15 2014-03-12 北京中视广信科技有限公司 移动终端与广播电视频道自动同步的方法及系统
CN104423552A (zh) * 2013-09-03 2015-03-18 联想(北京)有限公司 一种处理信息的方法和电子设备
US20150255059A1 (en) * 2014-03-05 2015-09-10 Casio Computer Co., Ltd. Voice search device, voice search method, and non-transitory recording medium
CN104036773A (zh) * 2014-05-22 2014-09-10 立德高科(北京)数码科技有限责任公司 将录入的文本内容通过防伪辨别装置以播放的方法及系统
CN105469783A (zh) * 2015-11-12 2016-04-06 深圳Tcl数字技术有限公司 音频识别方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112397051A (zh) * 2019-08-16 2021-02-23 武汉Tcl集团工业研究院有限公司 语音识别方法、装置及终端设备
CN112397051B (zh) * 2019-08-16 2024-02-02 武汉Tcl集团工业研究院有限公司 语音识别方法、装置及终端设备

Also Published As

Publication number Publication date
CN105469783B (zh) 2019-06-21
CN105469783A (zh) 2016-04-06

Similar Documents

Publication Publication Date Title
WO2017080195A1 (zh) 音频识别方法及装置
WO2017143692A1 (zh) 智能电视及其语音控制方法
WO2017054592A1 (zh) 一种界面显示的方法及终端
WO2019061612A1 (zh) 贷款产品推广方法、装置及计算机可读存储介质
WO2019061613A1 (zh) 贷款资质筛选方法、装置及计算机可读存储介质
WO2018006489A1 (zh) 终端的语音交互方法及装置
WO2018223607A1 (zh) 电视终端及hdr图像转为sdr的方法和计算机可读存储介质
WO2019051902A1 (zh) 终端控制方法、空调器及计算机可读存储介质
WO2019196213A1 (zh) 接口测试方法、装置、设备及计算机可读存储介质
WO2018120457A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
WO2019041851A1 (zh) 家电售后咨询方法、电子设备和计算机可读存储介质
WO2018120429A1 (zh) 一种资源更新的方法、终端、计算机可读存储介质及资源更新设备
WO2016032021A1 (ko) 음성 명령 인식을 위한 장치 및 방법
WO2017088427A1 (zh) 音频输出控制方法及装置
WO2015139594A1 (en) Security verification method, apparatus, and system
WO2016000560A1 (en) File transmission method, file transmission apparatus, and file transmission system
WO2016127458A1 (zh) 改进的基于语义词典的词语相似度计算方法和装置
WO2018233221A1 (zh) 多窗口声音输出方法、电视机以及计算机可读存储介质
WO2018032680A1 (zh) 音视频播放方法及系统
WO2018223602A1 (zh) 显示终端、画面对比度提高方法及计算机可读存储介质
WO2018188196A1 (zh) 一种数据版本控制方法、数据版本控制器、设备及计算机可读存储介质
WO2017045435A1 (zh) 控制电视播放方法和装置
WO2017036208A1 (zh) 显示界面中的信息提取方法及系统
WO2019085543A1 (zh) 电视机系统及电视机控制方法
WO2019000466A1 (zh) 人脸识别方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16863376

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20/08/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16863376

Country of ref document: EP

Kind code of ref document: A1