WO2018108080A1 - 一种基于声纹搜索的信息推荐方法及装置 - Google Patents

一种基于声纹搜索的信息推荐方法及装置 Download PDF

Info

Publication number
WO2018108080A1
WO2018108080A1 PCT/CN2017/115707 CN2017115707W WO2018108080A1 WO 2018108080 A1 WO2018108080 A1 WO 2018108080A1 CN 2017115707 W CN2017115707 W CN 2017115707W WO 2018108080 A1 WO2018108080 A1 WO 2018108080A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
information
user
search
target keyword
Prior art date
Application number
PCT/CN2017/115707
Other languages
English (en)
French (fr)
Inventor
何坚强
Original Assignee
北京奇虎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2018108080A1 publication Critical patent/WO2018108080A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval

Definitions

  • the present invention relates to the field of speech recognition technology, and more particularly to an information recommendation method and apparatus based on voiceprint search.
  • Speech technology is a computer that uses the corresponding algorithm to automatically extract the information that humans need from the speech.
  • the study of speech technology began in the 1950s and has a history of 60 years. With the rapid development of information technology, voice technology has become more and more important, and its application prospects are becoming more and more broad.
  • the present invention proposes an information recommendation method and apparatus based on voiceprint search, which can quickly and accurately recommend program content to a user according to user voice.
  • an embodiment of the present invention provides a method for recommending information based on a voiceprint search, comprising: preprocessing input voice information to obtain voiceprint data to be recognized; and extracting prosody characteristics of the voiceprint data. Finding a voiceprint model library according to the prosody feature, and identifying a target keyword, wherein the voiceprint model library includes a keyword lexicon indexed by a final vowel; searching for a title information including the target keyword, according to a preset rule Recommended for users.
  • an embodiment of the present invention provides an information recommendation apparatus based on voiceprint search.
  • the method includes: at least one processor; and at least one memory communicatively coupled to the at least one processor; the at least one memory including processor-executable instructions when the processor-executable instructions are When the at least one processor executes, causing the apparatus to perform at least the following operations: pre-processing the input voice information to obtain voiceprint data to be recognized; extracting a prosodic feature of the voiceprint data; searching according to the prosodic feature
  • the voiceprint model library identifies a target keyword, wherein the voiceprint model library includes a keyword vocabulary indexed by a finals; the title information including the target keyword is searched and recommended to the user according to a preset rule.
  • an embodiment of the present invention provides a computer program comprising computer readable code, when the computer readable code is executed, causing the method of the first aspect to be performed.
  • an embodiment of the present invention provides a computer readable medium, wherein the computer program according to the third aspect is stored.
  • the solution provided by the present invention firstly preprocesses the input voice information to obtain the voiceprint data to be identified.
  • the user presses the voice key and inputs voice information at the same time, and the system obtains the voiceprint data to be identified after pre-processing such as denoising.
  • the prosodic features of the voiceprint data are extracted.
  • the voiceprint features include acoustic features, prosodic features, lexical features, and the like.
  • the “prosody feature” is also called “super-sound feature” or “super-segment feature”, which refers to the change of pitch, length and intensity in speech other than the sound quality feature.
  • Rhythm is a typical feature of human natural language.
  • the present invention analyzes only by extracting prosodic features, and the analysis speed is fast.
  • the preset voiceprint model library of the present invention includes a keyword vocabulary indexed by a final. Finding a voiceprint model library according to the prosody feature, and identifying a target keyword. Thereby, the voice information is converted into text information quickly and accurately.
  • the title information including the target keyword is searched and recommended to the user according to a preset rule.
  • the title information includes content information of a name such as a book, a news, an article, and a summary thereof.
  • FIG. 1 is a flow chart of a method for recommending information based on voiceprint search according to the present invention.
  • FIG. 2 is a flow chart of an embodiment of a method for recommending information based on a voiceprint search according to the present invention.
  • FIG. 3 is a schematic diagram of an information recommendation apparatus based on a voiceprint search according to the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of an information recommendation apparatus based on a voiceprint search according to the present invention.
  • Figure 5 shows a block diagram of a smartphone based terminal for performing the method according to the invention
  • Figure 6 shows a schematic diagram of a memory unit for holding or carrying program code implementing the method according to the invention.
  • FIG. 1 is a flow chart of a method for recommending information based on voiceprint search according to the present invention, including:
  • S101 Perform pre-processing on the input voice information to obtain voiceprint data to be identified;
  • S103 Search a voiceprint model library according to the prosody feature, and identify a target keyword, where the voiceprint model library includes a keyword vocabulary with a final as an index;
  • S104 Search for title information including the target keyword, and recommend it to the user according to a preset rule.
  • the solution provided by the present invention firstly preprocesses the input voice information to obtain the voiceprint data to be identified.
  • the user presses the voice key and inputs voice information at the same time, and the system obtains the voiceprint data to be identified after pre-processing such as denoising.
  • the prosodic features of the voiceprint data are extracted.
  • the voiceprint features include acoustic features, prosodic features, lexical features, and the like.
  • the “prosody feature” is also called “super-sound feature” or “super-segment feature”, which refers to the change of pitch, length and intensity in speech other than the sound quality feature.
  • Rhythm is a typical feature of human natural language.
  • the present invention analyzes only by extracting prosodic features, and the analysis speed is fast.
  • the preset voiceprint model library of the present invention includes a keyword vocabulary indexed by a final. Finding a voiceprint model library according to the prosody feature, and identifying a target keyword. Thereby, the voice information is converted into text information quickly and accurately.
  • the title information including the target keyword is searched and recommended to the user according to a preset rule.
  • the title information includes content information of a name such as a book, a news, an article, and a summary thereof.
  • FIG. 2 is a flow chart of an embodiment of a method for recommending information based on a voiceprint search according to the present invention. 2, in contrast to FIG. 1, the embodiment of FIG. 2 further includes logging in an account with a voiceprint, and further, by using the account information, recommending to the user title information that meets the user's individual needs.
  • S201 Perform pre-processing on the input voice information to obtain voiceprint data to be identified;
  • S202 Determine whether the current voiceprint data matches the voiceprint data pre-stored in the user account, and if yes, log in to the user account;
  • S204 Find a voiceprint model library according to the prosody feature, and identify a target keyword, where the voiceprint model library includes a keyword vocabulary indexed by a finals;
  • S205 Display at least two target keywords for the user to select; and determine a target keyword to be searched according to the user's selection;
  • S206 determining the age of the speaker of the voice information, marking the speaker as an adult or a child; searching for the title information including the target keyword in the corresponding adult or child information area;
  • S207 determining the gender of the speaker of the voice information, marking the speaker as a boy or a girl; searching for the title information including the target keyword in the information area corresponding to the gender;
  • S208 Display the searched title information to the user by time or by the page size.
  • the implementation terminal of the present invention is not limited to a smart terminal having a microphone/sound receiver such as a wearable device, a mobile phone, an IPAD, a personal computer, or the like.
  • the present invention is implemented by a child using a smartphone terminal for further explanation. Assume that the child presses the voice button and simultaneously inputs the voice message "Gray Wolf" to the microphone of the smartphone. The invention preprocesses the input voice information to obtain the voiceprint data to be identified.
  • the step of pre-processing the input voice information to obtain the voiceprint data to be identified includes:
  • the cumulative energy of the voice data of each frame is calculated. If the accumulated energy of the continuous voice frame is greater than the preset silence threshold, the continuous voice frame is adopted as the voiceprint data to be identified.
  • Preprocessing includes two parts: denoising and endpoint detection.
  • Denoising is to quantize and sample the voice information input by the microphone to obtain a digitized voice stream; then pass the noisy voice through the noise processing to obtain a clean voice stream and filter out the low frequency interference through the pre-emphasis technique, especially 50H or 60Hz power frequency interference, improve the high frequency part of the voice stream, and it can also eliminate DC drift, suppress random noise and improve the energy of the unvoiced part.
  • the input voice information is sampled in mono, 8 bit, and 16 KHz.
  • the continuous speech frame is the voiceprint data to be identified. Keep all available training speech frames.
  • the system uses the short-term energy of the voice information and the short-term zero-crossing rate for endpoint detection.
  • the sampling frequency of the voice information is 8 Hz, and the data of each frame is 20 ms, for a total of 160 sampling points.
  • the short-term energy and the short-time zero-crossing rate are calculated every 20 ms.
  • the present invention preprocesses the "grey wolf” voice information, and matches the voiceprint data pre-stored by one of the user accounts, and logs in to the user account.
  • the child's age, gender, reading preference, and the like can be entered, so that the present invention can customize the personalized information to the user in combination with the user characteristics.
  • a prosodic feature of the voiceprint data is extracted.
  • the finals are ui, ai, ang. Due to language habits, the accent and length of the last final vowel are relatively large.
  • the vocal model library is searched according to the prosody feature to identify the target keyword, wherein the vocal model library contains a keyword lexicon indexed by the final. Since the prosody feature extraction is convenient and easy to analyze, the present invention analyzes only by extracting prosodic features, and the analysis speed is fast. However, the vocabulary data in the keyword lexicon of the voiceprint model stored in this embodiment is relatively small, and the comparative analysis is fast and the accuracy is high.
  • the voiceprint model library stores relevant information as shown in the following table:
  • the target keywords can be quickly identified as “grey wolf”, “grey wolf” and “Ran Taro”.
  • the searched title information is displayed to the user in chronological or pageview size.
  • the recommendations to the user are “Pleasant Goat and Big Big Wolf”, “Little Red Riding Hood” and “Ninja Ritaro”.
  • At least two target keywords are displayed for the user to select, for example, the present invention provides feedback to display “Grey Wolf”, “Grey Wolf”, “Ran Taro”, and the like.
  • the target keyword to be searched is determined to be “Grey Wolf”.
  • the first recommendation to the user is “Pleasant Goat and Big Big Wolf.”
  • the age of the speaker of the voice information is determined, the speaker is marked as an adult or a child; and the title information including the target keyword is searched for in the corresponding adult or child information area. It is determined that the age of the speaker of the voice information is preferentially determined by the age of the user account registration, and the age of the user may also be determined according to the voiceprint feature in the process of preprocessing. As shown in the above table, when it is judged that the user who inputs the voice information is a child, the target keywords are determined to be "grey wolf” and "grey wolf". The searched title information is displayed to the user in chronological or pageview size. At this point, the first recommended to the user is "Pleasant Goat and Big Big Wolf" and “Little Red Riding Hood". Secondly, you can also recommend the related Ninja Ranta.
  • the gender of the speaker of the voice information is determined, the speaker is marked as a boy or a girl; and the title information including the target keyword is searched for in the information area corresponding to the gender.
  • the target keywords are "grey wolf” and "grey wolf”.
  • the searched title information is displayed to the user in chronological or pageview size.
  • the first recommendation to the user is "Pleasant Goat and Big Big Wolf.”
  • the method further includes:
  • a search record of title information recommended to the user is stored in a history search record of the user account.
  • the method further includes:
  • the title information read by the user is stored in the history reading record of the user account.
  • the method further includes:
  • the title information of the user's favorite collection is stored in the bookmark column of the user account.
  • FIG. 3 is a schematic diagram of an information recommendation apparatus based on a voiceprint search according to the present invention, including:
  • a preprocessing unit configured to preprocess the input voice information to obtain voiceprint data to be identified
  • a feature extraction unit configured to extract a prosodic feature of the voiceprint data
  • a keyword identifying unit configured to search a voiceprint model library according to the prosody feature, and identify a target keyword, wherein the voiceprint model library includes a keyword vocabulary indexed by a final vowel;
  • the search recommendation unit is configured to search for title information including the target keyword, and recommend it to the user according to a preset rule.
  • FIG. 3 corresponds to Figure 1, and the operation of each unit in the figure is the same as in the method.
  • FIG. 4 is a schematic diagram of an embodiment of an information recommendation apparatus based on a voiceprint search according to the present invention.
  • the keyword determining unit is configured to display at least two target keywords for the user to select; and determine the target keywords to be searched according to the user's selection.
  • An account login unit connected to the pre-processing unit is configured to determine whether the current voiceprint data matches the voiceprint data pre-stored by the user account, and if so, log in to the user account.
  • the search recommendation unit includes:
  • the age judging unit is configured to judge the age of the speaker of the voice information, mark the speaker as an adult or a child, and search for a title information including the target keyword in the corresponding adult or child information area.
  • the search recommendation unit includes:
  • a gender judging unit configured to judge a gender of the speaker of the voice information, to speak the voice The person is marked as a boy or a girl; the title information including the target keyword is searched for in the information area corresponding to the gender.
  • the search recommendation unit includes:
  • a title display unit for displaying the searched title information to the user by time or by page size.
  • Figure 4 corresponds to Figure 2, in which the operation of each unit is the same as in the method.
  • the preprocessing unit comprises:
  • a sampling unit configured to sample a voice stream of the voice information by using a single channel
  • a framing unit configured to divide the 256 sample points into one frame, and divide the voice stream into frames by using 128 sampling points as overlapping units between the sound boxes;
  • the calculating unit is configured to calculate the accumulated energy of the voice data of each frame. If the accumulated energy of the continuous voice frame is greater than the preset silent threshold, the continuous voice frame is adopted as the voiceprint data to be identified.
  • the method further comprises:
  • a history recording unit respectively connected to the search recommendation unit and the account login unit, configured to store a search record of the title information recommended to the user into a historical search record of the user account.
  • the method further comprises:
  • the reading and recording unit respectively connected to the search recommendation unit and the account login unit is configured to store the title information read by the user into the historical reading record of the user account.
  • the method further comprises:
  • a favorite recording unit respectively connected to the search recommendation unit and the account login unit is configured to store title information of the user's favorite collection into a bookmark column of the user account.
  • Fig. 5 shows a smartphone terminal device (hereinafter referred to as a smartphone terminal device collectively referred to as a device) that can implement a voiceprint search-based information recommendation method according to the present invention.
  • the device conventionally includes a processor 1010 and a computer program product or computer readable medium in the form of a memory 1020.
  • the memory 1020 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • the memory 1020 has a memory space 1030 for executing program code 1031 of any of the above method steps.
  • the storage space 1030 of the program code may include respective program codes 1031 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such computer program products are typically portable or fixed storage units as described with reference to FIG.
  • the storage unit may have a storage section or a storage space or the like arranged similarly to the storage 1020 in FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit comprises program code 1031' for performing the steps of the method according to the invention, ie code that can be read by, for example, a processor such as 1010, which when executed by the device causes the device to perform the above Each step in the described method.

Abstract

一种基于声纹搜索的信息推荐方法及装置。该方法包括:对输入的语音信息进行预处理,获得待识别的声纹数据(S101);提取声纹数据的韵律特征(S102);根据韵律特征查找声纹模型库,识别目标关键词,其中声纹模型库包含以韵母作为索引的关键词词库(S103);搜索包含目标关键词的标题信息,按预设规则推荐给用户(S104)。根据用户语音快速准确地给用户推荐节目内容。

Description

一种基于声纹搜索的信息推荐方法及装置 技术领域
本发明涉及语音识别技术领域,更具体地,涉及基于声纹搜索的信息推荐方法及装置。
背景技术
自从发明并使用各种机器以来,人类就有一个梦想,那就是让各种机器能够听懂自己的语言,并按照口头命令采取对应的行动,从而实现人机之间的语言交互。语音技术的出现,为人类这一梦想的实现创造了可能。语音技术是计算机利用相应算法从语音中自动提取出人类需要的、具有实际意义的信息。语音技术的研究开始于20世纪50年代,至今已有60年的历史。随着信息技术的高速发展,语音技术变得越来越重要,其应用前景也越来越广阔。
其中,网络和多媒体技术发展至今,人们想对特定人的音频内容进行快速检索,从而能够快速定位到感兴趣的人的节目内容。如何基于说话人识别技术的说话人分割、说话人聚类,在大量的历史语音数据和最新的广播电视新闻信息中查询到有效的目标数据,成为当前业界的共同问题。
发明内容
鉴于上述问题,本发明提出了一种基于声纹搜索的信息推荐方法及装置,能够根据用户语音快速准确地给用户推荐节目内容。
第一方面,本发明实施例中提供了一种基于声纹搜索的信息推荐方法,包括:对输入的语音信息进行预处理,获得待识别的声纹数据;提取所述声纹数据的韵律特征;根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库;搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。
第二方面,本发明实施例提供了一种基于声纹搜索的信息推荐装置, 包括:至少一个处理器;以及,至少一个存储器,其与所述至少一个处理器可通信地连接;所述至少一个存储器包括处理器可执行的指令,当所述处理器可执行的指令由所述至少一个处理器执行时,致使所述装置执行至少以下操作:对输入的语音信息进行预处理,获得待识别的声纹数据;提取所述声纹数据的韵律特征;根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库;搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。
第三方面,本发明实施例提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码被运行时,导致第一方面所述的方法被执行。
第四方面,本发明实施例提供了一种计算机可读介质,其中存储了如第三方面所述的计算机程序。
相对于现有技术,本发明提供的方案,首先,对输入的语音信息进行预处理,获得待识别的声纹数据。例如,用户按下语音键,同时输入语音信息,系统经过去噪等预处理后获得待识别的声纹数据。然后,提取所述声纹数据的韵律特征。需要说明的是,声纹特征包括声学特征、韵律特征和词法特征等。其中,“韵律特征”又叫“超音质特征”或“超音段特征”,指的是语音中除音质特征之外的音高、音长和音强方面的变化。韵律是人类自然语言的一个典型特征,具有许多跨语言的共同特点,比如:音高下倾、重读、停顿等都普遍存在于不同的语言之中。另外,由于韵律特征提取方便,容易分析,而本发明仅通过提取韵律特征进行分析,分析速度快。为了达到分析准确的目标,本发明预设的声纹模型库,包含以韵母作为索引的关键词词库。根据所述韵律特征查找声纹模型库,识别目标关键词。从而实现快速、准确地将语音信息转换成文字信息。最后,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。其中,所述标题信息,包括书籍、新闻、文章等名称及其摘要的内容信息。
本发明附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明一种基于声纹搜索的信息推荐方法的流程图。
图2为本发明一种基于声纹搜索的信息推荐方法的实施例流程图。
图3为本发明一种基于声纹搜索的信息推荐装置的示意图。
图4为本发明一种基于声纹搜索的信息推荐装置的实施例示意图。
图5示出了用于执行根据本发明方法的基于智能手机终端的框图;
以及
图6示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
在本发明的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1为本发明一种基于声纹搜索的信息推荐方法的流程图,包括:
S101:对输入的语音信息进行预处理,获得待识别的声纹数据;
S102:提取所述声纹数据的韵律特征;
S103:根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库;
S104:搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。
相对于现有技术,本发明提供的方案,首先,对输入的语音信息进行预处理,获得待识别的声纹数据。例如,用户按下语音键,同时输入语音信息,系统经过去噪等预处理后获得待识别的声纹数据。然后,提取所述声纹数据的韵律特征。需要说明的是,声纹特征包括声学特征、韵律特征和词法特征等。其中,“韵律特征”又叫“超音质特征”或“超音段特征”,指的是语音中除音质特征之外的音高、音长和音强方面的变化。韵律是人类自然语言的一个典型特征,具有许多跨语言的共同特点,比如:音高下倾、重读、停顿等都普遍存在于不同的语言之中。另外,由于韵律特征提取方便,容易分析,而本发明仅通过提取韵律特征进行分析,分析速度快。为了达到分析准确的目标,本发明预设的声纹模型库,包含以韵母作为索引的关键词词库。根据所述韵律特征查找声纹模型库,识别目标关键词。从而实现快速、准确地将语音信息转换成文字信息。最后,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。其中,所述标题信息,包括书籍、新闻、文章等名称及其摘要的内容信息。
图2为本发明一种基于声纹搜索的信息推荐方法的实施例流程图。图2与图1相比,图2的实施例还包括以声纹登录账户,进一步地,通过账号资料给用户推荐符合用户个性需求的标题信息。
S201:对输入的语音信息进行预处理,获得待识别的声纹数据;
S202:判断当前的声纹数据是否与用户账户预先存储的声纹数据相符,如果相符,则登录所述用户账号;
S203:提取所述声纹数据的韵律特征;
S204:根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库;
S205:显示至少两个目标关键词,以供用户选择;根据用户的选择,确定待搜索的目标关键词;
S206:判断所述语音信息的说话人的年龄,将所述说话人标记为成年人或儿童;在对应的成年人或儿童信息专区中搜索包含所述目标关键词的标题信息;
S207:判断所述语音信息的说话人的性别,将所述说话人标记为男生或女生;在对应性别的信息专区中搜索包含所述目标关键词的标题信息;
S208:按时间先后或浏览量大小将搜索到的标题信息展示给用户。
本发明的实施终端不限于可穿戴设备、手机、IPAD、个人电脑等具有话筒/声音接收器的智能终端。本实施例以小孩使用智能手机终端实施本发明,做进一步的说明。假设,小孩按下语音键,同时对智能手机的话筒输入语音信息“灰太狼”。本发明对输入的语音信息进行预处理,获得待识别的声纹数据。
优选地,所述对输入的语音信息进行预处理,获得待识别的声纹数据的步骤,包括:
采用单声道采样所述语音信息的语音流;
以256个采样点为一帧,按128个采样点为音框之间的重迭单位,对所述语音流进行分帧;
计算各帧语音数据的累积能量,如果连续语音帧累积能量大于预设静音阈值,则采纳该段连续语音帧为待识别的声纹数据。
预处理包括去噪和端点检测两部分。
去噪,是对话筒输入的语音信息进行量化和采样,获得数字化的语音流;再将含噪的语音流通过去噪处理,得到干净的语音流后并通过预加重技术滤除低频干扰,尤其是50H或60Hz的工频干扰,提升语音流的高频部分,而且它还可以起到消除直流漂移、抑制随机噪声和提升清音部分能量的作用。具体地,输入语音信息采用单声道、8bit、16KHz采样。以256个采样点为一个音框单位(帧),以128为音框之间的重迭单位,对输入语音流进行分帧。计算各帧语音数据的累积能量E(最大值为256^3=16777216,用int表示足够),
Figure PCTCN2017115707-appb-000001
如果连续语音帧累积能量大于预设静音阈值(连续数>100),则采纳 该段连续语音帧为待识别的声纹数据。保留所有可供训练的语音帧。
端点检测,本系统采用语音信息的短时能量和短时过零率进行端点检测。语音信息的采样频率为8Hz,每帧数据20ms,共计160采样点。每隔20ms计算一次短时能量和短时过零率。通过对语音信号的短时能量和短时过零率检测可以删除掉静默帧、白噪声帧和清音帧,最后保留对求取基音、LPCC等特征参数非常有用的浊音信号。
经过去噪和端点检测之后,可以判断当前的声纹数据是否与用户账户预先存储的声纹数据相符,如果相符,则登录所述用户账号。相比于现有技术需要用户记住账号和密码,更为方便和安全,尤其适用于记忆力稍弱的小孩。接上例所述,小孩输入“灰太狼”之后,本发明对“灰太狼”语音信息经过预处理,与其中一个用户账号预先存储的声纹数据相符,登录该用户账号。
以小孩身份注册用户账号时,可以输入小孩的年龄、性别、阅读偏好等,以便本发明可以结合用户特点,给用户定制个性化信息。
提取所述声纹数据的韵律特征。对于“灰太狼”的韵律特征,韵母依次为ui,ai,ang。由于语言习惯,一般最后一个韵母的重音和音长都比较大,根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库。由于韵律特征提取方便,容易分析,而本发明仅通过提取韵律特征进行分析,分析速度快。而本实施例的声纹模型库存储的以韵母作为索引的关键词词库内词库数据相对还是较少的,对比分析速度快,而且准确率也高。例如,声纹模型库内存储了如下表所示的相关信息:
分类 文献/节目 关键词 韵母索引
儿童(男) 喜羊羊与灰太狼 喜羊羊、灰太狼 Iangang,uiaiang
儿童(女) 小红帽 小红帽、大灰狼 Aoongao,aiuiang
成人(男) 忍者乱太郎 忍者、乱太郎 Ene,anaiang
通过比对,可以快速地识别目标关键词为“灰太狼”“大灰狼”“乱太郎”。按时间先后或浏览量大小将搜索到的标题信息展示给用户。此时,推荐给用户的是《喜羊羊与灰太狼》、《小红帽》和《忍者乱太郎》。
优选地,显示至少两个目标关键词,以供用户选择,例如,本发明反馈显示“灰太狼”“大灰狼”“乱太郎”等。根据用户的选择,最终,确定待搜索的目标关键词为“灰太狼”。此时,首先推荐给用户的是《喜羊羊与灰太狼》。其次,还可以推荐相关的《小红帽》和《忍者乱太郎》。
优选地,判断所述语音信息的说话人的年龄,将所述说话人标记为成年人或儿童;在对应的成年人或儿童信息专区中搜索包含所述目标关键词的标题信息。判断所述语音信息的说话人的年龄优先通过用户账号登记的年龄作为判断依据,也可以在预处理的过程中根据声纹特征判断用户的年龄。如上表所示,当判断输入语音信息的用户为儿童时,确定目标关键词为“灰太狼”和“大灰狼”。按时间先后或浏览量大小将搜索到的标题信息展示给用户。此时,首先推荐给用户的是《喜羊羊与灰太狼》和《小红帽》。其次,还可以推荐相关的《忍者乱太郎》。
优选地,判断所述语音信息的说话人的性别,将所述说话人标记为男生或女生;在对应性别的信息专区中搜索包含所述目标关键词的标题信息。如上表所示,当判断输入语音信息的用户为男孩时,确定目标关键词为“灰太狼”和“大灰狼”。按时间先后或浏览量大小将搜索到的标题信息展示给用户。此时,首先推荐给用户的是《喜羊羊与灰太狼》。其次,还可以推荐相关的《小红帽》和《忍者乱太郎》。
优选地,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的步骤之后,还包括:
将推荐给用户的标题信息的搜索记录存储到所述用户账户的历史搜索记录之中。
例如,可以存储本次搜索查找到的《喜羊羊与灰太狼》、《小红帽》和《忍者乱太郎》三个作品。小孩再次搜索时,便可以方便的获得相关题材的其他作品。
优选地,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的步骤之后,还包括:
将用户点选阅读的标题信息存储到所述用户账户的历史阅读记录之中。
例如,小孩这次点选阅读了《喜羊羊与灰太狼》,看完了第30辑,下次打开时,直接跳转至历史阅读记录30辑,方便小孩接着往下看。
优选地,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的步骤之后,还包括:
将用户点选收藏的标题信息存储到所述用户账户的书签栏之中。
例如,小孩这次点选阅读了《喜羊羊与灰太狼》,并将其添加至书签栏之中。下次小孩只需在书签栏中翻查该作品,无需重新搜索。
图3为本发明一种基于声纹搜索的信息推荐装置的示意图,包括:
预处理单元,用于对输入的语音信息进行预处理,获得待识别的声纹数据;
特征提取单元,用于提取所述声纹数据的韵律特征;
关键词识别单元,用于根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库;
搜索推荐单元,用于搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。
图3与图1相对应,图中各个单元的运行方式与方法中的相同。
图4为本发明一种基于声纹搜索的信息推荐装置的实施例示意图。
如图4所示,还包括:
关键词确定单元,用于显示至少两个目标关键词,以供用户选择;根据用户的选择,确定待搜索的目标关键词。
如图4所示,包括:
与所述预处理单元相连的账号登陆单元,用于判断当前的声纹数据是否与用户账户预先存储的声纹数据相符,如果相符,则登录所述用户账号。
如图4所示,所述搜索推荐单元,包括:
年龄判断单元,用于判断所述语音信息的说话人的年龄,将所述说话人标记为成年人或儿童;在对应的成年人或儿童信息专区中搜索包含所述目标关键词的标题信息。
如图4所示,所述搜索推荐单元,包括:
性别判断单元,用于判断所述语音信息的说话人的性别,将所述说话 人标记为男生或女生;在对应性别的信息专区中搜索包含所述目标关键词的标题信息。
如图4所示,所述搜索推荐单元,包括:
标题展示单元,用于按时间先后或浏览量大小将搜索到的标题信息展示给用户。
图4与图2相对应,图中各个单元的运行方式与方法中的相同。
优选地,所述预处理单元,包括:
采样单元,用于采用单声道采样所述语音信息的语音流;
分帧单元,用于以256个采样点为一帧,按128个采样点为音框之间的重迭单位,对所述语音流进行分帧;
计算单元,用于计算各帧语音数据的累积能量,如果连续语音帧累积能量大于预设静音阈值,则采纳该段连续语音帧为待识别的声纹数据。
优选地,还包括:
与所述搜索推荐单元、所述账号登陆单元分别相连的历史记录单元,用于将推荐给用户的标题信息的搜索记录存储到所述用户账户的历史搜索记录之中。
优选地,还包括:
与所述搜索推荐单元、所述账号登陆单元分别相连的阅读记录单元,用于将用户点选阅读的标题信息存储到所述用户账户的历史阅读记录之中。
优选地,还包括:
与所述搜索推荐单元、所述账号登陆单元分别相连的收藏记录单元,用于将用户点选收藏的标题信息存储到所述用户账户的书签栏之中。
图5示出了可以实现根据本发明基于声纹搜索的信息推荐方法的智能手机终端设备(下述将智能手机终端设备统称为设备)。该设备传统上包括处理器1010和以存储器1020形式的计算机程序产品或者计算机可读介质。存储器1020可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器1020具有用于执行上述方法中的任何方法步骤的程序代码1031的存储空间1030。例如,用于 程序代码的存储空间1030可以包括分别用于实现上面的方法中的各种步骤的各个程序代码1031。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图6所述的便携式或者固定存储单元。该存储单元可以具有与图5中的存储器1020类似布置的存储段或者存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括用于执行根据本发明的方法步骤的程序代码1031’,即可以由例如诸如1010之类的处理器读取的代码,这些代码当由设备运行时,导致该设备执行上面所描述的方法中的各个步骤。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (22)

  1. 一种基于声纹搜索的信息推荐方法,包括:
    对输入的语音信息进行预处理,获得待识别的声纹数据;
    提取所述声纹数据的韵律特征;
    根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库;
    搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。
  2. 根据权利要求1所述的基于声纹搜索的信息推荐方法,其特征在于,所述对输入的语音信息进行预处理,获得待识别的声纹数据的步骤,包括:
    采用单声道采样所述语音信息的语音流;
    以256个采样点为一帧,按128个采样点为音框之间的重迭单位,对所述语音流进行分帧;
    计算各帧语音数据的累积能量,如果连续语音帧累积能量大于预设静音阈值,则采纳该段连续语音帧为待识别的声纹数据。
  3. 根据权利要求1所述的基于声纹搜索的信息推荐方法,其特征在于,所述识别目标关键词的步骤之后,所述搜索包含所述目标关键词的标题信息的步骤之前,还包括:
    显示至少两个目标关键词,以供用户选择;
    根据用户的选择,确定待搜索的目标关键词。
  4. 根据权利要求1所述的基于声纹搜索的信息推荐方法,其特征在于,对输入的语音信息进行预处理,获得待识别的声纹数据的步骤之后,包括:
    判断当前的声纹数据是否与用户账户预先存储的声纹数据相符,如果相符,则登录所述用户账号。
  5. 根据权利要求4所述的基于声纹搜索的信息推荐方法,其特征在于,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的步骤之后,还包括:
    将推荐给用户的标题信息的搜索记录存储到所述用户账户的历史搜索记录之中。
  6. 根据权利要求4所述的基于声纹搜索的信息推荐方法,其特征在于,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的步骤之后,还包括:
    将用户点选阅读的标题信息存储到所述用户账户的历史阅读记录之中。
  7. 根据权利要求4所述的基于声纹搜索的信息推荐方法,其特征在于,搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的步骤之后,还包括:
    将用户点选收藏的标题信息存储到所述用户账户的书签栏之中。
  8. 根据权利要求1或4所述的基于声纹搜索的信息推荐方法,其特征在于,搜索包含所述目标关键词的标题信息的步骤具体,包括:
    判断所述语音信息的说话人的年龄,将所述说话人标记为成年人或儿童;
    在对应的成年人或儿童信息专区中搜索包含所述目标关键词的标题信息。
  9. 根据权利要求1或4所述的基于声纹搜索的信息推荐方法,其特征在于,搜索包含所述目标关键词的标题信息的步骤具体,包括:
    判断所述语音信息的说话人的性别,将所述说话人标记为男生或女生;
    在对应性别的信息专区中搜索包含所述目标关键词的标题信息。
  10. 根据权利要求1或4所述的基于声纹搜索的信息推荐方法,其特征在于,按预设规则推荐给用户的步骤具体,包括:
    按时间先后或浏览量大小将搜索到的标题信息展示给用户。
  11. 一种基于声纹搜索的信息推荐装置,包括:
    至少一个处理器;
    以及,至少一个存储器,其与所述至少一个处理器可通信地连接;所述至少一个存储器包括处理器可执行的指令,当所述处理器可执行的指令由所述至少一个处理器执行时,致使所述装置执行至少以下操作:
    对输入的语音信息进行预处理,获得待识别的声纹数据;
    提取所述声纹数据的韵律特征;
    根据所述韵律特征查找声纹模型库,识别目标关键词,其中所述声纹模型库包含以韵母作为索引的关键词词库;
    搜索包含所述目标关键词的标题信息,按预设规则推荐给用户。
  12. 根据权利要求11所述的基于声纹搜索的信息推荐装置,其特征在于,所述对输入的语音信息进行预处理,获得待识别的声纹数据的操作具体包括:
    采用单声道采样所述语音信息的语音流;
    以256个采样点为一帧,按128个采样点为音框之间的重迭单位,对所述语音流进行分帧;
    计算各帧语音数据的累积能量,如果连续语音帧累积能量大于预设静音阈值,则采纳该段连续语音帧为待识别的声纹数据。
  13. 根据权利要求11所述的基于声纹搜索的信息推荐装置,其特征在于,还包括:
    显示至少两个目标关键词,以供用户选择;根据用户的选择,确定待搜索的目标关键词。
  14. 根据权利要求11所述的基于声纹搜索的信息推荐装置,其特征在于,还包括:
    判断当前的声纹数据是否与用户账户预先存储的声纹数据相符,如果相符,则登录所述用户账号。
  15. 根据权利要求14所述的基于声纹搜索的信息推荐装置,其特征在于,还包括:
    将推荐给用户的标题信息的搜索记录存储到所述用户账户的历史搜索记录之中。
  16. 根据权利要求14所述的基于声纹搜索的信息推荐装置,其特征在于,还包括:
    将用户点选阅读的标题信息存储到所述用户账户的历史阅读记录之中。
  17. 根据权利要求14所述的基于声纹搜索的信息推荐装置,其特征在于,还包括:
    将用户点选收藏的标题信息存储到所述用户账户的书签栏之中。
  18. 根据权利要求11或14所述的基于声纹搜索的信息推荐装置,其特征在于,所述搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的操作,具体包括:
    判断所述语音信息的说话人的年龄,将所述说话人标记为成年人或儿童;在对应的成年人或儿童信息专区中搜索包含所述目标关键词的标题信息。
  19. 根据权利要求11或14所述的基于声纹搜索的信息推荐装置,其特征在于,所述搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的操作,具体包括:
    判断所述语音信息的说话人的性别,将所述说话人标记为男生或女生;在对应性别的信息专区中搜索包含所述目标关键词的标题信息。
  20. 根据权利要求11或14所述的基于声纹搜索的信息推荐装置,其特征在于,所述搜索包含所述目标关键词的标题信息,按预设规则推荐给用户的操作,具体包括:
    按时间先后或浏览量大小将搜索到的标题信息展示给用户。
  21. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码被运行时,导致权利要求1-10中的任一项权利要求所述的方法被执行。
  22. 一种计算机可读介质,其中存储了如权利要求21所述的计算机程序。
PCT/CN2017/115707 2016-12-13 2017-12-12 一种基于声纹搜索的信息推荐方法及装置 WO2018108080A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611146872.4A CN106601259B (zh) 2016-12-13 2016-12-13 一种基于声纹搜索的信息推荐方法及装置
CN201611146872.4 2016-12-13

Publications (1)

Publication Number Publication Date
WO2018108080A1 true WO2018108080A1 (zh) 2018-06-21

Family

ID=58802007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/115707 WO2018108080A1 (zh) 2016-12-13 2017-12-12 一种基于声纹搜索的信息推荐方法及装置

Country Status (2)

Country Link
CN (1) CN106601259B (zh)
WO (1) WO2018108080A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110459210A (zh) * 2019-07-30 2019-11-15 平安科技(深圳)有限公司 基于语音分析的问答方法、装置、设备及存储介质
CN111104505A (zh) * 2019-12-30 2020-05-05 浙江阿尔法人力资源有限公司 信息提示方法、装置、设备和存储介质
CN111627448A (zh) * 2020-05-15 2020-09-04 公安部第三研究所 实现基于语音大数据的审讯与谈话控制系统及其方法
US20200312337A1 (en) * 2019-03-25 2020-10-01 Omilia Natural Language Solutions Ltd. Systems and methods for speaker verification
CN111798857A (zh) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 一种信息识别方法、装置、电子设备及存储介质
CN112423133A (zh) * 2019-08-23 2021-02-26 腾讯科技(深圳)有限公司 视频切换方法、装置、计算机可读存储介质和计算机设备
CN114143608A (zh) * 2021-11-05 2022-03-04 深圳Tcl新技术有限公司 内容推荐方法、装置、计算机设备及可读存储介质
CN114339342A (zh) * 2021-12-23 2022-04-12 歌尔科技有限公司 一种遥控器控制方法、遥控器、控制装置及介质

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106601259B (zh) * 2016-12-13 2021-04-06 北京奇虎科技有限公司 一种基于声纹搜索的信息推荐方法及装置
CN107357875B (zh) * 2017-07-04 2021-09-10 北京奇艺世纪科技有限公司 一种语音搜索方法、装置及电子设备
CN107656983A (zh) * 2017-09-08 2018-02-02 广州索答信息科技有限公司 一种基于声纹识别的智能推荐方法及装置
CN107886390B (zh) * 2017-09-30 2019-09-06 北京小蓦机器人技术有限公司 提供用户的实际需求资源的方法、设备、系统与存储介质
CN109671185B (zh) * 2017-10-17 2021-12-14 杭州海康威视数字技术股份有限公司 一种门禁控制方法及装置
CN108062354A (zh) * 2017-11-22 2018-05-22 上海博泰悦臻电子设备制造有限公司 信息推荐方法、系统、存储介质、电子设备及车辆
CN107886949B (zh) * 2017-11-24 2021-04-30 科大讯飞股份有限公司 一种内容推荐方法及装置
CN108492836A (zh) * 2018-03-29 2018-09-04 努比亚技术有限公司 一种基于语音的搜索方法、移动终端及存储介质
CN110867188A (zh) * 2018-08-13 2020-03-06 珠海格力电器股份有限公司 内容服务的提供方法、装置、存储介质及电子装置
CN109165336B (zh) * 2018-08-23 2021-10-01 广东小天才科技有限公司 一种信息输出控制方法及家教设备
CN110896501A (zh) * 2018-08-24 2020-03-20 青岛海尔多媒体有限公司 电视机及用于电视机的控制方法
CN109460501B (zh) * 2018-11-15 2020-12-29 成都傅立叶电子科技有限公司 一种全局检索作战辅助决策系统及方法
CN109829035A (zh) * 2018-12-19 2019-05-31 平安国际融资租赁有限公司 流程搜索方法、装置、计算机设备和存储介质
CN112447178A (zh) * 2019-08-28 2021-03-05 北京声智科技有限公司 一种声纹检索方法、装置及电子设备
CN110990685B (zh) * 2019-10-12 2023-05-26 中国平安财产保险股份有限公司 基于声纹的语音搜索方法、设备、存储介质及装置
CN110784768B (zh) * 2019-10-17 2021-06-15 珠海格力电器股份有限公司 一种多媒体资源播放方法、存储介质及电子设备
CN110879839A (zh) * 2019-11-27 2020-03-13 北京声智科技有限公司 一种热词识别方法、装置及系统
CN111078937B (zh) * 2019-12-27 2021-08-10 北京世纪好未来教育科技有限公司 语音信息检索方法、装置、设备和计算机可读存储介质
CN112052686B (zh) * 2020-09-02 2023-08-18 合肥分贝工场科技有限公司 一种用户交互式教育的语音学习资源推送方法
CN113643700B (zh) * 2021-07-27 2024-02-27 广州市威士丹利智能科技有限公司 一种智能语音开关的控制方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011069845A (ja) * 2009-09-24 2011-04-07 Nippon Telegr & Teleph Corp <Ntt> 音声検索方法,音声検索装置および音声検索プログラム
CN105868360A (zh) * 2016-03-29 2016-08-17 乐视控股(北京)有限公司 基于语音识别的内容推荐方法及装置
CN105979376A (zh) * 2015-12-02 2016-09-28 乐视致新电子科技(天津)有限公司 一种推荐方法和装置
CN106128467A (zh) * 2016-06-06 2016-11-16 北京云知声信息技术有限公司 语音处理方法及装置
CN106601259A (zh) * 2016-12-13 2017-04-26 北京奇虎科技有限公司 一种基于声纹搜索的信息推荐方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1835076B (zh) * 2006-04-07 2010-05-12 安徽中科大讯飞信息科技有限公司 一种综合运用语音识别、语音学知识及汉语方言分析的语音评测方法
CN102063282B (zh) * 2009-11-18 2014-08-13 上海果壳电子有限公司 汉语语音输入系统及方法
KR101905827B1 (ko) * 2013-06-26 2018-10-08 한국전자통신연구원 연속어 음성 인식 장치 및 방법
CN105243143B (zh) * 2015-10-14 2018-07-24 湖南大学 基于即时语音内容检测的推荐方法及系统
CN105895096A (zh) * 2016-03-30 2016-08-24 乐视控股(北京)有限公司 一种身份识别与语音交互操作的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011069845A (ja) * 2009-09-24 2011-04-07 Nippon Telegr & Teleph Corp <Ntt> 音声検索方法,音声検索装置および音声検索プログラム
CN105979376A (zh) * 2015-12-02 2016-09-28 乐视致新电子科技(天津)有限公司 一种推荐方法和装置
CN105868360A (zh) * 2016-03-29 2016-08-17 乐视控股(北京)有限公司 基于语音识别的内容推荐方法及装置
CN106128467A (zh) * 2016-06-06 2016-11-16 北京云知声信息技术有限公司 语音处理方法及装置
CN106601259A (zh) * 2016-12-13 2017-04-26 北京奇虎科技有限公司 一种基于声纹搜索的信息推荐方法及装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200312337A1 (en) * 2019-03-25 2020-10-01 Omilia Natural Language Solutions Ltd. Systems and methods for speaker verification
US11948582B2 (en) * 2019-03-25 2024-04-02 Omilia Natural Language Solutions Ltd. Systems and methods for speaker verification
CN111798857A (zh) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 一种信息识别方法、装置、电子设备及存储介质
CN110459210A (zh) * 2019-07-30 2019-11-15 平安科技(深圳)有限公司 基于语音分析的问答方法、装置、设备及存储介质
CN112423133A (zh) * 2019-08-23 2021-02-26 腾讯科技(深圳)有限公司 视频切换方法、装置、计算机可读存储介质和计算机设备
CN111104505A (zh) * 2019-12-30 2020-05-05 浙江阿尔法人力资源有限公司 信息提示方法、装置、设备和存储介质
CN111104505B (zh) * 2019-12-30 2023-08-25 浙江阿尔法人力资源有限公司 信息提示方法、装置、设备和存储介质
CN111627448A (zh) * 2020-05-15 2020-09-04 公安部第三研究所 实现基于语音大数据的审讯与谈话控制系统及其方法
CN114143608A (zh) * 2021-11-05 2022-03-04 深圳Tcl新技术有限公司 内容推荐方法、装置、计算机设备及可读存储介质
CN114339342A (zh) * 2021-12-23 2022-04-12 歌尔科技有限公司 一种遥控器控制方法、遥控器、控制装置及介质

Also Published As

Publication number Publication date
CN106601259A (zh) 2017-04-26
CN106601259B (zh) 2021-04-06

Similar Documents

Publication Publication Date Title
WO2018108080A1 (zh) 一种基于声纹搜索的信息推荐方法及装置
CN111179975B (zh) 用于情绪识别的语音端点检测方法、电子设备及存储介质
US10013977B2 (en) Smart home control method based on emotion recognition and the system thereof
CN107305541B (zh) 语音识别文本分段方法及装置
US9230547B2 (en) Metadata extraction of non-transcribed video and audio streams
US10515292B2 (en) Joint acoustic and visual processing
CN104598644B (zh) 喜好标签挖掘方法和装置
WO2019148586A1 (zh) 多人发言中发言人识别方法以及装置
Maghilnan et al. Sentiment analysis on speaker specific speech data
CN105260416A (zh) 一种基于语音识别的搜索方法及装置
CN107943786B (zh) 一种中文命名实体识别方法及系统
Levitan et al. Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
CN111105785A (zh) 一种文本韵律边界识别的方法及装置
Ghosal et al. Automatic male-female voice discrimination
Krishna et al. Emotion recognition using dynamic time warping technique for isolated words
CN112231440A (zh) 一种基于人工智能的语音搜索方法
Tripathi et al. VEP detection for read, extempore and conversation speech
JP2017204023A (ja) 会話処理装置
Phoophuangpairoj Automated Classification of Watermelon Quality Using Non-flicking Reduction and HMM Sequences Derived from Flicking Sound Characteristics.
EP3714455B1 (en) Extracting content from speech prosody
JP7159655B2 (ja) 感情推定システムおよびプログラム
Yue et al. Speaker age recognition based on isolated words by using SVM
Fennir et al. Acoustic scene classification for speaker diarization
Anila et al. Emotion recognition using continuous density HMM
Vijayalakshmi et al. Real-time Speech Emotion Recognition Using Support Vector Machine

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17881580

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17881580

Country of ref document: EP

Kind code of ref document: A1