WO2017020794A1 - 一种交互系统的语音识别方法和装置 - Google Patents

一种交互系统的语音识别方法和装置 Download PDF

Info

Publication number
WO2017020794A1
WO2017020794A1 PCT/CN2016/092412 CN2016092412W WO2017020794A1 WO 2017020794 A1 WO2017020794 A1 WO 2017020794A1 CN 2016092412 W CN2016092412 W CN 2016092412W WO 2017020794 A1 WO2017020794 A1 WO 2017020794A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
expected
answer
matching
user
Prior art date
Application number
PCT/CN2016/092412
Other languages
English (en)
French (fr)
Inventor
齐路
韩笑
苑一时
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2017020794A1 publication Critical patent/WO2017020794A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method and apparatus for an interactive system.
  • the question and answer system first throws a question to the user by means of voice or image display, and then the user answers the voice.
  • some display systems require the user to issue a voice command to select which directories to display.
  • the user's voice needs to be accurately identified, in order to judge whether the voice answer given by the user is correct under the question and answer system, and to determine which directories the user has selected in the display system, so as to display the contents in the corresponding directory.
  • the present invention has been made in order to provide a speech recognition method and apparatus for an interactive system that overcomes the above problems or at least partially solves the above problems.
  • a voice recognition method for an interactive system includes:
  • the speech signal is matched to an expected sample in one of the at least two sample sets.
  • a speech recognition apparatus for an interactive system, wherein the apparatus comprises:
  • the expected sample determining unit is adapted to predetermine, in the voice recognition sample library, a plurality of expected samples corresponding to the interaction state and the expected answer according to the interaction state and the expected answer in the interaction state;
  • a grouping unit configured to divide the plurality of expected samples corresponding to the expected answer into at least two samples Group, each sample group contains at least one expected sample;
  • the collecting unit is adapted to collect a voice signal of the user in the interactive state
  • a matching processing unit adapted to match the speech signal with an expected sample in one of the at least two sample groups.
  • a computer program comprising computer readable code, when said computer readable code is run on a computing device, causing said computing device to perform a speech recognition method of an interactive system as described above .
  • a computer readable medium wherein a computer program as described above is stored.
  • a plurality of expected samples corresponding to the interaction state and the expected answer are predetermined in the voice recognition sample library, and the expected answers are correspondingly
  • the expected samples are divided into at least two sample groups, each sample group includes at least one expected sample, and the user's voice signal is collected in the interactive state, and the voice signal is combined with one of the at least two sample groups.
  • the expected speech samples corresponding to the expected answers are pre-identified and grouped, and the user's voice signal is directly matched with one of the expected samples, thereby narrowing the matching range to a predictable one.
  • the small range not only improves the speed of speech recognition of the interactive system, but also improves the accuracy of speech recognition.
  • FIG. 1 is a flow chart showing a voice recognition method of an interactive system according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing a structure of a voice recognition apparatus of an interactive system according to an embodiment of the present invention
  • FIG. 3 is a structural diagram of a voice recognition apparatus of an interactive system according to still another embodiment of the present invention.
  • Figure 4 shows schematically a block diagram of a computing device for performing the method according to the invention
  • Fig. 5 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
  • FIG. 1 shows a flow chart of a speech recognition method of an interactive system in accordance with one embodiment of the present invention. As shown in Figure 1, the method includes:
  • Step S110 Predetermine a plurality of expected samples corresponding to the interaction state and the expected answer in the voice recognition sample library according to the interaction state and the expected answer in the interaction state.
  • an interactive state refers to a specific interactive scenario, such as a specific quiz scenario in the question and answer system, or a scenario in the system that displays a particular content.
  • the question and answer system asks: “Is the celery leaf edible?”.
  • the corresponding expected answer is “yes”.
  • the user answers "yes”, “can” and “enable” are correct, so three samples of "yes”, “can” and “energy” are selected from the speech recognition sample library as the interaction. Multiple expected sample groups corresponding to the status.
  • Step S120 The plurality of expected samples corresponding to the expected answer are divided into at least two sample groups, and each sample group includes at least one expected sample.
  • the question and answer system is still asked whether the celery leaf is edible?
  • the interactive state is divided into two groups, where “yes” is the first sample group, “may” and “can” "For the second sample group.
  • Step S130 collecting a voice signal of the user in the interactive state.
  • the question answering system when the question answering system outputs a question, the user's voice signal for the question is collected.
  • a microphone is generally used for the acquisition of voice signals.
  • Step S140 the speech signal and an expected sample in one of the at least two sample groups Make a match.
  • the acquired speech signal is matched to the expected sample “Yes” in the first sample set or to the expected samples “may” and “can” in the second sample set.
  • the user's voice signal is directly matched with one of the expected samples, thereby narrowing the matching range to a predictable small range. It not only improves the speed of speech recognition of the interactive system, but also improves the accuracy of speech recognition.
  • the method illustrated in FIG. 1 further includes determining that the answer given by the user is the expected answer if a matching expected sample is found in the sample set. That is, regardless of which sample of the sample is matched by the expected sample, as long as the matched expected sample is found, it is determined that the answer given by the user is the expected answer, that is, the correct answer.
  • the method illustrated in FIG. 1 further comprises: if the matched expected sample is not found in the sample set, the speech signal and another sample of the at least two sample sets The expected samples in the group are matched. If a matching expected sample is found in the other sample set, it is determined that the answer given by the user is the expected answer. If a matching expected sample is not found in the other sample set, it is determined that the user did not give the expected answer.
  • a matching expected sample is not found in the first selected sample group, then another other sample group is selected a second time to match the speech signal with the expected signal in the second selected sample group; likewise, if If the matching expected sample is found in the second selected sample group, it is determined that the user gives the correct answer; otherwise, if the matching expected sample is not found in the second selected sample group, it is determined that the user does not give the correct
  • the answer, or other sample groups that have not been selected further matches the speech signal to the expected samples in the unselected sample set.
  • the method further includes: calculating a matching value of the voice signal with an expected sample, and if the matching value reaches a preset value, determining that the voice signal matches the expected sample, and vice versa If the matching value does not reach the preset value, it is determined that the voice signal does not match the expected sample.
  • a preset value is preset, and when the matching is performed, the matching value between the voice signal and the expected sample is calculated, and the calculated matching value is compared with the preset value.
  • the matching value of the speech signal and the expected sample may adopt the similarity value of the two, that is, the similarity value of the two speech signals.
  • the dividing the plurality of expected samples corresponding to the expected answer into the at least two sample groups in the method shown in FIG. 1 includes: using a plurality of expected samples corresponding to the expected answer, according to The difference in degree of similarity to the expected answer is divided into at least two sample groups, or divided into at least two sample groups according to the probability of the expected answer that the user may reply.
  • the corresponding expected answer is “yes”.
  • three samples of “Yes”, “Yes” and “Energy” are selected as the three expected samples corresponding to the interaction state.
  • the “yes” is 100% similar to the expected answer, so it is divided into the first sample group, and "may” and “can” are divided into the second sample group because the similarity is not 100%.
  • the probability that the user answers "yes” is 70%, and is divided into the first sample group.
  • the probability that the user answers "may” and “can” is 16% and 14%, respectively, and is divided into the second sample group.
  • the matching the voice signal with an expected sample in one of the at least two sample groups in the method shown in FIG. 1 includes: combining the voice signal with at least two Matching the expected samples in the sample group with the highest degree of similarity to the expected answer in the sample group, or matching the voice signal with one of the at least two sample groups containing the highest expected probability that the user may reply The sample group is matched.
  • “yes” in the first sample group is the one with the highest probability of the user's answer. For example, 70% of the users will answer “yes”, so the first time the speech signal is expected with the first sample group. The sample "yes” is matched.
  • matching the speech signal with an expected sample in one of the at least two sample groups in the method of FIG. 1 includes: combining the speech signal with at least two sample groups The highest priority expected samples in one of the sample groups are matched.
  • the voice signal is first matched with the expected sample “may”, and if not matched, the expected sample “can” is matched.
  • the method shown in FIG. 1 further includes: correspondingly expanding the number of sample groups corresponding to the expected answer according to the collected historical voice signal in the interactive state, or expanding the expected corresponding The number of expected samples included in one sample group corresponding to the answer, or the number of samples in the library of speech recognition samples is expanded accordingly.
  • the pre-set speech recognition sample library may not cover all the corresponding answers.
  • Samples so you can supplement the speech recognition sample library or the selected sample group by learning. For example, through learning, some users will answer “hmm” and “OK”, which also means “yes” in the human context, so the two samples “hmm” and “ok” can also be included in speech recognition.
  • the sample library either added to the selected sample group or added a sample group. For example, “Hm” and “OK” are added to the aforementioned second sample group, or "Hm” and "OK” are divided into a third sample group.
  • the method of FIG. 1 before the method of FIG. 1 collects a voice signal of a user in the interactive state, the method further includes: combining the form of any one or more of voice, image, and video, Show interactive status. For example, the problem is presented or presented in a form that combines one or more of voice, image, and video.
  • the voice recognition apparatus 200 of the interactive system includes:
  • the expected sample determining unit 210 is adapted to predetermine a plurality of expected samples corresponding to the interaction state and the expected answer in the voice recognition sample library according to the interaction state and the expected answer in the interaction state.
  • the grouping unit 220 is adapted to divide the plurality of expected samples corresponding to the expected answer into at least two sample groups, each sample group including at least one expected sample.
  • the collecting unit 230 is adapted to collect a voice signal of the user in the interactive state.
  • the matching processing unit 240 is adapted to match the speech signal with an expected sample in one of the at least two sample groups.
  • the apparatus shown in FIG. 2 because the expected samples corresponding to the expected answers are previously found and grouped, the user's voice signal is directly matched with one of the expected samples, thereby narrowing the matching range to a predictable small range. It not only improves the speed of speech recognition of the interactive system, but also improves the accuracy of speech recognition.
  • FIG. 3 is a block diagram showing a structure of a voice recognition apparatus of an interactive system according to still another embodiment of the present invention.
  • the voice recognition apparatus 300 of the interactive system includes:
  • the expected sample determining unit 310 is adapted to predetermine a plurality of expected samples corresponding to the interaction state and the expected answer in the voice recognition sample library according to the interaction state and the expected answer in the interaction state.
  • the grouping unit 320 is adapted to divide the plurality of expected samples corresponding to the expected answer into at least two sample groups, each sample group including at least one expected sample. The at least two are saved in the grouping unit 320 Sample group.
  • the collecting unit 330 is adapted to collect a voice signal of the user in the interactive state
  • the matching processing unit 340 is adapted to match the speech signal with an expected sample in one of the at least two sample groups.
  • the matching processing unit 340 is adapted to determine that the answer given by the user is the expected answer when a matching expected sample is found in the sample group. That is, regardless of which sample of the sample is matched by the expected sample, as long as the matched expected sample is found, it is determined that the answer given by the user is the expected answer, that is, the correct answer.
  • the matching processing unit 340 is adapted to: when the matched expected sample is not found in the sample group, the speech signal and another sample of the at least two sample groups The expected samples in the group are matched.
  • the matching processing unit 340 is adapted to determine that the answer given by the user is the expected answer when a matching expected sample is found in the another sample group.
  • the matching processing unit 340 is adapted to determine that the user did not give an expected answer when a matching expected sample was not found in the other sample set.
  • a matching expected sample is not found in the first selected sample group, then another other sample group is selected a second time to match the speech signal with the expected signal in the second selected sample group; likewise, if If the matching expected sample is found in the second selected sample group, it is determined that the user gives the correct answer; otherwise, if the matching expected sample is not found in the second selected sample group, it is determined that the user does not give the correct
  • the answer, or other sample groups that have not been selected further matches the speech signal to the expected samples in the unselected sample set.
  • the matching processing unit 340 is configured to calculate a matching value of the voice signal and an expected sample, and if the matching value reaches a preset value, determine the voice signal and the expected sample. Matching, if the matching value does not reach the preset value, it is determined that the voice signal does not match the expected sample. That is, when determining whether the voice signal matches an expected sample, a preset value is preset, and when the matching is performed, the matching value between the voice signal and the expected sample is calculated, and the calculated matching value is compared with the preset value.
  • the matching value of the speech signal and the expected sample may adopt the similarity value of the two, that is, the similarity value of the two speech signals.
  • the grouping unit 320 is adapted to divide the plurality of expected samples corresponding to the expected answer into at least two sample groups according to a degree of similarity with the expected answer. Or divided into at least two sample groups according to the probability of the expected answer that the user may reply. For example, “Is the celery leaf edible?" The corresponding expected answer is “yes”. According to the expected answer, three samples of "Yes", “Yes” and “Energy” are selected as the three expected samples corresponding to the interaction state. The “yes” is 100% similar to the expected answer, so it is divided into the first sample group, and “may” and “can” are divided into the second sample group because the similarity is not 100%. For another example, the probability that the user answers "yes” is 70%, and is divided into the first sample group. The probability that the user answers "may” and “can” is 16% and 14%, respectively, and is divided into the second sample group.
  • the matching processing unit 340 is adapted to first perform the voice signal with an expected sample in a sample group of the at least two sample groups that is the most similar to the expected answer. Matching, or matching the speech signal with a sample set of at least two sample sets containing an expected answer with the highest probability that the user may reply. For example, “yes” in the first sample group is the one with the highest probability of the user's answer. For example, 70% of the users will answer “yes”, so the first time the speech signal is expected with the first sample group. The sample "yes" is matched.
  • the matching processing unit 340 is adapted to first match the voice signal with an expected sample with the highest priority among the one of the at least two sample groups. For example, if the current selection matches the expected speech sample in the second sample set, and the probability of the user answering "can” in the second sample set is 16%, the probability of answering "can” is 14%. Therefore, setting the "can” priority is higher than "energy”. Then, the voice signal line is matched with the expected sample “may", and if not matched, the expected sample "can” is matched.
  • the apparatus 300 further includes: an expansion unit 350, configured to expand the number of sample groups corresponding to the expected answer according to the collected historical voice signal in the interactive state, or correspondingly expand The expected number of samples included in one sample group corresponding to the expected answer, or the number of samples in the library of speech recognition samples is expanded accordingly.
  • an expansion unit 350 configured to expand the number of sample groups corresponding to the expected answer according to the collected historical voice signal in the interactive state, or correspondingly expand The expected number of samples included in one sample group corresponding to the expected answer, or the number of samples in the library of speech recognition samples is expanded accordingly.
  • the preset speech recognition sample library may not cover all the samples corresponding to the expected answer, so the speech recognition sample library or the selected sample group can be supplemented by learning. For example, through learning, some users will answer “hmm” and “OK”, which also means “yes” in the human context, so the two samples “hmm” and “ok” can also be included in speech recognition.
  • the sample library either added to the selected sample group or added a sample group.
  • the apparatus 300 further includes: a presentation unit 360 adapted to pass It combines the form of any one or more of voice, image and video to show the interactive state.
  • a presentation unit 360 adapted to pass It combines the form of any one or more of voice, image and video to show the interactive state.
  • the problem is presented or presented in a form that combines one or more of voice, image, and video.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the components of the speech recognition device of the interactive system in accordance with embodiments of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 4 shows a block diagram of a computing device for performing the method in accordance with the present invention.
  • the computing device conventionally includes a processor 410 and a computer program product or computer readable medium in the form of a memory 420.
  • the memory 420 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • Memory 420 has a memory space 430 for program code 431 for performing any of the method steps described above.
  • storage space 430 for program code may include various program code 431 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as described with reference to FIG.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 420 in the computing device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit includes computer readable code 431', ie, code readable by a processor, such as 410, that when executed by a computing device causes the computing device to perform each of the methods described above step.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种交互式系统的语音识别方法和装置。该方法包括:根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本(S110);将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本(S120);采集所述互动状态下的用户的语音信号(S130);将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配(S140)。该技术方案能够提高交互系统的语音识别速度和准确性。

Description

一种交互系统的语音识别方法和装置 技术领域
本发明涉及语音识别技术领域,具体涉及一种交互式系统的语音识别方法和装置。
背景技术
随着多媒体技术的发展,各种交互式系统都采用了语音交互的方式,以提高与用户的互动效率和提高趣味性。例如问答系统,先通过语音或图像显示的方式向用户抛出问题,然后用户语音回答。再例如一些展示系统,需要用户发出语音指令来选择展示哪些目录中的内容。在这些场景下都需要对用户的语音进行准确识别,才能在问答系统下判断用户给出的语音答案是否正确,以及在展示系统下判断用户到底选择了哪些目录,以便展示相应目录中的内容。
因此在交互系统中对用户的语音进行准确和快速的识别是亟需解决的问题。
发明内容
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种交互式系统的语音识别方法和装置。
依据本发明的一个方面,提供了一种交互系统的语音识别方法,其中,该方法包括:
根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本;
将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本;
采集所述互动状态下的用户的语音信号;
将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配。
依据本发明的一个方面,公开了一种交互系统的语音识别装置,其中,该装置包括:
预期样本确定单元,适于根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本;
分组单元,适于将所述预期答案对应的多个预期样本划分为至少两个样本 组,每个样本组中至少包含一个预期样本;
采集单元,适于采集所述互动状态下的用户的语音信号;
匹配处理单元,适于将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配。
依据本发明的一个方面,公开了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行如上所述的交互系统的语音识别方法。
依据本发明的一个方面,公开了一种计算机可读介质,其中存储了如上所述的计算机程序。
根据本发明的这种根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本,将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本,采集所述互动状态下的用户的语音信号,将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配的技术方案,由于将预期答案对应的预期样本预先找出来并分组,将用户的语音信号直接与其中一组预期样本进行匹配,因此使得匹配的范围缩小到一个可预期的小范围,不仅提高了交互系统的语音识别的速度,而且提高了语音识别的准确性。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了根据本发明一个实施例的一种交互系统的语音识别方法的流程图;
图2示出了根据本发明一个实施例的一种交互系统的语音识别装置的结构图;以及
图3示出了根据本发明又一个实施例的一种交互系统的语音识别装置的结构图;
图4示意性地示出了用于执行根据本发明的方法的计算设备的框图;以及
图5示意性地示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元。
具体实施例
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
图1示出了根据本发明一个实施例的一种交互系统的语音识别方法的流程图。如图1所示,该方法包括:
步骤S110,根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本。
在发明的本实施例中,一个互动状态是指某个特定的互动场景,例如问答系统中的某个特定的问答题场景,或者展示系统中的展示某个特定内容的场景。
以问答系统中的某个特定问答题场景为例,问答系统提问:“芹菜叶是否可以食用?”。对应的预期答案为“是”。根据该预期答案,用户回答“是”、“可以”和“能”都是在正确的,因此从语音识别样本库中选择“是”、“可以”和“能”这三个样本作为该互动状态对应的多个预期样本组。
步骤S120,将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本。
例如,仍以问答系统提问“芹菜叶是否可以食用?”这个互动状态为例,将对应的三个预期样本划分为两组,其中“是”为第一样本组,“可以”和“能”为第二样本组。
步骤S130,采集所述互动状态下的用户的语音信号。
例如,当问答系统输出问题后采集用户针对该问题的语音信号。一般采用麦克风进行语音信号的采集。
步骤S140,将所述语音信号与至少两个样本组中的一个样本组中的预期样本 进行匹配。
例如,将采集的语音信号与第一样本组中的预期样本“是”进行匹配,或者与第二样本组中的预期样本“可以”和“能”进行匹配。
图1所示的方法,由于将预期答案对应的预期样本预先找出来并分组,将用户的语音信号直接与其中一组预期样本进行匹配,因此使得匹配的范围缩小到一个可预期的小范围,不仅提高了交互系统的语音识别的速度,而且提高了语音识别的准确性。
在本发明的一个实施例中,图1所示的方法进一步包括:若在所述样本组中发现匹配的预期样本,则确定用户给出的答案为该预期答案。即不管将所述语音信号与哪个样本组中的预期样本进行匹配,只要发现匹配的预期样本,则确定用户给出的答案为该预期答案,即正确答案。
在本发明的一个实施例中,图1所示的方法进一步包括:若未在所述样本组中发现匹配的预期样本,则将所述语音信号与所述至少两个样本组中另一个样本组中的预期样本进行匹配。若在所述另一个样本组中发现匹配的预期样本,则确定用户给出的答案为该预期答案。若未在所述另一个样本组中发现匹配的预期样本,则确定用户未给出预期答案。
即如果未在第一次选择的样本组中发现匹配的预期样本,则再第二次选择一个其他样本组,将语音信号与第二次选择的样本组中的预期信号进行匹配;同样,如果在第二次选择的样本组中找到匹配的预期样本,则确定用户给出了正确答案;反之,如果在第二次选择的样本组中未找到匹配的预期样本,则确定用户未给出正确答案,或者还有其他未被选择过的样本组则将语音信号进一步与未被选择过的样本组中的预期样本进行匹配。
在本发明的一个实施例中,上述方法进一步包括:计算所述语音信号与预期样本的匹配度值,如果匹配度值达到预设值,则确定所述语音信号与该预期样本匹配,反之如果匹配度值未达到预设值,则确定所述语音信号与该预期样本不匹配。
即判断语音信号与一个预期样本是否匹配时,预先设置一个预设值,具体进行匹配时是计算语音信号与预期样本的匹配度值,再将计算得到的匹配度值与预设值进行比较。语音信号与预期样本的匹配度值可以采用两者的相似度值,即两个语音信号的相似度值。
在本发明的一个实施例中,图1所示方法中所述将所述预期答案对应的多个预期样本划分为至少两个样本组包括:将所述预期答案对应的多个预期样本,按照与所述预期答案的相似程度的不同划分为至少两个样本组,或者按照用户可能答复的预期答案的概率的不同划分为至少两个样本组。
例如,“芹菜叶是否可以食用?”对应的预期答案为“是”。根据该预期答案,选择“是”、“可以”和“能”这三个样本作为该互动状态对应的三个预期样本。其中“是”与预期答案的相似程度为100%,因此被划分到第一样本组,而“可以”和“能”由于相似度不是100%,因此被划分到第二样本组。又例如,用户回答“是”的概率是70%,划分到第一样本组,用户回答“可以”和“能”的概率分别是16%和14%,则划分到第二样本组。
在本发明的一个实施例中,图1所示方法中所述将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配包括:将所述语音信号与至少两个样本组中的与所述预期答案的相似程度最高的一个样本组中的预期样本进行匹配,或者将所述语音信号与至少两个样本组中的包含用户可能答复的概率最高的预期答案的一个样本组进行匹配。
例如,第一样本组中的“是”是用户回答概率最高的一个答案,比如70%的用户都会回答“是”,因此第一次将所述语音信号与第一样本组中的预期样本“是”进行匹配。
在本发明的一个实施例中,图1所示方法中将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配包括:将所述语音信号与至少两个样本组中的一个样本组中优先级最高的预期样本进行匹配。
例如,如果当前选择将所述语音信号与前述的第二样本组中的预期样本进行匹配,而第二样本组中用户回答“可以”的概率是16%,回答“能”的概率是14%,因此设置“可以”的优先级比“能”高。则将所述语音信号先与预期样本“可以”进行匹配,如果不匹配再与预期样本“能”进行匹配。
在本发明的一个实施例中,图1所示的方法进一步包括:根据所采集的该互动状态下的历史语音信号,相应扩充所述预期答案对应的样本组数量,或者,相应扩充所述预期答案对应的一个样本组包含的预期样本数量,或者相应扩充所述语音识别样本库中的样本数量。
这是因为,预先设置的语音识别样本库可能会没有覆盖预期答案对应的所有 样本,因此可以通过学习来补充语音识别样本库或者已经选出的样本组。例如通过学习发现,有些用户会回答“嗯”和“OK”,这在人类语境中也是表示“是”的含义,因此可以将“嗯”和“OK”这两个样本也收录到语音识别样本库中,或者添加到已经选出的样本组中,或者新增一个样本组。例如,将“嗯”和“OK”添加到前述的第二样本组,或者将“嗯”和“OK”划分为第三样本组。
在本发明的一个实施例中,图1所示方法在采集所述互动状态下的用户的语音信号之前,该方法进一步包括:通过结合语音、图像和视频中任一种或多种的形式,展现互动状态。例如,通过结合语音、图像和视频中任一种或多种的形式展现问题或者展现要展示的内容。
图2示出了根据本发明一个实施例的一种交互系统的语音识别装置的结构图。图2所示,该交互系统的语音识别装置200包括:
预期样本确定单元210,适于根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本。
分组单元220,适于将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本。
采集单元230,适于采集所述互动状态下的用户的语音信号。
匹配处理单元240,适于将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配。
图2所示的装置,由于将预期答案对应的预期样本预先找出来并分组,将用户的语音信号直接与其中一组预期样本进行匹配,因此使得匹配的范围缩小到一个可预期的小范围,不仅提高了交互系统的语音识别的速度,而且提高了语音识别的准确性。
图3示出了根据本发明又一个实施例的一种交互系统的语音识别装置的结构图。图3所示,该交互系统的语音识别装置300包括:
预期样本确定单元310,适于根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本。
分组单元320,适于将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本。分组单元320中保存所述至少两个 样本组。
采集单元330,适于采集所述互动状态下的用户的语音信号;
匹配处理单元340,适于将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配。
在本发明的一个实施例中,所述匹配处理单元340,适于当在所述样本组中发现匹配的预期样本时,确定用户给出的答案为该预期答案。即不管将所述语音信号与哪个样本组中的预期样本进行匹配,只要发现匹配的预期样本,则确定用户给出的答案为该预期答案,即正确答案。
在本发明的一个实施例中,所述匹配处理单元340,适于当未在所述样本组中发现匹配的预期样本时,将所述语音信号与所述至少两个样本组中另一个样本组中的预期样本进行匹配。
在本发明的一个实施例中,所述匹配处理单元340,适于当在所述另一个样本组中发现匹配的预期样本时,确定用户给出的答案为该预期答案。
在本发明的一个实施例中,所述匹配处理单元340,适于当未在所述另一个样本组中发现匹配的预期样本,则确定用户未给出预期答案。
即如果未在第一次选择的样本组中发现匹配的预期样本,则再第二次选择一个其他样本组,将语音信号与第二次选择的样本组中的预期信号进行匹配;同样,如果在第二次选择的样本组中找到匹配的预期样本,则确定用户给出了正确答案;反之,如果在第二次选择的样本组中未找到匹配的预期样本,则确定用户未给出正确答案,或者还有其他未被选择过的样本组则将语音信号进一步与未被选择过的样本组中的预期样本进行匹配。
在本发明的一个实施例中,所述匹配处理单元340,适于计算所述语音信号与预期样本的匹配度值,如果匹配度值达到预设值,则确定所述语音信号与该预期样本匹配,反之如果匹配度值未达到预设值,则确定所述语音信号与该预期样本不匹配。即判断语音信号与一个预期样本是否匹配时,预先设置一个预设值,具体进行匹配时是计算语音信号与预期样本的匹配度值,再将计算得到的匹配度值与预设值进行比较。语音信号与预期样本的匹配度值可以采用两者的相似度值,即两个语音信号的相似度值。
在本发明的一个实施例中,所述分组单元320,适于将所述预期答案对应的多个预期样本,按照与所述预期答案的相似程度的不同划分为至少两个样本组, 或者按照用户可能答复的预期答案的概率的不同划分为至少两个样本组。例如,“芹菜叶是否可以食用?”对应的预期答案为“是”。根据该预期答案,选择“是”、“可以”和“能”这三个样本作为该互动状态对应的三个预期样本。其中“是”与预期答案的相似程度为100%,因此被划分到第一样本组,而“可以”和“能”由于相似度不是100%,因此被划分到第二样本组。又例如,用户回答“是”的概率是70%,划分到第一样本组,用户回答“可以”和“能”的概率分别是16%和14%,则划分到第二样本组。
在本发明的一个实施例中,所述匹配处理单元340,适于将所述语音信号先与至少两个样本组中的与所述预期答案的相似程度最高的一个样本组中的预期样本进行匹配,或者将所述语音信号与至少两个样本组中的包含用户可能答复的概率最高的预期答案的一个样本组进行匹配。例如,第一样本组中的“是”是用户回答概率最高的一个答案,比如70%的用户都会回答“是”,因此第一次将所述语音信号与第一样本组中的预期样本“是”进行匹配。
在本发明的一个实施例中,所述匹配处理单元340,适于先将所述语音信号先与至少两个样本组中的一个样本组中优先级最高的预期样本进行匹配。例如,如果当前选择将所述语音信号与前述的第二样本组中的预期样本进行匹配,而第二样本组中用户回答“可以”的概率是16%,回答“能”的概率是14%,因此设置“可以”的优先级比“能”高。则将所述语音信号线与预期样本“可以”进行匹配,如果不匹配再与预期样本“能”进行匹配。
在本发明的一个实施例中,该装置300进一步包括:扩充单元350,适于根据所采集的该互动状态下的历史语音信号,相应扩充所述预期答案对应的样本组数量,或者,相应扩充所述预期答案对应的一个样本组包含的预期样本数量,或者相应扩充所述语音识别样本库中的样本数量。这是因为,预先设置的语音识别样本库可能会没有覆盖预期答案对应的所有样本,因此可以通过学习来补充语音识别样本库或者已经选出的样本组。例如通过学习发现,有些用户会回答“嗯”和“OK”,这在人类语境中也是表示“是”的含义,因此可以将“嗯”和“OK”这两个样本也收录到语音识别样本库中,或者添加到已经选出的样本组中,或者新增一个样本组。例如,将“嗯”和“OK”添加到前述的第二样本组,或者将“嗯”和“OK”划分为第三样本组。
在本发明的一个实施例中,该装置300进一步包括:展现单元360,适于通 过结合语音、图像和视频中任一种或多种的形式,展现互动状态。例如,通过结合语音、图像和视频中任一种或多种的形式展现问题或者展现要展示的内容。
需要说明的是:
在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着 处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的交互系统的语音识别装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图4示出了用于执行根据本发明的方法的计算设备的框图。该计算设备传统上包括处理器410和以存储器420形式的计算机程序产品或者计算机可读介质。存储器420可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器420具有用于执行上述方法中的任何方法步骤的程序代码431的存储空间430。例如,用于程序代码的存储空间430可以包括分别用于实现上面的方法中的各种步骤的各个程序代码431。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图5所述的便携式或者固定存储单元。该存储单元可以具有与图4的计算设备中的存储器420类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码431’,即可以由例如诸如410之类的处理器读取的代码,这些代码当由计算设备运行时,导致该计算设备执行上面所描述的方法中的各个步骤。
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且 本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。

Claims (24)

  1. 一种交互系统的语音识别方法,其中,该方法包括:
    根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本;
    将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本;
    采集所述互动状态下的用户的语音信号;
    将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配。
  2. 如权利要求1所述的方法,其中,该方法进一步包括:
    若在所述样本组中发现匹配的预期样本,则确定用户给出的答案为该预期答案。
  3. 如权利要求1所述的方法,其中,该方法进一步包括:
    若未在所述样本组中发现匹配的预期样本,则将所述语音信号与所述至少两个样本组中另一个样本组中的预期样本进行匹配。
  4. 如权利要求3所述的方法,其中,该方法进一步包括:
    若在所述另一个样本组中发现匹配的预期样本,则确定用户给出的答案为该预期答案。
  5. 如权利要求3所述的方法,其中,该方法进一步包括:
    若未在所述另一个样本组中发现匹配的预期样本,则确定用户未给出预期答案。
  6. 如权利要求1-5中任一项所述的方法,其中,该方法进一步包括:
    计算所述语音信号与预期样本的匹配度值,如果匹配度值达到预设值,则确定所述语音信号与该预期样本匹配,反之如果匹配度值未达到预设值,则确定所述语音信号与该预期样本不匹配。
  7. 如权利要求1所述的方法,其中,所述将所述预期答案对应的多个预期样本划分为至少两个样本组包括:
    将所述预期答案对应的多个预期样本,按照与所述预期答案的相似程度的不同划分为至少两个样本组,或者按照用户可能答复的预期答案的概率的不同划分为至少两个样本组。
  8. 如权利要求7所述的方法,其中,将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配包括:
    将所述语音信号与至少两个样本组中的与所述预期答案的相似程度最高的一个样本组中的预期样本进行匹配,或者将所述语音信号与至少两个样本组中的包含用户可能答复的概率最高的预期答案的一个样本组进行匹配。
  9. 如权利要求1所述的方法,其中,将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配包括:
    将所述语音信号与至少两个样本组中的一个样本组中优先级最高的预期样本进行匹配。
  10. 如权利要求1所述的方法,其中,该方法进一步包括:
    根据所采集的该互动状态下的历史语音信号,相应扩充所述预期答案对应的样本组数量,或者,相应扩充所述预期答案对应的一个样本组包含的预期样本数量,或者相应扩充所述语音识别样本库中的样本数量。
  11. 如权利要求1所述的方法,其中,在采集所述互动状态下的用户的语音信号之前,该方法进一步包括:
    通过结合语音、图像和视频中任一种或多种的形式,展现互动状态。
  12. 一种交互系统的语音识别装置,其中,该装置包括:
    预期样本确定单元,适于根据互动状态及所述互动状态下的预期答案,在语音识别样本库中预先确定与互动状态及所述预期答案对应的多个预期样本;
    分组单元,适于将所述预期答案对应的多个预期样本划分为至少两个样本组,每个样本组中至少包含一个预期样本;
    采集单元,适于采集所述互动状态下的用户的语音信号;
    匹配处理单元,适于将所述语音信号与至少两个样本组中的一个样本组中的预期样本进行匹配。
  13. 如权利要求12所述的装置,其中,
    所述匹配处理单元,适于当在所述样本组中发现匹配的预期样本时,确定用户给出的答案为该预期答案。
  14. 如权利要求12所述的装置,其中,
    所述匹配处理单元,适于当未在所述样本组中发现匹配的预期样本时,将所述语音信号与所述至少两个样本组中另一个样本组中的预期样本进行匹配。
  15. 如权利要求14所述的装置,其中,
    所述匹配处理单元,适于当在所述另一个样本组中发现匹配的预期样本时,确定用户给出的答案为该预期答案。
  16. 如权利要求14所述的装置,其中,
    所述匹配处理单元,适于当未在所述另一个样本组中发现匹配的预期样本,则确定用户未给出预期答案。
  17. 如权利要求13-16中任一项所述的装置,其中,
    所述匹配处理单元,适于计算所述语音信号与预期样本的匹配度值,如果匹配度值达到预设值,则确定所述语音信号与该预期样本匹配,反之如果匹配度值未达到预设值,则确定所述语音信号与该预期样本不匹配。
  18. 如权利要求12所述的装置,其中,
    所述分组单元,适于将所述预期答案对应的多个预期样本,按照与所述预期答案的相似程度的不同划分为至少两个样本组,或者按照用户可能答复的预期答案的概率的不同划分为至少两个样本组。
  19. 如权利要求18所述的装置,其中,
    所述匹配处理单元,适于将所述语音信号先与至少两个样本组中的与所述预期答案的相似程度最高的一个样本组中的预期样本进行匹配,或者将所述语音信号与至少两个样本组中的包含用户可能答复的概率最高的预期答案的一个样本组进行匹配。
  20. 如权利要求12所述的装置,其中,
    所述匹配处理单元,适于先将所述语音信号先与至少两个样本组中的一个样本组中优先级最高的预期样本进行匹配。
  21. 如权利要求12所述的装置,其中,该装置进一步包括:
    扩充单元,适于根据所采集的该互动状态下的历史语音信号,相应扩充所述预期答案对应的样本组数量,或者,相应扩充所述预期答案对应的一个样本组包含的预期样本数量,或者相应扩充所述语音识别样本库中的样本数量。
  22. 如权利要求12所述的装置,其中,该装置进一步包括:
    展现单元,适于通过结合语音、图像和视频中任一种或多种的形式,展现互动状态。
  23. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在 计算设备上运行时,导致所述计算设备执行根据权利要求1-11中的任一个所述的交互系统的语音识别方法。
  24. 一种计算机可读介质,其中存储了如权利要求23所述的计算机程序。
PCT/CN2016/092412 2015-07-31 2016-07-29 一种交互系统的语音识别方法和装置 WO2017020794A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510463527.2A CN105161098A (zh) 2015-07-31 2015-07-31 一种交互系统的语音识别方法和装置
CN201510463527.2 2015-07-31

Publications (1)

Publication Number Publication Date
WO2017020794A1 true WO2017020794A1 (zh) 2017-02-09

Family

ID=54801931

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/092412 WO2017020794A1 (zh) 2015-07-31 2016-07-29 一种交互系统的语音识别方法和装置

Country Status (2)

Country Link
CN (1) CN105161098A (zh)
WO (1) WO2017020794A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105161098A (zh) * 2015-07-31 2015-12-16 北京奇虎科技有限公司 一种交互系统的语音识别方法和装置
CN105771234A (zh) * 2016-04-02 2016-07-20 深圳市熙龙玩具有限公司 一种猜谜机玩具及其实现方法
CN113870635A (zh) * 2019-10-25 2021-12-31 北京猿力教育科技有限公司 一种语音答题方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1457966A1 (de) * 2003-02-27 2004-09-15 Siemens Aktiengesellschaft Verfahren zum Ermitteln der Verwechslungsgefahr von Vokabulareinträgen bei der phonembasierten Spracherkennung
CN102881284A (zh) * 2012-09-03 2013-01-16 江苏大学 非特定人语音情感识别方法及系统
CN103794214A (zh) * 2014-03-07 2014-05-14 联想(北京)有限公司 一种信息处理方法、装置和电子设备
CN104115221A (zh) * 2012-02-17 2014-10-22 微软公司 基于文本到语音转换以及语义的音频人类交互证明
CN104809103A (zh) * 2015-04-29 2015-07-29 北京京东尚科信息技术有限公司 一种人机对话的语义分析方法及系统
CN105161098A (zh) * 2015-07-31 2015-12-16 北京奇虎科技有限公司 一种交互系统的语音识别方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1575031A3 (en) * 2002-05-15 2010-08-11 Pioneer Corporation Voice recognition apparatus
JP5633638B2 (ja) * 2011-03-18 2014-12-03 富士通株式会社 通話評価装置、通話評価方法
CN103021403A (zh) * 2012-12-31 2013-04-03 威盛电子股份有限公司 基于语音识别的选择方法及其移动终端装置及信息系统
CN104424290A (zh) * 2013-09-02 2015-03-18 佳能株式会社 基于语音的问答系统和用于交互式语音系统的方法
CN104021786B (zh) * 2014-05-15 2017-05-24 北京中科汇联信息技术有限公司 一种语音识别的方法和装置
CN104064062A (zh) * 2014-06-23 2014-09-24 中国石油大学(华东) 一种基于声纹和语音识别的在线听力学习方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1457966A1 (de) * 2003-02-27 2004-09-15 Siemens Aktiengesellschaft Verfahren zum Ermitteln der Verwechslungsgefahr von Vokabulareinträgen bei der phonembasierten Spracherkennung
CN104115221A (zh) * 2012-02-17 2014-10-22 微软公司 基于文本到语音转换以及语义的音频人类交互证明
CN102881284A (zh) * 2012-09-03 2013-01-16 江苏大学 非特定人语音情感识别方法及系统
CN103794214A (zh) * 2014-03-07 2014-05-14 联想(北京)有限公司 一种信息处理方法、装置和电子设备
CN104809103A (zh) * 2015-04-29 2015-07-29 北京京东尚科信息技术有限公司 一种人机对话的语义分析方法及系统
CN105161098A (zh) * 2015-07-31 2015-12-16 北京奇虎科技有限公司 一种交互系统的语音识别方法和装置

Also Published As

Publication number Publication date
CN105161098A (zh) 2015-12-16

Similar Documents

Publication Publication Date Title
CN106683663B (zh) 神经网络训练设备和方法以及语音识别设备和方法
CN106658129B (zh) 基于情绪的终端控制方法、装置及终端
CN108665742B (zh) 一种通过阅读设备进行阅读的方法与设备
WO2016015621A1 (zh) 人脸图片人名识别方法和系统
US9928831B2 (en) Speech data recognition method, apparatus, and server for distinguishing regional accent
CN109754783B (zh) 用于确定音频语句的边界的方法和装置
CN103839545A (zh) 用于构建多语言声学模型的设备和方法
CN105488227A (zh) 一种电子设备及其基于声纹特征处理音频文件的方法
CN109376264A (zh) 一种音频检测方法、装置、设备及计算机可读存储介质
WO2017020794A1 (zh) 一种交互系统的语音识别方法和装置
CN110837586B (zh) 问答匹配方法、系统、服务器及存储介质
CN109102824B (zh) 基于人机交互的语音纠错方法和装置
US20150088513A1 (en) Sound processing system and related method
WO2017107843A1 (zh) 周期性任务的处理方法和装置及计算机程序和可读介质
WO2016029799A1 (zh) 信息搜索方法、装置及电子设备
Sonderegger Phonetic and phonological dynamics on reality television
CN109979450A (zh) 信息处理方法、装置及电子设备
CN107977394A (zh) 绘本识别方法及电子设备
RU2015152415A (ru) Ответ мультимодального поиска
CN111081117A (zh) 一种书写检测方法及电子设备
CN107729491B (zh) 提高题目答案搜索的准确率的方法、装置及设备
CN110175242B (zh) 基于知识图谱的人机交互联想方法、装置及介质
WO2017157067A1 (zh) 一种电子书的翻页方法及装置
WO2017128303A1 (zh) 一种房产网房源搜索方法及系统
WO2016155626A1 (zh) 实现搜索提示的装置、系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16832282

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16832282

Country of ref document: EP

Kind code of ref document: A1