WO2021135561A1 - 技能语音唤醒方法及装置 - Google Patents

技能语音唤醒方法及装置 Download PDF

Info

Publication number
WO2021135561A1
WO2021135561A1 PCT/CN2020/123643 CN2020123643W WO2021135561A1 WO 2021135561 A1 WO2021135561 A1 WO 2021135561A1 CN 2020123643 W CN2020123643 W CN 2020123643W WO 2021135561 A1 WO2021135561 A1 WO 2021135561A1
Authority
WO
WIPO (PCT)
Prior art keywords
business
wake
skill
information
knowledge
Prior art date
Application number
PCT/CN2020/123643
Other languages
English (en)
French (fr)
Inventor
朱成亚
Original Assignee
思必驰科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 思必驰科技股份有限公司 filed Critical 思必驰科技股份有限公司
Priority to EP20909040.6A priority Critical patent/EP4086892A4/en
Priority to US17/758,075 priority patent/US11721328B2/en
Priority to JP2022540758A priority patent/JP7436077B2/ja
Publication of WO2021135561A1 publication Critical patent/WO2021135561A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the invention belongs to the field of Internet technology, and in particular relates to a method and device for awakening skills by voice.
  • voice wake-up technology has made significant developments in the field of smart devices such as smart homes.
  • knowledge skills and business skills exist in smart devices, and knowledge skills are used to provide corresponding knowledge question and answer services for users of smart devices, for example, to provide corresponding answers to individual user questions.
  • business skills are used to provide corresponding business services for users of smart devices, such as music services, taxi services, weather checking services, and so on.
  • a general smart device recognizes the user's voice, it cannot recognize whether the user's voice is intended to wake up business skills or knowledge skills, which results in wrong call of skills.
  • the smart speaker receives the user's voice message "Who is Li Chen's mother”, if the music skills are awakened, the song “Who is the mother” of the singer “Li Chen” will be played.
  • the answer "Li Chen's mother is XXX” will be broadcasted and responded.
  • the embodiments of the present invention provide a skill voice wake-up method and device, which are used to solve at least one of the above technical problems.
  • an embodiment of the present invention provides a skill voice wake-up method, applied to an electronic device, the method includes: identifying wake-up text information corresponding to a voice request message to be processed; calling a business skill semantic model to determine the wake-up text The target business field corresponding to the information and the corresponding first confidence level, and the knowledge and skill semantic model is called to determine the knowledge reply answer corresponding to the wake-up text information and the corresponding second confidence level; according to the first confidence level and the corresponding second confidence level According to the second confidence level, one of the awakening knowledge skills and the target business skills corresponding to the target business field is selected.
  • an embodiment of the present invention provides a skill voice wake-up device, which is applied to an electronic device, and the device includes: a voice recognition unit configured to recognize wake-up text information corresponding to a voice request message to be processed; and a model calling unit , Is configured to call the business skills semantic model to determine the target business field corresponding to the wake-up text information and the corresponding first confidence, and call the knowledge skills semantic model to determine the knowledge reply answer corresponding to the wake-up text information and the corresponding The second confidence level; the skill awakening unit is configured to select one of the awakening knowledge skill and the target business skill corresponding to the target business field according to the first confidence level and the second confidence level.
  • an embodiment of the present invention provides an electronic device, which includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores the memory that can be used by the at least one processor. Executed instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the steps of the foregoing method.
  • an embodiment of the present invention provides a storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the foregoing method are implemented.
  • the beneficial effect of the embodiments of the present invention is that when a voice request message is received, the business skills semantic model and the knowledge skills semantic model are used to determine the corresponding business domain and the reply answer in parallel, and output the corresponding confidence level, so that the corresponding confidence level can be outputted. Knowledge skills or target business skills are awakened. As a result, the matching degree of the voice message with respect to the business skills and the knowledge skills is compared, which can reduce the probability of falsely waking up skills based on the voice message.
  • Fig. 1 shows a flowchart of an example of a skill voice wake-up method according to an embodiment of the present invention
  • Fig. 2 shows a flowchart of an example of operations performed by invoking a business skill semantic model according to an embodiment of the present invention
  • Fig. 3 shows a flowchart of an example of an operation of determining service relevance information according to an embodiment of the present invention
  • Fig. 4 shows a principle flow chart of an example of a method for awakening music skills by voice according to an embodiment of the present invention.
  • Fig. 5 shows a structural block diagram of an example of a skill voice wake-up device according to an embodiment of the present invention.
  • the invention may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, elements, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the present invention can also be practiced in distributed computing environments in which tasks are performed by remote processing devices connected through a communication network.
  • program modules can be located in local and remote computer storage media including storage devices.
  • module refers to related parts applied to a computer, such as hardware, a combination of hardware and software, software or software in execution, and so on.
  • an element can be, but is not limited to, a process, a processor, an object, an executable element, an execution thread, a program, and/or a computer running on a processor.
  • the application program or script program running on the server, and the server can all be components.
  • One or more elements can be in the process and/or thread of execution, and the elements can be localized on one computer and/or distributed between two or more computers, and can be run by various computer-readable media .
  • the component can also be based on a signal with one or more data packets, for example, a signal from a data that interacts with another component in a local system, a distributed system, and/or through a signal on the Internet to interact with other systems. Local and/or remote process to communicate.
  • the present invention provides a skill voice wake-up method and device, which can be applied to an electronic device, and the electronic device can be a terminal device or a server.
  • the terminal device can be, for example, any electronic device with human-machine voice interaction functions such as smart speakers, car machines, smart TVs, smart phones, tablet computers, smart watches, etc., which is not limited in the present invention
  • the server can be a terminal device The server equipment that provides technical support for human-machine voice interaction.
  • Fig. 1 shows a flowchart of an example of a skill voice wake-up method according to an embodiment of the present invention.
  • the electronic device recognizes the wake-up text information corresponding to the voice request message to be processed.
  • the voice request message may be collected by the microphone collection device of the smart voice device.
  • various voice recognition technologies can be used to determine the wake-up text information corresponding to the voice request message, which should not be restricted here.
  • the electronic device calls the business skill semantic model to determine the target business field corresponding to the wake-up text information and the corresponding first confidence level, and calls the knowledge skills semantic model to determine the knowledge reply answer corresponding to the wake-up text information And the corresponding second confidence level.
  • the business skill semantic model and the knowledge skill semantic model can be called in parallel to make predictions simultaneously, and output corresponding prediction results and confidence levels.
  • the business skill semantic model can be semantically trained using a business domain tag set
  • the knowledge skills semantic model can be semantically trained using a knowledge question and answer tag set, and various training methods can be used, which should not be restricted here.
  • the electronic device selects one of the awakening knowledge skill and the target business skill corresponding to the target business field according to the first confidence level and the second confidence level. For example, when the first degree of confidence is greater than the second degree of confidence, you can choose to wake up the target business skills, and when the first degree of confidence is less than or equal to the second degree of confidence, you can choose to wake up knowledge and skills.
  • the confidence level can represent the predicted probability, which can be a value between 0 and 1. Therefore, through this embodiment, the predicted probability of the user's voice intention between the music skill and the knowledge skill can be compared, and the probability of the skill being awakened by mistake can be reduced.
  • Fig. 2 shows a flowchart of an example of an operation performed by invoking a business skill semantic model according to an embodiment of the present invention.
  • the electronic device extracts the wake-up service keyword and wake-up service entity information in the wake-up text information.
  • the wake-up service keyword may be a keyword with a service attribute in the wake-up text information
  • the wake-up service entity information may be a word with an entity attribute in the wake-up text information.
  • the key word of the wake-up service may be “unforgettable tonight”, and the information of the wake-up service entity may be the singer “Li Guyi”.
  • various keyword extraction models can be used to extract the above-mentioned wake-up business keywords and wake-up business entity information, which should not be restricted here.
  • the electronic device determines whether there is a wake-up service keyword and corresponding wake-up service entity information in the service entity database.
  • the business entity database includes multiple business keywords and corresponding business entity information, and one business keyword can correspond to multiple business entity information.
  • multiple singers such as "Li Guyi”, “Dong Wenhua”, and “Zhang Ye” corresponding to "Unforgettable Tonight" are stored in the business entity database.
  • the data information in the business entity database can be pre-collected and configured, which can reflect the relationship between the business keywords and the business entity, such as which singers have sung a song with the same title, or the same name Which actors have performed the movie, and so on.
  • step 220 If there is a wake-up service keyword and corresponding wake-up service entity information in the business entity database in step 220, skip to step 231. In addition, if there is no wake-up service keyword and corresponding wake-up service entity information in the business entity database in step 220, skip to step 232.
  • step 231 the electronic device provides the wake-up business keyword to the business skill semantic model to determine the target business field and the corresponding first confidence.
  • the wake-up business keyword and the corresponding wake-up business entity information are "Unforgettable Tonight” and "Li Guyi” respectively, then "Unforgettable Tonight” can be directly provided to the business skill semantic model for prediction operations.
  • the electronic device obtains the business relevance information corresponding to the wake-up business keyword.
  • the awakening business keywords can be provided to the business correlation analysis tool, and the corresponding business correlation information can be obtained from the business correlation analysis tool.
  • the corresponding business relevance information can also be determined by analyzing the heat information corresponding to the wake-up business keywords, and the specific details will be expanded below.
  • step 240 the electronic device provides the wake-up business keywords and business relevance information to the business skill semantic model to determine the target business field and the corresponding first confidence.
  • the input of the business skills semantic model includes business relevance information in addition to awakening business keywords, that is, business relevance information can affect the business skills semantic prediction process, making the determined first confidence in the target business field The result of the degree has a higher accuracy.
  • Fig. 3 shows a flowchart of an example of the operation of determining service relevance information according to an embodiment of the present invention.
  • the electronic device determines, based on the search engine, the business popularity information corresponding to the wake-up business keyword and the search result business-related index.
  • a search engine is invoked based on business keywords that wake up to obtain the above-mentioned business popularity information and search result business-related indicators from the search engine, or search results are obtained from the search engine and the corresponding business is determined through analysis Popularity information and search results business-related indicators.
  • the search result business-related index can be used to reflect the degree of correlation between the search result and the business determined by the search engine.
  • the search result corresponding to the keyword of the wake-up service is determined based on the search engine.
  • the search result business-related index corresponding to the search result is determined.
  • a predetermined number for example, 10
  • the business-related indicators of the search results can have multi-level indicators of strong correlation, general correlation, or weak correlation.
  • the wake-up business keyword can be provided to the search engine to determine the corresponding first search result
  • the wake-up business keyword and the business name corresponding to the target business field can be provided to the search engine to determine the corresponding second search result.
  • the first search result and the second search result can be evaluated, so as to determine the corresponding search result business-related indicators, for example, comprehensive consideration of the correlation between the first search result, the second search result and the business .
  • the electronic device determines the business relevance information based on the business popularity information and the search result business-related indicators.
  • business relevance information may include business popularity information and search result business-related indicators, or business popularity information and search result business-related indicators may have a weight configuration for business relevance information.
  • Fig. 4 shows a principle flow chart of an example of a method for voice awakening of musical skills according to an embodiment of the present invention.
  • the popularity information and search information of the music song name can be obtained by crawling, so that the song name will be brought with the popularity information, search information, etc., and confidence information during semantic analysis.
  • the process of obtaining search information can be to directly put "song name” into the search engine to determine whether the first entry is a music entry, and if the first entry is not a music entry, the search engine Search for "song name + "song”" (for example, "song kiss goodbye") and determine whether the first entry is a musical entry. Therefore, the search information can have multiple search results related states.
  • a music knowledge base is built by itself, and all corresponding singer lists can be searched by song names.
  • semantic analysis finds that the semantic slot only has song name + artist name, check the music knowledge base to see if it matches. If they match, compare the confidence level of task-based skills with the confidence level of knowledge-based skills. In addition, if there is no match, combine the song popularity information, search information and confidence information to recalculate the new confidence, and then re-compare the knowledge skill confidence and task skill confidence.
  • the task-based skills semantic analysis and knowledge-based skills are scheduled in parallel, and the task-based skills return the semantic analysis results of multiple fields (including slot semantic slot information and confidence information, if the semantic slot It is the name of the song, including the popularity and search information), knowledge-based skills return information such as answer results and confidence.
  • the task-based semantic analysis result and the knowledge-based skill scheduling fusion module are used.
  • the song name + artist name does not match, recalculate the task-based skill analysis confidence (combined search information, popularity information, and confidence information), and then call the fusion module to select the task type or the knowledge type, if the task is selected Type, multiple task-based skills are selected through the fusion algorithm, if a knowledge-based skill is selected, directly organize the agreement to return.
  • the task-based skill analysis confidence combined search information, popularity information, and confidence information
  • Table 1 shows the experimental data table before and after using the voice awakening method of musical skills according to the embodiment of the present invention.
  • TP positive type, hit task skills
  • TN negative type hit knowledge skills
  • FP negative type identified as positive type
  • FN positive type identified as negative type
  • recall recall rate (TP/ (TP+FN)); precision: accuracy (TP/(TP+FP)); accuracy: accuracy ((TP+TN)/(TP+FP+TN+FN)); F value: (2*precision *recall/(precision+recall)).
  • Fig. 5 shows a structural block diagram of an example of a skill voice wake-up device according to an embodiment of the present invention.
  • the skill voice awakening device 500 includes a voice recognition unit 510, a model calling unit 520 and a skill awakening unit 530.
  • the voice recognition unit 510 is configured to recognize wake-up text information corresponding to the voice request message to be processed.
  • the operation of the voice recognition unit 510 can refer to the description above with reference to step 110 in FIG. 1.
  • the model calling unit 520 is configured to call the business skill semantic model to determine the target business field corresponding to the wake-up text information and the corresponding first confidence level, and call the knowledge skill semantic model to determine the knowledge reply answer corresponding to the wake-up text information And the corresponding second confidence level.
  • the operation of the model calling unit 520 may refer to the description above with reference to step 120 in FIG. 1.
  • the skill awakening unit 530 is configured to select one of the awakening knowledge skill and the target business skill corresponding to the target business field according to the first confidence level and the second confidence level.
  • the operation of the skill awakening unit 530 may refer to the description above with reference to step 130 in FIG. 1.
  • the device in the foregoing embodiment of the present invention can be used to execute the corresponding method embodiment of the present invention, and correspondingly achieve the technical effects achieved by the foregoing method embodiment of the present invention, which will not be repeated here.
  • a hardware processor (hardware processor) may be used to implement related functional modules.
  • an embodiment of the present invention provides a storage medium on which a computer program is stored, and the program is executed by a processor to perform the steps of the above skill voice wake-up method.
  • the electronic devices of the embodiments of the present invention exist in various forms, including but not limited to:
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • the device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each implementation manner can be implemented by means of software plus a general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solution essentially or the part that contributes to the related technology can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk , CD-ROM, etc., including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Abstract

一种技能语音唤醒方法及装置,应用于电子设备,该技能语音唤醒方法包括:识别待处理的语音请求消息所对应的唤醒文本信息(S110);调用业务技能语义模型确定该唤醒文本信息所对应的目标业务领域和相应的第一置信度,以及调用知识技能语义模型确定该唤醒文本信息所对应的知识回复答案和相应的第二置信度(S120);根据该第一置信度和该第二置信度,选择唤醒知识技能和与该目标业务领域相对应的目标业务技能中的一者(S130)。由此,可以降低根据语音消息而错误唤醒技能的概率。

Description

技能语音唤醒方法及装置 技术领域
本发明属于互联网技术领域,尤其涉及一种技能语音唤醒方法及装置。
背景技术
随着语音技术和人工智能技术的不断发展,语音唤醒技术在诸如智能家居等智能设备领域取得了重大的发展。
目前,在智能设备中存在知识技能和业务技能,知识技能被用来为智能设备的用户提供相应的知识问答服务,例如能够就各个用户提问提供相应的答案。另外,业务技能被用来为智能设备的用户提供相应的业务服务,例如音乐服务、打车服务、查天气服务等等。
但是,一般的智能设备在对用户语音进行识别时,无法识别出用户语音的意图是用来唤醒业务技能的还是用来唤醒知识技能的,而致使技能的错误调用。示例性地,当智能音箱收到用户语音消息“李晨的妈妈是谁”,如果唤醒音乐技能,则会将歌手“李晨”的歌曲“妈妈是谁”进行播放,另外,如果唤醒知识技能,则会将答案“李晨的妈妈是XXX”进行播报回复。
针对上述问题,目前业界暂无较佳的解决方案。
发明内容
本发明实施例提供一种技能语音唤醒方法及装置,用于至少解决上述技术问题之一。
第一方面,本发明实施例提供一种技能语音唤醒方法,应用于电子设备,所述方法包括:识别待处理的语音请求消息所对应的唤醒文本信息;调用业务技能语义模型确定所述唤醒文本信息所对应的目标业务领域和相应的第一置信度,以及调用知识技能语义模型确定所述唤醒文本信息所对应的知识回复答案和相应的第二置信度;根据所述第一置信度和所述第 二置信度,选择唤醒知识技能和与所述目标业务领域相对应的目标业务技能中的一者。
第二方面,本发明实施例提供一种技能语音唤醒装置,应用于电子设备,所述装置包括:语音识别单元,被配置为识别待处理的语音请求消息所对应的唤醒文本信息;模型调用单元,被配置为调用业务技能语义模型确定所述唤醒文本信息所对应的目标业务领域和相应的第一置信度,以及调用知识技能语义模型确定所述唤醒文本信息所对应的知识回复答案和相应的第二置信度;技能唤醒单元,被配置为根据所述第一置信度和所述第二置信度,选择唤醒知识技能和与所述目标业务领域相对应的目标业务技能中的一者。
第三方面,本发明实施例提供一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述方法的步骤。
第四方面,本发明实施例提供一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述方法的步骤。
本发明实施例的有益效果在于:在收到语音请求消息时,利用业务技能语义模型和知识技能语义模型来平行地确定相对应的业务领域和回复答案,并输出相应的置信度,进而可以对知识技能或目标业务技能进行唤醒。由此,对比了语音消息针对业务技能和知识技能的匹配度,可以降低根据语音消息而错误唤醒技能的概率。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了根据本发明实施例的技能语音唤醒方法的一示例的流程图;
图2示出了根据本发明实施例的调用业务技能语义模型执行的操作的 一示例的流程图;
图3示出了根据本发明实施例的确定业务相关度信息的操作的一示例的流程图;
图4示出了根据本发明实施例的音乐技能语音唤醒方法的一示例的原理流程图;和
图5示出了根据本发明实施例的技能语音唤醒装置的一示例的结构框图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。
本发明可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、元件、数据结构等等。也可以在分布式计算环境中实践本发明,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
在本发明中,“模块”、“系统”等等指应用于计算机的相关部分,如硬件、硬件和软件的组合、软件或执行中的软件等。详细地说,例如,元件可以、但不限于是运行于处理器的过程、处理器、对象、可执行元件、执行线程、程序和/或计算机。还有,运行于服务器上的应用程序或脚本程序、服务器都可以是元件。一个或多个元件可在执行的过程和/或线程中,并且元件可以在一台计算机上本地化和/或分布在两台或多台计算机之间,并可以由各种计算机可读介质运行。元件还可以根据具有一个或多个数据包的信号,例如,来自一个与本地系统、分布式系统中另一元件交互 的,和/或在因特网的网络通过信号与其它系统交互的数据的信号通过本地和/或远程过程来进行通信。
最后,还需要说明的是,在本文中,术语“包括”、“包含”,不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
为解决现有技术中存在的问题,本发明提供一种技能语音唤醒方法及装置,可以应用于电子设备,该电子设备可以为终端设备或者服务器端。其中,终端设备例如可以是智能音响、车机、智能电视、智能手机、平板电脑、智能手表等任何具备人机语音交互功能的电子设备,本发明对此不作限定;服务器端可以是为终端设备提供实现人机语音交互的技术支持的服务提供上的服务器设备。
图1示出了根据本发明实施例的技能语音唤醒方法的一示例的流程图。
如图1所示,在步骤110中,电子设备识别待处理的语音请求消息所对应的唤醒文本信息。这里,语音请求消息可以是通过智能语音设备的麦克风采集设备进行采集而得到的。另外,可以通过各种语音识别技术来确定语音请求消息所对应的唤醒文本信息,在此应不进行限制。
在步骤120中,电子设备调用业务技能语义模型确定所述唤醒文本信息所对应的目标业务领域和相应的第一置信度,以及调用知识技能语义模型确定所述唤醒文本信息所对应的知识回复答案和相应的第二置信度。示例性地,可以平行调用业务技能语义模型和知识技能语义模型,以同步进行预测,并输出相应的预测结果和置信度。另外,业务技能语义模型可以是以业务领域标签集进行语义训练的,知识技能语义模型可以是以知识问答标签集进行语义训练的,并且可以采用各种训练方式,在此应不加限制。
在步骤130中,电子设备根据第一置信度和第二置信度,选择唤醒知识技能和与目标业务领域相对应的目标业务技能中的一者。例如,当第一置信度大于第二置信度时,可以选择唤醒目标业务技能,而当第一置信度小于或等于第二置信度时,可以选择唤醒知识技能。这里,置信度可以表 示预测概率,其可以是取0到1之间的数值。因此,通过本实施例,可以对比用户语音意图在音乐技能和知识技能之间的预测概率,降低了技能被误唤醒的概率。
图2示出了根据本发明实施例的调用业务技能语义模型执行的操作的一示例的流程图。
如图2所示,在步骤210中,电子设备提取唤醒文本信息中的唤醒业务关键词和唤醒业务实体信息。这里,唤醒业务关键词可以是在唤醒文本信息中具有业务属性的关键词,唤醒业务实体信息可以是在唤醒文本信息中具有实体属性的词语。示例性地,唤醒业务关键词可以是“难忘今宵”,而唤醒业务实体信息可以是歌手“李谷一”。另外,可以采用各种关键词抽取模型来抽取上述的唤醒业务关键词和唤醒业务实体信息,在此应不加限制。
在步骤220中,电子设备判断业务实体数据库中是否存在唤醒业务关键词和相应的唤醒业务实体信息。这里,业务实体数据库包括多个业务关键词和相应的业务实体信息,并且一个业务关键词可以与多个业务实体信息相对应。继上面的示例,在业务实体数据库中存储了与“难忘今宵”相对应的诸如“李谷一”、“董文华”、“张也”之类的多个演唱者。另外,业务实体数据库中的数据信息可以是通过预先收集而进行配置的,其能够反映出业务关键词与业务实体之间的关系,例如同一歌名的歌曲被哪些歌手所演唱过,或同一名称的电影被哪些演员所演绎过,等等。
如果在步骤220中的业务实体数据库中存在唤醒业务关键词和相应的唤醒业务实体信息时跳转至步骤231。另外,如果在步骤220中的业务实体数据库中不存在唤醒业务关键词和相应的唤醒业务实体信息时跳转至步骤232。
在步骤231中,电子设备将唤醒业务关键词提供给业务技能语义模型,以确定目标业务领域和相应的第一置信度。继上面的示例,如果唤醒业务关键词和相应的唤醒业务实体信息分别是“难忘今宵”和“李谷一”,则可以直接将“难忘今宵”提供给业务技能语义模型,以进行预测操作。
在步骤232中,电子设备获取唤醒业务关键词所对应的业务相关度信息。示例性地,可以将唤醒业务关键词提供给业务相关度分析工具,并从 业务相关度分析工具来得到相应的业务相关度信息。另外,还可以通过分析唤醒业务关键词所对应的热度信息等来确定相应的业务相关度信息,具体细节将在下文中展开。
在步骤240中,电子设备将唤醒业务关键词和业务相关度信息提供给业务技能语义模型,以确定目标业务领域和相应的第一置信度。此时,业务技能语义模型的输入除了唤醒业务关键词之外,还有业务相关度信息,亦即业务相关度信息能够影响业务技能语义预测过程,使得所确定的针对目标业务领域的第一置信度的结果具有较高的精确度。
图3示出了根据本发明实施例的确定业务相关度信息的操作的一示例的流程图。
在步骤310中,电子设备基于搜索引擎确定所述唤醒业务关键词所对应的业务热度信息和搜索结果业务相关指标。示例性地,基于唤醒业务关键词调用搜索引擎,以从该搜索引擎来得到上述的业务热度信息和搜索结果业务相关指标,或者,从该搜索引擎来得到搜索结果并通过分析来确定相应的业务热度信息和搜索结果业务相关指标。这里,搜索结果业务相关指标可以用来反映搜索引擎所确定的搜索结果与业务之间的相关程度。
在一些实施方式中,基于搜索引擎确定唤醒业务关键词所对应的搜索结果。另外,基于预配置的搜索结果评价策略,确定该搜索结果所对应的搜索结果业务相关指标。示例性地,可以利用排序靠前的预定数量(例如,10个)的搜索结果针对目标业务领域进行相关度评价,并且搜索结果业务相关指标可以具有强相关、一般相关或弱相关的多级指标。
为了确保业务相关指标的精确度,还可以进行检索式变型的方式来进行多次检索,以确定业务相关指标。具体地,可以将唤醒业务关键词提供给搜索引擎,以确定相应的第一搜索结果,并且将唤醒业务关键词和目标业务领域所对应的业务名称提供给搜索引擎,以确定相应的第二搜索结果。进而,通过搜索结果评价策略可以对第一搜索结果和第二搜索结果进行评价,从而确定相应的搜索结果业务相关指标,例如综合考虑第一搜索结果、第二搜索结果与业务之间的相关性。
在步骤320中,电子设备基于业务热度信息和搜索结果业务相关指标,确定业务相关度信息。例如,业务相关度信息可以包括业务热度信息和搜 索结果业务相关指标,或者业务热度信息与搜索结果业务相关指标针对业务相关度信息存在权重配置。
图4示出了根据本发明实施例的音乐技能语音唤醒方法的一示例的原理流程图。
关于本实施例中的业务技能可以是针对各种业务的,而在下文的实施例中仅结合音乐技能作为示例来展开描述。
需说明的是,在音乐领域,基于正则匹配的方法,存在“歌曲名+歌手名”这样的唤醒方式,并常被应用在很多音箱产品中。例如,当用户直接说“刘德华的忘情水”,是可以直接播放歌曲“忘情水”的。为了方便扩展,“刘德华”绑定了相应的歌曲名词库,“忘情水”也绑定了相应的歌手名词库,两个词库中信息很多,容易出现业务技能和知识技能误调用的情况。
在目前相关技术中,一般是直接删除歌曲名或歌手名词库中对应的说法。但是,这样会导致在真正说指定歌曲名或歌手名时,语义解析失败。此外,在一些应用场景下,用户还期望即使用户说错了歌手信息,也可以根据歌曲信息进行音乐播放操作,例如当用户说出“刘德华的吻别”时,确实匹配不到,但仍然可以播放歌曲吻别。
在本实施例中,可以通过爬虫来获取音乐歌曲名的热度信息和搜索信息,这样在语义解析时会将歌曲名带上热度信息、搜索信息等以及置信度信息。这里,关于搜索信息的获取过程,可以是直接将“歌曲名”放入搜索引擎以判断第一个词条是否是音乐类词条,并且如果第一个词条不是音乐类词条,搜索引擎搜索“歌曲名+‘歌曲’”(例如,“歌曲吻别”)并判断第一个词条是否是音乐类词条。因此,搜索信息可以存在多种搜索结果相关状态。
此外,在本实施例中,自建了音乐知识库,可以通过歌曲名查找所有对应的歌手列表
另外,语义解析出语义槽只有歌曲名+歌手名时,检查音乐知识库,查看是否匹配。如果匹配,再比较任务型技能置信度与知识型技能置信度。此外,如果不匹配,结合歌曲热度信息、搜索信息和置信度信息来重新计算新置信度,后重新比较知识型技能置信度和任务型技能置信度。
如图4所示的流程中,在文本输入后,平行调度任务型技能语义解析 和知识型技能,任务型技能返回多个领域语义解析结果(包含slot语义槽信息和置信度信息,如果语义槽是歌曲名,包含热度和搜索信息),知识型技能返回回答结果以及置信度等信息。
另外,得到两边结果后,判别任务型返回的是否包含音乐领域(可能一句话任务型返回了多个领域解析结果)。
然后,如果返回的业务领域包含音乐领域,判别解析的语义槽是否是单纯的“歌曲名+歌手名”。
然后,如果是“歌曲名+歌手名”,调用音乐知识库,判别歌曲名和歌手名是否匹配,
然后,如果歌曲名+歌手名匹配,走任务型语义解析结果和知识型技能调度融合模块。
然后,如果歌曲名+歌手名不匹配,重新计算任务型技能解析置信度(结合搜索信息,热度信息,置信度信息),后再调用融合模块,选定任务型或知识型,如果选定任务型,多个任务型技能在通过融合算法选出一个,如果选定知识型技能,直接组织协议返回。
表1示出了使用本发明实施例的音乐技能语音唤醒方法前后的实验数据表。
Figure PCTCN2020123643-appb-000001
表1
如表1所示,TP:正类,命中任务型技能;TN:负类命中知识型技能;FP:负类判别为正类;FN:正类判别为负类;recall:召回率(TP/(TP+FN));precision:精确率(TP/(TP+FP));accuracy:准确率((TP+TN)/(TP+FP+TN+FN));F值:(2*precision*recall/(precision+recall))。
不能看出,优化前后相比,F值提升了5%。另外,如果再调整融合算法或对case by case进行优化,会取得更佳的优化效果。
图5示出了根据本发明实施例的技能语音唤醒装置的一示例的结构框图。
如图5所示,技能语音唤醒装置500包括语音识别单元510、模型调用单元520和技能唤醒单元530。
语音识别单元510被配置为识别待处理的语音请求消息所对应的唤醒文本信息。语音识别单元510的操作可以参照上面参考图1中的步骤110的描述。
模型调用单元520被配置为调用业务技能语义模型确定所述唤醒文本信息所对应的目标业务领域和相应的第一置信度,以及调用知识技能语义模型确定所述唤醒文本信息所对应的知识回复答案和相应的第二置信度。模型调用单元520的操作可以参照上面参考图1中的步骤120的描述。
技能唤醒单元530被配置为根据所述第一置信度和所述第二置信度,选择唤醒知识技能和与所述目标业务领域相对应的目标业务技能中的一者。技能唤醒单元530的操作可以参照上面参考图1中的步骤130的描述。
上述本发明实施例的装置可用于执行本发明中相应的方法实施例,并相应的达到上述本发明方法实施例所达到的技术效果,这里不再赘述。
本发明实施例中可以通过硬件处理器(hardware processor)来实现相关功能模块。
另一方面,本发明实施例提供一种存储介质,其上存储有计算机程序,该程序被处理器执行如上的技能语音唤醒方法的步骤。
上述产品可执行本发明实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本发明实施例所提供的方法。
本发明实施例的电子设备以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC 设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(4)其他具有数据交互功能的电子装置。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种技能语音唤醒方法,应用于电子设备,所述方法包括:
    识别待处理的语音请求消息所对应的唤醒文本信息;
    调用业务技能语义模型确定所述唤醒文本信息所对应的目标业务领域和相应的第一置信度,以及调用知识技能语义模型确定所述唤醒文本信息所对应的知识回复答案和相应的第二置信度;
    根据所述第一置信度和所述第二置信度,选择唤醒知识技能和与所述目标业务领域相对应的目标业务技能中的一者。
  2. 如权利要求1所述的方法,其中,所述调用业务技能语义模型确定所述唤醒文本信息所对应的目标业务领域和相应的第一置信度包括:
    提取所述唤醒文本信息中的唤醒业务关键词和唤醒业务实体信息;
    判断业务实体数据库中是否存在所述唤醒业务关键词和相应的所述唤醒业务实体信息,所述业务实体数据库包括多个业务关键词和相应的业务实体信息;
    当所述业务实体数据库中存在所述唤醒业务关键词和所述相应的所述唤醒业务实体信息时,将所述唤醒业务关键词提供给所述业务技能语义模型,以确定所述目标业务领域和相应的第一置信度。
  3. 如权利要求2所述的方法,其中,当所述业务实体数据库中不存在所述唤醒业务关键词或所述相应的所述唤醒业务实体信息时,所述方法还包括:
    获取所述唤醒业务关键词所对应的业务相关度信息;
    将所述唤醒业务关键词和所述业务相关度信息提供给所述业务技能语义模型,以确定所述目标业务领域和相应的第一置信度。
  4. 如权利要求3所述的方法,其中,所述获取所述唤醒业务关键词所对应的业务相关度信息包括:
    基于搜索引擎确定所述唤醒业务关键词所对应的业务热度信息和搜索结果业务相关指标;
    基于所述业务热度信息和所述搜索结果业务相关指标,确定所述业务相关度信息。
  5. 如权利要求4所述的方法,其中,所述基于搜索引擎确定所述唤醒业务关键词所对应的业务热度信息和搜索结果业务相关指标包括:
    基于搜索引擎确定所述唤醒业务关键词所对应的搜索结果;
    基于预配置的搜索结果评价策略,确定所述搜索结果所对应的搜索结果业务相关指标。
  6. 如权利要求5所述的方法,其中,所述基于搜索引擎确定所述唤醒业务关键词所对应的搜索结果包括:
    将所述唤醒业务关键词提供给所述搜索引擎,以确定相应的第一搜索结果;以及
    将所述唤醒业务关键词和所述目标业务领域所对应的业务名称提供给所述搜索引擎,以确定相应的第二搜索结果。
  7. 如权利要求1所述的方法,其中,所述目标业务技能包括音乐技能。
  8. 一种技能语音唤醒装置,应用于电子设备,所述装置包括:
    语音识别单元,被配置为识别待处理的语音请求消息所对应的唤醒文本信息;
    模型调用单元,被配置为调用业务技能语义模型确定所述唤醒文本信息所对应的目标业务领域和相应的第一置信度,以及调用知识技能语义模型确定所述唤醒文本信息所对应的知识回复答案和相应的第二置信度;
    技能唤醒单元,被配置为根据所述第一置信度和所述第二置信度,选择唤醒知识技能和与所述目标业务领域相对应的目标业务技能中的一者。
  9. 一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处 理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述方法的步骤。
  10. 一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1-7中任一项所述方法的步骤。
PCT/CN2020/123643 2019-12-31 2020-10-26 技能语音唤醒方法及装置 WO2021135561A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20909040.6A EP4086892A4 (en) 2019-12-31 2020-10-26 VOICE WAKE-UP METHOD AND DEVICE FOR COMPETENCES
US17/758,075 US11721328B2 (en) 2019-12-31 2020-10-26 Method and apparatus for awakening skills by speech
JP2022540758A JP7436077B2 (ja) 2019-12-31 2020-10-26 スキルの音声ウェイクアップ方法および装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911422397.2A CN111081225B (zh) 2019-12-31 2019-12-31 技能语音唤醒方法及装置
CN201911422397.2 2019-12-31

Publications (1)

Publication Number Publication Date
WO2021135561A1 true WO2021135561A1 (zh) 2021-07-08

Family

ID=70321405

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123643 WO2021135561A1 (zh) 2019-12-31 2020-10-26 技能语音唤醒方法及装置

Country Status (5)

Country Link
US (1) US11721328B2 (zh)
EP (1) EP4086892A4 (zh)
JP (1) JP7436077B2 (zh)
CN (1) CN111081225B (zh)
WO (1) WO2021135561A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081225B (zh) * 2019-12-31 2022-04-01 思必驰科技股份有限公司 技能语音唤醒方法及装置
CN111949178B (zh) * 2020-08-13 2022-02-22 百度在线网络技术(北京)有限公司 技能切换方法、装置、设备以及存储介质
US20230008868A1 (en) * 2021-07-08 2023-01-12 Nippon Telegraph And Telephone Corporation User authentication device, user authentication method, and user authentication computer program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107316643A (zh) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 语音交互方法及装置
CN107657949A (zh) * 2017-04-14 2018-02-02 深圳市人马互动科技有限公司 游戏数据的获取方法及装置
US20180052823A1 (en) * 2016-08-17 2018-02-22 Yahoo Holdings, Inc. Hybrid Classifier for Assigning Natural Language Processing (NLP) Inputs to Domains in Real-Time
CN108694942A (zh) * 2018-04-02 2018-10-23 浙江大学 一种基于家居智能服务机器人的智能家居交互问答系统
CN109658271A (zh) * 2018-12-19 2019-04-19 前海企保科技(深圳)有限公司 一种基于保险专业场景的智能客服系统及方法
US20190130912A1 (en) * 2011-12-08 2019-05-02 Sri International Generic virtual personal assistant platform
CN110299136A (zh) * 2018-03-22 2019-10-01 上海擎感智能科技有限公司 一种用于语音识别的处理方法及其系统
CN111081225A (zh) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 技能语音唤醒方法及装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070050371A1 (en) * 2005-08-26 2007-03-01 Trumba Corporation Interacting with an online database through a variety of communications media
US7917368B2 (en) * 2008-02-25 2011-03-29 Mitsubishi Electric Research Laboratories, Inc. Method for interacting with users of speech recognition systems
US8359020B2 (en) * 2010-08-06 2013-01-22 Google Inc. Automatically monitoring for voice input based on context
JP2013190985A (ja) 2012-03-13 2013-09-26 Sakae Takeuchi 知識応答システム、方法およびコンピュータプログラム
US10354650B2 (en) * 2012-06-26 2019-07-16 Google Llc Recognizing speech with mixed speech recognition models to generate transcriptions
US9058805B2 (en) * 2013-05-13 2015-06-16 Google Inc. Multiple recognizer speech recognition
JP6324249B2 (ja) 2014-07-22 2018-05-16 アルパイン株式会社 電子装置、音声認識システムおよび音声認識プログラム
CN105070288B (zh) 2015-07-02 2018-08-07 百度在线网络技术(北京)有限公司 车载语音指令识别方法和装置
US9871927B2 (en) 2016-01-25 2018-01-16 Conduent Business Services, Llc Complexity aware call-steering strategy in heterogeneous human/machine call-center environments
US20180054523A1 (en) 2016-08-16 2018-02-22 Rulai, Inc. Method and system for context sensitive intelligent virtual agents
US10449440B2 (en) * 2017-06-30 2019-10-22 Electronic Arts Inc. Interactive voice-controlled companion application for a video game
CN107134279B (zh) 2017-06-30 2020-06-19 百度在线网络技术(北京)有限公司 一种语音唤醒方法、装置、终端和存储介质
JP7316271B2 (ja) * 2017-10-10 2023-07-27 サノフイ 医療照会回答装置
CN107871506A (zh) * 2017-11-15 2018-04-03 北京云知声信息技术有限公司 语音识别功能的唤醒方法及装置
CN108335696A (zh) * 2018-02-09 2018-07-27 百度在线网络技术(北京)有限公司 语音唤醒方法和装置
CN109493863A (zh) * 2018-12-26 2019-03-19 广州灵聚信息科技有限公司 一种智能唤醒方法和装置
CN110570861B (zh) * 2019-09-24 2022-02-25 Oppo广东移动通信有限公司 用于语音唤醒的方法、装置、终端设备及可读存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130912A1 (en) * 2011-12-08 2019-05-02 Sri International Generic virtual personal assistant platform
US20180052823A1 (en) * 2016-08-17 2018-02-22 Yahoo Holdings, Inc. Hybrid Classifier for Assigning Natural Language Processing (NLP) Inputs to Domains in Real-Time
CN107657949A (zh) * 2017-04-14 2018-02-02 深圳市人马互动科技有限公司 游戏数据的获取方法及装置
CN107316643A (zh) * 2017-07-04 2017-11-03 科大讯飞股份有限公司 语音交互方法及装置
CN110299136A (zh) * 2018-03-22 2019-10-01 上海擎感智能科技有限公司 一种用于语音识别的处理方法及其系统
CN108694942A (zh) * 2018-04-02 2018-10-23 浙江大学 一种基于家居智能服务机器人的智能家居交互问答系统
CN109658271A (zh) * 2018-12-19 2019-04-19 前海企保科技(深圳)有限公司 一种基于保险专业场景的智能客服系统及方法
CN111081225A (zh) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 技能语音唤醒方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4086892A4 *

Also Published As

Publication number Publication date
CN111081225A (zh) 2020-04-28
EP4086892A4 (en) 2023-05-31
JP2023506087A (ja) 2023-02-14
CN111081225B (zh) 2022-04-01
US11721328B2 (en) 2023-08-08
US20230075023A1 (en) 2023-03-09
EP4086892A1 (en) 2022-11-09
JP7436077B2 (ja) 2024-02-21

Similar Documents

Publication Publication Date Title
WO2021135561A1 (zh) 技能语音唤醒方法及装置
US20210173834A1 (en) Method and system for capturing and exploiting user intent in a conversational interaction based information retrieval system
US9582757B1 (en) Scalable curation system
CN109165302B (zh) 多媒体文件推荐方法及装置
WO2018188586A1 (zh) 一种用户注册方法、装置及电子设备
US9734193B2 (en) Determining domain salience ranking from ambiguous words in natural speech
WO2018045646A1 (zh) 基于人工智能的人机交互方法和装置
CN111566638B (zh) 向应用编程接口添加描述性元数据以供智能代理使用
CN106407393B (zh) 一种用于智能设备的信息处理方法及装置
CN111522909B (zh) 一种语音交互方法及服务器
CN108735210A (zh) 一种语音控制方法及终端
CN109979450B (zh) 信息处理方法、装置及电子设备
CN111832308B (zh) 语音识别文本连贯性处理方法和装置
AU2017276360B2 (en) A system for the automated semantic analysis processing of query strings
CN108538294A (zh) 一种语音交互方法及装置
CN111341308A (zh) 用于输出信息的方法和装置
US20230133146A1 (en) Method and apparatus for determining skill field of dialogue text
CN111444321B (zh) 问答方法、装置、电子设备和存储介质
CN108153875B (zh) 语料处理方法、装置、智能音箱和存储介质
CN109948155A (zh) 一种多意图的选择方法及装置、终端设备
CN115705378A (zh) 一种资源推荐方法、装置及电子设备
US20220108691A1 (en) Dynamic Expansion of Acronyms in Audio Content
CN110019705A (zh) 一种信息处理方法、装置和用于信息处理的装置
WO2019066132A1 (ko) 보안성을 강화한 사용자 문맥 기반 인증 방법, 대화형 ai 에이전트 시스템 및 컴퓨터 판독가능 기록 매체
CN115658914A (zh) 音乐知识图谱构建方法、电子设备、存储介质及空调

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20909040

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022540758

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020909040

Country of ref document: EP

Effective date: 20220801