WO2021073298A1 - Speech information processing method and apparatus, and intelligent terminal and storage medium - Google Patents

Speech information processing method and apparatus, and intelligent terminal and storage medium Download PDF

Info

Publication number
WO2021073298A1
WO2021073298A1 PCT/CN2020/112928 CN2020112928W WO2021073298A1 WO 2021073298 A1 WO2021073298 A1 WO 2021073298A1 CN 2020112928 W CN2020112928 W CN 2020112928W WO 2021073298 A1 WO2021073298 A1 WO 2021073298A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
recognition information
knowledge
speech recognition
facts
Prior art date
Application number
PCT/CN2020/112928
Other languages
French (fr)
Chinese (zh)
Inventor
胡广绪
宋德超
贾巨涛
吴伟
赵鹏辉
Original Assignee
珠海格力电器股份有限公司
珠海联云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海格力电器股份有限公司, 珠海联云科技有限公司 filed Critical 珠海格力电器股份有限公司
Publication of WO2021073298A1 publication Critical patent/WO2021073298A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Definitions

  • the present disclosure relates to the field of machine learning, and in particular to a method, device, smart terminal, and storage medium for processing voice information.
  • the embodiments of the present disclosure provide a voice information processing method, device, smart terminal, and storage medium to solve the problem that when a user obtains data in related technologies, multiple return results often appear, such as “multiple meanings” and " The phenomenon of "multiple words with one meaning” makes it impossible to truly understand the questions raised by the user, and cannot accurately identify and understand the user's intention.
  • embodiments of the present disclosure provide a method for processing voice information, the method including:
  • the purpose and intention of the voice recognition information is determined according to the stored relevant knowledge facts and the structured query sentence.
  • the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence includes:
  • the weights of the different intentions determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
  • the weights of different intents of the structured query sentence in the corresponding scenarios are obtained by labeling the weights of different intents of each keyword in different scenarios through big data mining analysis;
  • the weights of structured query sentences for different intentions in the corresponding scenarios are obtained through deep learning method training.
  • the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence includes:
  • extracting relevant knowledge facts of the speech recognition information through the constructed knowledge graph model includes:
  • the set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
  • the relevant knowledge facts include knowledge refined through the knowledge graph model and specific data information of the speech recognition information.
  • storing the relevant knowledge facts includes:
  • the specific data information of the voice recognition information is stored in the data layer.
  • the embodiments of the present disclosure also provide a voice information processing device, the device including:
  • the conversion module is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, wherein the unstructured speech recognition information is obtained through speech recognition;
  • the extraction module is configured to perform the extraction of relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;
  • the determining module is configured to execute the purpose of determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
  • the determining module is configured to execute:
  • the weights of the different intentions determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
  • the weights of different intents of the structured query sentence in the corresponding scenarios are obtained by labeling the weights of different intents of each keyword in different scenarios through big data mining analysis;
  • the weights of structured query sentences with different intentions in the corresponding scenarios are obtained through deep learning device training.
  • the determining module is configured to execute:
  • An entity attribute sorting device based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting device;
  • the extraction module is configured to execute:
  • the set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
  • the relevant knowledge facts include knowledge refined through the knowledge graph model and specific data information of the speech recognition information.
  • the extraction module is configured to execute:
  • the specific data information of the voice recognition information is stored in the data layer.
  • the embodiments of the present disclosure also provide a smart terminal, including:
  • the memory is set to store program instructions
  • the processor is configured to call the program instructions stored in the memory, according to the method for processing voice information according to any one of the first aspects of the obtained program.
  • an embodiment of the present disclosure further provides a computer storage medium, wherein the computer storage medium stores computer-executable instructions, and the computer-executable instructions are configured to cause a computer to execute any of the embodiments of the present disclosure.
  • the method, device, smart terminal, and storage medium for processing voice information provided by the embodiments of the present disclosure first convert the received unstructured voice recognition information into structured query sentences, wherein the unstructured voice recognition
  • the information is obtained through speech recognition; in some embodiments, the relevant knowledge facts of the speech recognition information are extracted through the constructed knowledge graph model, and the relevant knowledge facts are stored; finally, the relevant knowledge facts are combined according to the stored relevant knowledge facts.
  • the purpose of the voice recognition information is determined in combination with the structured query sentence. In this way, the problem of ambiguity between different semantics and different intentions in different scenarios can be solved, so that the problem raised by the user can be truly understood, and the user's intention can be accurately recognized and understood.
  • FIG. 1 is a flowchart of a method for processing voice information according to an embodiment of the disclosure
  • FIG. 2 is a specific implementation flowchart of a method for processing voice information provided by an embodiment of the disclosure
  • FIG. 3 is a schematic structural diagram of a voice information processing apparatus provided by an embodiment of the disclosure.
  • FIG. 4 is a schematic structural diagram of a smart terminal provided by an embodiment of the disclosure.
  • voice interaction mechanisms are simple, fast, and interactive. More and more devices adopt voice interaction, and they have gradually become people's first choice.
  • the inventor found that a current problem is that when a user obtains data, there will often be multiple return results, and the phenomenon of "multi-sense of one word” and “multi-sense of one word” appear, so that the problem raised by the user cannot be truly understood. Unable to accurately identify and understand user intent.
  • the present disclosure provides a method for processing voice information.
  • the method is based on knowledge graphs and machine learning technologies, which solves the problem of semantic intention ambiguity in different scenarios, realizes accurate recognition and understanding of user intentions, and improves user participation Experience, to solve the phenomenon of "multiple meanings in one word” and "multiple words in one meaning” during voice input.
  • FIG. 1 is a flowchart of a method for processing voice information according to an embodiment of the present disclosure, including:
  • Step 101 Convert the received unstructured speech recognition information into a structured query sentence, where the unstructured speech recognition information is obtained through speech recognition.
  • the user can obtain the user's voice information when issuing a voice control command for a voice device such as a smart air conditioner.
  • a voice device such as a smart air conditioner.
  • the voice device After the voice device receives the user's voice information, it uploads the user's voice information to the cloud service platform, and the cloud service platform further analyzes and recognizes the user's voice information to obtain the voice recognition information.
  • the speech recognition information is an unstructured text sentence.
  • the user's intention can be obtained from the text sentence, but the speech device cannot understand the user's intention based on the unstructured speech recognition information, so the speech recognition information needs to be sent to the knowledge
  • the map server performs the next step of processing.
  • the knowledge graph server will convert the unstructured voice recognition information into structured query sentences after receiving the voice recognition information sent by the cloud service platform.
  • the voice device can analyze the real intention of the user based on the structured query sentence and combined with different scenarios and knowledge graphs.
  • Step 102 Extract the relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts.
  • the knowledge graph server is built with the architecture model of the knowledge graph.
  • the knowledge graph model starts from the most primitive data (including structured query sentence information and unstructured speech recognition information) and adopts a series of automatic or semi-automatic technologies.
  • the original database is a database that stores structured query sentence information, semi-structured speech recognition information, and unstructured speech recognition information.
  • the third-party database introduced here refers to the storage of knowledge in a certain professional field. The role of the database is to expand the different scenarios and different intentions corresponding to the speech recognition information, so as to ensure the accuracy of the purpose and intention understanding.
  • the relevant knowledge facts include specific data information of the knowledge refined through the knowledge graph model and the voice recognition information.
  • the knowledge refined by the knowledge graph model is stored in the core of the knowledge graph model, which is the pattern layer; the specific data information of the speech recognition information is stored in the data layer.
  • multiple structured query sentences of unstructured speech recognition information are combined with other information materials used to construct a knowledge graph, and combined with specific data information in the database for knowledge fusion.
  • the knowledge graph server needs to extract keywords in the speech recognition information; and determine the set of entity attributes corresponding to the keywords in the knowledge graph model, and use the set of entity attributes as the Representation of relevant knowledge facts of speech recognition information. For example, if the voice control command sent by the user is "I want to buy an apple", the keyword in the user's voice message is extracted as "apple”, and the entity attribute corresponding to the keyword "apple” in the knowledge graph model is determined.
  • the collection includes "apples (fruits of the genus Rosaceae), apples (apple), apples (products of Apple), and apples (person's name)", and the obtained set of attributes of the entity is used as the relevant knowledge of the user's voice control command this time Representation of facts. It should be noted that, in this embodiment, it can be obtained that when the user obtains the intention, multiple return results appear at this time, and the phenomenon of "one word with multiple meanings” and "one meaning with multiple words” appears. If you want to determine the user's accuracy If it is intended, step 103 is further executed.
  • Step 103 Determine the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
  • the context information of the voice recognition information and the application device information are used to determine the scene corresponding to the structured query sentence.
  • the scene corresponding to the structured query sentence may contain multiple different intentions; To determine the user’s intention in the intention, the purpose and intention can be judged according to the weight of different intentions.
  • the method for obtaining the weight of the intention of the structured query sentence in the corresponding scene in some embodiments, after big data mining and analysis, the weight of the structured query sentence in the corresponding scene is marked with different intentions; or , In some embodiments, it is obtained by deep learning method training.
  • the entity attributes with the highest correlation obtained after sorting according to the weights are determined, wherein the entity attributes are included in the stored related knowledge facts. For example, refer to Table 1, when the voice control command issued by the user is "I want to buy an apple", where the voice control command has different intent weights, such as:
  • the weight value of A1 is greater than the values of A2, A3, and A4, and the entity attribute with the highest correlation after this sorting can be obtained as "fruit of Rosaceae Apple”. , Which means that the user’s intention is to buy the "apple” for food.
  • an entity attribute ranking method based on deep learning is used to determine the entity attribute with the highest relevance obtained according to the ranking method; returning the entity attribute to determine the purpose of the speech recognition information.
  • an entity attribute ranking method based on CNN Convolutional Neural Network, a deep learning algorithm
  • the CNN trains a neural network through a deep learning method to determine the weights of different intentions of the scene and The question sequence and the word vector of the entity attribute sequence are sorted.
  • the entity attribute with the highest correlation is obtained according to the sorted records, and the entity attribute is used as the purpose of speech recognition information.
  • the voice information processing method provided by the present disclosure, through the combination of the knowledge graph server and machine learning and other technologies, solves the problem of semantic intention ambiguity with different intentions in different scenarios, and can accurately recognize and understand the user's intention, thereby It improves the user's participation experience, and solves the phenomenon of "multiple meanings in one word” and “multiple words in one meaning” during voice input.
  • FIG. 2 a specific implementation flowchart of a voice information processing method provided by an embodiment of the present disclosure is further described for specific implementation manners of the present disclosure, including:
  • Step 201 The voice recognition module receives a voice control command issued by the user.
  • Step 202 Upload the user voice information received according to the voice control command to the cloud service platform.
  • the cloud service platform will perform preliminary analysis and recognition of the voice recognition information obtained after receiving the user's voice information.
  • Step 203 The cloud service system sends the voice recognition information to the knowledge graph server to identify the user's purpose and intention.
  • the unstructured speech recognition information is first converted into a structured query sentence , And further perform the following processing, including:
  • Step B1 Perform knowledge extraction on semi-structured speech recognition information data and unstructured speech recognition information data.
  • the keywords in the user identification information can be obtained through knowledge extraction. For example, if the voice control command issued by the user is "I want to buy an apple", the keyword in the user's voice message is extracted as "apple”.
  • Step B2 Data integration of structured speech recognition information data and third-party database data.
  • the structured query sentences of multiple speech unstructured speech recognition information are combined with other information materials used to construct the knowledge graph, and the specific data information in the database is combined for knowledge fusion.
  • the entity attributes of different keywords with different intentions in different scenarios can be obtained.
  • step B1 and step B2 do not limit the execution order.
  • Step B3 Combine the keywords obtained from the knowledge extraction with the information after data integration to obtain a representation of the relevant knowledge facts.
  • keywords can be determined after knowledge extraction, and the data integration information contains the entity attributes corresponding to the keywords. Therefore, the combination of the two can obtain the identification of the relevant knowledge facts corresponding to the keywords.
  • the relevant knowledge facts include all the entity attributes of the keywords.
  • Step B4 Perform purpose intention reasoning based on the obtained relevant knowledge facts.
  • the purpose intention reasoning is optional.
  • Equipment information is determined.
  • the entity attribute with the highest correlation obtained after sorting according to the weight is determined, and the entity attribute is returned to determine the purpose intention of the speech recognition information.
  • the entity attribute ranking method based on deep learning determines the entity attribute with the highest correlation obtained according to the ranking method; and returns the entity attribute determined as the purpose of the speech recognition information.
  • Step B5 Verify, evaluate and filter the acquired relevant knowledge facts through the quality verification platform.
  • the obtained relevant knowledge facts can be verified and evaluated, and the prior knowledge facts that do not meet the specifications and requirements can be filtered. Thereby, the accuracy of the final goal intention can be improved.
  • Step B6 Perform timely knowledge updates on the relevant knowledge facts obtained.
  • timely updating the obtained representation of the relevant knowledge facts also helps to improve the accuracy of the finally obtained purpose intention.
  • step B5 and step B6 are optional, and are only used to improve the accuracy of the final purpose intention obtained, and are not intended to limit the specific implementation.
  • FIG. 3 is a voice information processing device provided by an embodiment of the present disclosure.
  • the device includes: a conversion module 301, an extraction module 302, and a determination module 303.
  • the conversion module 301 is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, where the unstructured speech recognition information is obtained through speech recognition;
  • the extraction module 302 is configured to extract relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;
  • the determining module 303 is configured to determine the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
  • the determining module 303 is configured to execute:
  • the weights of the different intentions determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
  • the weights of different intents of the structured query sentence in the corresponding scenarios are obtained by labeling the weights of different intents of each keyword in different scenarios through big data mining analysis;
  • the weights of structured query sentences with different intentions in the corresponding scenarios are obtained through deep learning device training.
  • the determining module 303 is configured to execute:
  • An entity attribute sorting device based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting device;
  • the extraction module 302 is configured to execute:
  • the set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
  • the relevant knowledge facts include knowledge refined through the knowledge graph model and specific data information of the speech recognition information.
  • the extraction module 302 is configured to execute:
  • the specific data information of the voice recognition information is stored in the data layer.
  • the smart terminal according to the present disclosure may at least include at least one processor and at least one memory.
  • the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps in the voice information processing method according to various exemplary embodiments of the present disclosure described above in this specification. For example, the processor may execute step 101 to step 103 as shown in FIG. 1.
  • the smart terminal 40 according to this embodiment of the present disclosure will be described below with reference to FIG. 4.
  • the smart terminal 40 shown in FIG. 4 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the smart terminal 40 is represented in the form of a general smart terminal.
  • the components of the smart terminal 40 may include, but are not limited to: the aforementioned at least one processor 41, the aforementioned at least one memory 42, and a bus 43 connecting different system components (including the memory 42 and the processor 41).
  • the bus 43 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a processor, or a local bus using any bus structure among multiple bus structures.
  • the memory 42 may include a readable medium in the form of a volatile memory, such as a random access memory (RAM) 421 and/or a cache memory 422, and may further include a read-only memory (ROM) 423.
  • RAM random access memory
  • ROM read-only memory
  • the memory 42 may also include a program/utility tool 425 having a set of (at least one) program modules 424.
  • program modules 424 include but are not limited to: an operating system, one or more application programs, other program modules, and program data. Each of the examples or some combination may include the realization of a network environment.
  • the smart terminal 40 can also communicate with one or more external devices 44 (such as keyboards, pointing devices, etc.), and/or with any device that enables the smart terminal 40 to communicate with one or more other smart terminals (such as routers, Modem, etc.) communication. This communication can be performed through an input/output (I/O) interface 45.
  • the smart terminal 40 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 46. As shown in the figure, the network adapter 46 communicates with other modules for the smart terminal 40 through the bus 43.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • various aspects of the smart terminal control method provided in the present disclosure can also be implemented in the form of a program product, which includes a computer program.
  • the program product runs on a computer device
  • the computer program is set.
  • the computer device may execute step 101 to step 103 as shown in FIG. 1.
  • the program product can adopt any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the program product for solving the ambiguity of the semantic intention of the scene of the embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include a computer program, and may be run on a smart terminal.
  • CD-ROM portable compact disk read-only memory
  • the program product of the present disclosure is not limited thereto.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, in which a readable computer program is carried. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the computer program contained on the readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wired, optical cable, RF, etc., or any suitable combination of the above.
  • the computer program for performing the operations of the present disclosure can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language.
  • the computer program can be executed entirely on the target target's intelligent terminal, partly executed on the target target device, executed as an independent software package, partly executed on the target target's intelligent terminal and partly executed on the remote intelligent terminal, or entirely on the remote intelligent terminal. Execute on the terminal or server.
  • the remote smart terminal can be connected to the target smart terminal through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external smart terminal (for example, using the Internet)
  • LAN local area network
  • WAN wide area network
  • the service provider comes to connect via the Internet).
  • the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of computer program products implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable computer programs.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a speech information processing method and apparatus, and an intelligent terminal and a storage medium, which relate to the technical field of machine learning. The method comprises: converting received unstructured speech identification information into a structured inquiry statement, wherein the unstructured speech identification information is obtained by means of speech identification; extracting a related knowledge fact of the speech identification information by means of a constructed knowledge graph model, and storing the related knowledge fact; determining, according to the stored related knowledge fact and in combination with the structured inquiry statement, an intention of the speech identification information. By means of the method, the problem that the intention of a user cannot be accurately identified since questions mentioned by the user cannot be really understood due to the phenomena of "polysemy" and "one meaning to multiple words" caused by a plurality of return results often appearing when a user acquires data in the related technology can be solved.

Description

一种语音信息的处理方法、装置、智能终端以及存储介质Method, device, intelligent terminal and storage medium for processing voice information
本公开要求于2019年10月18日提交中国专利局、申请号为201910994726.4、发明名称为“一种语音信息的处理方法、装置、智能终端以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 18, 2019, the application number is 201910994726.4, and the invention title is "a voice information processing method, device, smart terminal, and storage medium", all of which The content is incorporated into this disclosure by reference.
技术领域Technical field
本公开涉及机器学习领域,尤其涉及一种语音信息的处理方法、装置、智能终端以及存储介质。The present disclosure relates to the field of machine learning, and in particular to a method, device, smart terminal, and storage medium for processing voice information.
背景技术Background technique
当前随着物联网的快速发展,语音交互机制由于简单、快捷、交互性好等优点,越来越多的设备采用语音交互,并且逐渐成为人们的首选方式。但是发明人发现,目前存在一个问题就是当用户获取数据时,往往会出现多个返回结果,出现“一词多义”、“一义多词”现象,从而不能真正理解用户所提出的问题,无法做到精确识别理解用户意图。With the rapid development of the Internet of Things, voice interaction mechanisms are simple, fast, and interactive. More and more devices adopt voice interaction, and they have gradually become the first choice for people. However, the inventor found that a current problem is that when a user obtains data, there will often be multiple return results, and the phenomenon of "multi-sense of one word" and "multi-sense of one word" appear, so that the problem raised by the user cannot be truly understood. Unable to accurately identify and understand user intent.
发明内容Summary of the invention
本公开实施例提供一种语音信息的处理方法、装置、智能终端以及存储介质,用以解决相关技术中当用户获取数据时,往往会出现多个返回结果,出现“一词多义”、“一义多词”现象,从而不能真正理解用户所提出的问题,无法做到精确识别理解用户意图的问题。The embodiments of the present disclosure provide a voice information processing method, device, smart terminal, and storage medium to solve the problem that when a user obtains data in related technologies, multiple return results often appear, such as "multiple meanings" and " The phenomenon of "multiple words with one meaning" makes it impossible to truly understand the questions raised by the user, and cannot accurately identify and understand the user's intention.
第一方面,本公开实施例提供一种语音信息的处理方法,所述方法包括:In the first aspect, embodiments of the present disclosure provide a method for processing voice information, the method including:
将接收到的非结构化语音识别信息转化为结构化的查询语句,其中,所述非结构化语音识别信息是通过语音识别得到的;Converting the received unstructured speech recognition information into structured query sentences, where the unstructured speech recognition information is obtained through speech recognition;
通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,并将所述相关知识事实进行存储;Extracting relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and storing the relevant knowledge facts;
根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图。The purpose and intention of the voice recognition information is determined according to the stored relevant knowledge facts and the structured query sentence.
在一些实施方式中,所述根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图,包括:In some embodiments, the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence includes:
确定所述结构化的查询语句对应的场景,其中,所述场景下包含多个不同意图;所述对应的场景是通过语音识别信息的上下文信息以及应用设备信息确定的;Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;
根据所述不同意图的权重,确定按照权重排序后得到的相关性最高的实体属性,其中,所述实体属性包含在所述存储的相关知识事实中;According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
在一些实施方式中,结构化的查询语句在对应场景下不同意图的权重是通过大数据挖掘分析后对各关键词在不同场景下不同意图的权重进行标记得到的;或In some embodiments, the weights of different intents of the structured query sentence in the corresponding scenarios are obtained by labeling the weights of different intents of each keyword in different scenarios through big data mining analysis; or
结构化的查询语句在对应场景下不同意图的权重是通过深度学习方法训练得到的。The weights of structured query sentences for different intentions in the corresponding scenarios are obtained through deep learning method training.
在一些实施方式中,所述根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图,包括:In some embodiments, the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence includes:
基于深度学习的实体属性排序方法,确定根据所述排序方法得到的相关性最高的实体属性;An entity attribute sorting method based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting method;
返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
在一些实施方式中,所述通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,包括:In some implementation manners, extracting relevant knowledge facts of the speech recognition information through the constructed knowledge graph model includes:
将多个语音非结构化语音识别信息的结构化的查询语句、并结合其他的用于构建知识图谱的信息素材,并结合数据库中的具体数据信息进行知识融合, 其中所述数据库包括所述语音识别信息使用的数据库;Combine structured query sentences of multiple speech unstructured speech recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the speech Identify the database used for the information;
提取语音识别信息中的关键词;Extract keywords in speech recognition information;
确定所述关键词在知识图谱模型中对应的实体属性的集合,并将所述实体属性的集合作为所述语音识别信息的相关知识事实的表示。The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
在一些实施方式中,所述相关知识事实包括经过所述知识图谱模型提炼过的知识和所述语音识别信息的具体数据信息。In some embodiments, the relevant knowledge facts include knowledge refined through the knowledge graph model and specific data information of the speech recognition information.
在一些实施方式中,所述将所述相关知识事实进行存储,包括:In some embodiments, storing the relevant knowledge facts includes:
将所述经过知识图谱模型提炼过的知识存储到模式层;Storing the knowledge refined by the knowledge graph model in the pattern layer;
将所述语音识别信息的具体数据信息存储到数据层。The specific data information of the voice recognition information is stored in the data layer.
第二方面,本公开实施例还提供一种语音信息的处理装置,所述装置包括:In a second aspect, the embodiments of the present disclosure also provide a voice information processing device, the device including:
转化模块,被设置为执行将接收到的非结构化语音识别信息转化为结构化的查询语句,其中,所述非结构化语音识别信息是通过语音识别得到的;The conversion module is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, wherein the unstructured speech recognition information is obtained through speech recognition;
提取模块,被设置为执行通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,并将所述相关知识事实进行存储;The extraction module is configured to perform the extraction of relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;
确定模块,被设置为执行根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图。The determining module is configured to execute the purpose of determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
在一些实施方式中,所述确定模块,被设置为执行:In some embodiments, the determining module is configured to execute:
确定所述结构化的查询语句对应的场景,其中,所述场景下包含多个不同意图;所述对应的场景是通过语音识别信息的上下文信息以及应用设备信息确定的;Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;
根据所述不同意图的权重,确定按照权重排序后得到的相关性最高的实体属性,其中,所述实体属性包含在所述存储的相关知识事实中;According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
在一些实施方式中,结构化的查询语句在对应场景下不同意图的权重是通过大数据挖掘分析后对各关键词在不同场景下不同意图的权重进行标记得到的;或In some embodiments, the weights of different intents of the structured query sentence in the corresponding scenarios are obtained by labeling the weights of different intents of each keyword in different scenarios through big data mining analysis; or
结构化的查询语句在对应场景下不同意图的权重是通过深度学习装置训练得到的。The weights of structured query sentences with different intentions in the corresponding scenarios are obtained through deep learning device training.
在一些实施方式中,所述确定模块,被设置为执行:In some embodiments, the determining module is configured to execute:
基于深度学习的实体属性排序装置,确定根据所述排序装置得到的相关性最高的实体属性;An entity attribute sorting device based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting device;
返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
在一些实施方式中,所述提取模块,被设置为执行:In some embodiments, the extraction module is configured to execute:
将多个语音非结构化语音识别信息的结构化的查询语句、并结合其他的用于构建知识图谱的信息素材,并结合数据库中的具体数据信息进行知识融合,其中所述数据库包括所述语音识别信息使用的数据库;Combine multiple structured query sentences of unstructured voice recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the voice Identify the database used for the information;
提取语音识别信息中的关键词;Extract keywords in speech recognition information;
确定所述关键词在知识图谱模型中对应的实体属性的集合,并将所述实体属性的集合作为所述语音识别信息的相关知识事实的表示。The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
在一些实施方式中,所述相关知识事实包括经过所述知识图谱模型提炼过的知识和所述语音识别信息的具体数据信息。In some embodiments, the relevant knowledge facts include knowledge refined through the knowledge graph model and specific data information of the speech recognition information.
在一些实施方式中,所述提取模块,被设置为执行:In some embodiments, the extraction module is configured to execute:
将所述经过知识图谱模型提炼过的知识存储到模式层;Storing the knowledge refined by the knowledge graph model in the pattern layer;
将所述语音识别信息的具体数据信息存储到数据层。The specific data information of the voice recognition information is stored in the data layer.
第三方面,本公开实施例还提供一种智能终端,包括:In a third aspect, the embodiments of the present disclosure also provide a smart terminal, including:
存储器以及处理器;Memory and processor;
存储器,被设置为存储程序指令;The memory is set to store program instructions;
处理器,被设置为调用所述存储器中存储的程序指令,按照获得的程序第一方面任一项所述的语音信息的处理方法。The processor is configured to call the program instructions stored in the memory, according to the method for processing voice information according to any one of the first aspects of the obtained program.
第四方面,本公开实施例还提供一种计算机存储介质,其中,所述计算机存储介质存储有计算机可执行指令,所述计算机可执行指令被设置为使计算机执行本公开实施例中的任一项所述的语音信息的处理方法。In a fourth aspect, an embodiment of the present disclosure further provides a computer storage medium, wherein the computer storage medium stores computer-executable instructions, and the computer-executable instructions are configured to cause a computer to execute any of the embodiments of the present disclosure. The voice information processing method described in the item.
本公开实施例提供的一种语音信息的处理方法、装置、智能终端以及存储介质,首先将接收到的非结构化语音识别信息转化为结构化的查询语句,其中,所述非结构化语音识别信息是通过语音识别得到的;在一些实施方式中,通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,并将所述相关知识事实进行存储;最后,根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图。通过该方式解决不同场景下语义不同意图有歧义的问题,从而可以实现真正理解用户所提出的问题,并做到精确识别理解用户的意图。The method, device, smart terminal, and storage medium for processing voice information provided by the embodiments of the present disclosure first convert the received unstructured voice recognition information into structured query sentences, wherein the unstructured voice recognition The information is obtained through speech recognition; in some embodiments, the relevant knowledge facts of the speech recognition information are extracted through the constructed knowledge graph model, and the relevant knowledge facts are stored; finally, the relevant knowledge facts are combined according to the stored relevant knowledge facts. The purpose of the voice recognition information is determined in combination with the structured query sentence. In this way, the problem of ambiguity between different semantics and different intentions in different scenarios can be solved, so that the problem raised by the user can be truly understood, and the user's intention can be accurately recognized and understood.
本公开的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本公开而了解。本公开的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present disclosure will be described in the following specification, and partly become obvious from the specification, or understood by implementing the present disclosure. The objectives and other advantages of the present disclosure can be realized and obtained through the structures specifically pointed out in the written description, claims, and drawings.
附图说明Description of the drawings
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例中所需要使用的附图作简单地介绍,显而易见地,下面所介绍的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings that need to be used in the embodiments of the present disclosure. Obviously, the drawings described below are only some embodiments of the present disclosure. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1为本公开实施例提供的一种语音信息的处理方法的流程图;FIG. 1 is a flowchart of a method for processing voice information according to an embodiment of the disclosure;
图2为本公开实施例提供的一种语音信息的处理方法的具体实施流程图;2 is a specific implementation flowchart of a method for processing voice information provided by an embodiment of the disclosure;
图3为本公开实施例提供的一种语音信息的处理装置的结构示意图;FIG. 3 is a schematic structural diagram of a voice information processing apparatus provided by an embodiment of the disclosure;
图4为本公开实施例提供的一种智能终端的结构示意图。FIG. 4 is a schematic structural diagram of a smart terminal provided by an embodiment of the disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。In order to make the objectives, technical solutions, and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present disclosure.
相关技术中,随着物联网的快速发展,语音交互机制由于简单、快捷、交互性好等优点,越来越多的设备采用语音交互,并且逐渐成为人们的首选方式。但是发明人发现,目前存在一个问题就是当用户获取数据时,往往会出现多个返回结果,出现“一词多义”、“一义多词”现象,从而不能真正理解用户所提出的问题,无法做到精确识别理解用户意图。In related technologies, with the rapid development of the Internet of Things, voice interaction mechanisms are simple, fast, and interactive. More and more devices adopt voice interaction, and they have gradually become people's first choice. However, the inventor found that a current problem is that when a user obtains data, there will often be multiple return results, and the phenomenon of "multi-sense of one word" and "multi-sense of one word" appear, so that the problem raised by the user cannot be truly understood. Unable to accurately identify and understand user intent.
有鉴于此,本公开提供一种语音信息的处理方法,该方法基于知识图谱和机器学习等技术,解决了不同场景下存在的语义意图歧义问题,可实现精确识别理解用户意图,提高了用户参与体验,解决语音输入时出现“一词多义”、“一义多词”现象。参阅图1,为本公开实施例提供的一种语音信息的处理方法的流程图,包括:In view of this, the present disclosure provides a method for processing voice information. The method is based on knowledge graphs and machine learning technologies, which solves the problem of semantic intention ambiguity in different scenarios, realizes accurate recognition and understanding of user intentions, and improves user participation Experience, to solve the phenomenon of "multiple meanings in one word" and "multiple words in one meaning" during voice input. Refer to FIG. 1, which is a flowchart of a method for processing voice information according to an embodiment of the present disclosure, including:
步骤101:将接收到的非结构化语音识别信息转化为结构化的查询语句,其中,所述非结构化语音识别信息是通过语音识别得到的。Step 101: Convert the received unstructured speech recognition information into a structured query sentence, where the unstructured speech recognition information is obtained through speech recognition.
其中,用户对语音设备例如智能空调,进行语音交互发出语音控制命令时可得到用户语音信息。当语音设备接收到用户语音信息后,将用户的上传到云服务平台,云服务平台进一步对用户语音信息进行解析识别处理得到的语音识别信息。该语音识别信息是非结构化的文本语句,从该文本语句中可得到用户的意图,但语音设备不能根据该非结构化的语音识别信息理解用户的意图,故此需要将该语音识别信息发送给知识图谱服务器进行下一步地处理。Among them, the user can obtain the user's voice information when issuing a voice control command for a voice device such as a smart air conditioner. After the voice device receives the user's voice information, it uploads the user's voice information to the cloud service platform, and the cloud service platform further analyzes and recognizes the user's voice information to obtain the voice recognition information. The speech recognition information is an unstructured text sentence. The user's intention can be obtained from the text sentence, but the speech device cannot understand the user's intention based on the unstructured speech recognition information, so the speech recognition information needs to be sent to the knowledge The map server performs the next step of processing.
为了使语音设备可以准确的理解用户的目的意图,知识图谱服务器在接收到云服务平台发送的语音识别信息之后会将该非结构化语音识别信息转化为结构化的查询语句。通过用户语音信息对应的结构化的查询语句,可以使语音设备根据结构化的查询语句并结合不同的场景和知识图谱分析出用户的真正的意图。In order to enable the voice device to accurately understand the user's purpose and intention, the knowledge graph server will convert the unstructured voice recognition information into structured query sentences after receiving the voice recognition information sent by the cloud service platform. Through the structured query sentence corresponding to the user's voice information, the voice device can analyze the real intention of the user based on the structured query sentence and combined with different scenarios and knowledge graphs.
步骤102:通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,并将所述相关知识事实进行存储。Step 102: Extract the relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts.
其中,知识图谱服务器中构建有知识图谱的体系架构模型,知识图谱模型 从最原始的数据(包括结构化的查询语句信息和非结构化的语音识别信息)出发,采用一系列自动或者半自动的技术手段,从原始数据库和第三方数据库中提取相关知识事实。需要说明的是,原始数据库为存储结构化的查询语句信息、半结构化的语音识别信息、非结构化的语音识别信息的数据库,这里引入的第三方数据库是指存储的某一专业领域知识的数据库,其作用是为了扩充语音识别信息对应的不同场景及不同意图,从而保证目的意图理解的准确性。Among them, the knowledge graph server is built with the architecture model of the knowledge graph. The knowledge graph model starts from the most primitive data (including structured query sentence information and unstructured speech recognition information) and adopts a series of automatic or semi-automatic technologies. Means to extract relevant knowledge and facts from the original database and third-party databases. It should be noted that the original database is a database that stores structured query sentence information, semi-structured speech recognition information, and unstructured speech recognition information. The third-party database introduced here refers to the storage of knowledge in a certain professional field. The role of the database is to expand the different scenarios and different intentions corresponding to the speech recognition information, so as to ensure the accuracy of the purpose and intention understanding.
上述方法中,相关知识事实包括经过知识图谱模型提炼过的知识和语音识别信息的具体数据信息。并且将经过知识图谱模型提炼过的知识存储到知识图谱模型的核心即模式层中;将语音识别信息的具体数据信息存储到数据层中。In the above method, the relevant knowledge facts include specific data information of the knowledge refined through the knowledge graph model and the voice recognition information. And the knowledge refined by the knowledge graph model is stored in the core of the knowledge graph model, which is the pattern layer; the specific data information of the speech recognition information is stored in the data layer.
在一个实施例中,将多个语音非结构化语音识别信息的结构化的查询语句、并结合其他的用于构建知识图谱的信息素材,并结合数据库中的具体数据信息进行知识融合。此外,知识图谱服务器为理解用户的意图,需要提取语音识别信息中的关键词;并确定所述关键词在知识图谱模型中对应的实体属性的集合,并将所述实体属性的集合作为所述语音识别信息的相关知识事实的表示。例如,用户发出的语音控制命令为“我想买苹果”,则提取出该用户语音信息中的关键词为“苹果”,并确定该关键词“苹果”在知识图谱模型中对应的实体属性的集合有“苹果(蔷薇科苹果属果实)、苹果(苹果公司)、苹果(苹果公司产品)、苹果(人名)”,并且将得到的该实体属性的集合作为用户此次语音控制命令的相关知识事实的表示。需要说明的是,在这个实施例中,可得到这时候当用户获取意图时就出现了多个返回结果,出现“一词多义”、“一义多词”现象,若要确定用户的准确意图,则进一步执行步骤103。In one embodiment, multiple structured query sentences of unstructured speech recognition information are combined with other information materials used to construct a knowledge graph, and combined with specific data information in the database for knowledge fusion. In addition, in order to understand the user's intention, the knowledge graph server needs to extract keywords in the speech recognition information; and determine the set of entity attributes corresponding to the keywords in the knowledge graph model, and use the set of entity attributes as the Representation of relevant knowledge facts of speech recognition information. For example, if the voice control command sent by the user is "I want to buy an apple", the keyword in the user's voice message is extracted as "apple", and the entity attribute corresponding to the keyword "apple" in the knowledge graph model is determined. The collection includes "apples (fruits of the genus Rosaceae), apples (apple), apples (products of Apple), and apples (person's name)", and the obtained set of attributes of the entity is used as the relevant knowledge of the user's voice control command this time Representation of facts. It should be noted that, in this embodiment, it can be obtained that when the user obtains the intention, multiple return results appear at this time, and the phenomenon of "one word with multiple meanings" and "one meaning with multiple words" appears. If you want to determine the user's accuracy If it is intended, step 103 is further executed.
步骤103:根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图。Step 103: Determine the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
在一个实施例中,首先通过语音识别信息的上下文信息以及应用设备信息确定结构化的查询语句对应的场景,其中,结构化的查询语句对应的场景下可能包含多个不同意图;若想在不同的意图中确定用户的目的意图,可根据不同 意图的权重进行目的意图的判断。其中,获取结构化的查询语句在对应场景下意图的权重的实施方式,在一些实施方式中,通过大数据挖掘分析后对结构化的查询语句在对应场景下不同意图的权重进行标记得到;或者,在一些实施方式中,通过深度学习方法训练得到。In one embodiment, the context information of the voice recognition information and the application device information are used to determine the scene corresponding to the structured query sentence. The scene corresponding to the structured query sentence may contain multiple different intentions; To determine the user’s intention in the intention, the purpose and intention can be judged according to the weight of different intentions. Wherein, the method for obtaining the weight of the intention of the structured query sentence in the corresponding scene, in some embodiments, after big data mining and analysis, the weight of the structured query sentence in the corresponding scene is marked with different intentions; or , In some embodiments, it is obtained by deep learning method training.
在一些实施方式中,根据获取到的不同意图的权重,确定按照权重排序后得到的相关性最高的实体属性,其中,所述实体属性包含在所述存储的相关知识事实中。例如,参阅表1,为当用户发出的语音控制命令为“我想买苹果”,其中,该语音控制命令有不同意图的权重排序,如:In some embodiments, according to the obtained weights of different intentions, the entity attributes with the highest correlation obtained after sorting according to the weights are determined, wherein the entity attributes are included in the stored related knowledge facts. For example, refer to Table 1, when the voice control command issued by the user is "I want to buy an apple", where the voice control command has different intent weights, such as:
表1Table 1
场景Scenes 实体属性Entity attributes 权重值Weights
苹果apple 蔷薇科苹果属果实Rosaceae Malus fruit A1A1
苹果apple 苹果公司Apple A2A2
苹果apple 苹果公司产品Apple products A3A3
苹果apple 人名Person's name A4A4
其中,如表1所示,在一个实施例中,A1的权重值大于A2、A3及A4的值,则可得到本次排序后得到的相关性最高的实体属性为“蔷薇科苹果属果实”,即表明用户的目的意图是想购买吃的“苹果”。Among them, as shown in Table 1, in one embodiment, the weight value of A1 is greater than the values of A2, A3, and A4, and the entity attribute with the highest correlation after this sorting can be obtained as "fruit of Rosaceae Apple". , Which means that the user’s intention is to buy the "apple" for food.
在另一个实施例中,基于深度学习的实体属性排序方法,确定根据所述排序方法得到的相关性最高的实体属性;返回所述实体属性确定为所述语音识别信息的目的意图。例如,在一些实施方式中,采用根据一种基于CNN(Convolutional Neural Network,一种深度学习算法)的实体属性排序方法,该CNN通过深度学习方法训练神经网络的确定场景后的不同意图的权重以及问题序列与实体属性序列的词向量来进行排序。从而根据排序的记过得到相关性最高的实体属性,并将该实体属性作为语音识别信息的目的意图。In another embodiment, an entity attribute ranking method based on deep learning is used to determine the entity attribute with the highest relevance obtained according to the ranking method; returning the entity attribute to determine the purpose of the speech recognition information. For example, in some embodiments, an entity attribute ranking method based on CNN (Convolutional Neural Network, a deep learning algorithm) is adopted. The CNN trains a neural network through a deep learning method to determine the weights of different intentions of the scene and The question sequence and the word vector of the entity attribute sequence are sorted. Thus, the entity attribute with the highest correlation is obtained according to the sorted records, and the entity attribute is used as the purpose of speech recognition information.
通过本公开提供的语音信息的处理方法,通过知识图谱服务器与机器学习 等技术的结合,解决了在不同场景下不同意图存在的关于语义意图歧义的问题,可实现精确识别理解用户得意图,从而提高了用户参与体验,解决了语音输入时出现“一词多义”、“一义多词”现象。The voice information processing method provided by the present disclosure, through the combination of the knowledge graph server and machine learning and other technologies, solves the problem of semantic intention ambiguity with different intentions in different scenarios, and can accurately recognize and understand the user's intention, thereby It improves the user's participation experience, and solves the phenomenon of "multiple meanings in one word" and "multiple words in one meaning" during voice input.
参阅图2,为本公开实施例提供的一种语音信息的处理方法的具体实施流程图,对本公开的具体实施方式进一步说明,包括:Referring to FIG. 2, a specific implementation flowchart of a voice information processing method provided by an embodiment of the present disclosure is further described for specific implementation manners of the present disclosure, including:
步骤201:语音识别模块接收用户发出的语音控制命令。Step 201: The voice recognition module receives a voice control command issued by the user.
步骤202:将根据语音控制命令接收的用户语音信息上传到云服务平台。Step 202: Upload the user voice information received according to the voice control command to the cloud service platform.
其中,云服务平台在接收到用户语音信息后会进行初步的解析识别处理得到的语音识别信息。Among them, the cloud service platform will perform preliminary analysis and recognition of the voice recognition information obtained after receiving the user's voice information.
步骤203:云服务系统将语音识别信息发送给知识图谱服务器进行用户目的意图的识别。Step 203: The cloud service system sends the voice recognition information to the knowledge graph server to identify the user's purpose and intention.
在一个实施例中,知识图谱服务器接收到语音识别信息后,为使该语音识别信息在知识图谱服务器中进行目的意图的确定,首先会将该非结构化语音识别信息转化为结构化的查询语句,并进一步执行以下处理,包括:In one embodiment, after the knowledge graph server receives the speech recognition information, in order to determine the purpose of the speech recognition information in the knowledge graph server, the unstructured speech recognition information is first converted into a structured query sentence , And further perform the following processing, including:
步骤B1:对半结构化语音识别信息数据和非结构化语音识别信息数据进行知识提取。Step B1: Perform knowledge extraction on semi-structured speech recognition information data and unstructured speech recognition information data.
其中,通过知识提取可得到用户识别信息中的关键词。例如,通过用户发出的语音控制命令为“我想买苹果”,则提取出该用户语音信息中的关键词为“苹果”。Among them, the keywords in the user identification information can be obtained through knowledge extraction. For example, if the voice control command issued by the user is "I want to buy an apple", the keyword in the user's voice message is extracted as "apple".
步骤B2:将结构化语音识别信息数据和第三方数据库数据进行数据整合。Step B2: Data integration of structured speech recognition information data and third-party database data.
其中,将多个语音非结构化语音识别信息的结构化的查询语句、并结合其他的用于构建知识图谱的信息素材,并结合数据库中的具体数据信息进行知识融合。通过知识融合后,可得到不同关键词在不同场景下不同意图的实体属性。Among them, the structured query sentences of multiple speech unstructured speech recognition information are combined with other information materials used to construct the knowledge graph, and the specific data information in the database is combined for knowledge fusion. After knowledge fusion, the entity attributes of different keywords with different intentions in different scenarios can be obtained.
需要说明的是,步骤B1和步骤B2不限定执行顺序。It should be noted that step B1 and step B2 do not limit the execution order.
步骤B3:将知识提取得到的关键词和数据整合后的信息结合后得到相关知识事实的表示。Step B3: Combine the keywords obtained from the knowledge extraction with the information after data integration to obtain a representation of the relevant knowledge facts.
其中,知识提取后可确定关键词,数据整合信息中包含关键词对应的实体属性,因此,两者结合便可得到对应该关键词的相关知识事实的标识。相关知识事实包含关键词的所有实体属性。Among them, keywords can be determined after knowledge extraction, and the data integration information contains the entity attributes corresponding to the keywords. Therefore, the combination of the two can obtain the identification of the relevant knowledge facts corresponding to the keywords. The relevant knowledge facts include all the entity attributes of the keywords.
步骤B4:通过得到的相关知识事实进行目的意图推理。Step B4: Perform purpose intention reasoning based on the obtained relevant knowledge facts.
其中,进行目的意图推理可选的,首先确定所述结构化的查询语句对应的场景,其中,所述场景下包含多个不同意图;所述对应的场景是通过语音识别信息的上下文信息以及应用设备信息确定的。在一个实施例中,根据所述不同意图的权重,确定按照权重排序后得到的相关性最高的实体属性,并返回所述实体属性确定为所述语音识别信息的目的意图。在另一个实施例中,基于深度学习的实体属性排序方法,确定根据所述排序方法得到的相关性最高的实体属性;并返回所述实体属性确定为所述语音识别信息的目的意图。Wherein, the purpose intention reasoning is optional. First, determine the scene corresponding to the structured query sentence, where the scene contains a plurality of different intentions; the corresponding scene is the context information and application of the speech recognition information. Equipment information is determined. In one embodiment, according to the weights of the different intentions, the entity attribute with the highest correlation obtained after sorting according to the weight is determined, and the entity attribute is returned to determine the purpose intention of the speech recognition information. In another embodiment, the entity attribute ranking method based on deep learning determines the entity attribute with the highest correlation obtained according to the ranking method; and returns the entity attribute determined as the purpose of the speech recognition information.
步骤B5:通过质量校验平台对获取的相关知识事实进行校验评估过滤。Step B5: Verify, evaluate and filter the acquired relevant knowledge facts through the quality verification platform.
在一个实施例中,通过该质量校验平台的引入,可将得到的相关知识事实进行校验评估,将不符合规范、要求的先关知识事实进行过滤。从而可提高最终得到的目的意图的准确性。In one embodiment, through the introduction of the quality verification platform, the obtained relevant knowledge facts can be verified and evaluated, and the prior knowledge facts that do not meet the specifications and requirements can be filtered. Thereby, the accuracy of the final goal intention can be improved.
步骤B6:对得到的相关知识事实进行及时的知识更新。Step B6: Perform timely knowledge updates on the relevant knowledge facts obtained.
在一个实施例中,为保障得到的用户目的意图的准确性,对于得到的相关知识事实的表示进行及时的更新,同样有助于提高最终得到的目的意图的准确性。In one embodiment, in order to ensure the accuracy of the obtained user's purpose intention, timely updating the obtained representation of the relevant knowledge facts also helps to improve the accuracy of the finally obtained purpose intention.
其中,在一个实施例中步骤B5和步骤B6是可选的,仅用于提高最终得到的目的意图的准确性,并不作为对具体实施时进行限定。Wherein, in an embodiment, step B5 and step B6 are optional, and are only used to improve the accuracy of the final purpose intention obtained, and are not intended to limit the specific implementation.
基于相同的构思,参阅图3为本公开实施例提供的一种语音信息的处理装置,该装置包括:转化模块301、提取模块302和确定模块303。Based on the same concept, referring to FIG. 3 is a voice information processing device provided by an embodiment of the present disclosure. The device includes: a conversion module 301, an extraction module 302, and a determination module 303.
转化模块301,被设置为执行将接收到的非结构化语音识别信息转化为结构化的查询语句,其中,所述非结构化语音识别信息是通过语音识别得到的;The conversion module 301 is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, where the unstructured speech recognition information is obtained through speech recognition;
提取模块302,被设置为执行通过构建的知识图谱模型提取所述语音识别 信息的相关知识事实,并将所述相关知识事实进行存储;The extraction module 302 is configured to extract relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;
确定模块303,被设置为执行根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图。The determining module 303 is configured to determine the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
在一些实施方式中,所述确定模块303,被设置为执行:In some embodiments, the determining module 303 is configured to execute:
确定所述结构化的查询语句对应的场景,其中,所述场景下包含多个不同意图;所述对应的场景是通过语音识别信息的上下文信息以及应用设备信息确定的;Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;
根据所述不同意图的权重,确定按照权重排序后得到的相关性最高的实体属性,其中,所述实体属性包含在所述存储的相关知识事实中;According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
在一些实施方式中,结构化的查询语句在对应场景下不同意图的权重是通过大数据挖掘分析后对各关键词在不同场景下不同意图的权重进行标记得到的;或In some embodiments, the weights of different intents of the structured query sentence in the corresponding scenarios are obtained by labeling the weights of different intents of each keyword in different scenarios through big data mining analysis; or
结构化的查询语句在对应场景下不同意图的权重是通过深度学习装置训练得到的。The weights of structured query sentences with different intentions in the corresponding scenarios are obtained through deep learning device training.
在一些实施方式中,所述确定模块303,被设置为执行:In some embodiments, the determining module 303 is configured to execute:
基于深度学习的实体属性排序装置,确定根据所述排序装置得到的相关性最高的实体属性;An entity attribute sorting device based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting device;
返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
在一些实施方式中,所述提取模块302,被设置为执行:In some embodiments, the extraction module 302 is configured to execute:
将多个语音非结构化语音识别信息的结构化的查询语句、并结合其他的用于构建知识图谱的信息素材,并结合数据库中的具体数据信息进行知识融合,其中所述数据库包括所述语音识别信息使用的数据库;Combine multiple structured query sentences of unstructured voice recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the voice Identify the database used for the information;
提取语音识别信息中的关键词;Extract keywords in speech recognition information;
确定所述关键词在知识图谱模型中对应的实体属性的集合,并将所述实体属性的集合作为所述语音识别信息的相关知识事实的表示。The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
在一些实施方式中,所述相关知识事实包括经过所述知识图谱模型提炼过的知识和所述语音识别信息的具体数据信息。In some embodiments, the relevant knowledge facts include knowledge refined through the knowledge graph model and specific data information of the speech recognition information.
在一些实施方式中,所述提取模块302,被设置为执行:In some embodiments, the extraction module 302 is configured to execute:
将所述经过知识图谱模型提炼过的知识存储到模式层;Storing the knowledge refined by the knowledge graph model in the pattern layer;
将所述语音识别信息的具体数据信息存储到数据层。The specific data information of the voice recognition information is stored in the data layer.
在介绍了本公开示例性实施方式中的语音信息的处理方法和装置之后,接下来,介绍本公开的另一示例性实施方式的智能终端。After introducing the voice information processing method and device in the exemplary embodiment of the present disclosure, next, a smart terminal of another exemplary embodiment of the present disclosure is introduced.
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, a method, or a program product. Therefore, various aspects of the present disclosure can be specifically implemented in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software, which may be collectively referred to herein as "Circuit", "Module" or "System".
在一些可能的实施方式中,根据本公开的智能终端可以至少包括至少一个处理器、以及至少一个存储器。其中,存储器存储有计算机程序,当计算机程序被处理器执行时,使得处理器执行本说明书上述描述的根据本公开各种示例性实施方式的语音信息的处理方法中的步骤。例如,处理器可以执行如图1中所示的步骤101-步骤103。In some possible implementation manners, the smart terminal according to the present disclosure may at least include at least one processor and at least one memory. The memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps in the voice information processing method according to various exemplary embodiments of the present disclosure described above in this specification. For example, the processor may execute step 101 to step 103 as shown in FIG. 1.
下面参照图4来描述根据本公开的这种实施方式的智能终端40。图4显示的智能终端40仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。The smart terminal 40 according to this embodiment of the present disclosure will be described below with reference to FIG. 4. The smart terminal 40 shown in FIG. 4 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
如图4所示,智能终端40以通用智能终端的形式表现。智能终端40的组件可以包括但不限于:上述至少一个处理器41、上述至少一个存储器42、连接不同系统组件(包括存储器42和处理器41)的总线43。As shown in FIG. 4, the smart terminal 40 is represented in the form of a general smart terminal. The components of the smart terminal 40 may include, but are not limited to: the aforementioned at least one processor 41, the aforementioned at least one memory 42, and a bus 43 connecting different system components (including the memory 42 and the processor 41).
总线43表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器、外围总线、处理器或者使用多种总线结构中的任意总线结构的局域总线。The bus 43 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a processor, or a local bus using any bus structure among multiple bus structures.
存储器42可以包括易失性存储器形式的可读介质,例如随机存取存储器 (RAM)421和/或高速缓存存储器422,还可以进一步包括只读存储器(ROM)423。The memory 42 may include a readable medium in the form of a volatile memory, such as a random access memory (RAM) 421 and/or a cache memory 422, and may further include a read-only memory (ROM) 423.
存储器42还可以包括具有一组(至少一个)程序模块424的程序/实用工具425,这样的程序模块424包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The memory 42 may also include a program/utility tool 425 having a set of (at least one) program modules 424. Such program modules 424 include but are not limited to: an operating system, one or more application programs, other program modules, and program data. Each of the examples or some combination may include the realization of a network environment.
智能终端40也可以与一个或多个外部设备44(例如键盘、指向设备等)通信,和/或与使得该智能终端40能与一个或多个其它智能终端进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口45进行。并且,智能终端40还可以通过网络适配器46与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器46通过总线43与用于智能终端40的其它模块通信。应当理解,尽管图中未示出,可以结合智能终端40使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The smart terminal 40 can also communicate with one or more external devices 44 (such as keyboards, pointing devices, etc.), and/or with any device that enables the smart terminal 40 to communicate with one or more other smart terminals (such as routers, Modem, etc.) communication. This communication can be performed through an input/output (I/O) interface 45. In addition, the smart terminal 40 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 46. As shown in the figure, the network adapter 46 communicates with other modules for the smart terminal 40 through the bus 43. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the smart terminal 40, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
在一些可能的实施方式中,本公开提供的智能终端的控制方法的各个方面还可以实现为一种程序产品的形式,其包括计算机程序,当程序产品在计算机设备上运行时,计算机程序被设置为使计算机设备执行本说明书上述描述的根据本公开各种示例性实施方式的语音信息的处理方法中的步骤,例如,计算机设备可以执行如图1中所示的步骤101-步骤103。In some possible implementation manners, various aspects of the smart terminal control method provided in the present disclosure can also be implemented in the form of a program product, which includes a computer program. When the program product runs on a computer device, the computer program is set. In order for the computer device to execute the steps in the voice information processing method according to various exemplary embodiments of the present disclosure described above in this specification, for example, the computer device may execute step 101 to step 103 as shown in FIG. 1.
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器 (CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product can adopt any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
本公开的实施方式的用于解决场景语义意图歧义的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括计算机程序,并可以在智能终端上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The program product for solving the ambiguity of the semantic intention of the scene of the embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include a computer program, and may be run on a smart terminal. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读计算机程序。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, in which a readable computer program is carried. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
可读介质上包含的计算机程序可以用任何适当的介质传输,包括——但不限于——无线、有线、光缆、RF等等,或者上述的任意合适的组合。The computer program contained on the readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wired, optical cable, RF, etc., or any suitable combination of the above.
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的计算机程序,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。计算机程序可以完全地在目标对象智能终端上执行、部分地在目标对象设备上执行、作为一个独立的软件包执行、部分在目标对象智能终端上部分在远程智能终端上执行、或者完全在远程智能终端或服务器上执行。在涉及远程智能终端的情形中,远程智能终端可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到目标对象智能终端,或者,可以连接到外部智能终端(例如利用因特网服务提供商来通过因特网连接)。The computer program for performing the operations of the present disclosure can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language. The computer program can be executed entirely on the target target's intelligent terminal, partly executed on the target target device, executed as an independent software package, partly executed on the target target's intelligent terminal and partly executed on the remote intelligent terminal, or entirely on the remote intelligent terminal. Execute on the terminal or server. In the case of a remote smart terminal, the remote smart terminal can be connected to the target smart terminal through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external smart terminal (for example, using the Internet) The service provider comes to connect via the Internet).
应当注意,尽管在上文详细描述中提及了装置的若干单元或子单元,但是这种划分仅仅是示例性的并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多单元的特征和功能可以在一个单元中具体化。反之,上文描述的一个单元的特征和功能可以进一步划分为由多个单元来具体化。It should be noted that although several units or subunits of the device are mentioned in the above detailed description, this division is only exemplary and not mandatory. In fact, according to the embodiments of the present disclosure, the features and functions of two or more units described above may be embodied in one unit. Conversely, the features and functions of one unit described above can be further divided into multiple units to be embodied.
此外,尽管在附图中以特定顺序描述了本公开方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。In addition, although the operations of the method of the present disclosure are described in a specific order in the drawings, this does not require or imply that these operations must be performed in the specific order, or that all the operations shown must be performed to achieve the desired result. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用计算机程序的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of computer program products implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable computer programs.
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
尽管已描述了本公开的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要 求意欲解释为包括优选实施例以及落入本公开范围的所有变更和修改。Although the preferred embodiments of the present disclosure have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present disclosure.
显然,本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。这样,倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present disclosure without departing from the spirit and scope of the present disclosure. In this way, if these modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalent technologies, the present disclosure is also intended to include these modifications and variations.

Claims (16)

  1. 一种语音信息的处理方法,所述方法包括:A method for processing voice information, the method comprising:
    将接收到的非结构化语音识别信息转化为结构化的查询语句,其中,所述非结构化语音识别信息是通过语音识别得到的;Converting the received unstructured speech recognition information into structured query sentences, where the unstructured speech recognition information is obtained through speech recognition;
    通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,并将所述相关知识事实进行存储;Extracting relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and storing the relevant knowledge facts;
    根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图。The purpose and intention of the voice recognition information is determined according to the stored relevant knowledge facts and the structured query sentence.
  2. 根据权利要求1所述的方法,其中,所述根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图,包括:The method according to claim 1, wherein the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence comprises:
    确定所述结构化的查询语句对应的场景,其中,所述场景下包含多个不同意图;所述对应的场景是通过语音识别信息的上下文信息以及应用设备信息确定的;Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;
    根据所述不同意图的权重,确定按照权重排序后得到的相关性最高的实体属性,其中,所述实体属性包含在所述存储的相关知识事实中;According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
    返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
  3. 根据权利要求2所述的方法,其中,结构化的查询语句在对应场景下不同意图的权重是通过大数据挖掘分析后对各关键词在不同场景下不同意图的权重进行标记得到的;或The method according to claim 2, wherein the weights of different intents of the structured query sentences in the corresponding scenarios are obtained by marking the weights of the different intents of each keyword in different scenarios through big data mining analysis; or
    结构化的查询语句在对应场景下不同意图的权重是通过深度学习方法训练得到的。The weights of structured query sentences for different intentions in the corresponding scenarios are obtained through deep learning method training.
  4. 根据权利要求1所述的方法,其中,所述根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图,包括:The method according to claim 1, wherein the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence comprises:
    基于深度学习的实体属性排序方法,确定根据所述排序方法得到的相关性最高的实体属性;An entity attribute sorting method based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting method;
    返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
  5. 根据权利要求1所述的方法,其中,所述通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,包括:The method according to claim 1, wherein said extracting relevant knowledge facts of said speech recognition information through the constructed knowledge graph model comprises:
    将多个语音非结构化语音识别信息的结构化的查询语句、并结合其他的用于构建知识图谱的信息素材,并结合数据库中的具体数据信息进行知识融合,其中所述数据库包括所述语音识别信息使用的数据库;Combine multiple structured query sentences of unstructured voice recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the voice Identify the database used for the information;
    提取语音识别信息中的关键词;Extract keywords in speech recognition information;
    确定所述关键词在知识图谱模型中对应的实体属性的集合,并将所述实体属性的集合作为所述语音识别信息的相关知识事实的表示。The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
  6. 根据权利要求1所述的方法,其中,所述相关知识事实包括经过所述知识图谱模型提炼过的知识和所述语音识别信息的具体数据信息。The method according to claim 1, wherein the relevant knowledge facts include knowledge refined by the knowledge graph model and specific data information of the speech recognition information.
  7. 根据权利要求6所述的方法,其中,所述将所述相关知识事实进行存储,包括:The method according to claim 6, wherein the storing the relevant knowledge facts comprises:
    将所述经过知识图谱模型提炼过的知识存储到模式层;Storing the knowledge refined by the knowledge graph model in the pattern layer;
    将所述语音识别信息的具体数据信息存储到数据层。The specific data information of the voice recognition information is stored in the data layer.
  8. 一种语音信息的处理装置,所述装置包括:A device for processing voice information, the device comprising:
    转化模块,被设置为执行将接收到的非结构化语音识别信息转化为结构化的查询语句,其中,所述非结构化语音识别信息是通过语音识别得到的;The conversion module is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, wherein the unstructured speech recognition information is obtained through speech recognition;
    提取模块,被设置为执行通过构建的知识图谱模型提取所述语音识别信息的相关知识事实,并将所述相关知识事实进行存储;The extraction module is configured to perform the extraction of relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;
    确定模块,被设置为执行根据存储的相关知识事实并结合所述结构化的查询语句确定所述语音识别信息的目的意图。The determining module is configured to execute the purpose of determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
  9. 根据权利要求8所述的装置,其中,所述确定模块,被设置为执行:The device according to claim 8, wherein the determining module is configured to execute:
    确定所述结构化的查询语句对应的场景,其中,所述场景下包含多个不同意图;所述对应的场景是通过语音识别信息的上下文信息以及应用设备信息确定的;Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;
    根据所述不同意图的权重,确定按照权重排序后得到的相关性最高的实体 属性,其中,所述实体属性包含在所述存储的相关知识事实中;According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;
    返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
  10. 根据权利要求9所述的装置,其中,结构化的查询语句在对应场景下不同意图的权重是通过大数据挖掘分析后对各关键词在不同场景下不同意图的权重进行标记得到的;或The device according to claim 9, wherein the weights of different intentions of the structured query sentences in the corresponding scenes are obtained by labeling the weights of the different intentions of the keywords in different scenes after big data mining analysis; or
    结构化的查询语句在对应场景下不同意图的权重是通过深度学习装置训练得到的。The weights of structured query sentences with different intentions in the corresponding scenarios are obtained through deep learning device training.
  11. 根据权利要求8所述的装置,其中,所述确定模块,被设置为执行:The device according to claim 8, wherein the determining module is configured to execute:
    基于深度学习的实体属性排序装置,确定根据所述排序装置得到的相关性最高的实体属性;An entity attribute sorting device based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting device;
    返回所述实体属性确定为所述语音识别信息的目的意图。Returning the entity attribute is determined as the purpose of the voice recognition information.
  12. 根据权利要求8所述的装置,其中,所述提取模块,被设置为执行:The device according to claim 8, wherein the extraction module is configured to execute:
    将多个语音非结构化语音识别信息的结构化的查询语句、并结合其他的用于构建知识图谱的信息素材,并结合数据库中的具体数据信息进行知识融合,其中所述数据库包括所述语音识别信息使用的数据库;Combine multiple structured query sentences of unstructured voice recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the voice Identify the database used for the information;
    提取语音识别信息中的关键词;Extract keywords in speech recognition information;
    确定所述关键词在知识图谱模型中对应的实体属性的集合,并将所述实体属性的集合作为所述语音识别信息的相关知识事实的表示。The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
  13. 根据权利要求8所述的装置,其中,所述相关知识事实包括经过所述知识图谱模型提炼过的知识和所述语音识别信息的具体数据信息。8. The device according to claim 8, wherein the relevant knowledge facts include knowledge refined by the knowledge graph model and specific data information of the voice recognition information.
  14. 根据权利要求13所述的装置,其中,所述提取模块,被设置为执行:The device according to claim 13, wherein the extraction module is configured to execute:
    将所述经过知识图谱模型提炼过的知识存储到模式层;Storing the knowledge refined by the knowledge graph model in the pattern layer;
    将所述语音识别信息的具体数据信息存储到数据层。The specific data information of the voice recognition information is stored in the data layer.
  15. 一种智能终端,包括:存储器以及处理器;An intelligent terminal, including: a memory and a processor;
    存储器,被设置为存储程序指令;The memory is set to store program instructions;
    处理器,被设置为调用所述存储器中存储的程序指令,按照获得的程序执 行权利要求1-7任一项所述的方法。The processor is configured to call the program instructions stored in the memory, and execute the method according to any one of claims 1-7 according to the obtained program.
  16. 一种计算机存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如权利要求1-7中任一权利要求所述的方法。A computer storage medium storing computer executable instructions for executing the method according to any one of claims 1-7.
PCT/CN2020/112928 2019-10-18 2020-09-02 Speech information processing method and apparatus, and intelligent terminal and storage medium WO2021073298A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910994726.4 2019-10-18
CN201910994726.4A CN110795532A (en) 2019-10-18 2019-10-18 Voice information processing method and device, intelligent terminal and storage medium

Publications (1)

Publication Number Publication Date
WO2021073298A1 true WO2021073298A1 (en) 2021-04-22

Family

ID=69439350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112928 WO2021073298A1 (en) 2019-10-18 2020-09-02 Speech information processing method and apparatus, and intelligent terminal and storage medium

Country Status (2)

Country Link
CN (1) CN110795532A (en)
WO (1) WO2021073298A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114138930A (en) * 2021-10-23 2022-03-04 西安电子科技大学 Intention characterization system and method based on knowledge graph
CN114898751A (en) * 2022-06-15 2022-08-12 中国电信股份有限公司 Automatic configuration method and system, storage medium and electronic equipment
CN115827848A (en) * 2023-02-10 2023-03-21 天翼云科技有限公司 Method, device, equipment and storage medium for extracting knowledge graph events

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795532A (en) * 2019-10-18 2020-02-14 珠海格力电器股份有限公司 Voice information processing method and device, intelligent terminal and storage medium
CN111858966B (en) * 2020-08-05 2021-12-31 龙马智芯(珠海横琴)科技有限公司 Knowledge graph updating method and device, terminal equipment and readable storage medium
CN112086155A (en) * 2020-09-11 2020-12-15 北京欧应信息技术有限公司 Diagnosis and treatment information structured collection method based on voice input
CN115242569B (en) * 2021-04-23 2023-12-05 海信集团控股股份有限公司 Man-machine interaction method and server in intelligent home
CN113420124B (en) * 2021-06-25 2024-03-22 上海适享文化传播有限公司 Method for resolving conflict under multiple conditions of voice retrieval
CN113641797A (en) * 2021-08-30 2021-11-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment, storage medium and computer program product
CN113761927B (en) * 2021-08-31 2024-02-06 国网冀北电力有限公司 Power grid fault handling real-time auxiliary decision-making method, system, equipment and storage medium
CN114328955A (en) * 2021-12-17 2022-04-12 南京沃科电子科技有限公司 Automobile electronic knowledge map control system
CN115453897A (en) * 2022-08-18 2022-12-09 青岛海尔科技有限公司 Method and device for determining intention instruction, storage medium and electronic device
CN115356939A (en) * 2022-08-18 2022-11-18 青岛海尔科技有限公司 Control command transmission method, control device, storage medium, and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164216A1 (en) * 2007-12-21 2009-06-25 General Motors Corporation In-vehicle circumstantial speech recognition
CN102880649A (en) * 2012-08-27 2013-01-16 北京搜狗信息服务有限公司 Individualized information processing method and system
CN103106287A (en) * 2013-03-06 2013-05-15 深圳市宜搜科技发展有限公司 Processing method and processing system for retrieving sentences by user
CN103268348A (en) * 2013-05-28 2013-08-28 中国科学院计算技术研究所 Method for identifying user query intention
CN109657229A (en) * 2018-10-31 2019-04-19 北京奇艺世纪科技有限公司 A kind of intention assessment model generating method, intension recognizing method and device
CN110263160A (en) * 2019-05-29 2019-09-20 中国电子科技集团公司第二十八研究所 A kind of Question Classification method in computer question answering system
CN110795532A (en) * 2019-10-18 2020-02-14 珠海格力电器股份有限公司 Voice information processing method and device, intelligent terminal and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9009046B1 (en) * 2005-09-27 2015-04-14 At&T Intellectual Property Ii, L.P. System and method for disambiguating multiple intents in a natural language dialog system
US9465833B2 (en) * 2012-07-31 2016-10-11 Veveo, Inc. Disambiguating user intent in conversational interaction system for large corpus information retrieval
CN105070288B (en) * 2015-07-02 2018-08-07 百度在线网络技术(北京)有限公司 Vehicle-mounted voice instruction identification method and device
CN107589828A (en) * 2016-07-07 2018-01-16 深圳狗尾草智能科技有限公司 The man-machine interaction method and system of knowledge based collection of illustrative plates
WO2019011356A1 (en) * 2017-07-14 2019-01-17 Cognigy Gmbh Method for conducting dialog between human and computer
CN108428447B (en) * 2018-06-19 2021-02-02 科大讯飞股份有限公司 Voice intention recognition method and device
CN109492126B (en) * 2018-11-02 2022-03-01 廊坊市森淼春食用菌有限公司 Intelligent interaction method and device
CN109635117B (en) * 2018-12-26 2021-05-14 零犀(北京)科技有限公司 Method and device for recognizing user intention based on knowledge graph
CN109785833A (en) * 2019-01-02 2019-05-21 苏宁易购集团股份有限公司 Human-computer interaction audio recognition method and system for smart machine
CN109918673B (en) * 2019-03-14 2021-08-03 湖北亿咖通科技有限公司 Semantic arbitration method and device, electronic equipment and computer-readable storage medium
CN110334201B (en) * 2019-07-18 2021-09-21 中国工商银行股份有限公司 Intention identification method, device and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164216A1 (en) * 2007-12-21 2009-06-25 General Motors Corporation In-vehicle circumstantial speech recognition
CN102880649A (en) * 2012-08-27 2013-01-16 北京搜狗信息服务有限公司 Individualized information processing method and system
CN103106287A (en) * 2013-03-06 2013-05-15 深圳市宜搜科技发展有限公司 Processing method and processing system for retrieving sentences by user
CN103268348A (en) * 2013-05-28 2013-08-28 中国科学院计算技术研究所 Method for identifying user query intention
CN109657229A (en) * 2018-10-31 2019-04-19 北京奇艺世纪科技有限公司 A kind of intention assessment model generating method, intension recognizing method and device
CN110263160A (en) * 2019-05-29 2019-09-20 中国电子科技集团公司第二十八研究所 A kind of Question Classification method in computer question answering system
CN110795532A (en) * 2019-10-18 2020-02-14 珠海格力电器股份有限公司 Voice information processing method and device, intelligent terminal and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114138930A (en) * 2021-10-23 2022-03-04 西安电子科技大学 Intention characterization system and method based on knowledge graph
CN114138930B (en) * 2021-10-23 2024-02-02 西安电子科技大学 Intent characterization system and method based on knowledge graph
CN114898751A (en) * 2022-06-15 2022-08-12 中国电信股份有限公司 Automatic configuration method and system, storage medium and electronic equipment
CN114898751B (en) * 2022-06-15 2024-04-23 中国电信股份有限公司 Automatic configuration method and system, storage medium and electronic equipment
CN115827848A (en) * 2023-02-10 2023-03-21 天翼云科技有限公司 Method, device, equipment and storage medium for extracting knowledge graph events

Also Published As

Publication number Publication date
CN110795532A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
WO2021073298A1 (en) Speech information processing method and apparatus, and intelligent terminal and storage medium
US11164568B2 (en) Speech recognition method and apparatus, and storage medium
US10937413B2 (en) Techniques for model training for voice features
EP3832519A1 (en) Method and apparatus for evaluating translation quality
CN111090727B (en) Language conversion processing method and device and dialect voice interaction system
CN110083693B (en) Robot dialogue reply method and device
CN110930980B (en) Acoustic recognition method and system for Chinese and English mixed voice
WO2021218029A1 (en) Artificial intelligence-based interview method and apparatus, computer device, and storage medium
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
WO2020253064A1 (en) Speech recognition method and apparatus, and computer device and storage medium
CN107491436A (en) A kind of recognition methods of title party and device, server, storage medium
CN108664599B (en) Intelligent question-answering method and device, intelligent question-answering server and storage medium
US10854189B2 (en) Techniques for model training for voice features
JP2021081712A (en) Method, device, electronic apparatus, and computer readable storage media for voice interaction
WO2023130951A1 (en) Speech sentence segmentation method and apparatus, electronic device, and storage medium
JP2021081713A (en) Method, device, apparatus, and media for processing voice signal
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN112183051A (en) Intelligent voice follow-up method, system, computer equipment, storage medium and program product
WO2021239078A1 (en) Field recognition method, interaction method, electronic device, and storage medium
JP2015001695A (en) Voice recognition device, and voice recognition method and program
CN109918502A (en) Document explains method, apparatus, computer installation and computer readable storage medium
CN115114453A (en) Intelligent customer service implementation method and device based on knowledge graph
WO2021082570A1 (en) Artificial intelligence-based semantic identification method, device, and semantic identification apparatus
CN113553415A (en) Question and answer matching method and device and electronic equipment
CN112925889A (en) Natural language processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20876796

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20876796

Country of ref document: EP

Kind code of ref document: A1