WO2021073298A1

WO2021073298A1 - Speech information processing method and apparatus, and intelligent terminal and storage medium

Info

Publication number: WO2021073298A1
Application number: PCT/CN2020/112928
Authority: WO
Inventors: 胡广绪; 宋德超; 贾巨涛; 吴伟; 赵鹏辉
Original assignee: 珠海格力电器股份有限公司; 珠海联云科技有限公司
Priority date: 2019-10-18
Filing date: 2020-09-02
Publication date: 2021-04-22
Also published as: CN110795532A

Abstract

Disclosed are a speech information processing method and apparatus, and an intelligent terminal and a storage medium, which relate to the technical field of machine learning. The method comprises: converting received unstructured speech identification information into a structured inquiry statement, wherein the unstructured speech identification information is obtained by means of speech identification; extracting a related knowledge fact of the speech identification information by means of a constructed knowledge graph model, and storing the related knowledge fact; determining, according to the stored related knowledge fact and in combination with the structured inquiry statement, an intention of the speech identification information. By means of the method, the problem that the intention of a user cannot be accurately identified since questions mentioned by the user cannot be really understood due to the phenomena of "polysemy" and "one meaning to multiple words" caused by a plurality of return results often appearing when a user acquires data in the related technology can be solved.

Description

Method, device, intelligent terminal and storage medium for processing voice information

This disclosure claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 18, 2019, the application number is 201910994726.4, and the invention title is "a voice information processing method, device, smart terminal, and storage medium", all of which The content is incorporated into this disclosure by reference.

Technical field

The present disclosure relates to the field of machine learning, and in particular to a method, device, smart terminal, and storage medium for processing voice information.

Background technique

With the rapid development of the Internet of Things, voice interaction mechanisms are simple, fast, and interactive. More and more devices adopt voice interaction, and they have gradually become the first choice for people. However, the inventor found that a current problem is that when a user obtains data, there will often be multiple return results, and the phenomenon of "multi-sense of one word" and "multi-sense of one word" appear, so that the problem raised by the user cannot be truly understood. Unable to accurately identify and understand user intent.

Summary of the invention

The embodiments of the present disclosure provide a voice information processing method, device, smart terminal, and storage medium to solve the problem that when a user obtains data in related technologies, multiple return results often appear, such as "multiple meanings" and " The phenomenon of "multiple words with one meaning" makes it impossible to truly understand the questions raised by the user, and cannot accurately identify and understand the user's intention.

In the first aspect, embodiments of the present disclosure provide a method for processing voice information, the method including:

Converting the received unstructured speech recognition information into structured query sentences, where the unstructured speech recognition information is obtained through speech recognition;

Extracting relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and storing the relevant knowledge facts;

The purpose and intention of the voice recognition information is determined according to the stored relevant knowledge facts and the structured query sentence.

In some embodiments, the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence includes:

Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;

According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;

Returning the entity attribute is determined as the purpose of the voice recognition information.

In some embodiments, the weights of different intents of the structured query sentence in the corresponding scenarios are obtained by labeling the weights of different intents of each keyword in different scenarios through big data mining analysis; or

The weights of structured query sentences for different intentions in the corresponding scenarios are obtained through deep learning method training.

An entity attribute sorting method based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting method;

In some implementation manners, extracting relevant knowledge facts of the speech recognition information through the constructed knowledge graph model includes:

Combine structured query sentences of multiple speech unstructured speech recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the speech Identify the database used for the information;

Extract keywords in speech recognition information;

The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.

In some embodiments, the relevant knowledge facts include knowledge refined through the knowledge graph model and specific data information of the speech recognition information.

In some embodiments, storing the relevant knowledge facts includes:

Storing the knowledge refined by the knowledge graph model in the pattern layer;

The specific data information of the voice recognition information is stored in the data layer.

In a second aspect, the embodiments of the present disclosure also provide a voice information processing device, the device including:

The conversion module is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, wherein the unstructured speech recognition information is obtained through speech recognition;

The extraction module is configured to perform the extraction of relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;

The determining module is configured to execute the purpose of determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.

In some embodiments, the determining module is configured to execute:

The weights of structured query sentences with different intentions in the corresponding scenarios are obtained through deep learning device training.

In some embodiments, the determining module is configured to execute:

An entity attribute sorting device based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting device;

In some embodiments, the extraction module is configured to execute:

Combine multiple structured query sentences of unstructured voice recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the voice Identify the database used for the information;

Extract keywords in speech recognition information;

In some embodiments, the extraction module is configured to execute:

In a third aspect, the embodiments of the present disclosure also provide a smart terminal, including:

Memory and processor;

The memory is set to store program instructions;

The processor is configured to call the program instructions stored in the memory, according to the method for processing voice information according to any one of the first aspects of the obtained program.

In a fourth aspect, an embodiment of the present disclosure further provides a computer storage medium, wherein the computer storage medium stores computer-executable instructions, and the computer-executable instructions are configured to cause a computer to execute any of the embodiments of the present disclosure. The voice information processing method described in the item.

The method, device, smart terminal, and storage medium for processing voice information provided by the embodiments of the present disclosure first convert the received unstructured voice recognition information into structured query sentences, wherein the unstructured voice recognition The information is obtained through speech recognition; in some embodiments, the relevant knowledge facts of the speech recognition information are extracted through the constructed knowledge graph model, and the relevant knowledge facts are stored; finally, the relevant knowledge facts are combined according to the stored relevant knowledge facts. The purpose of the voice recognition information is determined in combination with the structured query sentence. In this way, the problem of ambiguity between different semantics and different intentions in different scenarios can be solved, so that the problem raised by the user can be truly understood, and the user's intention can be accurately recognized and understood.

Other features and advantages of the present disclosure will be described in the following specification, and partly become obvious from the specification, or understood by implementing the present disclosure. The objectives and other advantages of the present disclosure can be realized and obtained through the structures specifically pointed out in the written description, claims, and drawings.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings that need to be used in the embodiments of the present disclosure. Obviously, the drawings described below are only some embodiments of the present disclosure. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

FIG. 1 is a flowchart of a method for processing voice information according to an embodiment of the disclosure;

2 is a specific implementation flowchart of a method for processing voice information provided by an embodiment of the disclosure;

FIG. 3 is a schematic structural diagram of a voice information processing apparatus provided by an embodiment of the disclosure;

FIG. 4 is a schematic structural diagram of a smart terminal provided by an embodiment of the disclosure.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present disclosure.

In related technologies, with the rapid development of the Internet of Things, voice interaction mechanisms are simple, fast, and interactive. More and more devices adopt voice interaction, and they have gradually become people's first choice. However, the inventor found that a current problem is that when a user obtains data, there will often be multiple return results, and the phenomenon of "multi-sense of one word" and "multi-sense of one word" appear, so that the problem raised by the user cannot be truly understood. Unable to accurately identify and understand user intent.

In view of this, the present disclosure provides a method for processing voice information. The method is based on knowledge graphs and machine learning technologies, which solves the problem of semantic intention ambiguity in different scenarios, realizes accurate recognition and understanding of user intentions, and improves user participation Experience, to solve the phenomenon of "multiple meanings in one word" and "multiple words in one meaning" during voice input. Refer to FIG. 1, which is a flowchart of a method for processing voice information according to an embodiment of the present disclosure, including:

Step 101: Convert the received unstructured speech recognition information into a structured query sentence, where the unstructured speech recognition information is obtained through speech recognition.

Among them, the user can obtain the user's voice information when issuing a voice control command for a voice device such as a smart air conditioner. After the voice device receives the user's voice information, it uploads the user's voice information to the cloud service platform, and the cloud service platform further analyzes and recognizes the user's voice information to obtain the voice recognition information. The speech recognition information is an unstructured text sentence. The user's intention can be obtained from the text sentence, but the speech device cannot understand the user's intention based on the unstructured speech recognition information, so the speech recognition information needs to be sent to the knowledge The map server performs the next step of processing.

In order to enable the voice device to accurately understand the user's purpose and intention, the knowledge graph server will convert the unstructured voice recognition information into structured query sentences after receiving the voice recognition information sent by the cloud service platform. Through the structured query sentence corresponding to the user's voice information, the voice device can analyze the real intention of the user based on the structured query sentence and combined with different scenarios and knowledge graphs.

Step 102: Extract the relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts.

Among them, the knowledge graph server is built with the architecture model of the knowledge graph. The knowledge graph model starts from the most primitive data (including structured query sentence information and unstructured speech recognition information) and adopts a series of automatic or semi-automatic technologies. Means to extract relevant knowledge and facts from the original database and third-party databases. It should be noted that the original database is a database that stores structured query sentence information, semi-structured speech recognition information, and unstructured speech recognition information. The third-party database introduced here refers to the storage of knowledge in a certain professional field. The role of the database is to expand the different scenarios and different intentions corresponding to the speech recognition information, so as to ensure the accuracy of the purpose and intention understanding.

In the above method, the relevant knowledge facts include specific data information of the knowledge refined through the knowledge graph model and the voice recognition information. And the knowledge refined by the knowledge graph model is stored in the core of the knowledge graph model, which is the pattern layer; the specific data information of the speech recognition information is stored in the data layer.

In one embodiment, multiple structured query sentences of unstructured speech recognition information are combined with other information materials used to construct a knowledge graph, and combined with specific data information in the database for knowledge fusion. In addition, in order to understand the user's intention, the knowledge graph server needs to extract keywords in the speech recognition information; and determine the set of entity attributes corresponding to the keywords in the knowledge graph model, and use the set of entity attributes as the Representation of relevant knowledge facts of speech recognition information. For example, if the voice control command sent by the user is "I want to buy an apple", the keyword in the user's voice message is extracted as "apple", and the entity attribute corresponding to the keyword "apple" in the knowledge graph model is determined. The collection includes "apples (fruits of the genus Rosaceae), apples (apple), apples (products of Apple), and apples (person's name)", and the obtained set of attributes of the entity is used as the relevant knowledge of the user's voice control command this time Representation of facts. It should be noted that, in this embodiment, it can be obtained that when the user obtains the intention, multiple return results appear at this time, and the phenomenon of "one word with multiple meanings" and "one meaning with multiple words" appears. If you want to determine the user's accuracy If it is intended, step 103 is further executed.

Step 103: Determine the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.

In one embodiment, the context information of the voice recognition information and the application device information are used to determine the scene corresponding to the structured query sentence. The scene corresponding to the structured query sentence may contain multiple different intentions; To determine the user’s intention in the intention, the purpose and intention can be judged according to the weight of different intentions. Wherein, the method for obtaining the weight of the intention of the structured query sentence in the corresponding scene, in some embodiments, after big data mining and analysis, the weight of the structured query sentence in the corresponding scene is marked with different intentions; or , In some embodiments, it is obtained by deep learning method training.

In some embodiments, according to the obtained weights of different intentions, the entity attributes with the highest correlation obtained after sorting according to the weights are determined, wherein the entity attributes are included in the stored related knowledge facts. For example, refer to Table 1, when the voice control command issued by the user is "I want to buy an apple", where the voice control command has different intent weights, such as:

Table 1

场景Scenes	实体属性Entity attributes	权重值Weights
苹果apple	蔷薇科苹果属果实Rosaceae Malus fruit	A1A1
苹果apple	苹果公司Apple	A2A2
苹果apple	苹果公司产品Apple products	A3A3
苹果apple	人名Person's name	A4A4

Among them, as shown in Table 1, in one embodiment, the weight value of A1 is greater than the values of A2, A3, and A4, and the entity attribute with the highest correlation after this sorting can be obtained as "fruit of Rosaceae Apple". , Which means that the user’s intention is to buy the "apple" for food.

In another embodiment, an entity attribute ranking method based on deep learning is used to determine the entity attribute with the highest relevance obtained according to the ranking method; returning the entity attribute to determine the purpose of the speech recognition information. For example, in some embodiments, an entity attribute ranking method based on CNN (Convolutional Neural Network, a deep learning algorithm) is adopted. The CNN trains a neural network through a deep learning method to determine the weights of different intentions of the scene and The question sequence and the word vector of the entity attribute sequence are sorted. Thus, the entity attribute with the highest correlation is obtained according to the sorted records, and the entity attribute is used as the purpose of speech recognition information.

The voice information processing method provided by the present disclosure, through the combination of the knowledge graph server and machine learning and other technologies, solves the problem of semantic intention ambiguity with different intentions in different scenarios, and can accurately recognize and understand the user's intention, thereby It improves the user's participation experience, and solves the phenomenon of "multiple meanings in one word" and "multiple words in one meaning" during voice input.

Referring to FIG. 2, a specific implementation flowchart of a voice information processing method provided by an embodiment of the present disclosure is further described for specific implementation manners of the present disclosure, including:

Step 201: The voice recognition module receives a voice control command issued by the user.

Step 202: Upload the user voice information received according to the voice control command to the cloud service platform.

Among them, the cloud service platform will perform preliminary analysis and recognition of the voice recognition information obtained after receiving the user's voice information.

Step 203: The cloud service system sends the voice recognition information to the knowledge graph server to identify the user's purpose and intention.

In one embodiment, after the knowledge graph server receives the speech recognition information, in order to determine the purpose of the speech recognition information in the knowledge graph server, the unstructured speech recognition information is first converted into a structured query sentence , And further perform the following processing, including:

Step B1: Perform knowledge extraction on semi-structured speech recognition information data and unstructured speech recognition information data.

Among them, the keywords in the user identification information can be obtained through knowledge extraction. For example, if the voice control command issued by the user is "I want to buy an apple", the keyword in the user's voice message is extracted as "apple".

Step B2: Data integration of structured speech recognition information data and third-party database data.

Among them, the structured query sentences of multiple speech unstructured speech recognition information are combined with other information materials used to construct the knowledge graph, and the specific data information in the database is combined for knowledge fusion. After knowledge fusion, the entity attributes of different keywords with different intentions in different scenarios can be obtained.

It should be noted that step B1 and step B2 do not limit the execution order.

Step B3: Combine the keywords obtained from the knowledge extraction with the information after data integration to obtain a representation of the relevant knowledge facts.

Among them, keywords can be determined after knowledge extraction, and the data integration information contains the entity attributes corresponding to the keywords. Therefore, the combination of the two can obtain the identification of the relevant knowledge facts corresponding to the keywords. The relevant knowledge facts include all the entity attributes of the keywords.

Step B4: Perform purpose intention reasoning based on the obtained relevant knowledge facts.

Wherein, the purpose intention reasoning is optional. First, determine the scene corresponding to the structured query sentence, where the scene contains a plurality of different intentions; the corresponding scene is the context information and application of the speech recognition information. Equipment information is determined. In one embodiment, according to the weights of the different intentions, the entity attribute with the highest correlation obtained after sorting according to the weight is determined, and the entity attribute is returned to determine the purpose intention of the speech recognition information. In another embodiment, the entity attribute ranking method based on deep learning determines the entity attribute with the highest correlation obtained according to the ranking method; and returns the entity attribute determined as the purpose of the speech recognition information.

Step B5: Verify, evaluate and filter the acquired relevant knowledge facts through the quality verification platform.

In one embodiment, through the introduction of the quality verification platform, the obtained relevant knowledge facts can be verified and evaluated, and the prior knowledge facts that do not meet the specifications and requirements can be filtered. Thereby, the accuracy of the final goal intention can be improved.

Step B6: Perform timely knowledge updates on the relevant knowledge facts obtained.

In one embodiment, in order to ensure the accuracy of the obtained user's purpose intention, timely updating the obtained representation of the relevant knowledge facts also helps to improve the accuracy of the finally obtained purpose intention.

Wherein, in an embodiment, step B5 and step B6 are optional, and are only used to improve the accuracy of the final purpose intention obtained, and are not intended to limit the specific implementation.

Based on the same concept, referring to FIG. 3 is a voice information processing device provided by an embodiment of the present disclosure. The device includes: a conversion module 301, an extraction module 302, and a determination module 303.

The conversion module 301 is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, where the unstructured speech recognition information is obtained through speech recognition;

The extraction module 302 is configured to extract relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;

The determining module 303 is configured to determine the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.

In some embodiments, the determining module 303 is configured to execute:

In some embodiments, the extraction module 302 is configured to execute:

Extract keywords in speech recognition information;

In some embodiments, the extraction module 302 is configured to execute:

After introducing the voice information processing method and device in the exemplary embodiment of the present disclosure, next, a smart terminal of another exemplary embodiment of the present disclosure is introduced.

Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, a method, or a program product. Therefore, various aspects of the present disclosure can be specifically implemented in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software, which may be collectively referred to herein as "Circuit", "Module" or "System".

In some possible implementation manners, the smart terminal according to the present disclosure may at least include at least one processor and at least one memory. The memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps in the voice information processing method according to various exemplary embodiments of the present disclosure described above in this specification. For example, the processor may execute step 101 to step 103 as shown in FIG. 1.

The smart terminal 40 according to this embodiment of the present disclosure will be described below with reference to FIG. 4. The smart terminal 40 shown in FIG. 4 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.

As shown in FIG. 4, the smart terminal 40 is represented in the form of a general smart terminal. The components of the smart terminal 40 may include, but are not limited to: the aforementioned at least one processor 41, the aforementioned at least one memory 42, and a bus 43 connecting different system components (including the memory 42 and the processor 41).

The bus 43 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a processor, or a local bus using any bus structure among multiple bus structures.

The memory 42 may include a readable medium in the form of a volatile memory, such as a random access memory (RAM) 421 and/or a cache memory 422, and may further include a read-only memory (ROM) 423.

The memory 42 may also include a program/utility tool 425 having a set of (at least one) program modules 424. Such program modules 424 include but are not limited to: an operating system, one or more application programs, other program modules, and program data. Each of the examples or some combination may include the realization of a network environment.

The smart terminal 40 can also communicate with one or more external devices 44 (such as keyboards, pointing devices, etc.), and/or with any device that enables the smart terminal 40 to communicate with one or more other smart terminals (such as routers, Modem, etc.) communication. This communication can be performed through an input/output (I/O) interface 45. In addition, the smart terminal 40 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 46. As shown in the figure, the network adapter 46 communicates with other modules for the smart terminal 40 through the bus 43. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the smart terminal 40, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

In some possible implementation manners, various aspects of the smart terminal control method provided in the present disclosure can also be implemented in the form of a program product, which includes a computer program. When the program product runs on a computer device, the computer program is set. In order for the computer device to execute the steps in the voice information processing method according to various exemplary embodiments of the present disclosure described above in this specification, for example, the computer device may execute step 101 to step 103 as shown in FIG. 1.

The program product can adopt any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

The program product for solving the ambiguity of the semantic intention of the scene of the embodiment of the present disclosure may adopt a portable compact disk read-only memory (CD-ROM) and include a computer program, and may be run on a smart terminal. However, the program product of the present disclosure is not limited thereto. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.

The readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, in which a readable computer program is carried. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.

The computer program contained on the readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wired, optical cable, RF, etc., or any suitable combination of the above.

The computer program for performing the operations of the present disclosure can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming. Language-such as "C" language or similar programming language. The computer program can be executed entirely on the target target's intelligent terminal, partly executed on the target target device, executed as an independent software package, partly executed on the target target's intelligent terminal and partly executed on the remote intelligent terminal, or entirely on the remote intelligent terminal. Execute on the terminal or server. In the case of a remote smart terminal, the remote smart terminal can be connected to the target smart terminal through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external smart terminal (for example, using the Internet) The service provider comes to connect via the Internet).

It should be noted that although several units or subunits of the device are mentioned in the above detailed description, this division is only exemplary and not mandatory. In fact, according to the embodiments of the present disclosure, the features and functions of two or more units described above may be embodied in one unit. Conversely, the features and functions of one unit described above can be further divided into multiple units to be embodied.

In addition, although the operations of the method of the present disclosure are described in a specific order in the drawings, this does not require or imply that these operations must be performed in the specific order, or that all the operations shown must be performed to achieve the desired result. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.

Those skilled in the art should understand that the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of computer program products implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable computer programs.

The present disclosure is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present disclosure. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Although the preferred embodiments of the present disclosure have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present disclosure.

Obviously, those skilled in the art can make various changes and modifications to the present disclosure without departing from the spirit and scope of the present disclosure. In this way, if these modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalent technologies, the present disclosure is also intended to include these modifications and variations.

Claims

A method for processing voice information, the method comprising:

Converting the received unstructured speech recognition information into structured query sentences, where the unstructured speech recognition information is obtained through speech recognition;

Extracting relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and storing the relevant knowledge facts;

The purpose and intention of the voice recognition information is determined according to the stored relevant knowledge facts and the structured query sentence.
The method according to claim 1, wherein the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence comprises:

Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;

According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;

Returning the entity attribute is determined as the purpose of the voice recognition information.
The method according to claim 2, wherein the weights of different intents of the structured query sentences in the corresponding scenarios are obtained by marking the weights of the different intents of each keyword in different scenarios through big data mining analysis; or

The weights of structured query sentences for different intentions in the corresponding scenarios are obtained through deep learning method training.
The method according to claim 1, wherein the determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence comprises:

An entity attribute sorting method based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting method;

Returning the entity attribute is determined as the purpose of the voice recognition information.
The method according to claim 1, wherein said extracting relevant knowledge facts of said speech recognition information through the constructed knowledge graph model comprises:

Combine multiple structured query sentences of unstructured voice recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the voice Identify the database used for the information;

Extract keywords in speech recognition information;

The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
The method according to claim 1, wherein the relevant knowledge facts include knowledge refined by the knowledge graph model and specific data information of the speech recognition information.
The method according to claim 6, wherein the storing the relevant knowledge facts comprises:

Storing the knowledge refined by the knowledge graph model in the pattern layer;

The specific data information of the voice recognition information is stored in the data layer.
A device for processing voice information, the device comprising:

The conversion module is configured to execute the conversion of the received unstructured speech recognition information into structured query sentences, wherein the unstructured speech recognition information is obtained through speech recognition;

The extraction module is configured to perform the extraction of relevant knowledge facts of the speech recognition information through the constructed knowledge graph model, and store the relevant knowledge facts;

The determining module is configured to execute the purpose of determining the purpose and intention of the voice recognition information according to the stored relevant knowledge facts in combination with the structured query sentence.
The device according to claim 8, wherein the determining module is configured to execute:

Determining a scene corresponding to the structured query sentence, wherein the scene contains a plurality of different intentions; the corresponding scene is determined by the context information of the speech recognition information and the application device information;

According to the weights of the different intentions, determine the entity attributes with the highest relevance obtained after sorting according to the weights, wherein the entity attributes are included in the stored related knowledge facts;

Returning the entity attribute is determined as the purpose of the voice recognition information.
The device according to claim 9, wherein the weights of different intentions of the structured query sentences in the corresponding scenes are obtained by labeling the weights of the different intentions of the keywords in different scenes after big data mining analysis; or

The weights of structured query sentences with different intentions in the corresponding scenarios are obtained through deep learning device training.
The device according to claim 8, wherein the determining module is configured to execute:

An entity attribute sorting device based on deep learning, determining the entity attribute with the highest correlation obtained according to the sorting device;

Returning the entity attribute is determined as the purpose of the voice recognition information.
The device according to claim 8, wherein the extraction module is configured to execute:

Combine multiple structured query sentences of unstructured voice recognition information with other information materials used to construct a knowledge graph, and combine specific data information in a database for knowledge fusion, wherein the database includes the voice Identify the database used for the information;

Extract keywords in speech recognition information;

The set of entity attributes corresponding to the keywords in the knowledge graph model is determined, and the set of entity attributes is used as a representation of the relevant knowledge facts of the speech recognition information.
8. The device according to claim 8, wherein the relevant knowledge facts include knowledge refined by the knowledge graph model and specific data information of the voice recognition information.
The device according to claim 13, wherein the extraction module is configured to execute:

Storing the knowledge refined by the knowledge graph model in the pattern layer;

The specific data information of the voice recognition information is stored in the data layer.
An intelligent terminal, including: a memory and a processor;

The memory is set to store program instructions;

The processor is configured to call the program instructions stored in the memory, and execute the method according to any one of claims 1-7 according to the obtained program.
A computer storage medium storing computer executable instructions for executing the method according to any one of claims 1-7.