CN113470636B

CN113470636B - Voice information processing method, device, equipment and medium

Info

Publication number: CN113470636B
Application number: CN202010658673.1A
Authority: CN
Inventors: 刘波; 王月岭; 王彦芳; 刘帅帅; 高雪松; 陈维强
Original assignee: Qingdao Hisense Electronic Industry Holdings Co Ltd
Current assignee: Qingdao Hisense Electronic Industry Holdings Co Ltd
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2023-10-27
Anticipated expiration: 2040-07-09
Also published as: CN113470636A

Abstract

The application discloses a voice information processing method, a device, equipment and a medium, which are used for solving the problem that a home central control device is inaccurate in controlling a target device through the existing voice information processing method. When the application determines that the text information of the voice information to be recognized does not contain each key word of the necessary type corresponding to the target intention, if the historical key words of the user inputting the voice information to be recognized are determined to be stored, the historical key words are updated into the target key words, so that the controlled target equipment is determined according to the updated target key words, and the target equipment is controlled to execute the specific content of the target operation, thereby realizing the distinction of the text information of the voice information to be recognized, which is input by different users, avoiding updating the historical key words of other users into the target key words of the user, and improving the accuracy of the subsequent control of the target equipment.

Description

Voice information processing method, device, equipment and medium

Technical Field

The present application relates to the field of natural language understanding technologies, and in particular, to a method, an apparatus, a device, and a medium for processing voice information.

Background

Along with the development of artificial intelligence technology, the cost of intelligent equipment is lower and lower, various scenes of application voice interaction are more and more popular, and more application scenes have the condition of a plurality of intelligent equipment, so that the process of controlling each intelligent equipment in home by voice is complex for a user, and the linkage is poor. In order to solve this situation, decision-making and calculation of functions such as voice and semantics are currently placed on home central control devices, such as a home brain and a cloud computing center, and it is desired to uniformly control each intelligent device in a user's home through the home central control devices. How to precisely control each intelligent device in a user's home through a home central control device is a concern in recent years.

In the prior art, after identifying the target intention of text information of voice information to be identified and a target keyword contained in the text information, if the target keyword is determined not to contain a preset keyword of a necessary type required by the target intention, the intelligent brain acquires a stored historical keyword, and updates the historical keyword into the target keyword, so that the content of the text information is supplemented, the target operation corresponding to the target intention is determined later, and the specific content of the target operation executed by target equipment is controlled.

According to the method, users inputting voice information to be recognized are not distinguished, the content of text information is directly supplemented according to the SessionID and the stored historical keywords, so that when different users are sequentially input the voice information to be recognized, if the previous users enter a multi-round query process, the voice information to be recognized input after the previous users is easily used as reply voice information input by the previous users, and specific content of target operation which is expected to be controlled by the previous users to be executed is determined according to the voice information to be recognized input by the two users, so that the problem of inaccurate control of target equipment is caused.

Disclosure of Invention

The application provides a voice information processing method, a device, equipment and a medium, which are used for solving the problem that a home central control device is inaccurate in controlling a target device through the existing voice information processing method.

In a first aspect, the present application provides a voice information processing method, the method including:

determining target intention corresponding to text information of voice information to be recognized and target keywords contained in the text information, and determining target identification information of a user inputting the voice information to be recognized according to voiceprint characteristics of the voice information to be recognized;

When the target keywords do not contain the keywords of the necessary types corresponding to the target intention, if the history keywords of the users with the target identification information are stored, updating the history keywords into the target keywords;

if the updated target keywords comprise the keywords of the necessary types corresponding to the target intentions, determining target operations corresponding to the target intentions, determining controlled target equipment and specific contents of the target operations according to the updated target keywords, and controlling the target equipment to execute the specific contents of the target operations.

In a possible implementation manner, if the historical keywords of the user of the target identification information are not saved, the method further includes:

and correspondingly storing the target keywords and the target identification information.

determining a first missing necessary type according to the type corresponding to the target keyword and the necessary type;

outputting prompt information for supplementing the first necessary type of keywords, updating the received first necessary type of keywords input by a user of the target identification information into the target keywords, determining the target equipment according to the updated target keywords, and executing specific content of the target operation.

In a possible implementation manner, if the updated target keyword does not include the keyword of the necessary type corresponding to the target intention, the method further includes:

determining a second missing necessary type according to the type corresponding to the updated target keyword and the necessary type;

outputting prompt information for supplementing the keywords of the second necessary type, updating the received keywords of the second necessary type input by the user of the target identification information into the target keywords, determining the target equipment according to the updated target keywords, and executing the specific content of the target operation.

In one possible implementation manner, before determining whether the historical keywords of the user of the target identification information are stored, the method further includes:

acquiring equipment identification information of intelligent equipment for acquiring the voice information to be identified;

and if the historical keywords of the user with the target identification information are stored, updating the historical keywords into the target keywords, wherein the method comprises the following steps:

acquiring a historical keyword group corresponding to the intelligent equipment of the stored equipment identification information;

And if the historical keywords of the user of the target identification information exist in the historical keyword group, updating the historical keywords of the user of the target identification information into the target keywords.

In a possible implementation manner, if the historical keywords of the user with the target identification information are not stored, the method further includes:

and correspondingly storing the target keyword, the target identification information and the equipment identification information.

In a possible implementation manner, after the control target device performs the specific content of the target operation, the method further includes:

if the target operation is a preset closing operation, deleting the updated target keyword;

and if the target operation is a preset message leaving operation, deleting the updated target keyword.

In a second aspect, the present application also provides a voice information processing apparatus, the apparatus comprising:

the determining unit is used for determining target intention corresponding to text information of the voice information to be recognized and target keywords contained in the text information, and determining target identification information of a user inputting the voice information to be recognized according to voiceprint characteristics of the voice information to be recognized;

The first processing unit is used for updating the historical keywords into the target keywords if the historical keywords of the users with the target identification information are stored when the target keywords do not contain the keywords of the necessary types corresponding to the target intention;

and the second processing unit is used for determining target operation corresponding to the target intention if the updated target keyword contains the keyword of the necessary type corresponding to the target intention, determining controlled target equipment and specific content of the target operation according to the updated target keyword, and controlling the target equipment to execute the specific content of the target operation.

In one possible implementation manner, the first processing unit is further configured to store, if the historical keywords of the user of the target identification information are not stored, the target keywords and the target identification information correspondingly.

In a possible implementation manner, the first processing unit is further configured to determine, if the historical keyword of the user of the target identification information is not stored, a missing first necessary type according to a type corresponding to the target keyword and the necessary type; outputting prompt information for supplementing the first necessary type of keywords, updating the received first necessary type of keywords input by a user of the target identification information into the target keywords, determining the target equipment according to the updated target keywords, and executing specific content of the target operation.

In a possible implementation manner, the second processing unit is further configured to determine, if the updated target keyword does not include a keyword of a necessary type corresponding to the target intention, a missing second necessary type according to the type corresponding to the updated target keyword and the necessary type; outputting prompt information for supplementing the keywords of the second necessary type, updating the received keywords of the second necessary type input by the user of the target identification information into the target keywords, determining the target equipment according to the updated target keywords, and executing the specific content of the target operation.

In a possible implementation manner, the determining unit is further configured to obtain device identification information of an intelligent device that collects the voice information to be identified before determining whether the historical keywords of the user of the target identification information are stored;

the first processing unit is specifically configured to obtain a historical keyword group corresponding to the stored intelligent device of the device identification information; and if the historical keywords of the user of the target identification information exist in the historical keyword group, updating the historical keywords of the user of the target identification information into the target keywords.

In one possible implementation manner, the first processing unit is further configured to store, if the historical keywords of the user with the target identification information are not stored, the target keywords, the target identification information, and the device identification information correspondingly.

In a possible implementation manner, the second processing unit is further configured to delete the updated target keyword if the target operation is a preset closing operation after the specific content of the target operation is executed by the control target device; and if the target operation is a preset message leaving operation, deleting the updated target keyword.

In a third aspect, the present application also provides an electronic device, at least comprising a processor and a memory, the processor being configured to implement the steps of any of the above-described speech information processing methods when executing a computer program stored in the memory.

In a fourth aspect, the present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the above-described speech information processing methods.

When the application determines that the text information of the voice information to be recognized does not contain each key word of the necessary type corresponding to the target intention, if the historical key words of the user inputting the voice information to be recognized are determined to be stored, the historical key words are updated into the target key words, so that the controlled target equipment is determined according to the updated target key words, and the target equipment is controlled to execute the specific content of the target operation, thereby realizing the distinction of the text information of the voice information to be recognized, which is input by different users, avoiding updating the historical key words of other users into the target key words of the user, and improving the accuracy of the subsequent control of the target equipment.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a voice information processing process according to some embodiments of the present application;

FIG. 2 is a schematic diagram of a specific voice message processing flow according to some embodiments of the present application;

FIG. 3 is a schematic diagram of a voice information processing apparatus according to some embodiments of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to some embodiments of the present application.

Detailed Description

In order to improve the accuracy of control over target equipment, the application provides a voice information processing method, a device, equipment and a medium.

For the purpose of promoting an understanding of the principles and advantages of the application, reference will now be made in detail to the drawings, in which embodiments illustrated in the drawings are intended to illustrate, but not limit the application to the specific embodiments illustrated. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the practical application process, when a user wants to control intelligent equipment in a home through voice information, for example, an air conditioner in a bedroom is opened, information is queried, and the voice information to be recognized, which is input by the user, can be acquired through intelligent equipment such as an intelligent sound box and an intelligent television. The intelligent device for collecting the voice information to be recognized can directly perform voiceprint recognition on the voice information to be recognized and perform intention recognition on text information of the voice information to be recognized in a local mode, for example, an aiml intention template matching method is adopted, the voice information to be recognized can also be sent to electronic equipment, the electronic equipment performs voiceprint recognition on the voice information to be recognized and performs intention recognition on the text information of the voice information to be recognized. And after determining the target intention corresponding to the text information of the voice information to be recognized and each included target keyword, carrying out subsequent processing, thereby realizing the specific content of controlling the intelligent device to execute target operation according to the user's desire.

Fig. 1 is a schematic diagram of a voice information processing procedure according to some embodiments of the present application, where the procedure includes:

s101: determining target intention corresponding to text information of voice information to be recognized and target keywords contained in the text information, and determining target identification information of a user inputting the voice information to be recognized according to voiceprint characteristics of the voice information to be recognized.

The voice information processing method provided by the application is applied to electronic equipment, and the electronic equipment can be intelligent equipment such as an intelligent sound box and an intelligent television, and also can be household central control equipment such as an intelligent manager server and an intelligent brain.

In the application, the text information of the voice information to be recognized, which is acquired by the electronic equipment, can be generated according to the received voice information to be recognized, or can be text information of the voice information to be recognized, which is sent by other intelligent equipment. Similarly, the voiceprint feature of the voice information to be recognized, which is obtained by the electronic device, may also be a voiceprint feature generated directly according to the received voice information to be recognized and a locally stored voiceprint recognition model, or may also be a voiceprint feature of the voice information to be recognized, which is sent by other intelligent devices.

The electronic device may receive the voice information to be recognized sent by other intelligent devices, or may collect the voice information to be recognized by itself, which is not limited herein.

In order for the electronic device to recognize the user who inputs the voice information to be recognized, in the present application, the electronic device needs to acquire the voice information of each user in the home where the electronic device is located in advance, and determine the identification information of the user to which each voice information belongs. And registering each piece of voice information through a locally stored voiceprint recognition model. The registration means that the voice print characteristics corresponding to each voice message are obtained through a locally stored universal voice print model, and each voice print characteristic and corresponding identification information are correspondingly stored. After the subsequent electronic equipment acquires the voiceprint features of the voice information to be recognized, the voiceprint features are matched with each stored registered voiceprint feature, and when the matched registered voiceprint features are determined to exist, identification information corresponding to the matched registered voiceprint features is used as target identification information of a user inputting the voice information to be recognized.

In order to control the target device, after the electronic device obtains the text information of the voice information to be recognized, the electronic device performs intention recognition on the text information of the voice information to be recognized, and determines each target keyword contained in the text information. The method for performing intention recognition on the text information and determining each target keyword contained in the text information belongs to the prior art, and is not described herein.

S102: and when the target keywords do not contain the keywords of the necessary types corresponding to the target intention, if the history keywords of the users with the target identification information are stored, updating the history keywords into the target keywords.

In order to accurately control the intelligent device according to the voice information to be recognized, the necessary types of keywords corresponding to the intention are stored for each intention in advance, for example, the necessary types of keywords corresponding to the intention are music names when the intention is to open music, the necessary types of keywords corresponding to the open music are the starting points, the ending points and the time when the intention is to order the air ticket.

After determining the target keyword included in the text information based on the above embodiment, it is determined whether the target keyword includes a keyword of a necessary type corresponding to the target intention. If the target keywords are determined to contain the keywords of the necessary types corresponding to the target intention, and the text information is completely contained, the target equipment is accurately controlled based on the target keywords and the target intention.

For example, the target keyword included in the text information "play water-forgetting music" is "forgetting water", the necessary type of keyword corresponding to the matched intention template "play music" is song name, the target keyword "forgetting water" included in the text information is determined to include the necessary type of keyword corresponding to the matched intention template, and music is played based on the target keyword "forgetting water" and the target intention.

In an actual application scenario, due to different speaking habits of users, there may be a situation that information contained in the voice information to be recognized is incomplete, that is, a target keyword contained in text information of the voice information to be recognized does not contain a necessary type of keyword corresponding to the target intention. In this case, the stored historical keywords may be directly added to the target keywords to perfect the keywords of the necessary types that are missing in the text information, or the keywords of the necessary types that are missing in the text information may be obtained through multiple queries, but the historical keywords that are input by other users may be easily identified, or the voice information to be identified that is input by other users may be identified as the reply information, so as to affect the accuracy of the user to control the target device. Therefore, in order to accurately control the target device according to the voice information to be recognized, the history keywords corresponding to the user of each piece of identification information can be stored in advance. When it is determined that the target keyword does not include a keyword of a necessary type corresponding to the target intention, it is determined whether a history keyword of the user who inputs the voice information to be recognized, that is, whether the history keyword of the user who has the target identification information determined in the above embodiment is stored. If the historical keywords are stored, the historical keywords are directly updated into the target keywords.

For example, the target identification information of the user who inputs the voice information to be identified as "closed" is a, the target keyword is "closed", the fact that the main body to be closed is absent in the target keyword is determined, whether the historical keyword of the user with the target identification information as a is stored is determined, and if the historical keyword of the user with the target identification information as a is determined to be "forget water", the historical keyword "forget water" is updated to the target keyword.

S103: if the updated target keywords comprise the keywords of the necessary types corresponding to the target intentions, determining target operations corresponding to the target intentions, determining controlled target equipment and specific contents of the target operations according to the updated target keywords, and controlling the target equipment to execute the specific contents of the target operations.

When the history keywords of the user of the target identification information are updated to the target keywords based on the embodiment, whether the updated target keywords contain keywords of the necessary types corresponding to the target intention is judged, if the updated target keywords contain keywords of the necessary types corresponding to the target intention, the target operation corresponding to the target intention is determined according to the corresponding relation between the intention and the operation stored in advance, the controlled target device and the specific content of the target operation are determined according to the updated target keywords, and then the target device is controlled to execute the specific content of the target operation.

For example, if the updated target keywords are "closed", "forget water", and if the updated target keywords include a keyword of a necessary type corresponding to the target intention, the target operation corresponding to the target intention is determined to be closed, and according to the updated target keywords "closed", "forget water", the controlled target device is an intelligent sound box for playing "forget water", and the specific content for controlling the intelligent sound box to execute the target operation is to close the piece of music "forget water".

In order to achieve accurate control over the target device, based on the above embodiment, in the present application, if the history keyword of the user of the target identification information is not saved, the method further includes:

In another possible implementation, the user may be the first time he is engaged in a voice interaction with the smart device, the history keywords of the user will not be saved in the electronic device. In order to facilitate subsequent processing of voice information, after determining that the historical keywords of the user who does not store the target identification information, the target keywords included in the currently acquired text information and the target identification information of the user may be correspondingly stored, so that when the content in the voice information to be identified, which is input by the user who receives the target identification information again later, is incomplete, the target keywords may be used as the historical keywords of the user of the target identification information, so that subsequent processing operations in the foregoing embodiments may be performed.

For example, the target identification information of the user who inputs the voice information "closed" to be identified is a, the target keyword is "closed", the lack of the main body to be closed in the target keyword is determined, but the history keyword of the user who has the target identification information a is determined to be not stored, and the history keyword "closed" and the target identification information a are correspondingly stored.

In order to accurately control the intelligent device and improve the user experience, based on the above embodiments, in the present application, if the history keywords of the user of the target identification information are not saved, the method further includes:

Since the above-described determination of whether or not the history keyword of the user of the target identification information is stored is performed after it is determined that the target keyword included in the text information does not include a keyword of a necessary type corresponding to the target intention. When the historical keywords of the user which do not store the target identification information are determined based on the embodiment, the current target keywords are not included in the keywords of the necessary type corresponding to the target intention, and the specific content of the target equipment for executing the target operation cannot be accurately determined according to the current target keywords. Therefore, in order to accurately control the intelligent device and improve the user experience, multiple queries can be performed to supplement the missing necessary types of keywords.

Specifically, according to the type corresponding to the target keyword and the necessary type corresponding to the target intention, determining the first missing necessary type.

For example, the text information "how long from north is to be sublimated" corresponds to a target intention as a query time, the target keyword is "north is", "sublimated", the necessary type corresponding to the target intention query time includes two places and one traffic way, and the type corresponding to the target keyword "north is", "sublimated" includes only keywords of two places, and it is determined that the first necessary type is absent as the traffic way.

After determining the missing first necessary type, in order to prompt the user to supplement the missing first necessary type of keywords in time, prompt information for supplementing the first necessary type of keywords may be output, for example, "please input a traffic way".

The outputting of the prompt message for supplementing the first necessary type of keyword may be performed by voice broadcasting the prompt message in audio format, for example, voice broadcasting the prompt message "please supplement the traffic mode" for supplementing the keyword in the traffic mode, or displaying the prompt message corresponding to the text form on the display interface, for example, displaying the prompt message "please supplement the traffic mode" for supplementing the keyword in the traffic mode on the display interface. The two modes of outputting the prompt information can be combined at the same time, namely, the prompt information in the audio format is broadcast at the same time and the prompt information in the text format is displayed on the display interface.

The specific selection of which mode to output the prompt message can be preset according to the preference of the user, or can be selected according to the capabilities of the electronic devices, for example, some electronic devices do not have a display interface capable of displaying the prompt message, and when the prompt message is output, the prompt message in an audio format can be broadcasted for the electronic devices.

In an actual application scenario, after outputting a prompt message for the user to supplement the first necessary type of keywords, the general user will directly input the reply voice message including the first necessary type of keywords, for example, output "please supplement the trip mode", and the user will generally directly input the reply voice message including the trip mode, such as "public transportation". Therefore, after the collected voice information is obtained, if the user inputting the voice information is determined to be the user of the target identification information according to the voiceprint feature of the voice information, the text information of the voice information can be directly used as the supplementary first necessary type keyword, and the first necessary type keyword can be updated into the target keyword. Judging whether the updated target keywords contain keywords of a necessary type corresponding to the target intention, if so, determining target operation corresponding to the target intention, determining target equipment according to the updated target keywords, controlling the target equipment to execute specific content of the target operation, and then controlling the target equipment to execute specific content of the target operation.

For example, the text information of the voice information to be recognized, which is input by the user with the target identification information B, is "how long from north to clear" and the target is intended as the query time, the target keyword is "north", "clear", the necessary type corresponding to the target intended query time includes two places and one pass, and the type corresponding to the target keyword "north", "clear" includes only keywords of two places, the first necessary type is determined to be the pass, and then the prompt information "please input the pass" for supplementing the pass is input. Subsequently, acquiring voice information 'subway', if the user inputting the voice information is determined to be the user with the target identification information B according to the voiceprint characteristics of the voice information 'subway', acquiring text information of the voice information 'subway', directly using the text information as a keyword of a first necessary type input by the user, and updating the keyword of the first necessary type 'subway' into the target keywords 'Qinghua' and 'North Dar'. The updated target keywords 'Qinghua', 'North Da' and 'subway' contain the keywords of the necessary types corresponding to the corresponding target intentions, the target operation corresponding to the target intentions is obtained, the query duration of the network server is requested to be output, the target equipment is determined to be the intelligent mobile phone and the duration of the time from North Dalton to Qinghua subway according to the updated target keywords 'Qinghua', 'North Dalton' and 'subway', and the intelligent mobile phone is controlled to request the network server to query the information of the duration from North Dalton to Qinghua subway and output.

In order to accurately control the intelligent device and improve the user experience, based on the above embodiments, in the present application, if the updated target keyword does not include a keyword of a necessary type corresponding to the target intention, the method further includes:

In an actual application scenario, even if a history keyword is updated to a target keyword, the updated target keyword still does not include a keyword of a necessary type corresponding to a target intention, so that specific content of a target device for executing a target operation cannot be accurately determined according to the updated target keyword.

Specifically, according to the type corresponding to the updated target keyword and the necessary type corresponding to the target intention, determining a second necessary type which is absent, and then outputting prompt information for supplementing the keyword of the second necessary type. Acquiring the acquired voice information in real time, and after acquiring the voice information, if the user inputting the voice information is determined to be the user of the target identification information according to the voiceprint characteristics of the voice information, directly taking the text information of the voice information as a supplementary keyword of a second necessary type, and updating the keyword of the second necessary type into the target keyword. Judging whether the updated target keywords contain keywords of a necessary type corresponding to the target intention, if so, determining target operation corresponding to the target intention, determining target equipment according to the updated target keywords, controlling the target equipment to execute specific content of the target operation, and then controlling the target equipment to execute specific content of the target operation.

The manner and method for outputting the prompt message for supplementing the second necessary type of keyword and the process for processing the voice message for supplementing the second necessary type of keyword for the user are the same as the above description, and are not repeated here.

In order to further improve accuracy of control over the target device, on the basis of the above embodiment, before determining whether the historical keywords of the user of the target identification information are stored, in the present application, the method further includes:

In a practical application scenario, the following scenario may occur: an intelligent sound box is arranged in a living room of a family, a voice control panel is arranged in a kitchen, a user A inputs voice information to be recognized "open two tigers" through the intelligent sound box in the living room, and after the electronic equipment acquires the voice information to be recognized "open two tigers" input by the user A, corresponding processing is carried out, so that the intelligent sound box is controlled to play music of the two tigers; then, the user A inputs voice information to be recognized, namely 'play braised fish method', through the voice control panel in the kitchen, and the electronic equipment acquires the voice information to be recognized, namely 'play braised fish method', input by the user A and carries out corresponding processing, so that the voice control panel of the kitchen is controlled to play a voice course for making braised fish.

When the follow-up user A returns the music which the living room wants to close the intelligent sound box and plays, the voice information to be recognized is input into the living room through the intelligent sound box of the living room again to be closed, after the electronic equipment acquires the voice information to be recognized which is input by the user A and is closed, the electronic equipment adopts a traditional SessionID or context query mode, the electronic equipment can update the stored historical keyword of the user A, namely, the braised fish, into the target keyword to be closed, then the voice control panel of the control kitchen is determined to close the menu course which is being played, and in fact, for the user A, the voice control panel of the control kitchen is expected to close the music which is being played, so that the intelligent equipment cannot be controlled accurately according to the intention of the user, and the accuracy of controlling the intelligent equipment is reduced.

Based on the above-mentioned scenario, it can be known that, for a user, when the user controls the target device through the voice information to be recognized, the target smart device generally has a certain correlation with the smart device collecting the voice information to be recognized, for example, if the target device that the user wants to control is in a living room, the user often inputs the voice information to be recognized through a certain smart device in the living room, or the target device that the user wants to control is generally the smart device collecting the voice information to be recognized. Therefore, in order to further improve the accuracy of controlling the target device, in the application, before judging whether the historical keywords of the user with the target identification information are stored, the device identification information of the intelligent device for collecting the voice information to be recognized can be acquired, and corresponding processing is performed based on the target identification information of the user inputting the voice information to be recognized and the device identification information for collecting the voice information to be recognized, so that the intelligent device can be accurately controlled.

The device identification information may be a MAC address and an IP address of the smart device, or may be a preset character string or number, such as "swad", "007", etc. The flexible setting can be specifically performed according to actual requirements, and is not specifically limited herein.

In order to control the target equipment more accurately according to the voice information to be identified, the history keywords, the equipment identification information and the target identification information are correspondingly stored. After acquiring the device identification information of the intelligent device acquiring the voice information to be identified and the target identification information of the user inputting the voice information to be identified based on the above embodiment, the stored historical keyword group corresponding to the intelligent device of the device identification information can be acquired first, and the historical keyword group is determined by the target keyword contained in the text information of the voice information to be identified acquired by the intelligent device of the device identification information and the identification information of the user inputting the voice information to be identified. And then, determining whether the historical keywords of the users with the target identification information exist in the historical keyword group according to the identification information of the users corresponding to the historical keyword group.

Specifically, if it is determined that the history keyword of the user of the target identification information exists in the history keyword group, the history keyword of the user of the target identification information is explained to have correlation with the voice information to be identified, and then the history keyword of the user of the target identification information is updated to the target keyword.

In order to facilitate the subsequent processing of the voice information to be recognized, in the application, after the historical keywords of the user of the target identification information are updated to the target keywords, the updated target keywords, the target identification information and the equipment identification information are correspondingly stored.

In another possible implementation manner, if it is determined that the historical keyword group corresponding to the intelligent device that does not store the device identification information, or that the historical keyword of the user of the target identification information does not exist in the historical keyword group corresponding to the intelligent device that does not store the device identification information, that is, the historical keyword of the user that does not store the target identification information, the target keyword, the target identification information and the device identification information are directly stored correspondingly. Specifically, if the historical keywords of the user with the target identification information are not stored, the method further includes:

When the target keywords, the target identification information and the equipment identification information are stored correspondingly, mapping relations or tables can be respectively built so as to realize data structured storage of the contexts of different users on different intelligent equipment. When the voice information processing is carried out subsequently, the identification information of the user of the voice information to be identified can be input according to the established mapping relation or the table, the equipment identification information of the voice information to be identified is collected, the historical keywords are inquired, and the historical keywords of the user of the target identification information are determined, so that the target equipment can be accurately controlled.

In addition, in order to further facilitate the subsequent processing of the voice information to be recognized, other information can be correspondingly stored, wherein the other information comprises at least one of a target control instruction, target equipment, specific content of target operation, target application and the like.

The target control instruction can be determined according to the corresponding relation between the pre-saved operation and the control instruction or according to the pre-saved intention and the control instruction.

The target intention may be determined according to a correspondence between a pre-saved operation and an application, or according to a correspondence between a pre-saved intention and an application.

Specifically, the following describes the above embodiments in detail in connection with an actual application scenario:

an intelligent sound box is arranged in a living room of a family, a voice control panel is arranged in a kitchen, a user A inputs voice information to be recognized "open two tigers" through the intelligent sound box in the living room, and after the electronic equipment acquires the voice information to be recognized "open two tigers" input by the user A, corresponding processing is carried out, so that the intelligent sound box is controlled to play music of the two tigers; then, the user A inputs voice information to be recognized, namely 'play braised fish method', through the voice control panel in the kitchen, and the electronic equipment acquires the voice information to be recognized, namely 'play braised fish method', input by the user A and carries out corresponding processing, so that the voice control panel of the kitchen is controlled to play a voice course for making braised fish. At this time, the following table is a case where the electronic device stores the target keyword, the identification information of the user, the device identification information, the target control instruction, and the target application correspondingly.

Based on the above table, the device identification information is the MAC address of the intelligent device. The identification information of the user A is 10001, the equipment identification information of the intelligent sound box is fe 80:65c0:4647:41e2:7cfb, the target control instruction corresponding to the voice information to be recognized, namely, the opening of the two tigers, input by the user A is opening music, the target keyword contained in the text information of the voice information to be recognized, namely, the opening of the two tigers, is the opening of the two tigers, and the playing of the music of the two tigers is realized by applying the music. The equipment identification information of the voice control panel is fe 80:e508:73f:e5c8:c94, the target control instruction corresponding to the voice information to be recognized, namely the method of playing the braised fish, input by the user A is the method of playing the braised fish, the target keyword contained in the text information of the voice information to be recognized, namely the method of playing the braised fish, is the method of playing the braised fish, and the voice course for playing and making the braised fish is realized by applying a menu.

When the follow-up user A returns to the living room and hopes to close the music being played by the intelligent sound box, the voice information to be recognized is input again to be closed through the intelligent sound box in the living room, and after the electronic equipment obtains the voice information to be recognized input by the user A, the intelligent sound box correspondingly sends the voice information to be recognized to be closed and the equipment identification information of the intelligent sound box to the electronic equipment. The electronic equipment acquires that the target intention corresponding to the text information of the voice information to be recognized is closed, the target keyword contained in the text information is closed, and determines that the target identification information of the user inputting the voice information to be recognized is 10001 according to the voiceprint characteristics of the voice information to be recognized. And determining that the target keyword is closed and does not contain keywords of a necessary type corresponding to the target intention, namely, the electronic equipment can not determine which entity to close specifically according to the target keyword, inquiring the stored historical keyword group of the equipment identification information, determining that the historical keyword of the user of the target identification information 10001 in the historical keyword group is two tigers, and updating the historical keyword of the two tigers into the target keyword. And acquiring target operation corresponding to the target intention to be closed, and determining that the target equipment is an intelligent sound box and closing the music application according to the updated target keyword, and determining to control the intelligent sound box to close the music.

In another possible implementation manner, since the target application controlled by the user a through the smart speaker last time has been saved, it may be determined that the specific content of the target operation is closing music directly according to the target application.

In order to reduce the storage resources occupied by the information used for storing the history keywords and the like, in the present application, after the specific content of the target operation is executed by the control target device, the method further includes:

In the practical application process, although only the target keyword, the device identification information and the target identification information recorded when a certain user controls the target device last time can be correspondingly stored, the information required to be stored is very large due to the influence of factors such as increase of intelligent devices and users in families. In the process of processing the subsequent voice information, some stored information may not help to identify the subsequent voice information to be identified, for example, after the user controls the voice notepad to store a complete message, the complete message is generally not relevant to the message stored next time by the user, or after the user controls the smart television to be closed, the stored information closing the smart television is not very relevant to the program to be watched by the next time when the user opens the smart television again.

Therefore, in order to reduce the storage resources occupied for storing information such as history keywords, in the present application, after controlling the target device to execute specific contents of the target operation, it may be determined whether to delete the updated target keywords according to the target operation.

Specifically, according to the target operation, determining whether to delete the updated target keyword includes the following two cases:

case one: if the target operation is a preset closing operation, which indicates that the current updated target keyword has no strong correlation with the subsequent recognition of the voice information to be recognized, deleting the updated target keyword.

And a second case: if the target operation is a preset message leaving operation, which indicates that the current updated target keyword does not have strong correlation with the subsequent recognition of the voice information to be recognized, deleting the updated target keyword.

It should be noted that, according to the target operation, the case of determining to delete the updated target keyword does not include only the above two cases, but also may include other cases, and is not specifically limited herein.

The method comprises the steps that intelligent equipment such as an intelligent sound box and a dressing mirror are arranged in a living room of a family, a user A inputs voice information to be recognized "to a user B for leaving a message" in the living room through the intelligent sound box, after electronic equipment obtains voice information to be recognized "to the user B for leaving a message" input by the user A, corresponding target intention of text information corresponding to the voice information to be recognized "to the user B for leaving a message" is stored through a voice notepad, and a target keyword "user B" contained in the text information is stored, but if keywords with message content are absent in the target keywords of the text information, multi-round inquiry is carried out, if user C also inputs voice information to be recognized "to the user B for leaving a message" in the bedroom in the process of multi-round inquiry, the electronic equipment obtains voice information to be recognized "to the user B for prompting the user B for the English spelling operation at five afternoon" in the afternoon, corresponding to the text information corresponding to the target keyword of the user B for the English spelling at the afternoon "is processed correspondingly, and the text information corresponding to the target keyword corresponding to the text information corresponding to the English spelling operation at the five afternoon" is stored in the text of the text information corresponding to the text information of the user B for five afternoon "is recorded in the afternoon". At this time, if the voice information input by the user C and the voice information input by the user a cannot be distinguished by the conventional SessionID or the conventional contextual query manner, the electronic device may leave a message "to the user B" for voice information to be recognized which is input by the user C later, remind the user B of the fact that the target keyword "the english spelling operation at five afternoon" is mistakenly considered as the message content supplemented by the user a at five afternoon "the english spelling operation at five afternoon", so that the voice to be recognized input by the user a and the user C is the message information to be recorded.

For the above scenario, if the electronic device adopts the voice information processing method provided by the present application, in the process of performing multiple queries on the user a, the voice information to be identified input by the user C is obtained, and when it is determined that the target keyword included in the text information of the voice information to be identified input by the user C includes the keyword of the necessary type corresponding to the target intention, and it is determined that there is no historical keyword corresponding to the equipment identification information of the fitting mirror and the identification information of the user C, the equipment identification information of the fitting mirror, the identification information of the user C, the target keyword of the message for whom and the target keyword of the message content are correspondingly stored. The electronic device stores the target keyword, the user identification information, the device identification information, the target keyword for who to leave a message, and the target keyword of the message content correspondingly, see the following table:

based on the above table, the device identification information is the MAC address of the intelligent device. The identification information of the user A is 10001, the equipment identification information of the intelligent sound box is fe 80:65c0:4647:41e2:7cfb, the target keyword for who to leave a message contained in the text information of the voice information to be identified which is input by the user A and is left to the user B is "user B", and the target keyword of the message content contained in the text information is null. The identification information of the user C is 10002, the equipment identification information of the intelligent sound box is fe80, e508 is 73f is 5C8 is 94, the voice information to be identified input by the user C is "leave a message for the user B", the target keyword for who to leave a message contained in the text information of the English spelling operation at five afternoon is "user B", and the target keyword of the message content contained in the text information is "English spelling operation at five afternoon".

After the following user A inputs the voice information of supplementing the message content, namely, 9 am on friday, going to the library for book returning, the stored information is searched, and if the user A is determined that a message lacking the message content exists, the text information of the voice information, namely, 9 am on friday, going to the library for book returning, is directly used as a target keyword of the message content of the user A. At this time, the electronic device stores the target keyword, the user identification information, the device identification information, the target keyword for whom to leave a message, and the target keyword of the message content, see the following table:

the following details of the voice information processing method provided by the present application by means of specific embodiments, fig. 2 is a schematic diagram of a specific voice information processing flow provided by some embodiments of the present application, where the flow includes:

s201: and carrying out voice recognition on the voice information to be recognized.

The specific process for carrying out voice recognition on the voice information to be recognized comprises the following steps: and determining target intention corresponding to the text information of the voice information to be recognized and the contained target key.

When target intention corresponding to text information of voice information to be recognized and target keywords contained in the text information are obtained, the target intention corresponding to the text information can be determined through an existing intention recognition model, and each target keyword contained in the text information is determined through a keyword extraction model; the text information can also be matched with a preconfigured aiml intention template, when the matched aiml intention template exists, the intention corresponding to the matched aiml intention template is determined to be the target intention corresponding to the text information, and each target keyword contained in the text information is obtained according to the character interval of each keyword in the matched aiml intention template. Of course, the text information may be matched with a preconfigured aiml intention template, when there is a matched aiml intention template, the intention corresponding to the matched aiml intention template is determined to be the target intention corresponding to the text information, and then each target keyword contained in the text information is determined through a keyword extraction model.

S202: and acquiring equipment identification information and target identification information.

Specifically, voiceprint features of the voice information to be recognized are matched with registered voiceprint features, and identification information corresponding to the matched registered voiceprint features is determined as target identification information of a user inputting the voice information to be recognized.

Device identification information of the intelligent device for collecting the voice information to be recognized, such as an MAC address, an IP address and the like of the intelligent device, is obtained.

S203: when the target keyword does not contain the keyword of the necessary type corresponding to the target intention, judging whether a history keyword group corresponding to the intelligent device with the device identification information is stored, and judging whether the history keyword group contains the history keyword of the user with the target identification information, if yes, executing S204, otherwise, executing S206.

S204: and acquiring the historical keywords of the user of the target identification information in the historical keyword groups corresponding to the intelligent equipment of the equipment identification information.

S205: the history keyword acquired in S204 is updated to the target keyword, and then S207 is executed.

S206: and correspondingly storing the target keywords, the target identification information and the equipment identification information.

S207: and judging whether the updated target keywords comprise keywords of a necessary type corresponding to the target intention, if so, executing S209, otherwise, executing S208.

S208: multiple queries.

Specifically, determining a second necessary type which is lack according to the type corresponding to the updated target keyword and the necessary type;

and outputting prompt information for supplementing the second necessary type of keywords, updating the received second necessary type of keywords input by the user of the target identification information into the target keywords, and then executing S209.

S209: and correspondingly storing the target keywords, the target identification information and the equipment identification information, and controlling target equipment.

Specifically, the process of controlling the target device includes: and determining target operation corresponding to the target intention, determining the controlled target equipment and the specific content of the target operation according to the updated target keyword, and controlling the target equipment to execute the specific content of the target operation.

S210: if the target operation is a preset closing operation, deleting the updated target keyword; if the target operation is a preset message operation, deleting the updated target keyword.

The application also provides a voice information processing device, fig. 3 is a schematic structural diagram of a voice information processing device according to some embodiments of the application, where the device includes:

A determining unit 31, configured to determine a target intention corresponding to text information of voice information to be recognized and a target keyword included in the text information, and determine target identification information of a user who inputs the voice information to be recognized according to voiceprint features of the voice information to be recognized;

a first processing unit 32, configured to update, when the target keyword does not include a keyword of a necessary type corresponding to the target intention, a history keyword of a user having the target identification information stored therein, into the target keyword;

and the second processing unit 33 is configured to determine a target operation corresponding to the target intention if the updated target keyword includes a keyword of a necessary type corresponding to the target intention, determine a controlled target device and specific content of the target operation according to the updated target keyword, and control the target device to execute the specific content of the target operation.

In a possible implementation manner, the first processing unit 32 is further configured to store, if the historical keywords of the user of the target identification information are not stored, the target keywords and the target identification information correspondingly.

In a possible implementation manner, the first processing unit 32 is further configured to determine, if the historical keyword of the user of the target identification information is not stored, a missing first necessary type according to the type corresponding to the target keyword and the necessary type; outputting prompt information for supplementing the first necessary type of keywords, updating the received first necessary type of keywords input by a user of the target identification information into the target keywords, determining the target equipment according to the updated target keywords, and executing specific content of the target operation.

In a possible implementation manner, the second processing unit 33 is further configured to determine, if the updated target keyword does not include a keyword of a necessary type corresponding to the target intention, a missing second necessary type according to the type corresponding to the updated target keyword and the necessary type; outputting prompt information for supplementing the keywords of the second necessary type, updating the received keywords of the second necessary type input by the user of the target identification information into the target keywords, determining the target equipment according to the updated target keywords, and executing the specific content of the target operation.

In a possible implementation manner, the determining unit 31 is further configured to obtain device identification information of the intelligent device that collects the voice information to be identified before determining whether the historical keywords of the user of the target identification information are stored;

the first processing unit 32 is specifically configured to obtain a historical keyword group corresponding to the stored intelligent device of the device identification information; and if the historical keywords of the user of the target identification information exist in the historical keyword group, updating the historical keywords of the user of the target identification information into the target keywords.

In a possible implementation manner, the first processing unit 32 is further configured to store, if the historical keywords of the user with the target identification information are not stored, the target keywords, the target identification information, and the device identification information correspondingly.

In a possible implementation manner, the second processing unit 33 is further configured to delete the updated target keyword if the target operation is a preset closing operation after the specific content of the target operation is executed by the control target device; and if the target operation is a preset message leaving operation, deleting the updated target keyword.

Fig. 4 is a schematic structural diagram of an electronic device according to some embodiments of the present application, and on the basis of the foregoing embodiments, the present application further provides an electronic device, as shown in fig. 4, including: the processor 41, the communication interface 42, the memory 43 and the communication bus 44, wherein the processor 41, the communication interface 42 and the memory 43 complete communication with each other through the communication bus 44;

the memory 43 has stored therein a computer program which, when executed by the processor 41, causes the processor 41 to perform the steps of:

Because the principle of the electronic device for solving the problem is similar to that of the voice information processing method, the implementation of the electronic device can refer to the implementation of the method, and the repetition is omitted.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface 42 is used for communication between the electronic device and other devices described above.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital instruction processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

On the basis of the above embodiments, the present application also provides a computer readable storage medium having stored therein a computer program executable by a processor, which when run on the processor, causes the processor to perform the steps of:

Since the principle of solving the problem by using the computer readable medium is similar to that of the voice information processing method, after the processor executes the computer program in the computer readable medium, the implementation steps can be referred to the implementation of the method, and the repetition is omitted.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for processing voice information, the method comprising:

2. The method of claim 1, wherein if the historical keywords of the user of the target identification information are not saved, the method further comprises:

3. The method of claim 1, wherein if the historical keywords of the user of the target identification information are not saved, the method further comprises:

4. The method of claim 1, wherein if the updated target keyword does not include a keyword of a type necessary for the target intention, the method further comprises:

5. The method according to claim 1, wherein before determining whether the history keyword of the user of the target identification information is stored, the method further comprises:

6. The method of claim 5, wherein if the historical keywords of the user for which the target identification information is not stored, the method further comprises:

7. The method according to any one of claims 1 to 6, wherein after the control-target device performs the specific content of the target operation, the method further comprises:

8. A speech information processing apparatus, characterized in that the apparatus comprises:

9. An electronic device comprising at least a processor and a memory, the processor being adapted to implement the steps of the speech information processing method according to any of claims 1-7 when executing a computer program stored in the memory.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the speech information processing method according to any one of claims 1-7.