CN113488037B

CN113488037B - Speech recognition method

Info

Publication number: CN113488037B
Application number: CN202010661084.9A
Authority: CN
Inventors: 张淯易; 高雪松; 陈维强
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2024-04-12
Anticipated expiration: 2040-07-10
Also published as: CN113488037A

Abstract

The application discloses a voice recognition method which is used for solving the problems of lower accuracy and low user experience when corresponding recommended information is determined in the prior art. According to the method and the device, the first text information corresponding to the voice information to be recognized and the first user attribute information corresponding to the voiceprint feature of the voice information can be obtained, the first recommendation information corresponding to the voice information can be determined through the recommendation model which is trained in advance based on the first semantic information corresponding to the first text information and the first user attribute information, and because the first recommendation information corresponding to the voice information determined by the method and the device is comprehensively determined based on the first semantic information and the first user attribute information, the accuracy of determining the first recommendation information is higher, the actual demands of users are met, and the user experience is improved.

Description

Speech recognition method

Technical Field

The present disclosure relates to the field of speech recognition technologies, and in particular, to a method, apparatus, system, device, and medium for speech recognition.

Background

At present, more intelligent devices with voice recognition function are provided, and with the progress of technology and the improvement of popularity, users increasingly accept and accept the voice interaction mode. Along with the continuous improvement of voice interaction technology and artificial intelligence, intelligent equipment with voice recognition function accelerates to the aspects of intelligent sound box, intelligent household appliances, mobile terminal equipment and the like.

Fig. 1 is a schematic diagram of a speech recognition process provided in the prior art, as shown in fig. 1: the intelligent device with the voice recognition function collects voice information input by a user, the intelligent device carries out voice understanding on the voice information, and first text information corresponding to the voice information is determined. The intelligent device sends the first text information to a server, the server determines first semantic information corresponding to the first text information, determines corresponding first recommendation information through a pre-trained recommendation model based on the first semantic information, sends the first recommendation information to the intelligent device, and the intelligent device executes the first recommendation information to respond to voice information of a user.

Because the server is required to have certain calculation power and capacity when determining the first semantic information corresponding to the first text information and determining the corresponding first recommendation information, the cloud server can efficiently and accurately perform semantic analysis under the condition of good network conditions, most of factories currently upload voice information acquired by intelligent equipment to the cloud server, and the cloud server only adopts a local server for determining the recommendation information corresponding to the voice information.

However, whether the cloud server determines the recommended information or the local server, determining the recommended information is mainly performed based on the first text information corresponding to the voice information, for example, whether the voice information of "watch cartoon" is inputted by voice of children or middle-aged people, and the user is mainly recommended to the cartoon based on the first text information "watch cartoon". However, in the actual use process, the types of the cartoon sheets which the middle-aged and the children want to watch are different, for example, the children want to watch popular cartoon sheets such as happy sheep and Hui Tai Lang, and the middle-aged people want to watch cartoon sheets such as Hui Wa which are more reminiscent. Therefore, the existing method for determining the first recommended information corresponding to the voice information mainly based on the first text information corresponding to the voice information has the problem that the accuracy of the determined first recommended information is low and the user experience is low.

Disclosure of Invention

The application provides a voice recognition method, a voice recognition device, a voice recognition system, voice recognition equipment and voice recognition media, which are used for solving the problems of low accuracy and low user experience when corresponding recommendation information is determined in the prior art.

In a first aspect, the present application provides a method for speech recognition, the method comprising:

Acquiring first text information corresponding to voice information to be recognized and acquired by intelligent equipment, and first user attribute information corresponding to voiceprint features of the voice information;

determining first semantic information corresponding to the first text information; and determining corresponding first recommendation information through a pre-trained recommendation model based on the first semantic information and the first user attribute information, and controlling the intelligent equipment to execute the first recommendation information.

In a second aspect, the present application further provides a method for speech recognition, the method comprising:

collecting voice information and determining first text information corresponding to the voice information;

acquiring first voiceprint features corresponding to the voice information, and determining first user attribute information corresponding to the first voiceprint features according to the corresponding relation between the prestored voiceprint features and the user attribute information;

transmitting the first text information and the first user attribute information to a server;

and receiving first recommendation information sent by the server, and pushing and displaying corresponding content according to the first recommendation information.

In a third aspect, the present application further provides a speech recognition apparatus, the apparatus comprising:

The intelligent equipment comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring first text information corresponding to voice information to be recognized and acquired by the intelligent equipment, and first user attribute information corresponding to voiceprint features of the voice information;

the first determining module is used for determining first semantic information corresponding to the first text information; and determining corresponding first recommendation information through a pre-trained recommendation model based on the first semantic information and the first user attribute information, and controlling the intelligent equipment to execute the first recommendation information.

In a fourth aspect, the present application further provides a speech recognition apparatus, the apparatus comprising:

the acquisition module is used for acquiring voice information and determining first text information corresponding to the voice information;

the second determining module is used for acquiring first voiceprint features corresponding to the voice information and determining first user attribute information corresponding to the first voiceprint features according to the corresponding relation between the prestored voiceprint features and the user attribute information;

the first sending module is used for sending the first text information and the first user attribute information to a server;

and the receiving module is used for receiving the first recommendation information sent by the server and pushing and displaying the corresponding content according to the first recommendation information.

In a fifth aspect, the present application further provides a voice recognition system, where the system includes any one of the voice recognition devices applied to a server and a voice recognition device applied to an intelligent device.

In a sixth aspect, the present application provides an electronic device comprising at least a processor and a memory, the processor being configured to implement the steps of any of the above-described speech recognition methods when executing a computer program stored in the memory.

In a seventh aspect, the present application provides a computer readable storage medium storing a computer program which when executed by a processor performs the steps of any of the above-described speech recognition methods.

According to the method and the device, the first text information corresponding to the voice information to be recognized and the first user attribute information corresponding to the voiceprint feature of the voice information can be obtained, the first recommendation information corresponding to the voice information can be determined through the recommendation model which is trained in advance based on the first semantic information corresponding to the first text information and the first user attribute information, and because the first recommendation information corresponding to the voice information determined by the method and the device is comprehensively determined based on the first semantic information and the first user attribute information, the accuracy of determining the first recommendation information is higher, the actual demands of users are met, and the user experience is improved.

Drawings

FIG. 1 is a schematic diagram of a speech recognition process according to the prior art;

FIG. 2 is a schematic diagram of a first speech recognition process according to some embodiments of the present application;

FIG. 3 is a schematic diagram of a speech understanding model and semantic understanding model training process provided in some embodiments of the present application;

FIG. 4 is a schematic diagram of a second speech recognition process according to some embodiments of the present application;

FIG. 5 is a schematic diagram of a third speech recognition process according to some embodiments of the present application;

FIG. 6 is a schematic illustration of a speech recognition device according to some embodiments of the present application;

FIG. 7 is a schematic illustration of another speech recognition device according to some embodiments of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to some embodiments of the present application;

FIG. 9 is a schematic diagram of another electronic device according to some embodiments of the present application;

fig. 10 is a schematic diagram of a speech recognition system according to some embodiments of the present application.

Detailed Description

In order to improve accuracy of determined recommended information and improve user experience, the embodiment of the application provides a voice recognition method, device, system, equipment and medium.

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, wherein it is apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In the practical application process, the intelligent device can acquire voice information input by a user, understand the voice information, determine first text information corresponding to the voice information, in the application, the intelligent device can also acquire first voiceprint features corresponding to the voice information, determine first user attribute information corresponding to the first voiceprint features according to the corresponding relation between the prestored voiceprint features and the user attribute information, send the first text information and the first user attribute information to the server, and after receiving the first text information and the first user attribute information, the server can determine first semantic information corresponding to the first text information, input the first semantic information and the first user attribute information into a pre-trained recommendation model, determine corresponding first recommendation information, send the first recommendation information to the intelligent device, receive the first recommendation information, push and display corresponding content according to the first recommendation information, and respond to the voice information of the user.

Fig. 2 is a schematic diagram of a first speech recognition process according to some embodiments of the present application, where the process includes the following steps:

S201: and acquiring first text information corresponding to voice information to be recognized and acquired by the intelligent equipment, and first user attribute information corresponding to voiceprint features of the voice information.

In the present application, the speech recognition method is applied to a server. In a possible implementation manner, the first text information corresponding to the voice information to be identified, which is acquired by the intelligent device and acquired by the server, may be the first text information corresponding to the voice information sent by the receiving intelligent device. In a possible implementation manner, after the intelligent device collects the voice information, the first text information corresponding to the voice information can be determined through a voice understanding model stored in the intelligent device, and the method for determining the first text information corresponding to the voice information by the intelligent device is not limited and can be flexibly set according to requirements.

In one possible implementation, a server receives first user attribute information corresponding to voiceprint features of voice information sent by an intelligent device. Specifically, the intelligent device can acquire first voiceprint features corresponding to the voice information through the existing voiceprint recognition mode, and determine first user attribute information corresponding to the first voiceprint features according to the corresponding relation between the prestored voiceprint features and the user attribute information, so that the first user attribute information is sent to the server.

In a possible implementation manner, the first user attribute information may include age, gender, hobbies, behavior habits, and the like of the user, and may be flexibly set according to requirements.

S202: determining first semantic information corresponding to the first text information; and determining corresponding first recommendation information through a pre-trained recommendation model based on the first semantic information and the first user attribute information, and controlling the intelligent equipment to execute the first recommendation information.

In order to determine the recommended information corresponding to the voice information, in the present application, the server may determine the first semantic information corresponding to the first text information, and in particular, may determine the first semantic information corresponding to the first text information through a semantic understanding model stored in the server itself. The process of determining the first semantic information corresponding to the first text information through the semantic understanding model is the prior art, and is not described herein.

In order to improve accuracy of the determined recommendation information, in the application, the server itself may store a pre-trained recommendation model, and further may determine first recommendation information corresponding to the voice information through the pre-trained recommendation model. Specifically, the first semantic information and the first user attribute information may be input into a recommendation model that is trained in advance, so that the first recommendation information corresponding to the voice information is determined through the recommendation model.

In order to control the intelligent device to execute the first recommendation information, the server can send the first recommendation information to the intelligent device, and after the intelligent device receives the first recommendation information sent by the server, the intelligent device can push and display corresponding content according to the first recommendation information.

According to the method and the device, the first text information corresponding to the voice information to be recognized and the first user attribute information corresponding to the voiceprint feature of the voice information can be obtained, the first recommendation information corresponding to the voice information can be determined through the recommendation model which is trained in advance based on the first semantic information corresponding to the first text information and the first user attribute information, and because the first recommendation information corresponding to the voice information determined by the method and the device is comprehensively determined based on the first semantic information and the first user attribute information, the accuracy of the first recommendation information corresponding to the determined voice information is higher, the actual requirements of users are met, and the user experience is improved.

In order to improve accuracy of the determined recommendation information, in the present application, before determining, by pre-training a completed recommendation model, the corresponding first recommendation information based on the first semantic information and the first user attribute information, the method further includes:

Acquiring first acquisition time information of the voice information and first position information of the intelligent equipment;

the determining, based on the first semantic information and the first user attribute information, corresponding first recommendation information through a recommendation model that is trained in advance includes:

and determining corresponding first recommendation information through the recommendation model based on the first semantic information, the first user attribute information, the first acquisition time information and the first position information.

In one possible implementation manner, the first collection time information of the voice information and the first location information where the intelligent device is located, which are acquired by the server, may be the first collection time information and the first location information that are sent to the server by the intelligent device. The intelligent device can acquire first acquisition time information of the acquired voice information and first position information of the intelligent device, and the intelligent device can acquire the first acquisition time information based on java through the Calendar and the like in the Util package, and specifically, the first acquisition time information is acquired as the prior art and is not repeated herein. The intelligent device can acquire the first position information of the intelligent device itself through a GPS module of the intelligent device itself, and specifically, the first position information of the intelligent device itself is acquired as in the prior art, which is not described herein.

In order to improve the accuracy of the determined recommended information, in one possible implementation, the smart device may send the first text information, the first user attribute information, the first acquisition time information, and the first correspondence of the first location information to the server. The server can input the first semantic information, the first user attribute information, the first acquisition time information and the first position information corresponding to the first text information into a pre-trained recommendation model stored by the server, so that the first recommendation information corresponding to the voice information can be determined through the recommendation model.

For ease of understanding, the following describes in detail the recommended procedure provided by some embodiments of the present application by way of one specific embodiment.

In a possible implementation manner, a user inputs voice information of 'i want to eat', the intelligent device collects the voice information, can determine first text information corresponding to the voice information, obtains first voiceprint features corresponding to the voice information, and determines first user attribute information corresponding to the first voiceprint features according to a corresponding relation between the voiceprint features and the user attribute information, wherein the corresponding relation is stored in advance. The intelligent device acquires first acquisition time information of the voice information and first position information of the intelligent device, and sends first text information, first user attribute information, first acquisition time information and a first corresponding relation of the first position information to the server. The server determines first semantic information corresponding to the first text information, and inputs the first semantic information, the first user attribute information, the first acquisition time information and the first position information into a pre-trained recommendation model, wherein the recommendation model can determine that the corresponding first recommendation information is specifically a restaurant suitable for breakfast, dinner, a restaurant closest to the first position information content, a restaurant suitable for men, a restaurant suitable for women, a restaurant suitable for Chinese, a restaurant suitable for Americans, a restaurant suitable for high-income people, a restaurant suitable for low-income people, a restaurant suitable for Shandong people taste, a restaurant suitable for people to return, a restaurant suitable for children, a restaurant suitable for the elderly, and the like according to the first semantic information, the first user attribute information, the first acquisition time information and the first position information.

Because the first recommendation information corresponding to the determined voice information can be comprehensively determined based on the first semantic information, the first user attribute information, the first acquisition time information and the first position information, the accuracy of the first recommendation information corresponding to the determined voice information is higher, the actual requirements of users are met, and the user experience is improved.

In order to improve accuracy of the determined recommendation information, on the basis of the above embodiments, in the present application, the process of training the recommendation model includes:

any sample information in a sample set and second recommendation information corresponding to the sample information are acquired, wherein the sample information comprises second semantic information, second user attribute information, second acquisition time information and second position information;

determining third recommendation information corresponding to the sample information through an original recommendation model;

and training the original recommendation model according to the second recommendation information and the third recommendation information.

In order to improve accuracy of the determined recommended information, in the application, a sample set includes a plurality of sample information, each sample information includes second semantic information, second user attribute information, second acquisition time information and second position information, and each sample information corresponds to the second recommended information.

When training the original recommendation model, any sample information in the sample set can be acquired, and the sample information corresponds to the second recommendation information. And inputting any acquired sample information into an original recommendation model, and acquiring third recommendation information corresponding to the sample information through the original recommendation model.

In a specific implementation, after the third recommendation information corresponding to the input sample information is determined, because the second recommendation information corresponding to the sample information is stored in advance, whether the identification result of the recommendation model is accurate or not can be determined according to whether the second recommendation information is consistent with the third recommendation information or not. In specific implementation, if the identification results of the recommendation model are inconsistent, the parameters of the recommendation model need to be adjusted to train the recommendation model.

In specific implementation, when the parameters in the recommended model are adjusted, a gradient descent algorithm may be adopted to counter-propagate the gradient of the parameters of the recommended model, so as to train the recommended model.

In one possible implementation, the above operation may be performed on each sample information in the sample set, and when a preset convergence condition is satisfied, it is determined that the training of the recommendation model is completed.

The meeting of the preset convergence condition may be that the number of the sample information correctly identified is greater than a set number through the original recommended model, or the number of iterations of training the recommended model reaches a set maximum number of iterations, etc. The implementation may be flexibly set, and is not particularly limited herein.

In one possible implementation manner, when the original recommended model is trained, sample information in the sample set can be divided into training sample information and test sample information, the original recommended model is trained based on the training sample information, and then the reliability degree of the trained recommended model is verified based on the test sample information.

In addition, in the application, the prior art may be used to train the speech understanding model and the semantic understanding model, specifically, when training the original speech understanding model, any sample speech information in the sample set may be obtained, the sample speech information corresponds to the first sample text information, the obtained any sample speech information is input into the original speech understanding model, and the identification text information corresponding to the sample speech information is obtained through the original speech understanding model. In a specific implementation, after the first text sample information corresponding to the input sample voice information is determined, because the first text sample information corresponding to the sample voice information is stored in advance, whether the recognition result of the voice understanding model is accurate or not can be determined according to whether the first text sample information is consistent with the recognition text information or not. In the specific implementation, if the recognition results of the speech understanding models are inconsistent, the parameters of the speech understanding models need to be adjusted, so that the speech understanding models are trained.

In addition, when the original semantic understanding model is trained, any second sample text information in the sample set can be obtained, the second sample text information corresponds to sample semantic information, any obtained second sample text information is input into the original semantic understanding model, and identification semantic information corresponding to the second sample text information is obtained through the original semantic understanding model. In the implementation, after the sample semantic information corresponding to the input second sample text information is determined, because the sample semantic information corresponding to the second sample text information is stored in advance, whether the recognition result of the semantic understanding model is accurate or not can be determined according to whether the sample semantic information is consistent with the recognition semantic information or not. In the implementation, if the recognition results of the semantic understanding models are inconsistent, the parameters of the semantic understanding models need to be adjusted, so that the semantic understanding models are trained.

It should be noted that, fig. 3 is a schematic diagram of a training process of a speech understanding model and a semantic understanding model provided in some embodiments of the present application, and as shown in fig. 3, when training an original semantic understanding model, second sample text information in a sample set may be an output result of the speech understanding model, that is, recognition text information. In one possible implementation, when training the original semantic understanding model, the second sample text information in the sample set includes the identification text information output by the speech understanding model, and the text information output by a device such as a PC or the like with manual labeling. The identification text information can occupy a larger proportion of the second sample text information, can occupy a smaller proportion, can be flexibly set according to requirements, and can occupy a larger proportion in general.

In order to obtain the first text information corresponding to the voice information, based on the above embodiments, in this application, the obtaining the first text information corresponding to the voice information to be identified, which is collected by the intelligent device, includes:

receiving voice information sent by the intelligent equipment;

and determining first text information corresponding to the voice information according to the voice information.

In a possible implementation manner, after the intelligent device collects the voice information, the voice information can be sent to the server, the server receives the voice information sent by the intelligent device and can determine the first text information corresponding to the voice information according to the received voice information, specifically, the server can determine the first text information corresponding to the voice information through a voice understanding model stored in the server, the method for determining the first text information corresponding to the voice information by the server is not limited, and the method can be flexibly set according to requirements.

In order to obtain first user attribute information corresponding to a voiceprint feature of voice information, in the present application, on the basis of the foregoing embodiments, obtaining first user attribute information corresponding to a voiceprint feature of voice information includes:

Acquiring a first voiceprint feature corresponding to the voice information according to the received voice information sent by the intelligent equipment;

and determining first user attribute information corresponding to the first voiceprint feature according to the corresponding relation between the prestored voiceprint feature and the user attribute information.

In a possible implementation manner, in order to improve accuracy of the determined recommended information, after receiving the voice information sent by the intelligent device, the server may obtain, according to the received voice information, a first voiceprint feature corresponding to the voice information in an existing voiceprint recognition manner. In order to determine the first user attribute information corresponding to the first voiceprint feature, the server may store in advance a correspondence between the voiceprint feature and the user attribute information, and after acquiring the first voiceprint feature corresponding to the voice information, determine the first user attribute information corresponding to the first voiceprint feature according to the correspondence between the prestored voiceprint feature and the user attribute information.

In order to improve accuracy of the determined recommendation information, in the embodiment of the present application, the first user attribute information includes: at least one of age of the user, sex of the user, nationality of the user, income level of the user, region of the user, work unit of the user, family member constitution of the user, and time of preparation of the user to fall asleep.

In an actual application scenario, first recommendation information corresponding to the voice information may be determined based on the first semantic information and the first user attribute information. Generally, the more the first user attribute information is, the more the determined first recommendation information may be in line with the actual requirement of the user based on the first semantic information and the first user attribute information. In order to improve accuracy of the determined recommendation information, in the present application, the first user attribute information may include: at least one of age of the user, sex of the user, nationality of the user, income level of the user, region where the user is located, work unit of the user, family member constitution of the user, and time when the user is ready to fall asleep, for example, the first user attribute information may be: 20 years old, men, china, etc.

In one possible implementation, the age of the user may be: 5 years old, 15 years old, 30 years old, 45 years old, 65 years old, etc., may also be: infants, teenagers, young, middle-aged, elderly, etc. The gender of the user may be: male, female, etc. The nationality of the user can be: china, korea, united states, etc. The revenue level for the user may be: specific values of annual or monthly revenue for the user may also be: low revenue, medium revenue, high revenue, etc. The region in which the user is located may include region level, county level information, etc. in which the user is located. The user's unit of work may be: the specific unit name of the user can also be enterprise categories such as national enterprises, collective enterprises, affiliated enterprises, stock-making enterprises, private enterprises, individual households, partner enterprises, finite responsibility companies, stock finite companies and the like. The family members of the user may be configured to: 2. the number of family members of users 3, 5, etc. can also be family children, family old people, etc. The user's ready to fall asleep time may be: specific times such as 22:00 and 0:00 can be early sleep, late sleep and the like. The specific content of the first user attribute information is not particularly limited, and the first user attribute information can be flexibly set according to requirements.

The first user attribute information may be considered as a universally unique identification code (Universally Unique Identifier, UUID) corresponding to the user. In order to prevent leakage of the first user attribute information, the first user attribute information may be encoded to generate a corresponding user image, so as to prevent the first user attribute information from being intercepted by an attacker, and thus the content of the first user attribute information from being leaked. In general, in order to secure the first user attribute information, the first user attribute information may not include important information such as a name, an identification number (Identity document, ID) and the like of the user.

For ease of understanding, the following detailed description of the recommended procedures provided in some embodiments of the present application is provided by two specific examples.

In a possible implementation manner, a user inputs voice information of 'i want to watch an animation', the intelligent device collects the voice information, can determine first text information corresponding to the voice information, obtains first voiceprint features corresponding to the voice information, and determines first user attribute information corresponding to the first voiceprint features according to a corresponding relation between the voiceprint features and the user attribute information, wherein the corresponding relation is stored in advance. The intelligent device sends the first text information and the first user attribute information to a server. The server determines first semantic information corresponding to the first text information, inputs the first semantic information and the first user attribute information into a pre-trained recommendation model, and the recommendation model can determine that the corresponding first recommendation information is specifically "happy sheep and Hui Tai Lang" or "calabash baby" and the like according to the first semantic information and the first user attribute information.

In a possible implementation manner, a user inputs voice information of 'i want to buy skin care products', the intelligent device collects the voice information, can determine first text information corresponding to the voice information, obtains first voiceprint features corresponding to the voice information, and determines first user attribute information corresponding to the first voiceprint features according to a corresponding relation between the voiceprint features and the user attribute information, which is stored in advance. The intelligent device sends the first text information and the first user attribute information to a server. The server determines first semantic information corresponding to the first text information, and inputs the first semantic information and the first user attribute information into a pre-trained recommendation model, wherein the recommendation model can determine that the corresponding first recommendation information is specifically skin care products suitable for children, skin care products suitable for old people, skin care products suitable for men, skin care products suitable for girls, skin care products suitable for Chinese physique, skin care products suitable for American physique, skin care products suitable for high-income people, skin care products suitable for low-income people, skin care products suitable for Hainan people, skin care products suitable for Shaanxi people, skin care products suitable for night-stay people and the like according to the first semantic information and the first user attribute information.

Because the first recommendation information corresponding to the voice information is determined comprehensively based on the first semantic information and the first user attribute information, the accuracy of the determined first recommendation information is higher, the actual requirements of users are met, and the user experience is improved.

Based on the foregoing embodiments, fig. 4 is a schematic diagram of a second speech recognition process according to some embodiments of the present application, as shown in fig. 4, where the method includes:

s401: and collecting voice information and determining first text information corresponding to the voice information.

In the application, the voice recognition method is applied to intelligent equipment, wherein the intelligent equipment can be equipment such as a PC (personal computer), a mobile terminal, an intelligent sound equipment, a vehicle-mounted equipment and the like, and also can be an edge server and the like, and the edge server can be a security gateway.

The intelligent device can collect voice information, specifically, the prior art can be adopted to collect voice information, and details are not repeated here. In a possible implementation manner, after the intelligent device collects the voice information, the first text information corresponding to the voice information can be determined through a voice understanding model stored in the intelligent device, and the method for determining the first text information corresponding to the voice information by the intelligent device is not limited and can be flexibly set according to requirements.

S402: and acquiring a first voiceprint feature corresponding to the voice information, and determining first user attribute information corresponding to the first voiceprint feature according to the corresponding relation between the prestored voiceprint feature and the user attribute information.

In a possible implementation manner, in order to improve accuracy of the determined recommended information, after the intelligent device collects the voice information, the first voiceprint feature corresponding to the voice information may be obtained through an existing voiceprint recognition mode. In order to determine the first user attribute information corresponding to the first voiceprint feature, the intelligent device may store in advance a correspondence between the voiceprint feature and the user attribute information, and after acquiring the first voiceprint feature corresponding to the voice information, determine the first user attribute information corresponding to the first voiceprint feature according to the correspondence between the prestored voiceprint feature and the user attribute information.

S403: and sending the first text information and the first user attribute information to a server.

To improve the accuracy of the determined recommendation information, the smart device may send the first text information and the first user attribute information to the server.

In order to determine the recommended information corresponding to the voice information, in the present application, the server may determine the first semantic information corresponding to the first text information, and in particular, may determine the first semantic information corresponding to the first text information through a semantic understanding model stored in the server itself.

In order to improve accuracy of the determined recommendation information, in the application, the server itself may store a pre-trained recommendation model, and further may determine first recommendation information corresponding to the voice information through the pre-trained recommendation model. Specifically, the recommendation model may determine first recommendation information corresponding to the voice information based on the first semantic information and the first user attribute information.

S404: and receiving first recommendation information sent by the server, and pushing and displaying corresponding content according to the first recommendation information.

In order to control the intelligent device to execute the first recommendation information, the server can send the first recommendation information to the intelligent device, and after the intelligent device receives the first recommendation information sent by the server, the intelligent device can push and display corresponding content according to the first recommendation information. Specifically, the intelligent device may push and display the content corresponding to the first recommendation information in the intelligent device, so as to respond to the voice information of the user.

Because the first recommendation information corresponding to the determined voice information is comprehensively determined based on the first semantic information and the first user attribute information, the accuracy of the first recommendation information corresponding to the determined voice information is higher, the actual requirements of users are met, and the user experience is improved.

In order to improve accuracy of the determined recommended information, in the embodiments of the present application, before the first text information and the first user attribute information are sent to the server, the method further includes:

the sending the first text information and the first user attribute information to a server includes:

and sending the first text information, the first user attribute information, the first acquisition time information and the first corresponding relation of the first position information to the server.

In one possible implementation, the smart device may learn the first acquisition time information of the acquired voice information, and may determine the first location information where the smart device itself is located using the prior art.

In order to improve the accuracy of the determined recommended information, in one possible implementation, the smart device may send the first text information, the first user attribute information, the first acquisition time information, and the first correspondence of the first location information to the server. The server can determine the first recommendation information corresponding to the voice information based on the first semantic information, the first user attribute information, the first acquisition time information and the first position information corresponding to the first text information through a pre-trained recommendation model stored by the server.

In order to ensure the security of the first correspondence content, in the embodiments of the present application, before the first text information, the first user attribute information, the first acquisition time information, and the first correspondence of the first location information are sent to the server, the method further includes:

sending an assistance request to at least one other device that has previously established a connection;

receiving third attribute information and third position information sent by other equipment;

aiming at the sub-attribute information content corresponding to each piece of sub-attribute information contained in the first user attribute information and the third user attribute information, determining the sub-attribute information content corresponding to each piece of sub-attribute information contained in each piece of fourth user attribute information in a differential privacy mode;

Determining each fourth position information by adopting a differential privacy mode aiming at the contents of the first position information and the third position information;

the sending the first text information, the first user attribute information, the first acquisition time information, and the first correspondence of the first location information to the server includes:

and transmitting the first text information, the first user attribute information, the first acquisition time information, the first corresponding relation of the first position information and the identification information of the first corresponding relation, the first text information, the third user attribute information, the second corresponding relation of the first acquisition time information and the third position information and the identification information of the second corresponding relation, and the third text information, the fourth user attribute information, the first acquisition time information, the third corresponding relation of the fourth position information and the identification information of the third corresponding relation to the server.

In one possible implementation manner, when the server is a cloud server, in order to prevent the first correspondence content from being leaked in a transmission process between the intelligent device and the cloud server, the intelligent device may be provided with a security protection module, and the security protection module may be disposed inside the intelligent device, may also be disposed outside the intelligent device, connected to the intelligent device through a network, and so on. For example, since the security protection module needs to have a certain computing power, when the smart device is a device having a certain computing power such as a vehicle-mounted device, the security protection module may be disposed inside the smart device. In addition, in the smart home scene, the edge server has a certain calculation force, and the security protection module can also be arranged in the edge server.

In order to ensure the security of the first correspondence content, in one possible implementation manner, before the first text information, the first user attribute information, the first acquisition time information, and the first correspondence of the first location information are sent to the server, the security protection module corresponding to the intelligent device may send an assistance request to at least one other device that establishes a connection in advance. After the other devices receive the assistance request, the third user attribute information corresponding to the other devices and the third position information where the other devices are located can be sent to the security protection module corresponding to the intelligent device, and the security protection module corresponding to the intelligent device receives the third user attribute information and the third position information sent by the other devices.

In order to ensure the security of the first user attribute information in the first corresponding relationship, aiming at the sub attribute information content corresponding to each sub attribute information contained in the first user attribute information and the third user attribute information, a differential privacy mode is adopted to determine the sub attribute information content corresponding to each sub attribute information contained in each fourth user attribute information.

The process of determining the content of the sub-attribute information corresponding to each sub-attribute information included in each fourth user attribute information by using the differential privacy method is described below by taking the sex of the sub-attribute information as an example.

For convenience of description, the sub attribute information of the sex of the user is represented by X _i The sex of the user is indicated as "male" a ₁ The sex of the user is indicated as "female" and "a ₂ And (3) representing. The sub-attribute information content corresponding to the sub-attribute information, namely the gender of the user, contained in the first user attribute information and the third user attribute information is a ₁ The probability of (2) is P _real The content of the sub-attribute information corresponding to the sub-attribute information is a ₂ The probability of (1) is 1-P _real . When determining the sub-attribute information content corresponding to the sub-attribute information of the gender of the user contained in each fourth user attribute information based on the differential privacy mode, assume that the sub-attribute information content is a ₁ The probability of the sub-attribute information corresponding to the sub-attribute information is p, and the sub-attribute information content corresponding to the sub-attribute information is a ₂ The probability of (1) is 1-p.

The sub-attribute information content corresponding to the sub-attribute information of the gender of the user in the first user attribute information, the third user attribute information and the fourth user attribute information is a ₁ The probability of (2) is:

P(X _i ＝a ₁ )＝P _real *p+(1-P _real )*(1-p)；

the sub-attribute information content corresponding to the sub-attribute information of the gender of the user in the first user attribute information, the third user attribute information and the fourth user attribute information is a ₂ The probability of (2) is:

P(X _i ＝a ₂ )＝P _real *p+(1-P _real )*(1-p)。

in determining sub-attribute information content corresponding to sub-attribute information of the sex of the user contained in each fourth user attribute information based on the differential privacy mode, the first user attribute information and the third user attribute information are made to belong to The probability of the sub-attribute information content corresponding to each sub-attribute information included in the property information and the fourth user attribute information is closer to the true value (P _real ) The usability of the data is improved, and likelihood functions can be constructed through the statistical results so as to perform unbiased estimation correction:

the likelihood function is:where n is the total number of the first user attribute information, the third user attribute information, and the fourth user attribute information, n ₁ The sub-attribute information content corresponding to the sub-attribute information of the gender of the user in the first user attribute information, the third user attribute information and the fourth user attribute information is a ₁ Sub-amounts of user attribute information of (a).

After taking the logarithm of the likelihood and taking the derivative, the sub-attribute information content corresponding to the sub-attribute information of the gender of the user in the first user attribute information, the third user attribute information and the fourth user attribute information is a ₁ Maximum probability of (2)Wherein +.about.can be adjusted by adjusting the size of p>Is of a size of (a) and (b). In particularThe size of (3) can be flexibly set according to the requirement. In general, a->And P _real The closer the data is, the more available the data is.

Specifically, the size of p is associated with the size of the differential privacy budget ε and the confidence interval variable ω, i.e., the size of p can be adjusted by adjusting the sizes of the differential privacy budget ε and the confidence interval variable ω. In one possible embodiment, between ε, ω and p The relation of (2) is:

specifically, the existing differential privacy manner may be adopted to determine the content of the sub-attribute information corresponding to each piece of sub-attribute information included in each piece of fourth user attribute information, which is not described herein again. For example, the most typical centralized differential privacy algorithm available today can be employed: the laplace algorithm (Laplace algorithm) adds fourth user attribute information conforming to the laplace distribution. The differential privacy mode is insensitive to the change of a specific certain information, so that the privacy leakage risk of one information due to the fact that the information is added into a data set is controlled to be within a minimum and acceptable range, and an attacker cannot acquire accurate individual information by observing the information content.

Because the differential privacy mode has strict reasoning and proving privacy guarantee, the differential privacy mode becomes a method which is commonly applied in the disturbance of the position information. It can be appreciated that, for the content of the first location information and the third location information, a differential privacy manner is adopted, and at least one fourth location information is the same as the content of the first location information in the determined fourth location information. Therefore, under the condition that no assumption about the background knowledge of the attacker exists, the position information can be indistinguishable in a differential privacy mode, so that even if the attacker intercepts the position information, the attacker cannot distinguish the real first position information from the first user attribute information corresponding to the first position information, and the like. Specifically, in order to ensure the security of the first location information, the above differential privacy manner may be adopted for determining each fourth location information with respect to the content of the first location information and the third location information, which is not described herein again.

In order to ensure the security of the first correspondence content, when the first text information, the first user attribute information, the first acquisition time information, the first correspondence of the first location information, and the identification information of the first correspondence are transmitted to the server, the first text information, the third user attribute information, the first acquisition time information, the second correspondence of the third location information, and the identification information of the second correspondence, and the first text information, the fourth user attribute information, the first acquisition time information, the third correspondence of the fourth location information, and the identification information of the third correspondence may be simultaneously transmitted to the server.

Specifically, in one possible implementation manner, in order to ensure the security of the first text information and facilitate the server to recommend the first recommendation information for each corresponding relationship, the text information content in the second corresponding relationship and the third corresponding relationship may use the first text information. For the same reason, the first acquisition time information may also be used for the acquisition time information in the second correspondence and the third correspondence.

The first text information, the first user attribute information, the first acquisition time information, the first corresponding relation of the first position information and the identification information of the first corresponding relation can be sent to the server, and even if the information of the plurality of corresponding relations is intercepted by an attacker in the process of being sent to the server, the information comprises the plurality of corresponding relations, and the content of the plurality of corresponding relations is subjected to security protection in a differential mode, so that the attacker cannot determine which corresponding relation is the first corresponding relation, thereby ensuring the security of the content of the first corresponding relation.

In order to respond to the voice information of the user, on the basis of the above embodiments, in the present application, the receiving the first recommendation information sent by the server includes:

receiving first recommendation information recommended by the corresponding relation of each piece of identification information, wherein the first recommendation information is sent by the server;

the pushing and displaying the corresponding content according to the first recommendation information comprises the following steps:

and determining first recommendation information corresponding to the first corresponding relation according to the identification information of the first corresponding relation, and pushing and displaying corresponding content according to the first recommendation information.

In one possible implementation manner, after receiving the first correspondence, the second correspondence, and the third correspondence, the server recommends corresponding first recommendation information for each of the first correspondence, the second correspondence, and the third correspondence. After determining the first recommendation information corresponding to each corresponding relation, the server can send each first recommendation information to the intelligent device.

In one possible implementation manner, in order to prevent leakage of the first recommended information, the server may encrypt each piece of first recommended information with a preset encryption algorithm and send the encrypted piece of first recommended information to the intelligent device.

For convenience of description, the corresponding relationship of all the identification information and the corresponding first recommendation information are represented by Res', the first recommendation information corresponding to the first corresponding relationship is represented by Res, and the first recommendation information corresponding to the second corresponding relationship and the third corresponding relationship is represented by S. The server can encrypt Res 'by adopting a preset encryption algorithm and then send the encrypted Res' to the intelligent device.

In one possible implementation manner, in order to prevent the leakage of the first recommended information, the server may send the Res', the key (cipher_key) corresponding to the encryption algorithm, the company name (company) where the server is located, and the recommended task (task) corresponding to the current voice information to the smart device. For example, the recommended task may be the same as the content of the first text information, for example, when the first text information is "watch cartoon", the recommended task may be "watch cartoon", and may be flexibly set according to the requirement.

After receiving the decrypted Res ', cipher_key, company and task, a security protection module in the intelligent device can decrypt by adopting a preset decryption algorithm, and after checking that the information of the company and the task is correct, the Res ' can be considered to be safe and reliable, and then the first recommended information S corresponding to the non-first corresponding relation in the Res ' is removed, so that the Res is obtained. In a possible implementation manner, the first recommendation information (Res) corresponding to the first correspondence may be determined according to the identification information of the first correspondence, so that the intelligent device may push and display the content corresponding to the first recommendation information according to the first recommendation information corresponding to the first correspondence, so as to respond to the voice information of the user.

In the application, even when the server sends the encrypted Res ', the cipher_key, the company and the task to the intelligent device, after the encrypted Res ', the cipher_key, the company and the task are intercepted and decrypted by an attacker, the attacker cannot accurately locate the real first corresponding relationship and the corresponding first recommended information because the Res ' comprises the corresponding relationship of the plurality of identification information and the corresponding first recommended information, and the content of the corresponding relationship of the plurality of identification information is safely protected in a differential privacy mode.

Fig. 5 is a schematic diagram of a third speech recognition process according to some embodiments of the present application, as shown in fig. 5:

firstly, a voice acquisition module in the intelligent equipment acquires voice information of a user and determines first text information (content) corresponding to the voice information. And acquiring a first voiceprint feature corresponding to the voice information, and determining first user attribute information corresponding to the first voiceprint feature according to the corresponding relation between the prestored voiceprint feature and the user attribute information. And acquiring first acquisition time information (time) of the voice information and first location information (location) of the intelligent equipment.

The first user attribute information may include an age (age) of the user, a gender (sex) of the user, a nationality (address) of the user, an income level (income_level) of the user, a region (region) where the user is located, a work unit (company) of the user, and the like. The first user attribute information may be considered as a universally unique identification code (Universally Unique Identifier, UUID) corresponding to the user. In order to prevent leakage of the first user attribute information, the first user attribute information may be encoded to generate a corresponding user image, so as to prevent the first user attribute information from being intercepted by an attacker, and thus the content of the first user attribute information from being leaked.

It should be noted that, although the first user attribute information may be encoded to generate a corresponding user image, so as to prevent the first user attribute information from being intercepted by an attacker and the content of the first user attribute information from being compromised, the encoding of the UUID only ensures that the data is securely transmitted to the server, but cannot prevent the server provider from divulging the first user attribute information. Illustratively, internet megahead server providers, apple and Amazon, etc., each have examples of manual labeling by employing temporary personnel for improved speech recognition. Some temporary personnel disclose and reveal private data information such as user attribute information heard by themselves. Therefore, in the present application, the following steps may be adopted to provide security protection for the intelligent device at the user side. Therefore, the server provider can only provide services for the whole first corresponding relation, the second corresponding relation and the third corresponding relation, and cannot accurately know which corresponding relation is the first corresponding relation.

Secondly, a security protection module in the intelligent device sends an assistance request to at least one other device which establishes a connection in advance; receiving third attribute information and third position information sent by other equipment; aiming at the sub-attribute information content corresponding to each piece of sub-attribute information contained in the first user attribute information and the third user attribute information, determining the sub-attribute information content corresponding to each piece of sub-attribute information contained in each piece of fourth user attribute information in a differential privacy mode; and determining each fourth position information by adopting a differential privacy mode aiming at the content of the third position information.

Third, the security protection module in the intelligent device sends the first text information, the first user attribute information, the first collection time information, the first corresponding relation of the first position information and the identification information of the first corresponding relation, the first text information, the third user attribute information, the first collection time information, the second corresponding relation of the third position information and the identification information of the second corresponding relation, and the first text information, the fourth user attribute information, the first collection time information, the third corresponding relation of the fourth position information and the identification information of the third corresponding relation to the server together.

In one possible implementation manner, in order to ensure the security of the first correspondence and the identification information of the first correspondence, the second correspondence and the identification information of the second correspondence, and the third correspondence and the identification information of the third correspondence in the process of sending them to the server, a preset encryption algorithm may be adopted to encrypt the first correspondence and the identification information of the first correspondence, the second correspondence and the identification information of the second correspondence, and the third correspondence and the identification information of the third correspondence, and then send them to the server. After receiving the encrypted first corresponding relation and the identification information of the first corresponding relation, the second corresponding relation and the identification information of the second corresponding relation, and the third corresponding relation and the identification information of the third corresponding relation, the server can decrypt by adopting a preset decryption algorithm to obtain specific contents of the first corresponding relation and the identification information of the first corresponding relation, the second corresponding relation and the identification information of the second corresponding relation, and the identification information of the third corresponding relation and the third corresponding relation.

Fourth, the server determines corresponding first recommendation information for each corresponding relationship.

The server may include a semantic recognition server, where a semantic understanding model and a recommendation model are stored in the semantic recognition server, and content of the corresponding first recommendation information may be determined according to semantic information, user attribute information, acquisition time information, and location information in each corresponding relationship.

In one possible implementation manner, the first recommendation information may include a link corresponding to the content of the first recommendation information, and in order to obtain the link corresponding to the content of the first recommendation information, the server may further include a storage server, after determining the corresponding first recommendation information, the semantic recognition server may send the first recommendation information to the storage server, and the storage server determines the link corresponding to the content of the first recommendation information according to the content of the first recommendation information, and sends the first recommendation information including the link to the semantic recognition server.

Specifically, in order to ensure the security in the process of sending the first recommendation information to the storage server, a preset encryption algorithm may be adopted to encrypt the first recommendation information and send the encrypted first recommendation information to the storage server. After receiving the encrypted first recommendation information, the storage server may decrypt the first recommendation information by using a preset decryption algorithm, so as to obtain the content of the first recommendation information.

Fifth, the server sends each piece of first recommendation information to a security protection module in the intelligent device.

And finally, the safety protection module in the intelligent equipment determines first recommendation information corresponding to the first corresponding relation according to the identification information of the first corresponding relation.

Based on the same technical concept, the present application also provides a voice recognition device, which can implement the procedure executed by the server in the foregoing embodiment.

Fig. 6 is a voice recognition device according to some embodiments of the present application, as shown in fig. 6, where, based on the foregoing embodiments, the voice recognition device provided in the present application includes:

the acquiring module 601 is configured to acquire first text information corresponding to voice information to be identified acquired by an intelligent device, and first user attribute information corresponding to voiceprint features of the voice information;

a first determining module 602, configured to determine first semantic information corresponding to the first text information; and determining corresponding first recommendation information through a pre-trained recommendation model based on the first semantic information and the first user attribute information, and controlling the intelligent equipment to execute the first recommendation information.

In some embodiments, the first determining module 602 is further configured to obtain, by using the server, first collection time information of the voice information and first location information where the intelligent device is located before determining, by using a pre-trained recommendation model, the corresponding first recommendation information based on the first semantic information and the first user attribute information; and determining corresponding first recommendation information through the recommendation model based on the first semantic information, the first user attribute information, the first acquisition time information and the first position information.

In some embodiments, the process of training the recommendation model includes:

In some embodiments, the obtaining module 601 is specifically configured to receive voice information sent by the smart device; and determining first text information corresponding to the voice information according to the voice information.

In some embodiments, the obtaining module 601 is specifically configured to obtain, according to the received voice information sent by the intelligent device, a first voiceprint feature corresponding to the voice information; and determining first user attribute information corresponding to the first voiceprint feature according to the corresponding relation between the prestored voiceprint feature and the user attribute information.

The concepts related to the technical solutions provided in the present application, explanation, detailed description and other steps related to the speech recognition device are referred to in the foregoing methods or descriptions related to other embodiments, and are not repeated herein.

Based on the same technical concept, the present application also provides a voice recognition device, which can implement the flow executed by the intelligent device in the foregoing embodiment.

Fig. 7 is another voice recognition apparatus provided in some embodiments of the present application, as shown in fig. 7, where on the basis of the foregoing embodiments, the apparatus is applied to an intelligent device, and the apparatus includes:

the acquisition module 701 is configured to acquire voice information and determine first text information corresponding to the voice information;

A second determining module 702, configured to obtain a first voiceprint feature corresponding to the voice information, and determine first user attribute information corresponding to the first voiceprint feature according to a correspondence between a prestored voiceprint feature and user attribute information;

a first sending module 703, configured to send the first text information and the first user attribute information to a server;

and the receiving module 704 is configured to receive the first recommendation information sent by the server, and push and display corresponding content according to the first recommendation information.

In certain embodiments, the apparatus further comprises:

the second sending module is used for acquiring first acquisition time information of the voice information and first position information of the intelligent equipment before sending the first text information and the first user attribute information to a server; and sending the first text information, the first user attribute information, the first acquisition time information and the first corresponding relation of the first position information to the server.

In certain embodiments, the apparatus further comprises:

the disturbance module is used for sending an assistance request to at least one other device which is pre-connected before sending the first corresponding relation among the first text information, the first user attribute information, the first acquisition time information and the first position information to the server; receiving third attribute information and third position information sent by other equipment; aiming at the sub-attribute information content corresponding to each piece of sub-attribute information contained in the first user attribute information and the third user attribute information, determining the sub-attribute information content corresponding to each piece of sub-attribute information contained in each piece of fourth user attribute information in a differential privacy mode; determining each fourth position information by adopting a differential privacy mode aiming at the contents of the first position information and the third position information; and transmitting the first text information, the first user attribute information, the first acquisition time information, the first corresponding relation of the first position information and the identification information of the first corresponding relation, the first text information, the third user attribute information, the second corresponding relation of the first acquisition time information and the third position information and the identification information of the second corresponding relation, and the third text information, the fourth user attribute information, the first acquisition time information, the third corresponding relation of the fourth position information and the identification information of the third corresponding relation to the server.

In some embodiments, the receiving module 704 is specifically configured to receive first recommendation information that is sent by the server and recommended for a correspondence relationship of each piece of identification information; and determining first recommendation information corresponding to the first corresponding relation according to the identification information of the first corresponding relation, and pushing and displaying corresponding content according to the first recommendation information.

Fig. 8 is a schematic structural diagram of an electronic device according to some embodiments of the present application, and on the basis of the foregoing embodiments, the present application further provides an electronic device, as shown in fig. 8, including: a processor 801, a communication interface 802, a memory 803, and a communication bus 804, wherein the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804;

the memory 803 stores a computer program which, when executed by the processor 801, causes the processor 801 to perform the steps of the above method in which the server performs the corresponding functions.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface 802 is used for communication between the electronic device and other devices described above.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital instruction processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

On the basis of the above embodiments, the present application provides a computer-readable storage medium having stored therein a computer program executable by an electronic device, the computer-executable instructions for causing a computer to perform the procedures performed by the foregoing method portions.

The computer readable storage medium may be any available medium or data storage device that can be accessed by a processor in an electronic device, including but not limited to magnetic memories such as floppy disks, hard disks, magnetic tapes, magneto-optical disks (MO), etc., optical memories such as CD, DVD, BD, HVD, etc., and semiconductor memories such as ROM, EPROM, EEPROM, nonvolatile memories (NAND FLASH), solid State Disks (SSD), etc.

Fig. 9 is a schematic structural diagram of an electronic device according to some embodiments of the present application, and on the basis of the foregoing embodiments, the present application further provides an electronic device, as shown in fig. 9, including: processor 901, communication interface 902, memory 903 and communication bus 904, wherein processor 901, communication interface 902, memory 903 accomplish the communication between each other through communication bus 904;

the memory 903 stores a computer program which, when executed by the processor 901, causes the processor 901 to perform the steps of the above method in which the smart device performs the corresponding functions.

The communication interface 902 is used for communication between the electronic device and other devices.

Fig. 10 is a schematic structural diagram of a speech recognition system according to some embodiments of the present application, where, based on the foregoing embodiments, the present application provides a speech recognition system, including: the voice recognition apparatus of any one of the above embodiments applied to the server 1000, and the voice recognition apparatus applied to the smart device 2000 according to any one of the above embodiments.

The server 1000 is configured to obtain first text information corresponding to voice information to be identified collected by the intelligent device 2000, and first user attribute information corresponding to voiceprint features of the voice information;

The server 1000 determines first semantic information corresponding to the first text information; and based on the first semantic information and the first user attribute information, determining corresponding first recommendation information through a pre-trained recommendation model, and controlling the intelligent device 2000 to execute the first recommendation information.

The specific functions of the server 1000 are described above, and are not described herein.

The intelligent device 2000 is an intelligent device in the prior art that may collect voice information, receive first recommendation information sent by the server 1000, and push and display corresponding content according to the first recommendation information, which is not described herein again.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A method of speech recognition, the method comprising:

acquiring first text information, first user attribute information, first acquisition time information, first corresponding relation of first position information and identification information of the first corresponding relation, second corresponding relation of first acquisition time information, third position information and identification information of the second corresponding relation, first text information, fourth user attribute information, first acquisition time information, third corresponding relation of fourth position information and identification information of the third corresponding relation, which are sent by intelligent equipment; the first text information comprises text information corresponding to voice information to be identified, which is acquired by the intelligent equipment; the first user attribute information comprises user attribute information corresponding to voiceprint features of the voice information; the first acquisition time information comprises the time when the voice information is acquired; the first position information comprises position information of the intelligent equipment; the third user attribute information and the third position information are sent to the intelligent device by the other device after the intelligent device sends an assistance request to at least one other device which establishes connection in advance; the fourth user attribute information is obtained by the intelligent device after determining sub-attribute information content corresponding to each piece of sub-attribute information contained in each piece of fourth user attribute information by adopting a differential privacy mode aiming at the sub-attribute information content corresponding to each piece of sub-attribute information contained in the first user attribute information and the third user attribute information; the fourth location information is obtained by the intelligent device in a differential privacy mode aiming at the content of the first location information and the third location information;

Determining first semantic information corresponding to the first text information; and aiming at each corresponding relation, the corresponding relation comprises text information, user attribute information, acquisition time information and position information, corresponding first recommendation information is determined through a pre-trained recommendation model based on the first semantic information and the first user attribute information, and the intelligent equipment is controlled to execute the first recommendation information corresponding to the first corresponding relation.

2. The method of claim 1, wherein prior to determining the corresponding first recommendation information by pre-training a completed recommendation model based on the first semantic information and the first user attribute information, the method further comprises:

3. The method of claim 2, wherein training the recommendation model comprises:

4. The method of claim 1, wherein the obtaining the first text information corresponding to the voice information to be recognized, which is collected by the intelligent device, includes:

receiving voice information sent by the intelligent equipment;

5. The method of claim 4, wherein obtaining first user attribute information corresponding to voiceprint features of the voice information comprises:

6. The method of claim 1, wherein the first user attribute information comprises: at least one of age of the user, sex of the user, nationality of the user, income level of the user, region of the user, work unit of the user, family member constitution of the user, and time of preparation of the user to fall asleep.

7. A method of speech recognition, the method comprising:

collecting voice information and determining first text information corresponding to the voice information; acquiring first voiceprint features corresponding to the voice information, and determining first user attribute information corresponding to the first voiceprint features according to the corresponding relation between the prestored voiceprint features and the user attribute information; acquiring first acquisition time information of the voice information and first position information of the intelligent equipment;

aiming at the sub-attribute information content corresponding to each piece of sub-attribute information contained in the first user attribute information and the third user attribute information, determining the sub-attribute information content corresponding to each piece of sub-attribute information contained in each piece of fourth user attribute information in a differential privacy mode; determining each fourth position information by adopting a differential privacy mode aiming at the contents of the first position information and the third position information;

Transmitting the first text information, the first user attribute information, the first acquisition time information, the first corresponding relation of the first position information and the identification information of the first corresponding relation, the first text information, the third user attribute information, the second corresponding relation of the first acquisition time information and the third position information and the identification information of the second corresponding relation, and the third text information, the fourth user attribute information, the first acquisition time information, the third corresponding relation of the fourth position information and the identification information of the third corresponding relation to the server;

and receiving first recommendation information sent by the server for each corresponding relation, and pushing and displaying corresponding content according to the first recommendation information corresponding to the first corresponding relation.

8. The method of claim 7, wherein the receiving the first recommendation information sent by the server for each correspondence comprises:

the pushing and displaying the corresponding content according to the first recommendation information corresponding to the first corresponding relation includes: