CN112948662A

CN112948662A - Recommendation method and device and recommendation device

Info

Publication number: CN112948662A
Application number: CN201911260344.5A
Authority: CN
Inventors: 叶祺; 薛静静; 李正宇
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2021-06-11

Abstract

The embodiment of the invention provides a recommendation method and device and a recommendation device. The method specifically comprises the following steps: acquiring voice information input by a user; performing voice recognition on the voice information to obtain text information corresponding to the voice information; performing feature extraction on the voice information to obtain voice features corresponding to the voice information; according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings; and determining recommendation information according to the text information and the personalized tag. The embodiment of the invention can improve the accuracy of the recommendation information.

Description

Recommendation method and device and recommendation device

Technical Field

The invention relates to the technical field of computers, in particular to a recommendation method and device and a recommendation device.

Background

With the continuous development of speech recognition technology, the application of speech recognition based recommendation is more and more extensive. The speech recognition technology is a technology which takes speech as a research object, enables a machine to automatically recognize and understand human spoken language through signal processing and recognition technology, and converts a speech signal into a corresponding text or command.

The recommendation system converts voice data input by a user into a text by using a voice recognition technology, and recommends related information to the user according to the content in the text. However, each sentence entered by the user is typically a short text. The short text has the characteristics of short length, sparse features, non-normative features, large ambiguity and the like, so that the recommendation information is acquired according to the short text obtained through recognition, and the acquired recommendation information is possibly not accurate enough and does not accord with the intention of the user.

Disclosure of Invention

The embodiment of the invention provides a recommendation method, a recommendation device and a recommendation device, which can improve the accuracy of recommendation information.

In order to solve the above problem, an embodiment of the present invention discloses a recommendation method, where the method includes:

acquiring voice information input by a user;

performing voice recognition on the voice information to obtain text information corresponding to the voice information;

performing feature extraction on the voice information to obtain voice features corresponding to the voice information;

according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;

and determining recommendation information according to the text information and the personalized tag.

In another aspect, an embodiment of the present invention discloses a recommendation apparatus, where the apparatus includes:

the voice acquisition module is used for acquiring voice information input by a user;

the voice recognition module is used for carrying out voice recognition on the voice information to obtain text information corresponding to the voice information;

the feature extraction module is used for extracting features of the voice information to obtain voice features corresponding to the voice information;

a tag determination module, configured to determine, according to the voice feature, a personalized tag of the voice information, where the personalized tag at least includes any one of: gender, mood, health condition, region of belongings;

and the information recommendation module is used for determining recommendation information according to the text information and the personalized tag.

In yet another aspect, an embodiment of the present invention discloses an apparatus for recommending, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs including instructions for:

acquiring voice information input by a user;

In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform one or more of the recommendation methods described above.

The embodiment of the invention has the following advantages:

after voice information input by a user is acquired, voice recognition is carried out on the voice information to obtain text information corresponding to the voice information; extracting the characteristics of the voice information to obtain the voice characteristics corresponding to the voice information, and determining the personalized label of the voice information according to the voice characteristics; and then, determining recommendation information according to the text information and the personalized tag. The personalized tag may be used to represent information of a user's own state that is included in the speech feature but cannot be expressed in the text information, for example, the personalized tag may include at least any one of the following: gender, mood, health condition, region of interest. Therefore, the embodiment of the invention combines the recommendation information determined by the personalized label of the voice information on the basis of the text information converted from the voice information, so as to better accord with the self state of the user, further accord with the intention of the user, and improve the accuracy of the recommendation information.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a flow chart of the steps of one preferred method embodiment of the present invention;

FIG. 2 is a block diagram of a recommender embodiment of the present invention;

FIG. 3 is a block diagram of an apparatus 800 for recommending according to the present invention; and

fig. 4 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Method embodiment

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a recommendation method of the present invention is shown, which may specifically include the following steps:

step 101, acquiring voice information input by a user;

102, performing voice recognition on the voice information to obtain text information corresponding to the voice information;

103, extracting the characteristics of the voice information to obtain the voice characteristics corresponding to the voice information;

step 104, according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;

and 105, determining recommendation information according to the text information and the personalized tag.

The recommendation method of the embodiment of the invention can be applied to electronic equipment, and the electronic equipment comprises but is not limited to: a server, a smart phone, a tablet computer, an e-book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a car computer, a desktop computer, a set-top box, an intelligent tv, a wearable device, and so on.

The recommendation method provided by the embodiment of the invention can be used for providing recommendation information which is more in line with the user intention for the user according to the voice information input by the user. The speech information refers to a continuous piece of speech, such as a sentence, a piece of speech, etc. It is to be understood that the source of the voice information is not limited by the embodiment of the present invention, for example, the voice information may be a voice fragment collected in real time through a recording function of the electronic device.

In an optional embodiment of the present invention, the acquiring the voice information input by the user may specifically include: and acquiring voice information input or sent or received by a user through the instant messaging application.

The instant messaging application is an application program for realizing online chatting and exchanging through an instant messaging technology. The voice information acquired by the embodiment of the invention can comprise: the voice information input by the user through the instant messaging application, the voice information sent by the user to the communication opposite terminal through the instant messaging application and the voice information received by the user from the communication opposite terminal through the instant messaging application. Therefore, the embodiment of the invention can recommend the recommendation information which is in accordance with the intention of the user to the user in real time according to the chat content of the user.

After the voice information input by the user is acquired, the voice information can be preprocessed to eliminate the influence of aliasing, higher harmonic distortion, high frequency and other factors brought by a sounding organ and a collecting device and improve the quality of the voice information. The preprocessing may include pre-emphasis, framing, windowing, and the like.

After the voice information is preprocessed, voice recognition can be performed on the preprocessed voice information to obtain text information corresponding to the voice information. In order to improve the accuracy of the recommendation information, the embodiment of the invention can perform feature extraction on the preprocessed voice information frame by frame besides acquiring the text information corresponding to the voice information, so as to reduce the influence of information irrelevant to recognition in the voice information, reduce the data volume required to be processed in the subsequent recognition stage, and generate the voice feature representing the information of the speaker in the voice information. The speech features may include MFCC (Mel-frequency cepstral Coefficient), or LPCC (Linear Prediction Cepstrum Coefficient), or Filter Bank Feature (Filter Bank Feature), etc.

The extracted voice features include important information (called personalized tags in the embodiment of the present invention) of the user's own state, which cannot be expressed by text information, such as the user's gender, mood, physical condition, and the like. In the embodiment of the present invention, the personalized tag of the voice information may be determined according to the voice feature, and is used to express the feature information of the user's own state included in the voice feature, where the personalized tag may include at least any one of the following: gender, mood, health condition, region of interest.

In an application example of the present invention, it is assumed that the speech information input by the user is "i am uncomfortable in throat, cough …". The embodiment of the invention can perform voice recognition on the voice information to obtain corresponding text information, can also perform feature extraction on the voice information to obtain the voice feature corresponding to the voice information, and determines the personalized tag of the voice information according to the voice feature. Because the voice information contains serious cough sound, the personalized tag can be determined to be 'cold' according to the extracted voice characteristics, and then the recommendation information can be determined according to the text information 'I feel uncomfortable in throat at present, cough …' and the personalized tag 'cold' obtained by voice recognition.

For example, the text information can indicate that the user has "uncomfortable voice", and the personalized tag can indicate that the current self state of the user is "cold". Therefore, medical information, drug information, and the like for treating cold and throat pain can be acquired as recommendation information and recommended to the user.

The embodiment of the invention determines the recommendation information by combining the personalized label of the voice information on the basis of the text information converted from the voice information. The personalized tag is determined according to the voice features extracted from the voice information, and can express important information of the user state which cannot be expressed by the text information, so that the recommendation information obtained by combining the personalized tag is more in line with the user state, further more in line with the user intention, and the accuracy of the recommendation information can be improved.

In an optional embodiment of the present invention, the determining, according to the voice feature, the personalized tag of the voice message in step 104 specifically includes:

inputting the voice features into a classification model so as to output personalized labels corresponding to the voice features through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.

The embodiment of the invention can pre-train the classification model for identifying the personalized label, and input the voice characteristics extracted from the voice information into the classification model, so that the personalized label corresponding to the voice information can be output. The classification model is combined with a classification algorithm to analyze and process the voice characteristics to obtain a classification result. It is to be understood that the type of the classification algorithm is not limited by the embodiments of the present invention. The classification algorithm may include a support vector machine, a multi-layer perceptron, a recurrent neural network, a convolutional neural network, and probabilistic linear discriminant analysis, among others.

In an optional embodiment of the present invention, before the inputting the speech feature into the classification model, the method may further include:

step S11, acquiring voice sample data, wherein the voice sample data is marked with a corresponding personalized tag;

step S12, extracting the characteristics of the voice sample data to obtain the voice sample characteristics corresponding to the voice sample data;

step S13, inputting the voice sample characteristics into an initial classification model for training;

and step S14, in the training process, adjusting the model parameters of the initial classification model until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.

The embodiment of the invention can collect a large amount of voice sample data, and label the corresponding personalized label for the voice sample data for training the classification model.

It should be noted that the embodiment of the present invention may classify the voice sample data according to the labeled personalized tag, and train different classification models for the voice sample data of different personalized tag categories, respectively. For example, different classification models are trained respectively for different categories of personalized labels of the emotion, health condition, gender and the like of the user, for example, the emotion classification model can be used for outputting the emotion of the user, such as happiness, anxiety, excitement and the like; the health condition classification model can be used for outputting the health conditions of the user, such as cold, fever, headache and the like; the gender classification model can be used to output the gender of the user, such as male, female; the age classification model may be used to output the age bracket of the user, such as elderly, middle aged, young, children. When the method is applied, the voice characteristics can be respectively input into the classification models of different personalized label categories to obtain the classification result of each classification model. Alternatively, in the embodiment of the present invention, a general classification model may also be trained according to the voice sample data, for example, the voice sample data used for training the classification model includes two personalized tags of a health condition and a gender, and the trained classification model may output two classification results of the health condition and the gender.

After voice sample data is collected, feature extraction can be performed on the voice sample data to obtain voice sample features corresponding to the voice sample data; and then inputting the characteristics of the voice samples into an initial classification model for training, and in the training process, adjusting the model parameters of the initial classification model until the personalized labels output by the initial classification model are matched with the personalized labels labeled by the voice sample data to obtain the trained classification model.

It is understood that the specific type of the classification model is not limited by the embodiment of the present invention, and the classification model may include DNN (Deep Neural Networks). The deep neural network may fuse a variety of neural networks including, but not limited to, at least one or a combination, superposition, nesting of at least two of the following: CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory) Network, RNN (Simple Recurrent Neural Network), attention Neural Network, and the like.

In an optional embodiment of the present invention, the step 105 of determining recommendation information according to the text information and the personalized tag may specifically include:

step S21, according to the entity words in the text information, candidate information is obtained through retrieval;

and step S22, determining recommendation information according to the personalized tag and the candidate information.

The entity words are nouns or pronouns with specific meanings in the text, and the entity words can include names of people, places, organizations, proper nouns and the like. For example, entity words may include: home appliance names, garment names, brand names of goods, drug names, service provision names (e.g., car wash, home appliance maintenance, home information, etc.), ticket names, disease names, etc.

The embodiment of the invention can identify the entity words in the text information, input the identified entity words as query words into a search engine, and retrieve to obtain the candidate information. For example, if it is recognized that the text information includes an entity word "movie", the entity word may be used as a query word to perform a search, for example, the search results may be used as candidate information, such as related information of a movie recently shown and related information of a special-price movie ticket.

Then, the candidate information may be screened according to the personalized tag, and recommendation information may be determined. The personalized tag may include at least any one of: gender, mood, health condition, region of interest. For example, personalized tags combined with gender can recommend related information such as sports news to male users and cosmetics to female users. By combining with the personalized label of the region, the relevant information (such as real estate, restaurants and the like) such as goods or services can be recommended to the user in a certain region in a targeted manner.

Still taking the example that the text information recognized as above includes the entity word "movie" as an example, assuming that the personalized tag corresponding to the emotion of the user is determined to be "happy" according to the voice information input by the user, which indicates that the emotional state of the user is good, the movie suitable for being watched when the emotional state is good may be further selected from the candidate information retrieved in step S21 as the recommendation information.

In another application example of the present invention, the personalized tag may also indicate a region to which the user belongs. For example, the embodiment of the invention can determine the accent of the user according to the voice characteristics in the voice information input by the user, and further can determine the region to which the user belongs. For example, according to the accent features in the voice of the user, the region to which the user belongs is determined to be Shanghai, and then recommendation information conforming to the habit of Shanghai people can be recommended to the user. For example, if the voice information input by the user contains entity words related to food, food information more conforming to the taste of the Shanghai people can be used as recommendation information in combination with the personalized tag of the user.

Therefore, the voice information input by the user is converted into the text information, the voice feature in the voice information is extracted, the personalized tag used for representing the current state of the user is determined according to the voice feature, and the recommendation information is determined by combining the text information and the voice feature, so that the determined recommendation information not only accords with the text expression of the user, but also accords with the current self state of the user, and the accuracy of information recommendation can be improved.

In an optional embodiment of the present invention, after obtaining the text information corresponding to the voice information in step 102, the method may further include: and performing intention recognition on the text information, and determining the recommendation intention of the user.

In order to further improve the accuracy of information recommendation and enable the recommended information to better accord with the intention of the user, the embodiment of the invention can perform intention recognition on the text information after performing voice recognition on the voice information input by the user and converting the voice information into the text information, and determine the recommendation intention of the user. Specifically, word segmentation processing may be performed on the text information to obtain a keyword in the text information, and according to the keyword, a recommendation intention conforming to the user is determined.

For example, voice information input by a user is converted into the following text information: "i feel uncomfortable in throat, cough …", the text information is segmented, and the text information can be obtained to include the following keywords: "I", "throat", "uncomfortable". According to the keywords, the recommendation intention of the user can be determined to be related to the medical health aspect. As another example, for the text message "do you want to eat a hot pot this evening? ", the text message includes the keywords" tonight and "eat a hot pot". According to the keywords, the recommendation intention of the user can be determined to be related to the restaurant.

The recognition of the recommendation intent can be regarded as a problem of classifying semantic expressions, and can be recognized by adopting a classification method, wherein the classification method comprises a support vector machine, a deep neural network and the like. It is to be understood that the embodiments of the invention are not limited to the particular manner in which the intent of the recommendation is identified. For example, a large number of text samples can be collected in advance, the recommendation intentions of the text samples are labeled, and a recognition model for recognizing the recommendation intentions is trained. In specific application, the text information converted from the voice information is input into the recognition model, and then the corresponding recommendation intention can be output.

After the recommendation intention of the user is determined, the recommendation information can be determined jointly by combining the recommendation intention, the personalized tag and the voice feature, so that the accuracy of the recommendation information is further improved.

The personalized tag can supplement the recommendation intention of the user, such as reflecting the emotion, mood and health condition of the user, so that the recommendation information is more consistent with the current state of the user. In addition, in the process of determining the recommendation information, the embodiment of the invention can further combine the voice features, and the voice features are extracted from the original voice data of the user, so that the final recommendation result does not only depend on the analysis result obtained by understanding the natural language, and simultaneously the most original voice features can be embodied, thereby the recommendation result depends on various characteristics with richer representations, and the accuracy of the recommendation result is further improved.

Optionally, the determining recommendation information according to the text information and the personalized tag may specifically include:

step S31, generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;

step S32, splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;

step S33, generating a fourth vector according to the candidate information corresponding to the text information;

and step S34, inputting the user feature vector and the fourth vector into a recommendation model, and outputting recommendation information through the recommendation model.

The embodiment of the invention can carry out vectorization representation on the recommendation intention, the personalized tag, the voice characteristic and the recommendation information (such as webpage information) through vectorization technology, and determine the recommendation information through the nearest neighbor technology or the deep learning-based recommendation model technology and other technologies.

For the recommendation intention and the personalized tag, vectorization representation can be performed by using a word vector model. Specifically, the recommendation intention is input into a word vector model to obtain a first vector; and inputting the personalized tag into the word vector model to obtain a second vector. The Word vector model may include Word2Vec or GloVe, etc.

For speech features, its corresponding third vector may be a Mel Frequency Cepstral Coefficient (MFCC) feature vector. Specifically, for the preprocessed voice information, fourier Transform may be performed, and further, filter processing and DCT (Discrete Cosine Transform) may be performed on the transformed signal, so as to finally obtain an MFCC feature vector corresponding to the voice information as a third vector.

And splicing the first vector, the second vector and the third vector to obtain a user feature vector. The embodiment of the invention can calculate the correlation between the user characteristic vector and the vector of the recommendation information, and selects the recommendation information with higher correlation to recommend to the user.

Specifically, candidate information (such as web page information) to be recommended can be determined according to the text information, a fourth vector can be obtained by vectorizing the candidate information, the user feature vector and the fourth vector are input into a pre-trained recommendation model, the recommendation model can calculate the correlation between the user feature vector and the fourth vector, the candidate information is ranked according to the correlation, and n (n is a positive integer) candidate information ranked in the top is selected as recommendation information to be recommended.

It can be understood that the structure and the training mode of the recommendation model are not limited by the embodiment of the present invention.

The recommendation model may be a deep neural network model that may fuse a variety of neural networks including, but not limited to, at least one or a combination, superposition, nesting of at least two of the following: CNN networks, RNNs, attention neural networks, etc.

In an example of the present invention, the recommendation model may be trained by: firstly, historical recommendation data corresponding to a large amount of historical voice information of a user can be collected, and the historical recommendation data is labeled according to whether the user clicks the historical recommendation data, so that a labeling result is obtained. Then, determining a historical recommendation intention according to text information corresponding to the historical voice information; and determining the historical personalized tag according to the voice characteristics corresponding to the historical voice information. Next, a history first vector corresponding to the history recommendation intention, a history second vector corresponding to the history personalized tag, a history third vector corresponding to the voice feature of the history voice information, and a history fourth vector corresponding to the history candidate information are generated. And finally, inputting a historical user characteristic vector and a historical fourth vector obtained by splicing the historical first vector, the historical second vector and the historical third vector into an initial recommendation model, and performing parameter optimization on the initial recommendation model according to the difference between the recommendation result and the labeling result output by the initial recommendation model until the difference between the recommendation result and the labeling result output by the initial recommendation model is smaller than a preset threshold value to obtain the trained recommendation model.

In an optional embodiment of the present invention, after determining the recommendation intent of the user, the method may further include:

step S41, slot position information corresponding to the recommendation intention is obtained;

and step S42, slot filling is carried out on the text information according to the slot information, and each slot value corresponding to the slot information in the text information is obtained.

In the embodiment of the present invention, slot information corresponding to the recommendation intention may be set in advance. The slot position information refers to a slot position required in the vertical search field, and different recommendation intentions can correspond to different slot position information. For shopping intent, for example, the corresponding slot information may include: trade name, brand name, model number, price, etc. As another example, for a medical health intent, the corresponding slot information may include: body part, physical condition, personal scale, time, etc.

If the recommendation intention of the user is determined to be the medical health intention according to the text information, the preset slot position information corresponding to the medical health intention can be obtained. And then, according to the slot position information, slot position filling is carried out on the text information to obtain each slot position value corresponding to the slot position information in the text information. The slot filling can be regarded as a sequence labeling problem, and is to match each word in the text information with preset slot information and identify each slot value corresponding to the slot information in the text information.

The generating of the first vector corresponding to the recommendation intent may specifically include: and generating a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.

Because the slot value in the text information is matched with the slot information corresponding to the recommendation intention in the text information, after the slot value corresponding to the slot information in the text information is determined, vectorization representation can be performed on the slot value in the text information, and a first vector corresponding to the recommendation intention is obtained.

In an application example of the present invention, it is assumed that the speech information input by the user is "i am uncomfortable in throat, cough …". Firstly, voice recognition is carried out on the voice information to obtain corresponding text information. According to keywords (such as 'I', 'throat' and 'uncomfortable') in the text information, the recommendation intention of the user can be determined to be the medical health intention, so that slot position information (such as body part, body condition, personal name and time) corresponding to the medical health intention can be obtained, and slot position values corresponding to the slot position information in the text information are obtained through slot position filling and are specifically as follows: the body part is named as "throat", the physical condition is named as "uncomfortable", the person is named as "I", and the time is named as "present". In addition, feature extraction can be performed on the voice information to obtain a voice feature corresponding to the voice information, and the personalized tag of the voice information is determined according to the voice feature. Since the voice information contains severe cough sound, the personalized tag can be determined as a cold according to the extracted voice feature. Next, according to each slot value in the text information, a first vector corresponding to the recommendation intent may be generated, specifically, a first vector may be obtained by vectorizing (body part is "throat", body condition is "uncomfortable", person is "i", time is "now"); vectorizing the personalized label 'cold' to obtain a second vector; and vectorizing and expressing the voice characteristics extracted from the voice information 'I feel uncomfortable in throat and cough …' input by the user to obtain a third vector. And finally, splicing the first vector, the second vector and the third vector to obtain a user characteristic vector, and inputting the user characteristic vector and a fourth vector corresponding to the candidate information into a recommendation model to obtain recommendation information.

After determining the recommendation information according to the voice information input by the user, the recommendation information may be displayed to the user. Specifically, the recommendation information can be displayed to the user in a chat interface of the instant messaging application currently used by the user, so that the user can directly view the detailed content of the recommendation information by clicking the recommendation information, and the efficiency and convenience for the user to obtain the recommendation information content are improved.

Optionally, the embodiment of the present invention may further send the recommendation information to a correspondent node of the user. Because the current user and the correspondent user are chatting by using the instant messaging application and may have the same recommendation intention for the same current topic, the recommendation information of the current user can be recommended to the correspondent user, so that the correspondent user can quickly acquire the required recommendation information.

To sum up, after acquiring voice information input by a user, the embodiment of the invention performs voice recognition on the voice information to obtain text information corresponding to the voice information; extracting the characteristics of the voice information to obtain the voice characteristics corresponding to the voice information, and determining the personalized label of the voice information according to the voice characteristics; and then, determining recommendation information according to the text information and the personalized tag. The personalized tag may be used to represent information of a user's own state that is included in the speech feature but cannot be expressed in the text information, for example, the personalized tag may include at least any one of the following: gender, mood, health condition, region of interest. Therefore, the embodiment of the invention combines the recommendation information determined by the personalized label of the voice information on the basis of the text information converted from the voice information, so as to better accord with the self state of the user, further accord with the intention of the user, and improve the accuracy of the recommendation information.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 2, a block diagram of a recommendation device according to an embodiment of the present invention is shown, where the device may specifically include:

a voice acquiring module 201, configured to acquire voice information input by a user;

the voice recognition module 202 is configured to perform voice recognition on the voice information to obtain text information corresponding to the voice information;

the feature extraction module 203 is configured to perform feature extraction on the voice information to obtain a voice feature corresponding to the voice information;

a tag determining module 204, configured to determine, according to the voice feature, a personalized tag of the voice message, where the personalized tag at least includes any one of: gender, mood, health condition, region of belongings;

and the information recommendation module 205 is configured to determine recommendation information according to the text information and the personalized tag.

Optionally, the information recommending module 205 may specifically include:

the retrieval submodule is used for retrieving to obtain candidate information according to the entity words in the text information;

and the determining submodule is used for determining recommendation information according to the personalized tag and the candidate information.

Optionally, the apparatus may further include:

the intention determining module is used for carrying out intention identification on the text information and determining the recommendation intention of the user;

the information recommendation module comprises:

the first generation submodule is used for generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;

the second generation submodule is used for splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;

the third generation submodule is used for generating a fourth vector according to the candidate information corresponding to the text information;

and the information determination submodule is used for inputting the user characteristic vector and the fourth vector into a recommendation model so as to output recommendation information through the recommendation model.

Optionally, the apparatus may further include:

the slot position acquisition module is used for acquiring slot position information corresponding to the recommendation intention;

the slot filling module is used for filling slots in the text information according to the slot information to obtain slot values corresponding to the slot information in the text information;

the first vector determination submodule is specifically configured to generate a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.

Optionally, the tag determination module is specifically configured to input the speech feature into a classification model, so as to output a personalized tag corresponding to the speech feature through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.

Optionally, the apparatus may further include:

the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring voice sample data which is marked with a corresponding personalized tag;

the characteristic acquisition module is used for extracting the characteristics of the voice sample data to obtain the voice sample characteristics corresponding to the voice sample data;

the characteristic training module is used for inputting the characteristics of the voice samples into an initial classification model for training;

and the parameter adjusting module is used for adjusting the model parameters of the initial classification model in the training process until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.

Optionally, the voice acquiring module is specifically configured to acquire voice information input, sent or received by the user through the instant messaging application.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for recommendation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors comprises instructions for: acquiring voice information input by a user; performing voice recognition on the voice information to obtain text information corresponding to the voice information; performing feature extraction on the voice information to obtain voice features corresponding to the voice information; according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings; and determining recommendation information according to the text information and the personalized tag.

FIG. 3 is a block diagram illustrating an apparatus 800 for recommending according to an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 3, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 4 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the recommendation method shown in fig. 1.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a recommendation method, the method comprising: acquiring voice information input by a user; performing voice recognition on the voice information to obtain text information corresponding to the voice information; performing feature extraction on the voice information to obtain voice features corresponding to the voice information; according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings; and determining recommendation information according to the text information and the personalized tag.

The embodiment of the invention discloses A1 and a recommendation method, which comprises the following steps:

acquiring voice information input by a user;

A2, the method of A1, wherein the determining recommendation information according to the text information and the personalized tag, comprises:

searching to obtain candidate information according to the entity words in the text information;

and determining recommendation information according to the personalized tag and the candidate information.

A3, according to the method in A1, after obtaining the text information corresponding to the voice information, the method further includes:

performing intention identification on the text information, and determining the recommendation intention of the user;

the determining recommendation information according to the text information and the personalized tag includes:

generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;

splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;

generating a fourth vector according to the candidate information corresponding to the text information;

inputting the user feature vector and the fourth vector into a recommendation model to output recommendation information through the recommendation model.

A4, after the determining of the user's recommendation intent, the method of A3, further comprising:

acquiring slot position information corresponding to the recommendation intention;

according to the slot position information, slot position filling is carried out on the text information to obtain each slot position value corresponding to the slot position information in the text information;

the generating a first vector corresponding to the recommendation intention comprises:

and generating a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.

A5, the method of A1, wherein the determining the personalized tag of the voice message according to the voice feature includes:

A6, before the inputting the speech features into a classification model according to the method of A5, the method further comprising:

acquiring voice sample data, wherein the voice sample data is marked with a corresponding personalized tag;

performing feature extraction on the voice sample data to obtain voice sample features corresponding to the voice sample data;

inputting the voice sample characteristics into an initial classification model for training;

and in the training process, adjusting the model parameters of the initial classification model until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.

A7, the method according to any A1 to A6, wherein the obtaining the voice information input by the user comprises:

and acquiring voice information input or sent or received by a user through the instant messaging application.

The embodiment of the invention discloses B8 and a recommendation device, which comprises:

B9, the apparatus of B8, the information recommendation module comprising:

B10, the apparatus of B8, the apparatus further comprising:

the information recommendation module comprises:

B11, the apparatus of B10, the apparatus further comprising:

B12, the apparatus according to B8, the tag determination module is specifically configured to input the speech features into a classification model, so as to output personalized tags corresponding to the speech features through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.

B13, the method according to B12, the apparatus further comprising:

B14, the apparatus according to any of B8 to B13, wherein the voice acquiring module is specifically configured to acquire voice information input or sent or received by a user through an instant messaging application.

The embodiment of the invention discloses C15, an apparatus for recommending, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for:

acquiring voice information input by a user;

C16, the apparatus according to C15, the determining recommendation information according to the text information and the personalized tag, comprising:

C17, the device of C15, the device also configured to execute the one or more programs by one or more processors including instructions for:

C18, the device of C17, the device also configured to execute the one or more programs by one or more processors including instructions for:

C19, the apparatus according to C15, the determining the personalized tag of the voice information according to the voice feature includes:

C20, the device of C19, the device also configured to execute the one or more programs by one or more processors including instructions for:

C21, the apparatus according to any of C15 to C20, wherein the acquiring the voice information input by the user comprises:

Embodiments of the present invention disclose D22, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a recommendation method as described in one or more of a 1-a 7.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The recommendation method, the recommendation device and the device for recommendation provided by the invention are described in detail above, and specific examples are applied in the text to explain the principle and the implementation of the invention, and the description of the above examples is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A recommendation method, characterized in that the method comprises:

acquiring voice information input by a user;

2. The method of claim 1, wherein determining recommendation information based on the textual information and the personalized tag comprises:

3. The method according to claim 1, wherein after obtaining the text information corresponding to the voice information, the method further comprises:

4. The method of claim 3, wherein after determining the recommendation intent of the user, the method further comprises:

5. The method of claim 1, wherein determining the personalized tag for the voice message according to the voice feature comprises:

6. The method of claim 5, wherein prior to entering the speech features into a classification model, the method further comprises:

7. The method according to any one of claims 1 to 6, wherein the acquiring the voice information input by the user comprises:

8. A speech recognition apparatus, characterized in that the apparatus comprises:

9. An apparatus for recommendation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:

acquiring voice information input by a user;

10. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the recommendation method of one or more of claims 1-7.