CN112948662A - Recommendation method and device and recommendation device - Google Patents
Recommendation method and device and recommendation device Download PDFInfo
- Publication number
- CN112948662A CN112948662A CN201911260344.5A CN201911260344A CN112948662A CN 112948662 A CN112948662 A CN 112948662A CN 201911260344 A CN201911260344 A CN 201911260344A CN 112948662 A CN112948662 A CN 112948662A
- Authority
- CN
- China
- Prior art keywords
- information
- voice
- recommendation
- user
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000036541 health Effects 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 230000036651 mood Effects 0.000 claims abstract description 20
- 238000013145 classification model Methods 0.000 claims description 66
- 238000012549 training Methods 0.000 claims description 25
- 230000015654 memory Effects 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 13
- 238000013528 artificial neural network Methods 0.000 description 9
- 206010011224 Cough Diseases 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000008451 emotion Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000007635 classification algorithm Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 206010019233 Headaches Diseases 0.000 description 1
- 206010068319 Oropharyngeal pain Diseases 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 239000002537 cosmetic Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 231100000869 headache Toxicity 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 208000023409 throat pain Diseases 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9532—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Child & Adolescent Psychology (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a recommendation method and device and a recommendation device. The method specifically comprises the following steps: acquiring voice information input by a user; performing voice recognition on the voice information to obtain text information corresponding to the voice information; performing feature extraction on the voice information to obtain voice features corresponding to the voice information; according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings; and determining recommendation information according to the text information and the personalized tag. The embodiment of the invention can improve the accuracy of the recommendation information.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a recommendation method and device and a recommendation device.
Background
With the continuous development of speech recognition technology, the application of speech recognition based recommendation is more and more extensive. The speech recognition technology is a technology which takes speech as a research object, enables a machine to automatically recognize and understand human spoken language through signal processing and recognition technology, and converts a speech signal into a corresponding text or command.
The recommendation system converts voice data input by a user into a text by using a voice recognition technology, and recommends related information to the user according to the content in the text. However, each sentence entered by the user is typically a short text. The short text has the characteristics of short length, sparse features, non-normative features, large ambiguity and the like, so that the recommendation information is acquired according to the short text obtained through recognition, and the acquired recommendation information is possibly not accurate enough and does not accord with the intention of the user.
Disclosure of Invention
The embodiment of the invention provides a recommendation method, a recommendation device and a recommendation device, which can improve the accuracy of recommendation information.
In order to solve the above problem, an embodiment of the present invention discloses a recommendation method, where the method includes:
acquiring voice information input by a user;
performing voice recognition on the voice information to obtain text information corresponding to the voice information;
performing feature extraction on the voice information to obtain voice features corresponding to the voice information;
according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;
and determining recommendation information according to the text information and the personalized tag.
In another aspect, an embodiment of the present invention discloses a recommendation apparatus, where the apparatus includes:
the voice acquisition module is used for acquiring voice information input by a user;
the voice recognition module is used for carrying out voice recognition on the voice information to obtain text information corresponding to the voice information;
the feature extraction module is used for extracting features of the voice information to obtain voice features corresponding to the voice information;
a tag determination module, configured to determine, according to the voice feature, a personalized tag of the voice information, where the personalized tag at least includes any one of: gender, mood, health condition, region of belongings;
and the information recommendation module is used for determining recommendation information according to the text information and the personalized tag.
In yet another aspect, an embodiment of the present invention discloses an apparatus for recommending, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs including instructions for:
acquiring voice information input by a user;
performing voice recognition on the voice information to obtain text information corresponding to the voice information;
performing feature extraction on the voice information to obtain voice features corresponding to the voice information;
according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;
and determining recommendation information according to the text information and the personalized tag.
In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform one or more of the recommendation methods described above.
The embodiment of the invention has the following advantages:
after voice information input by a user is acquired, voice recognition is carried out on the voice information to obtain text information corresponding to the voice information; extracting the characteristics of the voice information to obtain the voice characteristics corresponding to the voice information, and determining the personalized label of the voice information according to the voice characteristics; and then, determining recommendation information according to the text information and the personalized tag. The personalized tag may be used to represent information of a user's own state that is included in the speech feature but cannot be expressed in the text information, for example, the personalized tag may include at least any one of the following: gender, mood, health condition, region of interest. Therefore, the embodiment of the invention combines the recommendation information determined by the personalized label of the voice information on the basis of the text information converted from the voice information, so as to better accord with the self state of the user, further accord with the intention of the user, and improve the accuracy of the recommendation information.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flow chart of the steps of one preferred method embodiment of the present invention;
FIG. 2 is a block diagram of a recommender embodiment of the present invention;
FIG. 3 is a block diagram of an apparatus 800 for recommending according to the present invention; and
fig. 4 is a schematic diagram of a server in some embodiments of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Method embodiment
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a recommendation method of the present invention is shown, which may specifically include the following steps:
102, performing voice recognition on the voice information to obtain text information corresponding to the voice information;
103, extracting the characteristics of the voice information to obtain the voice characteristics corresponding to the voice information;
and 105, determining recommendation information according to the text information and the personalized tag.
The recommendation method of the embodiment of the invention can be applied to electronic equipment, and the electronic equipment comprises but is not limited to: a server, a smart phone, a tablet computer, an e-book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a car computer, a desktop computer, a set-top box, an intelligent tv, a wearable device, and so on.
The recommendation method provided by the embodiment of the invention can be used for providing recommendation information which is more in line with the user intention for the user according to the voice information input by the user. The speech information refers to a continuous piece of speech, such as a sentence, a piece of speech, etc. It is to be understood that the source of the voice information is not limited by the embodiment of the present invention, for example, the voice information may be a voice fragment collected in real time through a recording function of the electronic device.
In an optional embodiment of the present invention, the acquiring the voice information input by the user may specifically include: and acquiring voice information input or sent or received by a user through the instant messaging application.
The instant messaging application is an application program for realizing online chatting and exchanging through an instant messaging technology. The voice information acquired by the embodiment of the invention can comprise: the voice information input by the user through the instant messaging application, the voice information sent by the user to the communication opposite terminal through the instant messaging application and the voice information received by the user from the communication opposite terminal through the instant messaging application. Therefore, the embodiment of the invention can recommend the recommendation information which is in accordance with the intention of the user to the user in real time according to the chat content of the user.
After the voice information input by the user is acquired, the voice information can be preprocessed to eliminate the influence of aliasing, higher harmonic distortion, high frequency and other factors brought by a sounding organ and a collecting device and improve the quality of the voice information. The preprocessing may include pre-emphasis, framing, windowing, and the like.
After the voice information is preprocessed, voice recognition can be performed on the preprocessed voice information to obtain text information corresponding to the voice information. In order to improve the accuracy of the recommendation information, the embodiment of the invention can perform feature extraction on the preprocessed voice information frame by frame besides acquiring the text information corresponding to the voice information, so as to reduce the influence of information irrelevant to recognition in the voice information, reduce the data volume required to be processed in the subsequent recognition stage, and generate the voice feature representing the information of the speaker in the voice information. The speech features may include MFCC (Mel-frequency cepstral Coefficient), or LPCC (Linear Prediction Cepstrum Coefficient), or Filter Bank Feature (Filter Bank Feature), etc.
The extracted voice features include important information (called personalized tags in the embodiment of the present invention) of the user's own state, which cannot be expressed by text information, such as the user's gender, mood, physical condition, and the like. In the embodiment of the present invention, the personalized tag of the voice information may be determined according to the voice feature, and is used to express the feature information of the user's own state included in the voice feature, where the personalized tag may include at least any one of the following: gender, mood, health condition, region of interest.
In an application example of the present invention, it is assumed that the speech information input by the user is "i am uncomfortable in throat, cough …". The embodiment of the invention can perform voice recognition on the voice information to obtain corresponding text information, can also perform feature extraction on the voice information to obtain the voice feature corresponding to the voice information, and determines the personalized tag of the voice information according to the voice feature. Because the voice information contains serious cough sound, the personalized tag can be determined to be 'cold' according to the extracted voice characteristics, and then the recommendation information can be determined according to the text information 'I feel uncomfortable in throat at present, cough …' and the personalized tag 'cold' obtained by voice recognition.
For example, the text information can indicate that the user has "uncomfortable voice", and the personalized tag can indicate that the current self state of the user is "cold". Therefore, medical information, drug information, and the like for treating cold and throat pain can be acquired as recommendation information and recommended to the user.
The embodiment of the invention determines the recommendation information by combining the personalized label of the voice information on the basis of the text information converted from the voice information. The personalized tag is determined according to the voice features extracted from the voice information, and can express important information of the user state which cannot be expressed by the text information, so that the recommendation information obtained by combining the personalized tag is more in line with the user state, further more in line with the user intention, and the accuracy of the recommendation information can be improved.
In an optional embodiment of the present invention, the determining, according to the voice feature, the personalized tag of the voice message in step 104 specifically includes:
inputting the voice features into a classification model so as to output personalized labels corresponding to the voice features through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.
The embodiment of the invention can pre-train the classification model for identifying the personalized label, and input the voice characteristics extracted from the voice information into the classification model, so that the personalized label corresponding to the voice information can be output. The classification model is combined with a classification algorithm to analyze and process the voice characteristics to obtain a classification result. It is to be understood that the type of the classification algorithm is not limited by the embodiments of the present invention. The classification algorithm may include a support vector machine, a multi-layer perceptron, a recurrent neural network, a convolutional neural network, and probabilistic linear discriminant analysis, among others.
In an optional embodiment of the present invention, before the inputting the speech feature into the classification model, the method may further include:
step S11, acquiring voice sample data, wherein the voice sample data is marked with a corresponding personalized tag;
step S12, extracting the characteristics of the voice sample data to obtain the voice sample characteristics corresponding to the voice sample data;
step S13, inputting the voice sample characteristics into an initial classification model for training;
and step S14, in the training process, adjusting the model parameters of the initial classification model until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.
The embodiment of the invention can collect a large amount of voice sample data, and label the corresponding personalized label for the voice sample data for training the classification model.
It should be noted that the embodiment of the present invention may classify the voice sample data according to the labeled personalized tag, and train different classification models for the voice sample data of different personalized tag categories, respectively. For example, different classification models are trained respectively for different categories of personalized labels of the emotion, health condition, gender and the like of the user, for example, the emotion classification model can be used for outputting the emotion of the user, such as happiness, anxiety, excitement and the like; the health condition classification model can be used for outputting the health conditions of the user, such as cold, fever, headache and the like; the gender classification model can be used to output the gender of the user, such as male, female; the age classification model may be used to output the age bracket of the user, such as elderly, middle aged, young, children. When the method is applied, the voice characteristics can be respectively input into the classification models of different personalized label categories to obtain the classification result of each classification model. Alternatively, in the embodiment of the present invention, a general classification model may also be trained according to the voice sample data, for example, the voice sample data used for training the classification model includes two personalized tags of a health condition and a gender, and the trained classification model may output two classification results of the health condition and the gender.
After voice sample data is collected, feature extraction can be performed on the voice sample data to obtain voice sample features corresponding to the voice sample data; and then inputting the characteristics of the voice samples into an initial classification model for training, and in the training process, adjusting the model parameters of the initial classification model until the personalized labels output by the initial classification model are matched with the personalized labels labeled by the voice sample data to obtain the trained classification model.
It is understood that the specific type of the classification model is not limited by the embodiment of the present invention, and the classification model may include DNN (Deep Neural Networks). The deep neural network may fuse a variety of neural networks including, but not limited to, at least one or a combination, superposition, nesting of at least two of the following: CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory) Network, RNN (Simple Recurrent Neural Network), attention Neural Network, and the like.
In an optional embodiment of the present invention, the step 105 of determining recommendation information according to the text information and the personalized tag may specifically include:
step S21, according to the entity words in the text information, candidate information is obtained through retrieval;
and step S22, determining recommendation information according to the personalized tag and the candidate information.
The entity words are nouns or pronouns with specific meanings in the text, and the entity words can include names of people, places, organizations, proper nouns and the like. For example, entity words may include: home appliance names, garment names, brand names of goods, drug names, service provision names (e.g., car wash, home appliance maintenance, home information, etc.), ticket names, disease names, etc.
The embodiment of the invention can identify the entity words in the text information, input the identified entity words as query words into a search engine, and retrieve to obtain the candidate information. For example, if it is recognized that the text information includes an entity word "movie", the entity word may be used as a query word to perform a search, for example, the search results may be used as candidate information, such as related information of a movie recently shown and related information of a special-price movie ticket.
Then, the candidate information may be screened according to the personalized tag, and recommendation information may be determined. The personalized tag may include at least any one of: gender, mood, health condition, region of interest. For example, personalized tags combined with gender can recommend related information such as sports news to male users and cosmetics to female users. By combining with the personalized label of the region, the relevant information (such as real estate, restaurants and the like) such as goods or services can be recommended to the user in a certain region in a targeted manner.
Still taking the example that the text information recognized as above includes the entity word "movie" as an example, assuming that the personalized tag corresponding to the emotion of the user is determined to be "happy" according to the voice information input by the user, which indicates that the emotional state of the user is good, the movie suitable for being watched when the emotional state is good may be further selected from the candidate information retrieved in step S21 as the recommendation information.
In another application example of the present invention, the personalized tag may also indicate a region to which the user belongs. For example, the embodiment of the invention can determine the accent of the user according to the voice characteristics in the voice information input by the user, and further can determine the region to which the user belongs. For example, according to the accent features in the voice of the user, the region to which the user belongs is determined to be Shanghai, and then recommendation information conforming to the habit of Shanghai people can be recommended to the user. For example, if the voice information input by the user contains entity words related to food, food information more conforming to the taste of the Shanghai people can be used as recommendation information in combination with the personalized tag of the user.
Therefore, the voice information input by the user is converted into the text information, the voice feature in the voice information is extracted, the personalized tag used for representing the current state of the user is determined according to the voice feature, and the recommendation information is determined by combining the text information and the voice feature, so that the determined recommendation information not only accords with the text expression of the user, but also accords with the current self state of the user, and the accuracy of information recommendation can be improved.
In an optional embodiment of the present invention, after obtaining the text information corresponding to the voice information in step 102, the method may further include: and performing intention recognition on the text information, and determining the recommendation intention of the user.
In order to further improve the accuracy of information recommendation and enable the recommended information to better accord with the intention of the user, the embodiment of the invention can perform intention recognition on the text information after performing voice recognition on the voice information input by the user and converting the voice information into the text information, and determine the recommendation intention of the user. Specifically, word segmentation processing may be performed on the text information to obtain a keyword in the text information, and according to the keyword, a recommendation intention conforming to the user is determined.
For example, voice information input by a user is converted into the following text information: "i feel uncomfortable in throat, cough …", the text information is segmented, and the text information can be obtained to include the following keywords: "I", "throat", "uncomfortable". According to the keywords, the recommendation intention of the user can be determined to be related to the medical health aspect. As another example, for the text message "do you want to eat a hot pot this evening? ", the text message includes the keywords" tonight and "eat a hot pot". According to the keywords, the recommendation intention of the user can be determined to be related to the restaurant.
The recognition of the recommendation intent can be regarded as a problem of classifying semantic expressions, and can be recognized by adopting a classification method, wherein the classification method comprises a support vector machine, a deep neural network and the like. It is to be understood that the embodiments of the invention are not limited to the particular manner in which the intent of the recommendation is identified. For example, a large number of text samples can be collected in advance, the recommendation intentions of the text samples are labeled, and a recognition model for recognizing the recommendation intentions is trained. In specific application, the text information converted from the voice information is input into the recognition model, and then the corresponding recommendation intention can be output.
After the recommendation intention of the user is determined, the recommendation information can be determined jointly by combining the recommendation intention, the personalized tag and the voice feature, so that the accuracy of the recommendation information is further improved.
The personalized tag can supplement the recommendation intention of the user, such as reflecting the emotion, mood and health condition of the user, so that the recommendation information is more consistent with the current state of the user. In addition, in the process of determining the recommendation information, the embodiment of the invention can further combine the voice features, and the voice features are extracted from the original voice data of the user, so that the final recommendation result does not only depend on the analysis result obtained by understanding the natural language, and simultaneously the most original voice features can be embodied, thereby the recommendation result depends on various characteristics with richer representations, and the accuracy of the recommendation result is further improved.
Optionally, the determining recommendation information according to the text information and the personalized tag may specifically include:
step S31, generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;
step S32, splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;
step S33, generating a fourth vector according to the candidate information corresponding to the text information;
and step S34, inputting the user feature vector and the fourth vector into a recommendation model, and outputting recommendation information through the recommendation model.
The embodiment of the invention can carry out vectorization representation on the recommendation intention, the personalized tag, the voice characteristic and the recommendation information (such as webpage information) through vectorization technology, and determine the recommendation information through the nearest neighbor technology or the deep learning-based recommendation model technology and other technologies.
For the recommendation intention and the personalized tag, vectorization representation can be performed by using a word vector model. Specifically, the recommendation intention is input into a word vector model to obtain a first vector; and inputting the personalized tag into the word vector model to obtain a second vector. The Word vector model may include Word2Vec or GloVe, etc.
For speech features, its corresponding third vector may be a Mel Frequency Cepstral Coefficient (MFCC) feature vector. Specifically, for the preprocessed voice information, fourier Transform may be performed, and further, filter processing and DCT (Discrete Cosine Transform) may be performed on the transformed signal, so as to finally obtain an MFCC feature vector corresponding to the voice information as a third vector.
And splicing the first vector, the second vector and the third vector to obtain a user feature vector. The embodiment of the invention can calculate the correlation between the user characteristic vector and the vector of the recommendation information, and selects the recommendation information with higher correlation to recommend to the user.
Specifically, candidate information (such as web page information) to be recommended can be determined according to the text information, a fourth vector can be obtained by vectorizing the candidate information, the user feature vector and the fourth vector are input into a pre-trained recommendation model, the recommendation model can calculate the correlation between the user feature vector and the fourth vector, the candidate information is ranked according to the correlation, and n (n is a positive integer) candidate information ranked in the top is selected as recommendation information to be recommended.
It can be understood that the structure and the training mode of the recommendation model are not limited by the embodiment of the present invention.
The recommendation model may be a deep neural network model that may fuse a variety of neural networks including, but not limited to, at least one or a combination, superposition, nesting of at least two of the following: CNN networks, RNNs, attention neural networks, etc.
In an example of the present invention, the recommendation model may be trained by: firstly, historical recommendation data corresponding to a large amount of historical voice information of a user can be collected, and the historical recommendation data is labeled according to whether the user clicks the historical recommendation data, so that a labeling result is obtained. Then, determining a historical recommendation intention according to text information corresponding to the historical voice information; and determining the historical personalized tag according to the voice characteristics corresponding to the historical voice information. Next, a history first vector corresponding to the history recommendation intention, a history second vector corresponding to the history personalized tag, a history third vector corresponding to the voice feature of the history voice information, and a history fourth vector corresponding to the history candidate information are generated. And finally, inputting a historical user characteristic vector and a historical fourth vector obtained by splicing the historical first vector, the historical second vector and the historical third vector into an initial recommendation model, and performing parameter optimization on the initial recommendation model according to the difference between the recommendation result and the labeling result output by the initial recommendation model until the difference between the recommendation result and the labeling result output by the initial recommendation model is smaller than a preset threshold value to obtain the trained recommendation model.
In an optional embodiment of the present invention, after determining the recommendation intent of the user, the method may further include:
step S41, slot position information corresponding to the recommendation intention is obtained;
and step S42, slot filling is carried out on the text information according to the slot information, and each slot value corresponding to the slot information in the text information is obtained.
In the embodiment of the present invention, slot information corresponding to the recommendation intention may be set in advance. The slot position information refers to a slot position required in the vertical search field, and different recommendation intentions can correspond to different slot position information. For shopping intent, for example, the corresponding slot information may include: trade name, brand name, model number, price, etc. As another example, for a medical health intent, the corresponding slot information may include: body part, physical condition, personal scale, time, etc.
If the recommendation intention of the user is determined to be the medical health intention according to the text information, the preset slot position information corresponding to the medical health intention can be obtained. And then, according to the slot position information, slot position filling is carried out on the text information to obtain each slot position value corresponding to the slot position information in the text information. The slot filling can be regarded as a sequence labeling problem, and is to match each word in the text information with preset slot information and identify each slot value corresponding to the slot information in the text information.
The generating of the first vector corresponding to the recommendation intent may specifically include: and generating a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.
Because the slot value in the text information is matched with the slot information corresponding to the recommendation intention in the text information, after the slot value corresponding to the slot information in the text information is determined, vectorization representation can be performed on the slot value in the text information, and a first vector corresponding to the recommendation intention is obtained.
In an application example of the present invention, it is assumed that the speech information input by the user is "i am uncomfortable in throat, cough …". Firstly, voice recognition is carried out on the voice information to obtain corresponding text information. According to keywords (such as 'I', 'throat' and 'uncomfortable') in the text information, the recommendation intention of the user can be determined to be the medical health intention, so that slot position information (such as body part, body condition, personal name and time) corresponding to the medical health intention can be obtained, and slot position values corresponding to the slot position information in the text information are obtained through slot position filling and are specifically as follows: the body part is named as "throat", the physical condition is named as "uncomfortable", the person is named as "I", and the time is named as "present". In addition, feature extraction can be performed on the voice information to obtain a voice feature corresponding to the voice information, and the personalized tag of the voice information is determined according to the voice feature. Since the voice information contains severe cough sound, the personalized tag can be determined as a cold according to the extracted voice feature. Next, according to each slot value in the text information, a first vector corresponding to the recommendation intent may be generated, specifically, a first vector may be obtained by vectorizing (body part is "throat", body condition is "uncomfortable", person is "i", time is "now"); vectorizing the personalized label 'cold' to obtain a second vector; and vectorizing and expressing the voice characteristics extracted from the voice information 'I feel uncomfortable in throat and cough …' input by the user to obtain a third vector. And finally, splicing the first vector, the second vector and the third vector to obtain a user characteristic vector, and inputting the user characteristic vector and a fourth vector corresponding to the candidate information into a recommendation model to obtain recommendation information.
After determining the recommendation information according to the voice information input by the user, the recommendation information may be displayed to the user. Specifically, the recommendation information can be displayed to the user in a chat interface of the instant messaging application currently used by the user, so that the user can directly view the detailed content of the recommendation information by clicking the recommendation information, and the efficiency and convenience for the user to obtain the recommendation information content are improved.
Optionally, the embodiment of the present invention may further send the recommendation information to a correspondent node of the user. Because the current user and the correspondent user are chatting by using the instant messaging application and may have the same recommendation intention for the same current topic, the recommendation information of the current user can be recommended to the correspondent user, so that the correspondent user can quickly acquire the required recommendation information.
To sum up, after acquiring voice information input by a user, the embodiment of the invention performs voice recognition on the voice information to obtain text information corresponding to the voice information; extracting the characteristics of the voice information to obtain the voice characteristics corresponding to the voice information, and determining the personalized label of the voice information according to the voice characteristics; and then, determining recommendation information according to the text information and the personalized tag. The personalized tag may be used to represent information of a user's own state that is included in the speech feature but cannot be expressed in the text information, for example, the personalized tag may include at least any one of the following: gender, mood, health condition, region of interest. Therefore, the embodiment of the invention combines the recommendation information determined by the personalized label of the voice information on the basis of the text information converted from the voice information, so as to better accord with the self state of the user, further accord with the intention of the user, and improve the accuracy of the recommendation information.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Device embodiment
Referring to fig. 2, a block diagram of a recommendation device according to an embodiment of the present invention is shown, where the device may specifically include:
a voice acquiring module 201, configured to acquire voice information input by a user;
the voice recognition module 202 is configured to perform voice recognition on the voice information to obtain text information corresponding to the voice information;
the feature extraction module 203 is configured to perform feature extraction on the voice information to obtain a voice feature corresponding to the voice information;
a tag determining module 204, configured to determine, according to the voice feature, a personalized tag of the voice message, where the personalized tag at least includes any one of: gender, mood, health condition, region of belongings;
and the information recommendation module 205 is configured to determine recommendation information according to the text information and the personalized tag.
Optionally, the information recommending module 205 may specifically include:
the retrieval submodule is used for retrieving to obtain candidate information according to the entity words in the text information;
and the determining submodule is used for determining recommendation information according to the personalized tag and the candidate information.
Optionally, the apparatus may further include:
the intention determining module is used for carrying out intention identification on the text information and determining the recommendation intention of the user;
the information recommendation module comprises:
the first generation submodule is used for generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;
the second generation submodule is used for splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;
the third generation submodule is used for generating a fourth vector according to the candidate information corresponding to the text information;
and the information determination submodule is used for inputting the user characteristic vector and the fourth vector into a recommendation model so as to output recommendation information through the recommendation model.
Optionally, the apparatus may further include:
the slot position acquisition module is used for acquiring slot position information corresponding to the recommendation intention;
the slot filling module is used for filling slots in the text information according to the slot information to obtain slot values corresponding to the slot information in the text information;
the first vector determination submodule is specifically configured to generate a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.
Optionally, the tag determination module is specifically configured to input the speech feature into a classification model, so as to output a personalized tag corresponding to the speech feature through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.
Optionally, the apparatus may further include:
the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring voice sample data which is marked with a corresponding personalized tag;
the characteristic acquisition module is used for extracting the characteristics of the voice sample data to obtain the voice sample characteristics corresponding to the voice sample data;
the characteristic training module is used for inputting the characteristics of the voice samples into an initial classification model for training;
and the parameter adjusting module is used for adjusting the model parameters of the initial classification model in the training process until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.
Optionally, the voice acquiring module is specifically configured to acquire voice information input, sent or received by the user through the instant messaging application.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
An embodiment of the present invention provides an apparatus for recommendation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors comprises instructions for: acquiring voice information input by a user; performing voice recognition on the voice information to obtain text information corresponding to the voice information; performing feature extraction on the voice information to obtain voice features corresponding to the voice information; according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings; and determining recommendation information according to the text information and the personalized tag.
FIG. 3 is a block diagram illustrating an apparatus 800 for recommending according to an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 3, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 4 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the recommendation method shown in fig. 1.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a recommendation method, the method comprising: acquiring voice information input by a user; performing voice recognition on the voice information to obtain text information corresponding to the voice information; performing feature extraction on the voice information to obtain voice features corresponding to the voice information; according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings; and determining recommendation information according to the text information and the personalized tag.
The embodiment of the invention discloses A1 and a recommendation method, which comprises the following steps:
acquiring voice information input by a user;
performing voice recognition on the voice information to obtain text information corresponding to the voice information;
performing feature extraction on the voice information to obtain voice features corresponding to the voice information;
according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;
and determining recommendation information according to the text information and the personalized tag.
A2, the method of A1, wherein the determining recommendation information according to the text information and the personalized tag, comprises:
searching to obtain candidate information according to the entity words in the text information;
and determining recommendation information according to the personalized tag and the candidate information.
A3, according to the method in A1, after obtaining the text information corresponding to the voice information, the method further includes:
performing intention identification on the text information, and determining the recommendation intention of the user;
the determining recommendation information according to the text information and the personalized tag includes:
generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;
splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;
generating a fourth vector according to the candidate information corresponding to the text information;
inputting the user feature vector and the fourth vector into a recommendation model to output recommendation information through the recommendation model.
A4, after the determining of the user's recommendation intent, the method of A3, further comprising:
acquiring slot position information corresponding to the recommendation intention;
according to the slot position information, slot position filling is carried out on the text information to obtain each slot position value corresponding to the slot position information in the text information;
the generating a first vector corresponding to the recommendation intention comprises:
and generating a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.
A5, the method of A1, wherein the determining the personalized tag of the voice message according to the voice feature includes:
inputting the voice features into a classification model so as to output personalized labels corresponding to the voice features through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.
A6, before the inputting the speech features into a classification model according to the method of A5, the method further comprising:
acquiring voice sample data, wherein the voice sample data is marked with a corresponding personalized tag;
performing feature extraction on the voice sample data to obtain voice sample features corresponding to the voice sample data;
inputting the voice sample characteristics into an initial classification model for training;
and in the training process, adjusting the model parameters of the initial classification model until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.
A7, the method according to any A1 to A6, wherein the obtaining the voice information input by the user comprises:
and acquiring voice information input or sent or received by a user through the instant messaging application.
The embodiment of the invention discloses B8 and a recommendation device, which comprises:
the voice acquisition module is used for acquiring voice information input by a user;
the voice recognition module is used for carrying out voice recognition on the voice information to obtain text information corresponding to the voice information;
the feature extraction module is used for extracting features of the voice information to obtain voice features corresponding to the voice information;
a tag determination module, configured to determine, according to the voice feature, a personalized tag of the voice information, where the personalized tag at least includes any one of: gender, mood, health condition, region of belongings;
and the information recommendation module is used for determining recommendation information according to the text information and the personalized tag.
B9, the apparatus of B8, the information recommendation module comprising:
the retrieval submodule is used for retrieving to obtain candidate information according to the entity words in the text information;
and the determining submodule is used for determining recommendation information according to the personalized tag and the candidate information.
B10, the apparatus of B8, the apparatus further comprising:
the intention determining module is used for carrying out intention identification on the text information and determining the recommendation intention of the user;
the information recommendation module comprises:
the first generation submodule is used for generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;
the second generation submodule is used for splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;
the third generation submodule is used for generating a fourth vector according to the candidate information corresponding to the text information;
and the information determination submodule is used for inputting the user characteristic vector and the fourth vector into a recommendation model so as to output recommendation information through the recommendation model.
B11, the apparatus of B10, the apparatus further comprising:
the slot position acquisition module is used for acquiring slot position information corresponding to the recommendation intention;
the slot filling module is used for filling slots in the text information according to the slot information to obtain slot values corresponding to the slot information in the text information;
the first vector determination submodule is specifically configured to generate a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.
B12, the apparatus according to B8, the tag determination module is specifically configured to input the speech features into a classification model, so as to output personalized tags corresponding to the speech features through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.
B13, the method according to B12, the apparatus further comprising:
the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring voice sample data which is marked with a corresponding personalized tag;
the characteristic acquisition module is used for extracting the characteristics of the voice sample data to obtain the voice sample characteristics corresponding to the voice sample data;
the characteristic training module is used for inputting the characteristics of the voice samples into an initial classification model for training;
and the parameter adjusting module is used for adjusting the model parameters of the initial classification model in the training process until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.
B14, the apparatus according to any of B8 to B13, wherein the voice acquiring module is specifically configured to acquire voice information input or sent or received by a user through an instant messaging application.
The embodiment of the invention discloses C15, an apparatus for recommending, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for:
acquiring voice information input by a user;
performing voice recognition on the voice information to obtain text information corresponding to the voice information;
performing feature extraction on the voice information to obtain voice features corresponding to the voice information;
according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;
and determining recommendation information according to the text information and the personalized tag.
C16, the apparatus according to C15, the determining recommendation information according to the text information and the personalized tag, comprising:
searching to obtain candidate information according to the entity words in the text information;
and determining recommendation information according to the personalized tag and the candidate information.
C17, the device of C15, the device also configured to execute the one or more programs by one or more processors including instructions for:
performing intention identification on the text information, and determining the recommendation intention of the user;
the determining recommendation information according to the text information and the personalized tag includes:
generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;
splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;
generating a fourth vector according to the candidate information corresponding to the text information;
inputting the user feature vector and the fourth vector into a recommendation model to output recommendation information through the recommendation model.
C18, the device of C17, the device also configured to execute the one or more programs by one or more processors including instructions for:
acquiring slot position information corresponding to the recommendation intention;
according to the slot position information, slot position filling is carried out on the text information to obtain each slot position value corresponding to the slot position information in the text information;
the generating a first vector corresponding to the recommendation intention comprises:
and generating a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.
C19, the apparatus according to C15, the determining the personalized tag of the voice information according to the voice feature includes:
inputting the voice features into a classification model so as to output personalized labels corresponding to the voice features through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.
C20, the device of C19, the device also configured to execute the one or more programs by one or more processors including instructions for:
acquiring voice sample data, wherein the voice sample data is marked with a corresponding personalized tag;
performing feature extraction on the voice sample data to obtain voice sample features corresponding to the voice sample data;
inputting the voice sample characteristics into an initial classification model for training;
and in the training process, adjusting the model parameters of the initial classification model until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.
C21, the apparatus according to any of C15 to C20, wherein the acquiring the voice information input by the user comprises:
and acquiring voice information input or sent or received by a user through the instant messaging application.
Embodiments of the present invention disclose D22, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a recommendation method as described in one or more of a 1-a 7.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
The recommendation method, the recommendation device and the device for recommendation provided by the invention are described in detail above, and specific examples are applied in the text to explain the principle and the implementation of the invention, and the description of the above examples is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A recommendation method, characterized in that the method comprises:
acquiring voice information input by a user;
performing voice recognition on the voice information to obtain text information corresponding to the voice information;
performing feature extraction on the voice information to obtain voice features corresponding to the voice information;
according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;
and determining recommendation information according to the text information and the personalized tag.
2. The method of claim 1, wherein determining recommendation information based on the textual information and the personalized tag comprises:
searching to obtain candidate information according to the entity words in the text information;
and determining recommendation information according to the personalized tag and the candidate information.
3. The method according to claim 1, wherein after obtaining the text information corresponding to the voice information, the method further comprises:
performing intention identification on the text information, and determining the recommendation intention of the user;
the determining recommendation information according to the text information and the personalized tag includes:
generating a first vector corresponding to the recommendation intention, a second vector corresponding to the personalized tag and a third vector corresponding to the voice feature;
splicing the first vector, the second vector and the third vector to obtain a user characteristic vector;
generating a fourth vector according to the candidate information corresponding to the text information;
inputting the user feature vector and the fourth vector into a recommendation model to output recommendation information through the recommendation model.
4. The method of claim 3, wherein after determining the recommendation intent of the user, the method further comprises:
acquiring slot position information corresponding to the recommendation intention;
according to the slot position information, slot position filling is carried out on the text information to obtain each slot position value corresponding to the slot position information in the text information;
the generating a first vector corresponding to the recommendation intention comprises:
and generating a first vector corresponding to the recommendation intention according to each slot position value corresponding to the slot position information in the text information.
5. The method of claim 1, wherein determining the personalized tag for the voice message according to the voice feature comprises:
inputting the voice features into a classification model so as to output personalized labels corresponding to the voice features through the classification model; and the classification model is obtained by training according to the voice sample data marked with the personalized label.
6. The method of claim 5, wherein prior to entering the speech features into a classification model, the method further comprises:
acquiring voice sample data, wherein the voice sample data is marked with a corresponding personalized tag;
performing feature extraction on the voice sample data to obtain voice sample features corresponding to the voice sample data;
inputting the voice sample characteristics into an initial classification model for training;
and in the training process, adjusting the model parameters of the initial classification model until the personalized label output by the initial classification model is matched with the personalized label labeled by the voice sample data to obtain the trained classification model.
7. The method according to any one of claims 1 to 6, wherein the acquiring the voice information input by the user comprises:
and acquiring voice information input or sent or received by a user through the instant messaging application.
8. A speech recognition apparatus, characterized in that the apparatus comprises:
the voice acquisition module is used for acquiring voice information input by a user;
the voice recognition module is used for carrying out voice recognition on the voice information to obtain text information corresponding to the voice information;
the feature extraction module is used for extracting features of the voice information to obtain voice features corresponding to the voice information;
a tag determination module, configured to determine, according to the voice feature, a personalized tag of the voice information, where the personalized tag at least includes any one of: gender, mood, health condition, region of belongings;
and the information recommendation module is used for determining recommendation information according to the text information and the personalized tag.
9. An apparatus for recommendation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:
acquiring voice information input by a user;
performing voice recognition on the voice information to obtain text information corresponding to the voice information;
performing feature extraction on the voice information to obtain voice features corresponding to the voice information;
according to the voice characteristics, determining a personalized tag of the voice information, wherein the personalized tag at least comprises any one of the following items: gender, mood, health condition, region of belongings;
and determining recommendation information according to the text information and the personalized tag.
10. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the recommendation method of one or more of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911260344.5A CN112948662A (en) | 2019-12-10 | 2019-12-10 | Recommendation method and device and recommendation device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911260344.5A CN112948662A (en) | 2019-12-10 | 2019-12-10 | Recommendation method and device and recommendation device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112948662A true CN112948662A (en) | 2021-06-11 |
Family
ID=76225747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911260344.5A Pending CN112948662A (en) | 2019-12-10 | 2019-12-10 | Recommendation method and device and recommendation device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112948662A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115187345A (en) * | 2022-09-13 | 2022-10-14 | 深圳装速配科技有限公司 | Intelligent household building material recommendation method, device, equipment and storage medium |
CN116741151A (en) * | 2023-08-14 | 2023-09-12 | 成都筑猎科技有限公司 | User call real-time monitoring system based on call center |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109582822A (en) * | 2018-10-19 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | A kind of music recommended method and device based on user speech |
CN109949071A (en) * | 2019-01-31 | 2019-06-28 | 平安科技(深圳)有限公司 | Products Show method, apparatus, equipment and medium based on voice mood analysis |
CN110136723A (en) * | 2019-04-15 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Data processing method and device based on voice messaging |
CN110473546A (en) * | 2019-07-08 | 2019-11-19 | 华为技术有限公司 | A kind of media file recommendation method and device |
-
2019
- 2019-12-10 CN CN201911260344.5A patent/CN112948662A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109582822A (en) * | 2018-10-19 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | A kind of music recommended method and device based on user speech |
CN109949071A (en) * | 2019-01-31 | 2019-06-28 | 平安科技(深圳)有限公司 | Products Show method, apparatus, equipment and medium based on voice mood analysis |
CN110136723A (en) * | 2019-04-15 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Data processing method and device based on voice messaging |
CN110473546A (en) * | 2019-07-08 | 2019-11-19 | 华为技术有限公司 | A kind of media file recommendation method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115187345A (en) * | 2022-09-13 | 2022-10-14 | 深圳装速配科技有限公司 | Intelligent household building material recommendation method, device, equipment and storage medium |
CN116741151A (en) * | 2023-08-14 | 2023-09-12 | 成都筑猎科技有限公司 | User call real-time monitoring system based on call center |
CN116741151B (en) * | 2023-08-14 | 2023-11-07 | 成都筑猎科技有限公司 | User call real-time monitoring system based on call center |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11966986B2 (en) | Multimodal entity and coreference resolution for assistant systems | |
US11315546B2 (en) | Computerized system and method for formatted transcription of multimedia content | |
JP6967059B2 (en) | Methods, devices, servers, computer-readable storage media and computer programs for producing video | |
US11394675B2 (en) | Method and device for commenting on multimedia resource | |
CN111638832A (en) | Information display method, device, system, electronic equipment and storage medium | |
CN111241822A (en) | Emotion discovery and dispersion method and device under input scene | |
CN111259148A (en) | Information processing method, device and storage medium | |
CN110222256B (en) | Information recommendation method and device and information recommendation device | |
CN113806588B (en) | Method and device for searching video | |
US20200044999A1 (en) | Voice forwarding in automated chatting | |
US11847419B2 (en) | Human emotion detection | |
CN109308334B (en) | Information recommendation method and device and search engine system | |
CN112185389A (en) | Voice generation method and device, storage medium and electronic equipment | |
CN109582869A (en) | A kind of data processing method, device and the device for data processing | |
CN110930978A (en) | Language identification method and device and language identification device | |
CN113705210A (en) | Article outline generation method and device for generating article outline | |
CN112948662A (en) | Recommendation method and device and recommendation device | |
Chong et al. | EmoChat: Bringing multimodal emotion detection to mobile conversation | |
CN111274389B (en) | Information processing method, device, computer equipment and storage medium | |
WO2021167732A1 (en) | Implementing automatic chatting during video displaying | |
CN113409766A (en) | Recognition method, device for recognition and voice synthesis method | |
JP2022018724A (en) | Information processing device, information processing method, and information processing program | |
CN115167733A (en) | Method and device for displaying live broadcast resources, electronic equipment and storage medium | |
CN113010768A (en) | Data processing method and device and data processing device | |
CN111971670A (en) | Generating responses in a conversation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |