CN112447179A - Voice interaction method, device, equipment and computer readable storage medium - Google Patents

Voice interaction method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112447179A
CN112447179A CN201910806670.5A CN201910806670A CN112447179A CN 112447179 A CN112447179 A CN 112447179A CN 201910806670 A CN201910806670 A CN 201910806670A CN 112447179 A CN112447179 A CN 112447179A
Authority
CN
China
Prior art keywords
message
voice
voice message
corresponding relation
sender
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910806670.5A
Other languages
Chinese (zh)
Inventor
马建华
李青懋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910806670.5A priority Critical patent/CN112447179A/en
Publication of CN112447179A publication Critical patent/CN112447179A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Abstract

The invention discloses a voice interaction method, a voice interaction device, voice interaction equipment and a computer readable storage medium, relates to the technical field of communication, and aims to solve the problem that a terminal with a sound box function cannot meet the requirement of efficient communication of a user in a specific scene. The method comprises the following steps: receiving a voice message of a message sender; according to the voice message, identifying the identity information of the message sender; determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map; outputting the voice message to the message recipient. The embodiment of the invention can enable the user to utilize the terminal with the sound box function to carry out efficient communication.

Description

Voice interaction method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a voice interaction method, apparatus, device, and computer-readable storage medium.
Background
The intelligent sound box is based on man-machine conversation of artificial intelligence, and an intelligent ecological circle under closed-loop scenes such as families and offices is constructed through voice input, semantic recognition, instruction execution and the like. The main functions of the current intelligent sound box are focused on high-tone-quality playing, intelligent telephone, home control, voiceprint recognition personalized hobby customization, daily man-machine conversation inquiry and other functions. However, the current smart speaker is only used as a channel for voice input and output, and cannot meet the requirement of efficient communication of users in some specific application scenarios, such as interpersonal interaction scenarios.
Disclosure of Invention
The embodiment of the invention provides a voice interaction method, a voice interaction device, voice interaction equipment and a computer readable storage medium, and aims to solve the problem that a terminal with a sound box function cannot meet the requirement of efficient communication of a user in a specific scene.
In a first aspect, an embodiment of the present invention provides a voice interaction method, applied to a terminal with a sound box function, including:
receiving a voice message of a message sender;
according to the voice message, identifying the identity information of the message sender;
determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
outputting the voice message to the message recipient.
Wherein, the identifying the identity information of the message sender according to the voice message comprises:
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the user identity.
Wherein, the identifying the identity information of the message sender according to the voice message comprises:
acquiring information of a terminal used by the message sender for sending the voice message;
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relation, wherein the second corresponding relation is the corresponding relation among a voice model, the terminal information and the user identity.
Wherein, the determining the message receiver according to the identity information of the message sender, the voice message and the preset user relationship map comprises:
converting the voice message into a text message;
performing semantic recognition on the text message based on an NLP (Natural Language Processing) algorithm to obtain a semantic recognition result;
and determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
Wherein the outputting the voice message to the message recipient comprises:
acquiring a sound model of the message receiver;
synthesizing the voice message and the sound model to obtain a synthesized voice message;
and broadcasting the synthesized voice message to the message receiver by using the sound box.
Wherein after the obtaining of the synthesized voice message, the method further comprises:
and caching the synthesized voice message.
Wherein, prior to the receiving a voice message of a message sender, the method further comprises at least one of:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity;
and constructing a user relation graph based on a knowledge graph algorithm.
In a second aspect, an embodiment of the present invention provides a voice interaction apparatus, applied to a terminal with a speaker function, including:
the receiving module is used for receiving the voice message of the message sender;
the identification module is used for identifying the identity information of the message sender according to the voice message;
the determining module is used for determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
and the output module is used for outputting the voice message to the message receiver.
Wherein the identification module comprises:
the first extraction submodule is used for extracting the voiceprint characteristics of the voice message;
and the first identification submodule is used for identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the identity of the user.
Wherein the identification module comprises:
the first obtaining submodule is used for obtaining the information of a terminal used by the message sender for sending the voice message;
the second extraction submodule is used for extracting the voiceprint characteristics of the voice message;
and the second identification submodule is used for identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relationship, wherein the second corresponding relationship is the corresponding relationship among a voice model, the terminal information and the user identity.
Wherein the determining module comprises:
the conversion submodule is used for converting the voice message into a text message;
the recognition submodule is used for carrying out semantic recognition on the character message based on an NLP algorithm to obtain a semantic recognition result;
and the determining submodule is used for determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
Wherein the output module comprises:
the obtaining submodule is used for obtaining a sound model of the message receiver;
the synthesis submodule is used for synthesizing the voice message and the sound model to obtain a synthesized voice message;
and the output submodule is used for broadcasting the synthesized voice message to the message receiver by using the sound box.
Wherein the output module further comprises:
and the buffer submodule is used for buffering the synthesized voice message.
Wherein the apparatus further comprises a setting module for performing at least one of:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity;
and constructing a user relation graph based on a knowledge graph algorithm.
In a third aspect, an embodiment of the present invention provides a communication device, including: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor;
the processor is configured to read a program in the memory to implement the steps in the method according to the first aspect.
In a fourth aspect, an embodiment of the present invention provides a voice interaction apparatus, applied to a terminal with a speaker function, including: a processor and a transceiver;
the transceiver is used for receiving a voice message of a message sender;
the processor is used for identifying the identity information of the message sender according to the voice message; determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
the transceiver is configured to output the voice message to the message recipient.
Wherein the processor is further configured to extract voiceprint features of the voice message; and identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the user identity.
The processor is further configured to acquire information of a terminal used by the message sender to send the voice message; extracting voiceprint features of the voice message; and identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relation, wherein the second corresponding relation is the corresponding relation among a voice model, the terminal information and the user identity.
Wherein the processor is further configured to convert the voice message into a text message; performing semantic recognition on the text message based on a Natural Language Processing (NLP) algorithm to obtain a semantic recognition result; and determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
Wherein the processor is further configured to obtain an acoustic model of the message recipient; synthesizing the voice message and the sound model to obtain a synthesized voice message; and broadcasting the synthesized voice message to the message receiver by using the sound box.
Wherein the processor is further configured to buffer the synthesized voice message.
Wherein the processor is further configured to perform at least one of:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity;
and constructing a user relation graph based on a knowledge graph algorithm.
In a fifth aspect, the present invention provides a computer-readable storage medium for storing a computer program, which when executed by a processor implements the steps in the method according to the first aspect.
In the embodiment of the invention, the corresponding message receiver can be determined according to the voice message of the message sender, so that the voice message can be directionally output to the message receiver. Therefore, by using the scheme of the embodiment of the invention, the user can utilize the terminal with the sound box function to carry out efficient communication.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of a voice interaction method provided by an embodiment of the invention;
fig. 2 is a structural diagram of an intelligent sound box provided in the embodiment of the present invention;
FIG. 3 is a diagram illustrating a mapping relationship established in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a relationship map in an embodiment of the invention;
FIG. 5 is a block diagram of a voice interaction apparatus according to an embodiment of the present invention;
FIG. 6 is a second block diagram of a voice interaction apparatus according to an embodiment of the present invention;
fig. 7 is a block diagram of a communication device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a voice interaction method provided by an embodiment of the present invention, and is applied to a terminal with a speaker function. The terminal includes, but is not limited to, a smart speaker, a smart phone, a set-top box, a television, and the like. That is, the method of the embodiment of the present invention can be applied as long as it has a function of a sound box. As shown in fig. 1, the voice interaction method includes the following steps:
step 101, receiving a voice message of a message sender.
Wherein, the message sender can input voice message by using a terminal with a sound box function. For example, the message sender can input a voice message through the mobile terminal, and can also input a voice message through the smart speaker.
And step 102, identifying the identity information of the message sender according to the voice message.
The identity information may be a user name, a user tag, or the like, which identifies the user.
In the embodiment of the present invention, the identity information of the sender of the message can be identified in at least two ways.
Firstly, extracting the voiceprint feature of the voice message. And then, identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the identity of the user.
In practical applications, the first corresponding relationship may be established in advance to improve processing efficiency. For example, for different users, the users are required to input voice through the smart sound box, and then the corresponding relation between the input voice and the user identity is established.
And secondly, firstly, acquiring the information of the terminal used by the message sender for sending the voice message. And then, extracting the voiceprint characteristics of the voice message, and identifying the identity information of the message sender according to the voiceprint characteristics, the terminal information and a second corresponding relation, wherein the second corresponding relation is the corresponding relation among a voice model, the terminal information and the user identity.
In practical applications, the second corresponding relationship may be established in advance to improve the processing efficiency. Each terminal has a unique identifier such as an SN (Serial Number) Number or a MAC (Medium Access Control). In practical applications, different users may be required to input voice through the terminal. And then establishing a corresponding relation among the input voice, the terminal identification and the user identity.
And 103, determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map.
In an embodiment of the present invention, the voice message may be converted into a text message. And then, performing semantic recognition on the text message based on an NLP algorithm to obtain a semantic recognition result. And finally, determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map. In this way, the message receiver can be accurately positioned.
In order to improve the processing efficiency, in the embodiment of the invention, the user relationship graph can be constructed in advance based on a knowledge graph algorithm. Wherein, the user relationship map records the relationship between different users. For example, taking a family member as an example, in the user relationship map, the relationship between different family members centering on a certain family member can be described.
And 104, outputting the voice message to the message receiver.
Here, if a voice message received from another terminal is to be played through the terminal, the voice model of the message recipient may be first obtained, and then the voice message and the voice model are synthesized to obtain a synthesized voice message. And then, broadcasting the synthesized voice message to the message receiver by using the sound box. Before playing the synthesized voice message, if the receiver of the message is inconvenient to answer, the synthesized voice message can be cached. Then, when the message receiver is convenient to answer, the cached voice message is played.
If a voice message is to be sent via a terminal to another terminal, the voice message can be sent via the internet to the message receiver.
In the embodiment of the invention, the corresponding message receiver can be determined according to the voice message of the message sender, so that the voice message can be directionally output to the message receiver. Therefore, by using the scheme of the embodiment of the invention, the user can utilize the terminal with the sound box function to carry out efficient communication.
Hereinafter, a method for implementing voice interaction by using a smart speaker will be described by taking the smart speaker as an example. Fig. 2 is a schematic structural diagram of the smart speaker. In fig. 2, the smart speaker may include:
the smart speaker peripheral module 201 may broadcast the received voice content through the module, and may also implement voice input of an external sound source through the module.
A voice input and transcription module 202, wherein the voice input sub-module is used for extracting voice and voiceprint characteristics of the input voice information; and the voice transcription submodule is used for converting the voice information into a text message, namely NLP natural language processing.
The relational model matching module 203 is used for recognizing the roles of the relational graph by utilizing the voiceprint characteristics extracted by the voice input submodule; for voice information received from other mobile terminals, the identity recognition of a message sender is firstly recognized, and a human voice model is matched according to the identity recognition result.
A voice output and synthesis module 204, wherein the voice synthesis submodule is used for realizing voice synthesis of the voice message and the human voice model; and the voice output submodule is used for broadcasting the voice message with the real sound model of the message sender.
The message queue 205 is configured to buffer received or to-be-sent voice messages according to a time sequence, complete processing of the messages one by one, and play the messages when a receiving party of the to-be-confirmed messages can receive or listen to the messages. The message is not limited to voice and text, and may be displayed with pictures, short videos, and the like.
The intelligent scheduling module 206 recognizes the message receiver of the voice message mainly through the natural language and semantic processing of the NLP, and completes the calculation of the relation map between the message receiver and the voice message.
In fig. 2, the smart speaker and the mobile terminal may transmit a voice message through an internet server. And the internet server is used for establishing the connection between the intelligent sound box and the APP mobile client and forwarding or storing the service request. The internet server provides portal services through the public network IP. The public network IP address is built in a common intelligent sound box, and the ID or SN of the sound box is registered after the public network IP address is activated. After the APP mobile client is connected with the server, the ID or SN of the intelligent sound box can be searched, and after the sound box side confirms authentication, the APP mobile client can be connected to the intelligent sound box to receive or send messages.
The intelligent sound box is connected in a multi-APP mobile client side mode, random multi-message sending is carried out, and messages can be accurately played to a target person to be listened through equipment identification and semantic recognition.
Based on the structure, firstly, a voice model library can be configured in a user-defined mode in the intelligent sound box system. And secondly, the intelligent sound box binds the accessed unique identification such as the SN number or the MAC of the mobile terminal equipment and defines the member role of the mobile terminal equipment. Then, defining a mapping relation between the corresponding voice model and the access equipment, and establishing a relation map between the access members. And finally, constructing an intelligent message scheduling, pushing and interaction system of the intelligent sound box. The system comprises a voice role recognition module, a voice recognition module and a voice recognition module, wherein the voice role recognition module is used for recognizing a voice role of a sound source input by an intelligent sound box side; the voice is transcribed into NLP and the message keywords are identified to complete semantic identification; the message is dispatched and pushed to the access mobile terminal; and receiving instant messages of the mobile terminal equipment or the account, calling voice playing of a corresponding role voice model and the like.
Through the processing, the multi-party identification interaction process with voice synthesis is finally realized in family members or user-defined relationship circles through the intelligent sound box, and the efficient and convenient communication process in a certain scene is met.
Hereinafter, each of the above-described processes will be described in detail.
Firstly, in the intelligent sound box system, a human voice model library can be configured in a user-defined mode. Representative speaking voice can be input through a mobile terminal, a microphone of a computer or an intelligent sound box, the voiceprint characteristics of the voice are extracted through a machine learning algorithm, and a label is defined for a human voice model file with the voiceprint. Taking a family intelligent sound box as an example, the established voice model file label is mapped by the real names of family members. Namely, the corresponding relation between the real names of the family members and the voice model files is established.
In practical application, the smart sound box can also bind unique identifiers such as SN (serial number) or MAC (media access control) of the accessed mobile terminal equipment. Through the network connection of the mobile terminal to the intelligent sound box, safe authority connection can be established through scanning the SN or the verification identification code of the intelligent sound box. And simultaneously mapping the connected mobile terminal with the real names of the family members. As shown in fig. 3, the relationship is the correspondence between the established family members, the voice model, and the mobile terminal.
Secondly, defining the roles of family members or relationship circles and establishing a relationship map between the members. When the relation map is established, the main relation can be established by taking the role of any member as a central point. And then, carrying out relationship automatic generation calculation by using an inference algorithm of the knowledge graph. After that, it can also be confirmed by a human.
As shown in fig. 4, by taking the member Dd as the center, the direct relationship with the family member is established through the tag, which is shown by the solid line in the figure; the relations among other members can automatically complete member label supplement through a knowledge graph algorithm, and an overall relation graph of the family members is constructed and is shown by a dotted line in the graph.
And finally, constructing an intelligent message scheduling module of the intelligent sound box.
With reference to fig. 2, for the voice message input by the user and received by the smart speaker, the smart speaker performs certain processing and then sends the processed voice message to the corresponding mobile terminal.
The input voice message of the intelligent sound box is finally sent to a message receiver through the processes of human voice model recognition, voice transcription, message semantic recognition, calculation of a relation map between a message sender and a message receiver, identity determination of the message receiver, message sending and the like.
Specifically, the smart speaker recognizes the input voice message and determines a corresponding voice model. And converting the voice message into a text message, and performing semantic recognition. And determining a message receiver according to the human voice model, the semantic recognition result and the corresponding relation map. The voice message is then sent to the message recipient.
For example, the voice message inputted by the page Cc is: "is mom, father do you make fish soup today and when you get back to work? ". Then, after the intelligent sound box is analyzed, the voice model is Cc, the relationship graph is calculated as the conversation between the mother and the woman, the receiver of the message is Liu Dd girl, and the message is pushed to Liu Dd.
The voice message input by Dd is: "mom, dad have stomach uncomfortable, buy the dish and return to have time to buy the medicine of warm stomach in the pharmacy, buy just can". Then, after the intelligent sound box is analyzed, the voice model is a Dd, the relationship graph is calculated as a mother-child conversation, the receiver of the message is supposed to be the girl of the Wang Ab, and the message is pushed to the Wang Ab.
Through the method, the voice print recognition of the family members is carried out, the identity of the speaker is confirmed, and even if the identity is called by the same name, the relationship graph can be accurately pushed to the message receiving end in a manner of establishing the relationship graph.
With reference to fig. 2, for the received voice message sent by another mobile terminal, the smart speaker performs certain processing and then plays the voice message to the corresponding user.
The voice message sent by the mobile terminal is played to the corresponding user through the processes of message sender identity recognition, semantic recognition, message sender and receiver relation map calculation, receiver identity determination, voice synthesis and the like.
Specifically, the intelligent sound box identifies the voice message sent by the mobile terminal, identifies the identity of the message sender, and determines the corresponding voice model. And converting the voice message into a text message, and performing semantic recognition. And determining a message receiver according to the human voice model, the semantic recognition result and the corresponding relation map. Then, the voice message and the human voice model are synthesized, and the synthesized voice message is played for the corresponding user.
For example, the voice message sent by the mobile client (liu Dd) is: "mom overtime today, go back later, you write work and have a break in the morning". Processing by an intelligent sound box to determine that the message sender is Liu Dd, and performing semantic recognition through natural voice processing to determine that the identity of the message sender is mother; and determining that the voice message should be sent to the daughter card Cc through the calculation of the relation map. Then, after only Cc wakes up the voiceprint recognition of the smart speaker, the message is broadcast to notify Cc.
The voice message sent by the mobile client (liu Dd) is: "mom, i shift with me today, go back later, Cc writes work and you take her morning bar to rest". The message sender is determined to be Liu Dd through the processing of the intelligent sound box; through semantic recognition of natural voice processing, the relationship graph calculates that the message should be sent to the Wang Ab. Then, after only the wang Ab wakes up the voiceprint recognition of the smart speaker, the message will be broadcast to notify the wang Ab, but not to its daughter or others.
It can be seen from the above description that, in the embodiment of the present invention, in order to implement voice synthesis interaction, the smart speaker identifies and determines members through voiceprint, searches through the relationship graph, and analyzes the role of the personalized voice in the family or the relationship circle. And performing semantic recognition after voice is converted into text on the input sound source message, and accurately pushing the input sound source message to a message receiver in an intelligent matching manner. Therefore, the scheme of the embodiment of the invention can solve the problem that the receiver cannot distinguish the message role when the voice command is input at the intelligent sound box side by a plurality of people in an interactive and random way. Meanwhile, in the embodiment of the invention, the interpersonal relationship map is finally established, and the multi-party online or offline interaction is realized in a voice model matching mode.
The embodiment of the invention also provides a voice interaction device. Referring to fig. 5, fig. 5 is a structural diagram of a voice interaction apparatus according to an embodiment of the present invention. Because the principle of the voice interaction device for solving the problem is similar to the voice interaction method in the embodiment of the present invention, the implementation of the voice interaction device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 5, the voice interaction apparatus includes:
a receiving module 501, configured to receive a voice message of a message sender; an identifying module 502, configured to identify, according to the voice message, identity information of the message sender; a determining module 503, configured to determine a message receiver according to the identity information of the message sender, the voice message, and a preset user relationship map; an output module 504, configured to output the voice message to the message recipient.
Optionally, the identifying module 502 includes: the first extraction submodule is used for extracting the voiceprint characteristics of the voice message; and the first identification submodule is used for identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the identity of the user.
Optionally, the identifying module 502 includes: the first obtaining submodule is used for obtaining the information of a terminal used by the message sender for sending the voice message; the second extraction submodule is used for extracting the voiceprint characteristics of the voice message; and the second identification submodule is used for identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relationship, wherein the second corresponding relationship is the corresponding relationship among a voice model, the terminal information and the user identity.
Optionally, the determining module 503 includes: the conversion submodule is used for converting the voice message into a text message; the recognition submodule is used for carrying out semantic recognition on the character message based on an NLP algorithm to obtain a semantic recognition result; and the determining submodule is used for determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
Optionally, the output module 504 includes: the obtaining submodule is used for obtaining a sound model of the message receiver; the synthesis submodule is used for synthesizing the voice message and the sound model to obtain a synthesized voice message; and the output submodule is used for broadcasting the synthesized voice message to the message receiver by using the sound box.
Wherein the output module 504 further comprises: and the buffer submodule is used for buffering the synthesized voice message.
Optionally, the apparatus further includes a setting module, configured to perform at least one of the following:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity; and constructing a user relation graph based on a knowledge graph algorithm.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the invention also provides a voice interaction device. Referring to fig. 6, fig. 6 is a structural diagram of a voice interaction apparatus according to an embodiment of the present invention. Because the principle of the voice interaction device for solving the problem is similar to the voice interaction method in the embodiment of the present invention, the implementation of the voice interaction device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 6, the voice interaction apparatus includes: a processor 601 and a transceiver 602.
Wherein the transceiver 602 is configured to receive a voice message of a message sender;
the processor 601 is configured to identify, according to the voice message, identity information of the message sender; determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
the transceiver 602 is configured to output the voice message to the message recipient.
Wherein the processor 601 is further configured to extract a voiceprint feature of the voice message; and identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the user identity.
The processor 601 is further configured to obtain information of a terminal used by the message sender to send the voice message; extracting voiceprint features of the voice message; and identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relation, wherein the second corresponding relation is the corresponding relation among a voice model, the terminal information and the user identity.
Wherein, the processor 601 is further configured to convert the voice message into a text message; performing semantic recognition on the text message based on a Natural Language Processing (NLP) algorithm to obtain a semantic recognition result; and determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
Wherein, the processor 601 is further configured to obtain an acoustic model of the message recipient; synthesizing the voice message and the sound model to obtain a synthesized voice message; and broadcasting the synthesized voice message to the message receiver by using the sound box.
Wherein the processor 601 is further configured to buffer the synthesized voice message.
Wherein the processor 601 is further configured to perform at least one of:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity;
and constructing a user relation graph based on a knowledge graph algorithm.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
As shown in fig. 7, the communication device according to the embodiment of the present invention is applied to a terminal having a speaker function, and includes:
the processor 700, which is used to read the program in the memory 720, executes the following processes:
receiving a voice message of a message sender through the transceiver 710; according to the voice message, identifying the identity information of the message sender; determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map; outputting the voice message to the message recipient.
A transceiver 710 for receiving and transmitting data under the control of the processor 700.
Where in fig. 7, the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 700 and memory represented by memory 720. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 710 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. The user interface 730 may also be an interface capable of interfacing with a desired device for different user devices, including but not limited to a keypad, display, speaker, microphone, joystick, etc.
The processor 700 is responsible for managing the bus architecture and general processing, and the memory 720 may store data used by the processor 700 in performing operations.
The processor 700 is further configured to read the computer program and perform the following steps:
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the user identity.
The processor 700 is further configured to read the computer program and perform the following steps:
acquiring information of a terminal used by the message sender for sending the voice message;
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relation, wherein the second corresponding relation is the corresponding relation among a voice model, the terminal information and the user identity.
The processor 700 is further configured to read the computer program and perform the following steps:
converting the voice message into a text message;
performing semantic recognition on the text message based on a Natural Language Processing (NLP) algorithm to obtain a semantic recognition result;
and determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
The processor 700 is further configured to read the computer program and perform the following steps:
acquiring a sound model of the message receiver;
synthesizing the voice message and the sound model to obtain a synthesized voice message;
and broadcasting the synthesized voice message to the message receiver by using the sound box.
The processor 700 is further configured to read the computer program and perform the following steps:
and caching the synthesized voice message.
The processor 700 is further configured to read the computer program and perform the following steps:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity;
and constructing a user relation graph based on a knowledge graph algorithm.
Furthermore, a computer-readable storage medium of an embodiment of the present invention stores a computer program executable by a processor to implement:
receiving a voice message of a message sender;
according to the voice message, identifying the identity information of the message sender;
determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
outputting the voice message to the message recipient.
Wherein, the identifying the identity information of the message sender according to the voice message comprises:
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the user identity.
Wherein, the identifying the identity information of the message sender according to the voice message comprises:
acquiring information of a terminal used by the message sender for sending the voice message;
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relation, wherein the second corresponding relation is the corresponding relation among a voice model, the terminal information and the user identity.
Wherein, the determining the message receiver according to the identity information of the message sender, the voice message and the preset user relationship map comprises:
converting the voice message into a text message;
performing semantic recognition on the text message based on a Natural Language Processing (NLP) algorithm to obtain a semantic recognition result;
and determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
Wherein the outputting the voice message to the message recipient comprises:
acquiring a sound model of the message receiver;
synthesizing the voice message and the sound model to obtain a synthesized voice message;
and broadcasting the synthesized voice message to the message receiver by using the sound box.
Wherein after the obtaining of the synthesized voice message, the method further comprises:
and caching the synthesized voice message.
Wherein, prior to the receiving a voice message of a message sender, the method further comprises at least one of:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity; and constructing a user relation graph based on a knowledge graph algorithm.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (11)

1. A voice interaction method is applied to a terminal with a sound box function, and is characterized by comprising the following steps:
receiving a voice message of a message sender;
according to the voice message, identifying the identity information of the message sender;
determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
outputting the voice message to the message recipient.
2. The method of claim 1, wherein the identifying identity information of the sender of the message based on the voice message comprises:
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics and a first corresponding relation, wherein the first corresponding relation is the corresponding relation between a voice model and the user identity.
3. The method of claim 1, wherein the identifying identity information of the sender of the message based on the voice message comprises:
acquiring information of a terminal used by the message sender for sending the voice message;
extracting voiceprint features of the voice message;
and identifying the identity information of the message sender according to the voiceprint characteristics, the information of the terminal and a second corresponding relation, wherein the second corresponding relation is the corresponding relation among a voice model, the terminal information and the user identity.
4. The method of claim 1, wherein determining a message recipient according to the identity information of the message sender, the voice message and a preset user relationship map comprises:
converting the voice message into a text message;
performing semantic recognition on the text message based on a Natural Language Processing (NLP) algorithm to obtain a semantic recognition result;
and determining a message receiver according to the identity information of the message sender, the semantic recognition result and the user relationship map.
5. The method of claim 1, wherein outputting the voice message to the message recipient comprises:
acquiring a sound model of the message receiver;
synthesizing the voice message and the sound model to obtain a synthesized voice message;
and broadcasting the synthesized voice message to the message receiver by using the sound box.
6. The method of claim 5, wherein after the obtaining the synthesized voice message, the method further comprises:
and caching the synthesized voice message.
7. The method of claim 1, wherein prior to the receiving a voice message of a message sender, the method further comprises at least one of:
establishing a first corresponding relation or a second corresponding relation, wherein the first corresponding relation is a corresponding relation between a voice model and a user identity, and the second corresponding relation is a corresponding relation between the voice model, terminal information and the user identity;
and constructing a user relation graph based on a knowledge graph algorithm.
8. The utility model provides a voice interaction device, is applied to the terminal that has the audio amplifier function, its characterized in that includes:
the receiving module is used for receiving the voice message of the message sender;
the identification module is used for identifying the identity information of the message sender according to the voice message;
the determining module is used for determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
and the output module is used for outputting the voice message to the message receiver.
9. A communication device, comprising: a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor; it is characterized in that the preparation method is characterized in that,
the processor for reading the program in the memory to implement the steps in the method of any one of claims 1 to 7.
10. The utility model provides a voice interaction device, is applied to the terminal that has the audio amplifier function, its characterized in that includes: a processor and a transceiver;
the transceiver is used for receiving a voice message of a message sender;
the processor is used for identifying the identity information of the message sender according to the voice message; determining a message receiver according to the identity information of the message sender, the voice message and a preset user relation map;
the transceiver is configured to output the voice message to the message recipient.
11. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the steps in the method of any one of claims 1 to 7.
CN201910806670.5A 2019-08-29 2019-08-29 Voice interaction method, device, equipment and computer readable storage medium Pending CN112447179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910806670.5A CN112447179A (en) 2019-08-29 2019-08-29 Voice interaction method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910806670.5A CN112447179A (en) 2019-08-29 2019-08-29 Voice interaction method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112447179A true CN112447179A (en) 2021-03-05

Family

ID=74740740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910806670.5A Pending CN112447179A (en) 2019-08-29 2019-08-29 Voice interaction method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112447179A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436625A (en) * 2021-06-25 2021-09-24 安徽淘云科技股份有限公司 Man-machine interaction method and related equipment thereof
CN114124605A (en) * 2021-11-25 2022-03-01 珠海格力电器股份有限公司 Control method of smart home, smart home equipment, nonvolatile storage medium and processor

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010091677A (en) * 2000-03-17 2001-10-23 최승현 selective on-line interactive system using voice synthesis and method therefore
US6507643B1 (en) * 2000-03-16 2003-01-14 Breveon Incorporated Speech recognition system and method for converting voice mail messages to electronic mail messages
US20180013718A1 (en) * 2015-11-17 2018-01-11 Tencent Technology (Shenzhen) Company Limited Account adding method, terminal, server, and computer storage medium
CN107770047A (en) * 2017-10-12 2018-03-06 上海斐讯数据通信技术有限公司 Intelligent sound box, the system and method for realizing based on intelligent sound box social functions
CN109379499A (en) * 2018-11-20 2019-02-22 北京千丁互联科技有限公司 A kind of voice call method and device
CN110866410A (en) * 2019-11-15 2020-03-06 深圳市赛为智能股份有限公司 Multi-language conversion method, device, computer equipment and storage medium
CN114495921A (en) * 2020-11-11 2022-05-13 上海擎感智能科技有限公司 Voice processing method and device and computer storage medium
CN116052666A (en) * 2023-02-21 2023-05-02 之江实验室 Voice message processing method, device, system, electronic device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6507643B1 (en) * 2000-03-16 2003-01-14 Breveon Incorporated Speech recognition system and method for converting voice mail messages to electronic mail messages
KR20010091677A (en) * 2000-03-17 2001-10-23 최승현 selective on-line interactive system using voice synthesis and method therefore
US20180013718A1 (en) * 2015-11-17 2018-01-11 Tencent Technology (Shenzhen) Company Limited Account adding method, terminal, server, and computer storage medium
CN107770047A (en) * 2017-10-12 2018-03-06 上海斐讯数据通信技术有限公司 Intelligent sound box, the system and method for realizing based on intelligent sound box social functions
CN109379499A (en) * 2018-11-20 2019-02-22 北京千丁互联科技有限公司 A kind of voice call method and device
CN110866410A (en) * 2019-11-15 2020-03-06 深圳市赛为智能股份有限公司 Multi-language conversion method, device, computer equipment and storage medium
CN114495921A (en) * 2020-11-11 2022-05-13 上海擎感智能科技有限公司 Voice processing method and device and computer storage medium
CN116052666A (en) * 2023-02-21 2023-05-02 之江实验室 Voice message processing method, device, system, electronic device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436625A (en) * 2021-06-25 2021-09-24 安徽淘云科技股份有限公司 Man-machine interaction method and related equipment thereof
CN114124605A (en) * 2021-11-25 2022-03-01 珠海格力电器股份有限公司 Control method of smart home, smart home equipment, nonvolatile storage medium and processor

Similar Documents

Publication Publication Date Title
CN102842306B (en) Sound control method and device, voice response method and device
CN104618780B (en) Electrical equipment control method and system
CN103327181B (en) Voice chatting method capable of improving efficiency of voice information learning for users
CN104700836A (en) Voice recognition method and voice recognition system
CN109739971A (en) A method of full duplex Intelligent voice dialog is realized based on wechat small routine
CN107205097B (en) Mobile terminal searching method and device and computer readable storage medium
US11244686B2 (en) Method and apparatus for processing speech
CN102984496B (en) The processing method of the audiovisual information in video conference, Apparatus and system
CN104735480A (en) Information sending method and system between mobile terminal and television
CN104144108A (en) Information response method, device and system
CN101944360A (en) Method and terminal for convenient use
CN106847256A (en) A kind of voice converts chat method
CN112447179A (en) Voice interaction method, device, equipment and computer readable storage medium
CN105427856B (en) Appointment data processing method and system for intelligent robot
KR101351264B1 (en) System and method for message translation based on voice recognition
CN112599130A (en) Intelligent conference system based on intelligent screen
CN110706704A (en) Method, device and computer equipment for generating voice interaction prototype
CN109977427A (en) A kind of miniature wearable real time translator
CN111028837B (en) Voice conversation method, voice recognition system and computer storage medium
CN113345440A (en) Signal processing method, device and equipment and Augmented Reality (AR) system
CN113763925A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
WO2021134284A1 (en) Voice information processing method, hub device, control terminal and storage medium
CN115455991A (en) Translation method in conference, server and readable storage medium
US11830120B2 (en) Speech image providing method and computing device for performing the same
CN110855832A (en) Method and device for assisting call and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination