CN111312247A - Voice interaction method and device - Google Patents

Voice interaction method and device Download PDF

Info

Publication number
CN111312247A
CN111312247A CN202010101353.6A CN202010101353A CN111312247A CN 111312247 A CN111312247 A CN 111312247A CN 202010101353 A CN202010101353 A CN 202010101353A CN 111312247 A CN111312247 A CN 111312247A
Authority
CN
China
Prior art keywords
user
query
search result
voice
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010101353.6A
Other languages
Chinese (zh)
Inventor
侯柏岑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Shanghai Xiaodu Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010101353.6A priority Critical patent/CN111312247A/en
Publication of CN111312247A publication Critical patent/CN111312247A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application discloses a voice interaction method and device. One embodiment of the method comprises: determining query words and query intentions based on query voice input by a user; searching based on the query words and the query intention to obtain a primary search result, and replying the primary search result to the user; determining emotion information of a user based on a search result; and modifying the primary search result based on the emotion information to obtain a secondary search result, and replying the secondary search result to the user. According to the embodiment, active correction and active reply are carried out based on the emotion information of the user, the voice interaction cost of the user is reduced, and the accuracy of the user for obtaining information is improved.

Description

Voice interaction method and device
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a voice interaction method and device.
Background
With the rapid development of artificial intelligence, various applications based on artificial intelligence emerge endlessly, and great convenience is brought to the life of people. For example, the conversational artificial intelligence secretary integrates three functions of voice recognition, natural language processing and machine learning, a user can communicate with the conversational artificial intelligence secretary in a one-to-one mode by using voice, characters or pictures, the conversational artificial intelligence secretary can clearly understand multiple requirements of the user in conversation, and then various high-quality services are provided for the user on the basis of widely indexing real-world services and information.
In the voice interaction process of the conventional conversational artificial intelligence secretary, the query voice is processed only when the user inputs the query voice, and a processing result is replied to the user. However, since the voice interaction process involves voice converting text, text converting voice. And errors can happen inevitably during conversion, so that search results cannot meet the requirements of users. At this time, the user is required to input the query voice again to perform the secondary search.
Disclosure of Invention
The embodiment of the application provides a voice interaction method and device.
In a first aspect, an embodiment of the present application provides a voice interaction method, including: determining query words and query intentions based on query voice input by a user; searching based on the query words and the query intention to obtain a primary search result, and replying the primary search result to the user; determining emotion information of a user based on a search result; and modifying the primary search result based on the emotion information to obtain a secondary search result, and replying the secondary search result to the user.
In some embodiments, determining emotional information of the user based on the one-time search result comprises: collecting a facial image of a user; extracting expression response information of the user from the facial image; determining emotion information of the user based on the emotional response information.
In some embodiments, determining emotional information of the user based on the one-time search result comprises: collecting response voice of a user; extracting voice response information of the user from the response voice, wherein the voice response information comprises at least one of the following items: silence duration, voice content, voice tone; mood information of the user is determined based on the voice response information.
In some embodiments, modifying the primary search results based on the emotional information to obtain secondary search results comprises: performing text correction on the query words to obtain corrected words; and searching based on the corrected characters and the query intention to obtain a secondary search result.
In some embodiments, modifying the primary search results based on the emotional information to obtain secondary search results comprises: performing intention correction on the query intention to obtain a corrected intention; and searching based on the query words and the correction intention to obtain a secondary search result.
In a second aspect, an embodiment of the present application provides a voice interaction apparatus, including: a first determination unit configured to determine a query word and a query intention based on a query voice input by a user; the search unit is configured to search based on the query words and the query intention, obtain a search result and reply the search result to the user; a second determination unit configured to determine emotion information of the user based on the one-time search result; and the correcting unit is configured to correct the primary search result based on the emotion information to obtain a secondary search result, and reply the secondary search result to the user.
In some embodiments, the second determination unit comprises: a first acquisition subunit configured to acquire a face image of a user; a first extraction subunit configured to extract expression response information of the user from the face image. A first determining subunit configured to determine emotion information of the user based on the expressive response information.
In some embodiments, the second determination unit comprises: a second acquisition subunit configured to acquire a response voice of the user; a second extraction subunit configured to extract voice response information of the user from the response voice, wherein the voice response information includes at least one of: silence duration, voice content, voice tone; a second determining subunit configured to determine emotion information of the user based on the voice response information.
In some embodiments, the correction unit comprises: the first correction subunit is configured to perform text correction on the query word to obtain a corrected word; and the first searching subunit is configured to perform searching based on the corrected characters and the query intention to obtain a secondary searching result.
In some embodiments, the correction unit comprises: the second correcting subunit is configured to perform intention correction on the query intention to obtain a corrected intention; and the second searching subunit is configured to search based on the query words and the correction intention to obtain a secondary searching result.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
According to the voice interaction method and the voice interaction device, firstly, query characters and query intentions are determined based on query voice input by a user; then, searching based on the query words and the query intention to obtain a primary search result, and replying the primary search result to the user; then determining emotion information of the user based on the primary search result; and finally, correcting the primary search result based on the emotion information to obtain a secondary search result, and replying the secondary search result to the user. The emotion information of the user is actively corrected and actively replied, so that the voice interaction cost of the user is reduced, and the satisfaction degree of the user for acquiring the information is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture to which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a voice interaction method according to the present application;
FIG. 3 is a flow diagram of yet another embodiment of a voice interaction method according to the present application;
FIG. 4 is a flow diagram of another embodiment of a voice interaction method according to the present application;
FIG. 5 is a timing diagram for one embodiment of a voice interaction device according to the present application;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the voice interaction method or voice interaction apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a conversational artificial intelligence secretary or the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices supporting voice interaction, including but not limited to smart phones, tablet computers, smart speakers, wearable devices, and so on. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a backend server of a search engine, which may analyze and process data such as query text and query intention received from the terminal devices 101, 102, 103, and feed back processing results (e.g., secondary search results) to the terminal devices 101, 102, 103.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the voice interaction method provided in the embodiment of the present application is generally executed by the terminal devices 101, 102, and 103, and accordingly, the voice interaction apparatus is generally disposed in the terminal devices 101, 102, and 103.
With continued reference to FIG. 2, a flow 200 of one embodiment of a voice interaction method according to the present application is shown. The voice interaction method comprises the following steps:
step 201, determining query words and query intentions based on query voice input by a user.
In this embodiment, an execution subject of the voice interaction method (e.g., the terminal devices 101, 102, 103 shown in fig. 1) may collect a query voice input by a user, and determine query text and query intent based on the query voice input by the user.
Generally, the execution body has a voice assistant function. Specifically, the execution main body may be a terminal device configured with a voice input module such as a microphone, for example, a smart phone, a tablet computer, a smart speaker, a wearable device, and the like. The user can speak the query voice to the execution main body, and the voice input module such as a microphone and the like arranged on the execution main body can collect the query voice spoken by the user, perform voice recognition on the query voice, convert the query voice into vocabulary contents in a written language form, namely query characters. The intent may be a search result type including, but not limited to, encyclopedia, video, games, images, and the like. If there is a character that clearly expresses the intention in the query character, the execution subject may determine the query intention according to the character that clearly expresses the intention. If there is no text that clearly expresses the intention in the query text, the query text may correspond to at least one intention, and the execution main body may select one intention from the at least one intention as the query intention. For example, the execution subject may randomly choose one intent from the at least one intent as the query intent. For another example, the execution subject may select an intention most searched by a majority of users as a query intention from the at least one intention.
Step 202, searching is carried out based on the query words and the query intents, a search result is obtained, and the search result is replied to the user.
In this embodiment, the execution main body may perform a search based on the query text and the query intent, obtain a search result, and reply the search result to the user. Specifically, the executing entity may send the query text to a backend server (e.g., the server 105 shown in fig. 1) of the search engine, and the backend server of the search engine may perform information search on the internet by using the query text to obtain a search result corresponding to the query text. Then, the execution main body can select a search result meeting the query intention from the search results corresponding to the query words, and the search result is played or displayed for the user as a search result.
In step 203, emotion information of the user based on the search result is determined.
In the present embodiment, the execution subject described above may determine emotion information of the user based on the one-time search result. Specifically, in the process of playing or displaying a search result, the execution main body may collect information for expressing the emotion of the user, and analyze the information for expressing the emotion of the user to determine the emotion information of the user. The information for expressing the emotion of the user may include, but is not limited to, a facial image of the user, a response voice of the user, and the like.
In some optional implementations of this embodiment, the executing subject may capture a facial image of the user; extracting expression response information of the user from the facial image; determining emotion information of the user based on the emotional response information. The expressive response information of the user may include, but is not limited to, frown, face left, white eyes, etc.
In some optional implementation manners of this embodiment, the execution subject may collect a response voice of the user; extracting voice response information of the user from the response voice; mood information of the user is determined based on the voice response information. The voice response information may include, but is not limited to, a silent period, voice content, voice tone, and the like.
And 204, correcting the primary search result based on the emotion information to obtain a secondary search result, and replying the secondary search result to the user.
In this embodiment, the executing entity may modify the primary search result based on the emotion information, obtain a secondary search result, and reply the secondary search result to the user. Specifically, the execution body may determine whether the user is satisfied with the one-time search result based on the emotion information. If the user is not satisfied with the first search result, the execution main body may obtain a search result which is related to the query word or the query intention but is different from the first search result, and the search result is played or displayed for the user as a second search result.
The voice interaction method provided by the embodiment of the application comprises the steps of firstly determining query characters and query intentions based on query voice input by a user; then, searching based on the query words and the query intention to obtain a primary search result, and replying the primary search result to the user; then determining emotion information of the user based on the primary search result; and finally, correcting the primary search result based on the emotion information to obtain a secondary search result, and replying the secondary search result to the user. The emotion information of the user is actively corrected and actively replied, so that the voice interaction cost of the user is reduced, and the satisfaction degree of the user for acquiring the information is improved.
With further reference to FIG. 3, a flow 300 of yet another embodiment of a voice interaction method according to the present application is shown. The voice interaction method comprises the following steps:
step 301, determining query words and query intentions based on query voice input by a user.
Step 302, searching is carried out based on the query words and the query intention, a search result is obtained, and the search result is replied to the user.
Step 303, determining emotion information of the user based on the search result.
In the present embodiment, the specific operations of step 301-.
And step 304, performing text correction on the query words to obtain corrected words.
In this embodiment, an execution main body (for example, terminal devices 101, 102, and 103 shown in fig. 1) of the voice interaction method may perform text correction on the query text to obtain a corrected text. Specifically, the execution main body may correct the query word into a word similar to the pronunciation thereof as a corrected word.
And 305, searching based on the corrected characters and the query intention, obtaining a secondary search result, and replying the secondary search result to the user.
In this embodiment, the execution subject may perform a search based on the corrected text and the query intent, obtain a secondary search result, and reply the secondary search result to the user. Specifically, the executing entity may send the corrected word to a backend server (e.g., the server 105 shown in fig. 1) of the search engine, and the backend server of the search engine may perform information search on the internet by using the corrected word to obtain a search result corresponding to the corrected word. Then, the execution main body can select the search result meeting the query intention from the search results corresponding to the corrected characters, and the search result is played or displayed as a secondary search result for the user.
For ease of understanding, an application scenario of the voice interaction method is described below.
Firstly, a user speaks query voice to the smart sound box: who is wald?
Then, the intelligent sound box converts the query voice into query words: who is mardel?
Then, the intelligent sound box searches based on the query words and plays voice to the user: and (3) all over, the Chinese character chen. You can ask me what the tail number is today for restriction.
Subsequently, the intelligent sound box collects the frowns of the user through the camera.
Then, the intelligent sound box corrects the query words into correction words: who is wald?
And finally, the intelligent equipment searches based on the corrected characters and plays voice to the user: do you want to ask neither wald? I know one of his, Oscar Wander, great writer and artist … … in the great Britain of the 19 th century
As can be seen from fig. 3, the flow 300 of the voice interaction method in this embodiment highlights the modification step compared to the embodiment corresponding to fig. 2. Therefore, the scheme described in this embodiment corrects the query word based on the emotion information of the user, performs secondary search based on the corrected word, and actively replies to the user. By correcting the query words and initiating the search again, the user does not need to input the query voice again, the voice interaction cost of the user is reduced, and the satisfaction degree of the user for acquiring information is improved.
With further reference to FIG. 4, a flow 400 of another embodiment of a voice interaction method according to the present application is shown. The voice interaction method comprises the following steps:
step 401, determining query words and query intentions based on query voice input by a user.
Step 402, searching based on the query words and the query intents to obtain a primary search result, and replying the primary search result to the user.
In step 403, emotion information of the user based on the search result is determined.
In the present embodiment, the specific operations of steps 401-.
And step 404, performing intention correction on the query intention to obtain a corrected intention.
In the embodiment, the executing body (for example, the terminal devices 101, 102, 103 shown in fig. 1) of the voice interaction method may perform intent correction on the query intent, so as to obtain a corrected intent. Specifically, the execution subject may select an intention different from the query intention from at least one intention corresponding to the query text as the modification intention.
And 405, searching based on the query words and the correction intention to obtain a secondary search result, and replying the secondary search result to the user.
In this embodiment, the execution subject may perform a search based on the query word and the correction intention, obtain a secondary search result, and reply the secondary search result to the user. Specifically, the execution main body may select a search result satisfying the modification intention from search results corresponding to the query text, and play or display the search result as a secondary search result for the user.
For ease of understanding, an application scenario of the voice interaction method is described below.
Firstly, a user speaks query voice to the smart sound box: yin Yang teacher.
Then, the intelligent sound box converts the query voice into query words: yin Yang teacher.
Then, the intelligent sound box searches based on the query words and plays voice to the user: for you to find "Yin Yang Shi" of TV drama, you can say me that the 1 st episode is played.
Then, the smart speaker collects the user output response voice through the microphone: i do not want this.
Then, the intelligent sound box revises the query intention into a revision intention: and (6) playing the game.
Finally, the smart device plays the voice to the user: you also found the game yin and yang chef.
In addition, if intelligent audio amplifier gathers the user through the camera and frown, intelligent audio amplifier still can revise the inquiry intention as revising the intention: encyclopedia, and play speech to the user: i also know that Yin Yang teacher is a wizard which grasps the way of Yin and Yang, understands the star and the face of the opponent, and also can detect the direction, know disaster, draw letters and perform magic.
As can be seen from fig. 4, the flow 400 of the voice interaction method in this embodiment highlights the modification step compared to the embodiment corresponding to fig. 2. Therefore, the scheme described in the embodiment corrects the query intention based on the emotion information of the user, performs secondary search based on the corrected intention, and actively replies to the user. The search is initiated again by understanding the intention corresponding to the query voice again, the user does not need to input the query voice again, the voice interaction cost of the user is reduced, and the satisfaction degree of the user for obtaining the information is improved.
With further reference to fig. 5, as an implementation of the method shown in the above-mentioned figures, the present application provides an embodiment of a voice interaction apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.
As shown in fig. 5, the voice interaction apparatus 500 of the present embodiment may include: a first determination unit 501, a search unit 502, a second determination unit 503, and a correction unit 504. The first determining unit 501 is configured to determine query words and query intentions based on query voice input by a user; a search unit 502 configured to perform a search based on the query word and the query intention, obtain a search result, and reply the search result to the user; a second determination unit 503 configured to determine emotion information of the user based on the one-time search result; a correcting unit 504 configured to correct the primary search result based on the emotion information, obtain a secondary search result, and reply the secondary search result to the user.
In the present embodiment, in the voice interaction apparatus 500: the detailed processing and the technical effects of the first determining unit 501, the searching unit 502, the second determining unit 503 and the modifying unit 504 can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, and are not described herein again.
In some optional implementations of this embodiment, the second determining unit 503 includes: a first acquisition subunit (not shown in the figure) configured to acquire a face image of a user; a first extraction sub-unit (not shown in the figure) configured to extract expressive response information of the user from the face image. A first determining subunit (not shown in the figure) configured to determine emotion information of the user based on the expressive response information.
In some optional implementations of this embodiment, the second determining unit 503 includes: a second acquiring subunit (not shown in the figure) configured to acquire a response voice of the user; a second extraction subunit (not shown in the figure) configured to extract voice response information of the user from the response voice, wherein the voice response information includes at least one of: silence duration, voice content, voice tone; a second determining subunit (not shown in the figure) configured to determine emotion information of the user based on the voice response information.
In some optional implementations of the present embodiment, the modifying unit 504 includes: a first correction subunit (not shown in the figure), configured to perform text correction on the query word to obtain a corrected word; and the first searching subunit (not shown in the figure) is configured to search based on the corrected characters and the query intention to obtain a secondary searching result.
In some optional implementations of the present embodiment, the modifying unit 504 includes: a second correction subunit (not shown in the figure), configured to perform intent correction on the query intent, so as to obtain a corrected intent; and the second searching subunit (not shown in the figure) is configured to search based on the query words and the correction intention to obtain a secondary searching result.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing an electronic device (e.g., terminal devices 101, 102, 103 shown in FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determination unit, a search unit, a second determination unit, and a correction unit. Where the names of these units do not constitute a limitation on the units themselves in this case, for example, the first determination unit may also be described as a "unit that determines a query word and a query intention based on a query speech input by a user".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining query words and query intentions based on query voice input by a user; searching based on the query words and the query intention to obtain a primary search result, and replying the primary search result to the user; determining emotion information of a user based on a search result; and modifying the primary search result based on the emotion information to obtain a secondary search result, and replying the secondary search result to the user.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A voice interaction method, comprising:
determining query words and query intentions based on query voice input by a user;
searching based on the query words and the query intention to obtain a primary search result, and replying the primary search result to the user;
determining emotion information of the user based on the primary search result;
and correcting the primary search result based on the emotion information to obtain a secondary search result, and replying the secondary search result to the user.
2. The method of claim 1, wherein the determining emotional information of the user based on the one-time search result comprises:
acquiring a facial image of the user;
extracting expression response information of the user from the facial image;
determining emotion information of the user based on the expression response information.
3. The method of claim 1 or 2, wherein the determining emotional information of the user based on the one-time search result comprises:
collecting the response voice of the user;
extracting voice response information of the user from the response voice, wherein the voice response information comprises at least one of the following: silence duration, voice content, voice tone;
determining emotion information of the user based on the voice response information.
4. The method of claim 1, wherein the modifying the primary search results based on the emotional information to obtain secondary search results comprises:
performing text correction on the query words to obtain corrected words;
and searching based on the corrected characters and the query intention to obtain a secondary search result.
5. The method of claim 1 or 4, wherein the modifying the primary search result based on the emotional information to obtain a secondary search result comprises:
performing intention correction on the query intention to obtain a corrected intention;
and searching based on the query words and the correction intention to obtain a secondary search result.
6. A voice interaction device, comprising:
a first determination unit configured to determine a query word and a query intention based on a query voice input by a user;
the search unit is configured to search based on the query words and the query intention, obtain a primary search result and reply the primary search result to the user;
a second determination unit configured to determine emotion information of the user based on the one-time search result;
and the correcting unit is configured to correct the primary search result based on the emotion information to obtain a secondary search result, and reply the secondary search result to the user.
7. The apparatus of claim 6, wherein the second determining unit comprises:
a first acquisition subunit configured to acquire a face image of the user;
a first extraction subunit configured to extract expression response information of the user from the face image;
a first determining subunit configured to determine emotion information of the user based on the expressive response information.
8. The apparatus of claim 6 or 7, wherein the second determining unit comprises:
a second acquisition subunit configured to acquire a response voice of the user;
a second extraction subunit configured to extract voice response information of the user from the response voice, wherein the voice response information includes at least one of: silence duration, voice content, voice tone;
a second determining subunit configured to determine emotion information of the user based on the voice response information.
9. The apparatus of claim 6, wherein the correction unit comprises:
the first correction subunit is configured to perform text correction on the query word to obtain a corrected word;
and the first searching subunit is configured to search based on the corrected characters and the query intention to obtain a secondary searching result.
10. The apparatus according to claim 6 or 9, wherein the correction unit comprises:
the second correcting subunit is configured to perform intention correction on the query intention to obtain a corrected intention;
and the second searching subunit is configured to search based on the query words and the correction intention to obtain a secondary searching result.
11. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN202010101353.6A 2020-02-19 2020-02-19 Voice interaction method and device Pending CN111312247A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010101353.6A CN111312247A (en) 2020-02-19 2020-02-19 Voice interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010101353.6A CN111312247A (en) 2020-02-19 2020-02-19 Voice interaction method and device

Publications (1)

Publication Number Publication Date
CN111312247A true CN111312247A (en) 2020-06-19

Family

ID=71148389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010101353.6A Pending CN111312247A (en) 2020-02-19 2020-02-19 Voice interaction method and device

Country Status (1)

Country Link
CN (1) CN111312247A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239459A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Voice search method, voice search device and voice search system
CN104407834A (en) * 2014-11-13 2015-03-11 腾讯科技(成都)有限公司 Message input method and device
CN105549841A (en) * 2015-12-02 2016-05-04 小天才科技有限公司 Voice interaction method, device and equipment
CN106649843A (en) * 2016-12-30 2017-05-10 上海博泰悦臻电子设备制造有限公司 Media file recommending method and system based on vehicle-mounted terminal and vehicle-mounted terminal
CN106777073A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 The automatic method for correcting of wrong word and server in a kind of search engine
CN106959999A (en) * 2017-02-06 2017-07-18 广东小天才科技有限公司 A kind of method and device of phonetic search
US20180089197A1 (en) * 2016-09-29 2018-03-29 International Business Machines Corporation Internet search result intention
CN108682419A (en) * 2018-03-30 2018-10-19 京东方科技集团股份有限公司 Sound control method and equipment, computer readable storage medium and equipment
CN109065035A (en) * 2018-09-06 2018-12-21 珠海格力电器股份有限公司 Information interacting method and device
CN109241924A (en) * 2018-09-18 2019-01-18 宁波众鑫网络科技股份有限公司 Multi-platform information interaction system Internet-based
CN109885828A (en) * 2019-01-14 2019-06-14 平安科技(深圳)有限公司 Word error correction method, device, computer equipment and medium based on language model
WO2019222043A1 (en) * 2018-05-17 2019-11-21 Qualcomm Incorporated User experience evaluation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239459A (en) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 Voice search method, voice search device and voice search system
CN104407834A (en) * 2014-11-13 2015-03-11 腾讯科技(成都)有限公司 Message input method and device
CN105549841A (en) * 2015-12-02 2016-05-04 小天才科技有限公司 Voice interaction method, device and equipment
US20180089197A1 (en) * 2016-09-29 2018-03-29 International Business Machines Corporation Internet search result intention
CN106777073A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 The automatic method for correcting of wrong word and server in a kind of search engine
CN106649843A (en) * 2016-12-30 2017-05-10 上海博泰悦臻电子设备制造有限公司 Media file recommending method and system based on vehicle-mounted terminal and vehicle-mounted terminal
CN106959999A (en) * 2017-02-06 2017-07-18 广东小天才科技有限公司 A kind of method and device of phonetic search
CN108682419A (en) * 2018-03-30 2018-10-19 京东方科技集团股份有限公司 Sound control method and equipment, computer readable storage medium and equipment
WO2019222043A1 (en) * 2018-05-17 2019-11-21 Qualcomm Incorporated User experience evaluation
CN109065035A (en) * 2018-09-06 2018-12-21 珠海格力电器股份有限公司 Information interacting method and device
CN109241924A (en) * 2018-09-18 2019-01-18 宁波众鑫网络科技股份有限公司 Multi-platform information interaction system Internet-based
CN109885828A (en) * 2019-01-14 2019-06-14 平安科技(深圳)有限公司 Word error correction method, device, computer equipment and medium based on language model

Similar Documents

Publication Publication Date Title
US20190378494A1 (en) Method and apparatus for outputting information
US20200126566A1 (en) Method and apparatus for voice interaction
CN110517689B (en) Voice data processing method, device and storage medium
US10824664B2 (en) Method and apparatus for providing text push information responsive to a voice query request
CN110808034A (en) Voice conversion method, device, storage medium and electronic equipment
CN112037792B (en) Voice recognition method and device, electronic equipment and storage medium
CN110536166B (en) Interactive triggering method, device and equipment of live application program and storage medium
CN107705782B (en) Method and device for determining phoneme pronunciation duration
CN110136715B (en) Speech recognition method and device
CN112364144B (en) Interaction method, device, equipment and computer readable medium
CN113257218B (en) Speech synthesis method, device, electronic equipment and storage medium
CN111354362A (en) Method and device for assisting hearing-impaired communication
CN110347869B (en) Video generation method and device, electronic equipment and storage medium
CN114882861A (en) Voice generation method, device, equipment, medium and product
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
CN111415662A (en) Method, apparatus, device and medium for generating video
CN112383721A (en) Method and apparatus for generating video
CN116756285A (en) Virtual robot interaction method, device and storage medium
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
CN111312247A (en) Voice interaction method and device
CN115171645A (en) Dubbing method and device, electronic equipment and storage medium
CN113763925B (en) Speech recognition method, device, computer equipment and storage medium
JP2019203998A (en) Conversation device, robot, conversation device control method and program
CN112562733A (en) Media data processing method and device, storage medium and computer equipment
CN112487833A (en) Machine translation method and translation system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210512

Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant after: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

Applicant after: Shanghai Xiaodu Technology Co.,Ltd.

Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200619