CN111739529A - Interaction method and device, earphone and server - Google Patents

Interaction method and device, earphone and server Download PDF

Info

Publication number
CN111739529A
CN111739529A CN202010507540.4A CN202010507540A CN111739529A CN 111739529 A CN111739529 A CN 111739529A CN 202010507540 A CN202010507540 A CN 202010507540A CN 111739529 A CN111739529 A CN 111739529A
Authority
CN
China
Prior art keywords
user
recognition result
voice
information
earphone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010507540.4A
Other languages
Chinese (zh)
Inventor
崔文华
赵楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Intelligent Technology Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202010507540.4A priority Critical patent/CN111739529A/en
Publication of CN111739529A publication Critical patent/CN111739529A/en
Priority to PCT/CN2021/074916 priority patent/WO2021244059A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1091Details not provided for in groups H04R1/1008 - H04R1/1083
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides an interaction method, an interaction device, an earphone and a server, wherein the earphone is in communication connection with the server, the earphone is provided with an interaction assistant, and the method comprises the following steps: the earphone sends user voice to the server and obtains a voice recognition result of the user voice from the server; and calling the interactive assistant to execute interactive operation according to the voice recognition result. The user does not need to operate the earphone by hands, and various interactive functions of the earphone are realized.

Description

Interaction method and device, earphone and server
Technical Field
The present invention relates to the field of electronic device technologies, and in particular, to an interaction method, an interaction apparatus, an earphone, and a server.
Background
With the continuous development of scientific technology, electronic technology has also been developed rapidly, and the variety of electronic equipment is also more and more, and people are more and more accustomed to using multiple electronic equipment in life.
However, in some scenarios, there are still some limitations on the operation of the electronic device, which are not favorable for the user to operate the electronic device. For example, in driving a car, riding, running, etc., it is inconvenient for a user to operate a handheld electronic device.
Disclosure of Invention
The embodiment of the invention provides an interaction method, an interaction device, an earphone and a server.
The embodiment of the invention discloses an interaction method, which is applied to an earphone, wherein the earphone is in communication connection with a server, the earphone is provided with an interaction assistant, and the method comprises the following steps:
the earphone sends user voice to the server and obtains a voice recognition result of the user voice from the server;
and calling the interactive assistant to execute interactive operation according to the voice recognition result.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
awakening the interactive assistant according to the voice recognition result;
acquiring a user state;
and calling the interactive assistant to recommend songs or play songs according to the user state.
Optionally, the headset has a gravity sensor, and the acquiring the user status includes:
and acquiring sensing data detected by the gravity sensor, and determining the state of the user according to the sensing data.
Optionally, the invoking the interactive assistant to recommend songs according to the user status includes:
sending the user status to the server;
and receiving a recommended song sent by the server and calling the interactive assistant to recommend the recommended song to a user, wherein the recommended song is a song searched by the server and matched with the user state.
Optionally, the invoking the interactive assistant to play a song according to the user status includes:
sending the user status to the server;
receiving preset songs sent by the server after sound effect adjustment and calling the interaction assistant to play; and the preset song after the sound effect adjustment is the sound effect which is determined by the server to be matched with the user state, and the preset song is adjusted to be the sound effect.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
and calling the interactive assistant to recognize information from the user voice according to the voice recognition result and record the information, or acquiring the recorded information according to the voice recognition result and playing the information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the memo information from the voice recognition result according to the voice recognition result and record the memo information, or acquiring preset memo information according to the voice recognition result and playing the preset memo information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the target voice from the user voice according to the voice recognition result and record the target voice, or acquiring and playing the recorded target voice according to the voice recognition result.
Optionally, the method further comprises:
and sending the recorded information to the server, and/or acquiring and recording the recorded information of the server.
Optionally, the method further comprises:
after the memo information is recorded, a reminding event for the memo information is generated.
Optionally, the method further comprises:
and acquiring a preset reminding event aiming at the memo information from the server.
Optionally, the method further comprises:
and when the trigger condition of a preset reminding event is met, calling the interactive assistant to acquire and play the memo information corresponding to the preset reminding event.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
searching information matched with the voice recognition result from preset memo information;
and calling the interactive assistant to play the information matched with the voice recognition result.
Optionally, the method further comprises:
obtaining a semantic analysis result obtained by performing semantic analysis on the memo information by the server;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
and when the voice recognition result comprises the memo information with the target tag information searched for according to the representation requirement, calling the interaction assistant to search for the preset memo information matched with the target tag information and playing the preset memo information.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
obtaining a dialogue statement from the voice recognition result;
and calling the interactive assistant to generate and play a reply sentence matched with the conversation sentence.
Optionally, the invoking the interactive assistant, generating and playing a reply sentence matched with the dialogue sentence includes:
acquiring user azimuth information;
and calling the interactive assistant to generate a reply sentence for voice navigation according to the user direction information and the dialogue sentence and play the reply sentence.
Optionally, the headset has an orientation sensor, and the acquiring of the user orientation information includes:
and acquiring the user orientation information detected by the orientation sensor.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
acquiring user geographical position information;
and calling the interactive assistant to generate a reply sentence for voice navigation and play the reply sentence according to the user direction information, the conversation sentence and the user geographical position information.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
sending navigation inquiry information to the server; the navigation query information comprises the user orientation information and the dialogue sentences;
and receiving and playing a reply sentence for voice navigation sent by the server, wherein the reply sentence for voice navigation is generated by the server according to the user direction information and the dialogue sentence query.
The embodiment of the invention discloses an interaction method, which is applied to a server, wherein the server is in communication connection with an earphone, the earphone is provided with an interaction assistant, and the method comprises the following steps:
the server receives the user voice sent by the earphone and identifies the user voice to obtain a voice identification result;
and sending the voice recognition result to the earphone, wherein the earphone is used for calling the interaction assistant to execute interaction operation according to the voice recognition result.
Optionally, the method further comprises:
acquiring a user state detected by the earphone;
searching for a recommended song matched with the user state, and sending the recommended song to the earphone; the earphone is used for calling the interaction assistant to recommend the recommended song to the user.
Optionally, the method further comprises:
acquiring a user state detected by the earphone;
determining sound effects matched with the user state;
and adjusting a preset song to the sound effect, and sending the preset song after the sound effect is adjusted to the earphone, wherein the earphone is used for calling the interactive assistant to play the preset song after the sound effect is adjusted.
Optionally, the method further comprises:
and identifying and recording information from the user voice according to the voice identification result, or acquiring and sending the recorded information to the earphone according to the voice identification result, wherein the earphone is used for playing the recorded information.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and identifying and recording memo information from the voice recognition result according to the voice recognition result, or acquiring preset memo information according to the voice recognition result and sending the preset memo information to the earphone.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and recognizing and recording a target voice from the user voice according to the voice recognition result, or acquiring and sending the recorded target voice to the earphone according to the voice recognition result.
Optionally, the method further comprises:
and sending information recognized from the user voice to the earphone, and/or acquiring and recording the recorded information of the earphone.
Optionally, the method further comprises:
after recording the memo information, generating a reminding event aiming at the memo information;
and the earphone is used for calling the interaction assistant to acquire and play memo information corresponding to a preset reminding event when a triggering condition of the preset reminding event is met.
Optionally, the method further comprises:
performing semantic analysis on the memo information to obtain a semantic analysis result;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining preset memo information according to the voice recognition result and sending the preset memo information to the headset includes:
and when the voice recognition result comprises the memo information with the target tag information, which is required to be searched, searching the preset memo information matched with the target tag information and sending the preset memo information to the earphone.
Optionally, the method further comprises:
obtaining a dialogue statement from the voice recognition result;
and generating a reply sentence matched with the conversation sentence and sending the reply sentence to the earphone, wherein the earphone is used for playing the reply sentence matched with the conversation sentence.
Optionally, the generating and sending a reply sentence matched with the dialogue sentence to the headset includes:
acquiring user direction information detected by the earphone;
and generating a reply sentence for voice navigation according to the user direction information and the dialogue sentence, and sending the reply sentence to the earphone.
The embodiment of the invention discloses an interactive device, which is applied to an earphone, wherein the earphone is in communication connection with a server, the earphone is provided with an interactive assistant, and the device comprises:
the voice recognition result acquisition module is used for sending the user voice to the server and acquiring the voice recognition result of the user voice from the server;
and the first interactive module is used for calling the interactive assistant to execute interactive operation according to the voice recognition result.
Optionally, the first interaction module includes:
the awakening sub-module is used for awakening the interactive assistant according to the voice recognition result;
the user state acquisition submodule is used for acquiring a user state;
and the song interaction sub-module is used for calling the interaction assistant to recommend songs or play songs according to the user state.
Optionally, the earphone has a gravity sensor, and the user state acquisition sub-module is configured to acquire sensing data detected by the gravity sensor, and determine the user state according to the sensing data.
Optionally, the song interaction sub-module is configured to send the user status to the server; and receiving a recommended song sent by the server and calling the interactive assistant to recommend the recommended song to a user, wherein the recommended song is a song searched by the server and matched with the user state.
Optionally, the song interaction sub-module is configured to send the user status to the server; receiving preset songs sent by the server after sound effect adjustment and calling the interaction assistant to play; and the preset song after the sound effect adjustment is the sound effect which is determined by the server to be matched with the user state, and the preset song is adjusted to be the sound effect.
Optionally, the first interaction module includes:
and the first recording interaction sub-module is used for calling the interaction assistant to recognize information from the user voice according to the voice recognition result and record the information, or acquiring the recorded information according to the voice recognition result and playing the information.
Optionally, the first recording interaction sub-module is configured to invoke the interaction assistant to recognize memo information from the voice recognition result according to the voice recognition result and record the memo information, or acquire preset memo information according to the voice recognition result and play the preset memo information.
Optionally, the first recording interaction sub-module is configured to invoke the interaction assistant to recognize a target voice from the user voice according to the voice recognition result and record the target voice, or acquire the recorded target voice according to the voice recognition result and play the target voice.
Optionally, the method further comprises:
and the first recording information transmission module is used for sending the recorded information to the server and/or acquiring and recording the recorded information of the server.
Optionally, the method further comprises:
and the first reminding event generating module is used for generating reminding events aiming at the memo information after the memo information is recorded.
Optionally, the method further comprises:
and the first reminding event acquisition module is used for acquiring a preset reminding event aiming at the memo information from the server.
Optionally, the method further comprises:
and the first reminding event triggering module is used for calling the interaction assistant to acquire and play the memo information corresponding to the preset reminding event when the triggering condition of the preset reminding event is met.
Optionally, the first recording interaction sub-module is configured to search for information matching the voice recognition result from preset memo information; and calling the interactive assistant to play the information matched with the voice recognition result.
Optionally, the method further comprises:
the first semantic analysis module is used for acquiring a semantic analysis result obtained by performing semantic analysis on the memo information by the server;
and the first tag generation module is used for generating tag information for the memo information according to a semantic analysis result.
Optionally, the first record interaction sub-module is configured to, when the voice recognition result includes a memo information that represents a need to search for a target tag information, invoke the interaction assistant to search for a preset memo information that matches the target tag information, and play the preset memo information.
Optionally, the first interaction module includes:
a first spoken sentence acquisition submodule for acquiring a spoken sentence from the speech recognition result;
and the first dialogue interaction sub-module is used for calling the interaction assistant to generate a reply sentence matched with the dialogue sentence and playing the reply sentence.
Optionally, the first dialogue interaction submodule is configured to obtain user orientation information; and calling the interactive assistant to generate a reply sentence for voice navigation according to the user direction information and the dialogue sentence and play the reply sentence.
Optionally, the headset has an orientation sensor, and the first dialogue interaction submodule is configured to acquire user orientation information detected by the orientation sensor.
Optionally, the first dialogue interaction submodule is configured to obtain user geographical location information; and calling the interactive assistant to generate a reply sentence for voice navigation and play the reply sentence according to the user direction information, the conversation sentence and the user geographical position information.
Optionally, the first dialogue interaction sub-module is configured to send navigation query information to the server; the navigation query information comprises the user orientation information and the dialogue sentences; and receiving and playing a reply sentence for voice navigation sent by the server, wherein the reply sentence for voice navigation is generated by the server according to the user direction information and the dialogue sentence query.
The embodiment of the invention discloses an interactive device, which is applied to a server, wherein the server is in communication connection with an earphone, the earphone is provided with an interactive assistant, and the device comprises:
the voice recognition module is used for receiving the user voice sent by the earphone and recognizing the user voice to obtain a voice recognition result;
and the voice recognition result sending module is used for sending the voice recognition result to the earphone, and the earphone is used for calling the interaction assistant to execute interaction operation according to the voice recognition result.
Optionally, the method further comprises:
the first user state acquisition module is used for acquiring the user state detected by the earphone;
the first song sending module is used for searching for the recommended song matched with the user state and sending the recommended song to the earphone; the earphone is used for calling the interaction assistant to recommend the recommended song to the user.
Optionally, the method further comprises:
the second user state acquisition module is used for acquiring the user state detected by the earphone;
the sound effect determining module is used for determining the sound effect matched with the user state;
and the second song sending module is used for adjusting a preset song into the sound effect and sending the preset song with the adjusted sound effect to the earphone, and the earphone is used for calling the interactive assistant to play the preset song with the adjusted sound effect.
Optionally, the method further comprises:
and the recorded information processing module is used for identifying and recording information from the user voice according to the voice identification result, or acquiring and sending the recorded information to the earphone according to the voice identification result, wherein the earphone is used for playing the recorded information.
Optionally, the recording information processing module includes:
and the memo information processing submodule is used for identifying and recording memo information from the voice recognition result according to the voice recognition result, or acquiring preset memo information according to the voice recognition result and sending the preset memo information to the earphone.
Optionally, the recording information processing module includes:
and the voice processing submodule is used for recognizing and recording the target voice from the user voice according to the voice recognition result, or acquiring the recorded target voice according to the voice recognition result and sending the target voice to the earphone.
Optionally, the method further comprises:
and the second recorded information transmission module is used for sending the information recognized from the user voice to the earphone and/or acquiring and recording the recorded information of the earphone.
Optionally, the method further comprises:
the second reminding event generating module is used for generating reminding events aiming at the memo information after the memo information is recorded;
and the earphone is used for calling the interaction assistant to acquire and play memo information corresponding to a preset reminding event when a triggering condition of the preset reminding event is met.
Optionally, the method further comprises:
the second semantic analysis module is used for performing semantic analysis on the memo information to obtain a semantic analysis result;
and the second label generating module is used for generating label information for the memo information according to the semantic analysis result.
Optionally, the memo information processing sub-module is configured to, when the voice recognition result includes a representation requirement for searching for memo information with target tag information, search for preset memo information matched with the target tag information and send the preset memo information to the headset.
Optionally, the method further comprises:
a dialogue sentence acquisition module, configured to acquire a dialogue sentence from the speech recognition result;
and the reply sentence sending module is used for generating a reply sentence matched with the conversation sentence and sending the reply sentence to the earphone, and the earphone is used for playing the reply sentence matched with the conversation sentence.
Optionally, the reply sentence sending module is configured to obtain user direction information detected by the headset; and generating a reply sentence for voice navigation according to the user direction information and the dialogue sentence, and sending the reply sentence to the earphone.
An embodiment of the present invention discloses a headset, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
sending user voice to a server, and acquiring a voice recognition result of the user voice from the server;
and calling an interactive assistant to execute interactive operation according to the voice recognition result.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
awakening the interactive assistant according to the voice recognition result;
acquiring a user state;
and calling the interactive assistant to recommend songs or play songs according to the user state.
Optionally, the headset has a gravity sensor, and the acquiring the user status includes:
and acquiring sensing data detected by the gravity sensor, and determining the state of the user according to the sensing data.
Optionally, the invoking the interactive assistant to recommend songs according to the user status includes:
sending the user status to the server;
and receiving a recommended song sent by the server and calling the interactive assistant to recommend the recommended song to a user, wherein the recommended song is a song searched by the server and matched with the user state.
Optionally, the invoking the interactive assistant to play a song according to the user status includes:
sending the user status to the server;
receiving preset songs sent by the server after sound effect adjustment and calling the interaction assistant to play; and the preset song after the sound effect adjustment is the sound effect which is determined by the server to be matched with the user state, and the preset song is adjusted to be the sound effect.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
and calling the interactive assistant to recognize information from the user voice according to the voice recognition result and record the information, or acquiring the recorded information according to the voice recognition result and playing the information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the memo information from the voice recognition result according to the voice recognition result and record the memo information, or acquiring preset memo information according to the voice recognition result and playing the preset memo information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the target voice from the user voice according to the voice recognition result and record the target voice, or acquiring and playing the recorded target voice according to the voice recognition result.
Optionally, further comprising instructions for:
and sending the recorded information to the server, and/or acquiring and recording the recorded information of the server.
Optionally, further comprising instructions for:
after the memo information is recorded, a reminding event for the memo information is generated.
Optionally, further comprising instructions for:
and acquiring a preset reminding event aiming at the memo information from the server.
Optionally, further comprising instructions for:
and when the trigger condition of a preset reminding event is met, calling the interactive assistant to acquire and play the memo information corresponding to the preset reminding event.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
searching information matched with the voice recognition result from preset memo information;
and calling the interactive assistant to play the information matched with the voice recognition result.
Optionally, further comprising instructions for:
obtaining a semantic analysis result obtained by performing semantic analysis on the memo information by the server;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
and when the voice recognition result comprises the memo information with the target tag information searched for according to the representation requirement, calling the interaction assistant to search for the preset memo information matched with the target tag information and playing the preset memo information.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
obtaining a dialogue statement from the voice recognition result;
and calling the interactive assistant to generate and play a reply sentence matched with the conversation sentence.
Optionally, the invoking the interactive assistant, generating and playing a reply sentence matched with the dialogue sentence includes:
acquiring user azimuth information;
and calling the interactive assistant to generate a reply sentence for voice navigation according to the user direction information and the dialogue sentence and play the reply sentence.
Optionally, the headset has an orientation sensor, and the acquiring of the user orientation information includes:
and acquiring the user orientation information detected by the orientation sensor.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
acquiring user geographical position information;
and calling the interactive assistant to generate a reply sentence for voice navigation and play the reply sentence according to the user direction information, the conversation sentence and the user geographical position information.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
sending navigation inquiry information to the server; the navigation query information comprises the user orientation information and the dialogue sentences;
and receiving and playing a reply sentence for voice navigation sent by the server, wherein the reply sentence for voice navigation is generated by the server according to the user direction information and the dialogue sentence query.
The embodiment of the invention discloses a server, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for:
receiving user voice sent by an earphone, and identifying the user voice to obtain a voice identification result;
and sending the voice recognition result to the earphone, wherein the earphone is used for calling the interaction assistant to execute interaction operation according to the voice recognition result.
Optionally, further comprising instructions for:
acquiring a user state detected by the earphone;
searching for a recommended song matched with the user state, and sending the recommended song to the earphone; the earphone is used for calling the interaction assistant to recommend the recommended song to the user.
Optionally, further comprising instructions for:
acquiring a user state detected by the earphone;
determining sound effects matched with the user state;
and adjusting a preset song to the sound effect, and sending the preset song after the sound effect is adjusted to the earphone, wherein the earphone is used for calling the interactive assistant to play the preset song after the sound effect is adjusted.
Optionally, further comprising instructions for:
and identifying and recording information from the user voice according to the voice identification result, or acquiring and sending the recorded information to the earphone according to the voice identification result, wherein the earphone is used for playing the recorded information.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and identifying and recording memo information from the voice recognition result according to the voice recognition result, or acquiring preset memo information according to the voice recognition result and sending the preset memo information to the earphone.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and recognizing and recording a target voice from the user voice according to the voice recognition result, or acquiring and sending the recorded target voice to the earphone according to the voice recognition result.
Optionally, further comprising instructions for:
and sending information recognized from the user voice to the earphone, and/or acquiring and recording the recorded information of the earphone.
Optionally, further comprising instructions for:
after recording the memo information, generating a reminding event aiming at the memo information;
and the earphone is used for calling the interaction assistant to acquire and play memo information corresponding to a preset reminding event when a triggering condition of the preset reminding event is met.
Optionally, further comprising instructions for:
performing semantic analysis on the memo information to obtain a semantic analysis result;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining preset memo information according to the voice recognition result and sending the preset memo information to the headset includes:
and when the voice recognition result comprises the memo information with the target tag information, which is required to be searched, searching the preset memo information matched with the target tag information and sending the preset memo information to the earphone.
Optionally, further comprising instructions for:
obtaining a dialogue statement from the voice recognition result;
and generating a reply sentence matched with the conversation sentence and sending the reply sentence to the earphone, wherein the earphone is used for playing the reply sentence matched with the conversation sentence.
Optionally, the generating and sending a reply sentence matched with the dialogue sentence to the headset includes:
acquiring user direction information detected by the earphone;
and generating a reply sentence for voice navigation according to the user direction information and the dialogue sentence, and sending the reply sentence to the earphone.
The embodiment of the invention discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the interaction method are realized.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, the earphone can acquire the voice recognition result of the user voice from the server, and the interaction assistant of the earphone can perform interaction operation according to the voice recognition result of the user voice without using hands of the user to operate the earphone, thereby realizing various interaction functions of the earphone.
Drawings
FIG. 1 is a flowchart illustrating the steps of a first embodiment of an interactive method of the present invention;
FIG. 2 is a flowchart illustrating the steps of a second embodiment of an interactive method of the present invention;
FIG. 3 is a flowchart of the steps of a third embodiment of an interaction method of the present invention;
FIG. 4 is a flowchart illustrating the fourth step of an interactive method according to a fourth embodiment of the present invention;
FIG. 5 is a flow chart of the steps of an interactive method embodiment five of the present invention;
FIG. 6 is a block diagram of an interactive apparatus according to a first embodiment of the present invention;
FIG. 7 is a block diagram of a second embodiment of an interactive apparatus according to the present invention;
FIG. 8 is a block diagram of a third embodiment of an interactive apparatus according to the present invention;
FIG. 9 is a block diagram of a fourth embodiment of an interactive apparatus according to the present invention;
FIG. 10 is a block diagram of a headset for interaction in accordance with an exemplary embodiment;
fig. 11 is a schematic structural diagram of a server for interaction according to another exemplary embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Referring to fig. 1, a flowchart illustrating steps of a first embodiment of an interaction method according to the present invention is shown, where the method is applied to a headset, the headset is in communication connection with a server, the headset has an interaction assistant, and the method specifically includes the following steps:
step 101, the earphone sends user voice to the server, and obtains a voice recognition result of the user voice from the server.
The earphone is a portable electronic device frequently used in daily life, and can have a playing function, a sound pickup function and a communication function. The user can listen to the song or communicate over the phone using the headset.
The server has a voice recognition function and can perform voice recognition on the user voice collected by the earphone.
And 102, calling the interactive assistant to execute interactive operation according to the voice recognition result.
The earphone is provided with an interactive assistant, and the interactive assistant can be a program which is arranged in the earphone and runs independently, and can provide various interactive functions. The interaction assistant can execute interaction operation according to the voice recognition result so as to realize various interaction functions of the headset.
In the embodiment of the invention, the earphone can be in communication connection with the mobile terminal, the mobile terminal can control the interaction assistant on the interface of the APP by the user according to the APP matched with the interaction assistant of the earphone.
The interaction assistant may be awakened in a particular manner, such as a particular voice command. Some interactive functions of the interactive assistant may be performed after being woken up, and some interactive functions may be performed without being woken up.
In the embodiment of the invention, the earphone can acquire the voice recognition result of the user voice from the server, and the interaction assistant of the earphone can perform interaction operation according to the voice recognition result of the user voice without using hands of the user to operate the earphone, thereby realizing various interaction functions of the earphone.
In a running scene, a driving scene, and the like, it is inconvenient for a user to take out the mobile phone to search for a song, and in order to facilitate the user to search for a song, in an embodiment, the interaction function of the headset may include a song recommendation function.
Referring to fig. 2, a flowchart of steps of a second embodiment of an interaction method according to the present invention is shown, where the method is applied to a headset, the headset is in communication connection with a server, the headset has an interaction assistant, and the method specifically includes the following steps:
step 201, the earphone sends the user voice to the server, and obtains the voice recognition result of the user voice from the server.
Step 202, waking up the interactive assistant according to the voice recognition result.
And when the voice recognition result comprises information which indicates that the user needs to search for a proper song or adjust the sound effect of the song, the function of the interactive assistant for recommending the song to the user is awakened. For example, the speech recognition results include: "play a song bar suitable for running", or "switch to sound effects suitable for running".
Step 203, acquiring the user state.
The user state, i.e., the state in which the user is located, may include a sedentary state, a walking state, a running state, a driving state, a riding state, and the like.
In one example, the headset may have a gravity sensor that can detect the user's state. The step of acquiring the user status may include: and acquiring sensing data detected by the gravity sensor, and determining the state of the user according to the sensing data. Specifically, the user state may be determined by an algorithm for detecting the user state according to the sensing data of the gravity sensor.
And step 204, invoking the interactive assistant to recommend songs or play songs according to the user state.
In the embodiment of the invention, the interactive assistant can recommend songs to the user according to the state of the user, and the user determines whether to play the songs; songs that adapt to the user's status may also be played directly.
Specifically, the songs in the preset song list may be configured with a plurality of labels or categories, such as "rock", "pop", "jazz", "ballad", "pure music", "strong rhythm", "passion", "lyric", "quiet", "dynamic", and the like. The interaction assistant may look for songs for tags that match the user's state. For example, when the user state is "running state", a song labeled "strong rhythmicity" may be recommended.
In one example, the interactive assistant may be invoked to find recommended songs that match the user's state and recommend to the user.
In another example, the recommended songs may be looked up by the server. The step of invoking the interaction assistant to recommend a song according to the user status may comprise: sending the user status to the server; and receiving a recommended song sent by the server and calling the interactive assistant to recommend the recommended song to a user, wherein the recommended song is a song searched by the server and matched with the user state.
In the embodiment of the invention, the interactive assistant can play the songs according to the matched sound effect according to the user state. Specifically, the interactive assistant may adjust the sound effects of the song via a sound effect algorithm. The sound effects algorithm may adjust the song to a variety of types of sound effects, such as "quiet," "leisurely," "rock," and so forth. When the user is sitting still, the preset song can be adjusted to a sound effect of "quiet".
In one example, the sound effects of a song may be adjusted by an interaction assistant, and invoking the interaction assistant to recommend a song based on the user state may include: calling the interactive assistant to determine a sound effect matched with the user state, and adjusting a preset song to the sound effect; and playing the preset song after the sound effect is adjusted.
In another example, the sound effects of a song may be adjusted by a server, and the step of invoking the interaction assistant to recommend a song based on the user status may include: sending the user status to the server; receiving preset songs sent by the server after sound effect adjustment and calling the interaction assistant to play; and the preset song after the sound effect adjustment is the sound effect which is determined by the server to be matched with the user state, and the preset song is adjusted to be the sound effect. The preset song may be a playing song in a playlist of the headset.
In the embodiment of the invention, the earphone can acquire the voice recognition result of the user voice from the server, awaken the interactive assistant according to the voice recognition result, acquire the user state and call the interactive assistant to recommend or play songs according to the user state. The embodiment of the invention realizes that the earphone recommends or plays songs without the user operating the earphone by hand, thereby simplifying the operation process of the user.
In the scenes of running, driving and the like, the user is inconvenient to take out the mobile phone to record the memo information or search the memo information, and in order to facilitate the user to use the memo, in an embodiment, the interaction function of the headset may include an information recording interaction function.
Referring to fig. 3, a flowchart illustrating steps of a third embodiment of an interaction method according to the present invention is shown, where the method is applied to a headset, the headset is in communication connection with a server, the headset has an interaction assistant, and the method specifically includes the following steps:
step 301, the earphone sends a user voice to the server, and obtains a voice recognition result of the user voice from the server.
Step 302, the interactive assistant is called to recognize information from the user voice according to the voice recognition result and record the information, or the recorded information is obtained according to the voice recognition result and played.
In the embodiment of the invention, the earphone can send the recorded information to the server, and can also acquire and record the information recorded by the server.
In an embodiment of the present invention, the invoking the interactive assistant to recognize information from the user speech according to the speech recognition result and record the information, or acquiring recorded information according to the speech recognition result and playing the information includes: and calling the interactive assistant to recognize the target voice from the user voice according to the voice recognition result and record the target voice, or acquiring and playing the recorded target voice according to the voice recognition result.
When the speech recognition result includes information indicating that the user speech needs to be recorded, the interactive assistant may recognize the target speech from the user speech. For example, the user speaks "record a voice," the interactive assistant may record the user's voice that is collected later. The user can set a recording mode to filter the sound to be recorded. For example, in a conference, where the user wishes to record the voice of individual people participating in the conference, the headset may record the user's voice omni-directionally. In a classroom where a user wishes to record the voice of a teacher lecture, the headset may record the user's voice in a specified direction.
When the voice recognition result comprises a target voice indicating that the recorded target voice needs to be played, the interaction assistant can search for the target voice and play the target voice.
In an embodiment of the present invention, the invoking the interactive assistant to recognize information from the user speech according to the speech recognition result and record the information, or acquiring recorded information according to the speech recognition result and playing the information includes: and calling the interactive assistant to recognize the memo information from the voice recognition result according to the voice recognition result and record the memo information, or acquiring preset memo information according to the voice recognition result and playing the preset memo information.
When the voice recognition result includes information indicating that memo information needs to be recorded, the interactive assistant may extract relevant contents from the voice recognition result and record the same as the memo information. The interactive assistant supports the instruction to speak, for example, the voice instruction may be in the form of "help me to write down + memo content", the voice recognition result is "help me to write down, 10 am point of tomorrow to be sold in the meeting room of the third floor", "help me to write down" indicates that memo information needs to be recorded, and the interactive assistant records "10 am point of tomorrow to be sold in the meeting room of the third floor" as memo information. The content indicating that the memo information needs to be recorded can also be 'help me several things', 'help me write one account', 'help parents write one thing', 'help me write one parking space' and the like, and the voice recognition model can be trained in advance to recognize the memo information which needs to be recorded.
When the voice recognition result includes information indicating that the memo information needs to be queried, the interactive assistant may acquire the preset memo information and play it.
In the embodiment of the invention, the interaction assistant can locally acquire the memo information from the earphone and also can acquire the memo information from the server.
In this embodiment of the present invention, the step of obtaining the preset memo information according to the voice recognition result and playing the preset memo information may include: searching information matched with the voice recognition result from preset memo information; and calling the interactive assistant to play the information matched with the voice recognition result.
Specifically, the interactive assistant may retrieve specific information from the memo information, such as retrieving information that matches keywords, time, place, category, etc. in the user's voice. For example, the speech recognition result is "a meeting with sales in the morning on tomorrow is a few points? ", the interactive assistant responds with" 10 points ".
In the embodiment of the invention, the earphone can carry out semantic analysis on the memo information to obtain a semantic analysis result. The earphone can also be obtained from the server, and the server carries out semantic analysis on the memo information to obtain a semantic analysis result. The earphone can generate label information for the memo information according to the semantic analysis result.
Specifically, the server may perform semantic analysis on the speech recognition result by using an algorithm for natural language understanding to obtain a semantic analysis result.
In this embodiment of the present invention, the step of obtaining the preset memo information according to the voice recognition result and playing the preset memo information may include: and when the voice recognition result comprises the memo information with the target tag information searched for according to the representation requirement, calling the interaction assistant to search for the preset memo information matched with the target tag information and playing the preset memo information.
The label information may include classification labels, attribute labels, and the like. The interactive assistant can generate corresponding label information according to the semantic analysis result. Based on semantic analysis, the interactive assistant may classify or tag memo information, for example, the user says "help me write down, 10 am in tomorrow and sell at meeting in the third floor conference room", based on semantic analysis, the memo information belongs to the category of to-do-things, and the user may search through tag information in addition to search through keywords, for example, the user's voice is "what to do-things do me tomorrow? ", the interactive assistant looks up the memo information with the tag information of" to-do-things ".
In the embodiment of the invention, the interactive assistant can generate a reminding event aiming at the memo information after recording the memo information. The reminder event generated by the server for the memo information may also be obtained from the server. The reminding event may include reminding content, i.e. memo information, and a triggering condition, i.e. a condition for triggering the reminding event, e.g. a set time is reached.
In the embodiment of the invention, when the trigger condition of the preset reminding event is met, the interactive assistant can be called to obtain the memo information corresponding to the preset reminding event and play the memo information. For example, if the triggering condition of the reminding event is that "time reaches 9: 45", the earphone plays the memo information corresponding to the reminding event, and the earphone reminds "you have arranged a meeting room in the third floor at 10 o' clock, and please arrange in advance in a sales meeting".
In the embodiment of the invention, the earphone can acquire the voice recognition result of the voice of the user from the server; and calling the interactive assistant to recognize information from the user voice according to the voice recognition result and record the information, or acquiring and playing the recorded information according to the voice recognition result. The embodiment of the invention realizes that the user can record information or play the recorded information through the earphone without using hands to operate the earphone, thereby simplifying the operation process of the user.
In walking, riding and other scenes, it is inconvenient for the user to take out the mobile phone for query, and in order to facilitate the query of the user, in an embodiment, the interaction function of the headset may include a question-answer interaction function.
Referring to fig. 4, a flowchart illustrating steps of a fourth embodiment of an interaction method according to the present invention is shown, where the method is applied to a headset, the headset is in communication connection with a server, the headset has an interaction assistant, and the method specifically includes the following steps:
step 401, the earphone sends a user voice to the server, and obtains a voice recognition result of the user voice from the server.
Step 402, obtaining a dialogue statement from the voice recognition result.
In the embodiment of the invention, the interactive assistant of the headset can have a conversation with the user, and the conversation sentence of the user can be obtained from the voice recognition result.
And step 403, calling the interactive assistant to generate a reply sentence matched with the dialogue sentence and play the reply sentence.
The interactive assistant can generate and play the matched reply sentence according to the dialogue sentence of the user, so as to perform voice question and answer with the user.
In this embodiment of the present invention, the step of invoking the interactive assistant, generating a reply sentence matched with the dialog sentence, and playing the reply sentence may include: acquiring user azimuth information; and calling the interactive assistant to generate a reply sentence for voice navigation according to the user direction information and the dialogue sentence and play the reply sentence.
The user direction information refers to the front direction of the user, in the embodiment of the invention, the earphone can be provided with the direction sensor, and the direction sensor can detect the user direction information in real time when the user wears the earphone. The interactive assistant may obtain user orientation information detected by the orientation sensor.
In one example, the step of invoking the interactive assistant to generate and play a reply sentence for voice navigation according to the user orientation information and the dialogue sentence may include: acquiring user geographical position information; and calling the interactive assistant to generate a reply sentence for voice navigation and play the reply sentence according to the user direction information, the conversation sentence and the user geographical position information.
The headset may obtain current user geographical location information, e.g., the headset detects current user geographical location information. Also for example, a headset may be communicatively coupled to the mobile device. The mobile device may be equipped with location capabilities, such as providing a GPS module, or by communicating with a base station for location determination, and the headset may obtain current user geographical location information detected by the mobile device.
In another example, the step of invoking the interactive assistant to generate and play a reply sentence for voice navigation according to the user orientation information and the dialog sentence may include: sending navigation inquiry information to the server; the navigation query information comprises the user orientation information and the dialogue sentences; and receiving and playing a reply sentence for voice navigation sent by the server, wherein the reply sentence for voice navigation is generated by the server according to the user direction information and the dialogue sentence query.
The server may generate a reply sentence for voice navigation from the user orientation information, the dialogue sentence, and the user geographical location information query, and then send the reply sentence back to the headset.
The interaction assistant of the earphone can carry out voice navigation interaction with the user, and navigation voice can be played according to real-time user direction information, real-time user geographic position information and voice continuously spoken by the user in the interaction process.
For example, the user: what is a good taste nearby?
The interaction assistant: what style do you want to eat?
The user: sichuan style.
The interaction assistant: there is a home 'east slope of eyebrow state' in the vicinity of 800 m with good evaluation, do you consider?
The user: is good.
The interaction assistant: now help you navigate to "east slope of eyebrow state", do you see a red tall building in front?
The user: it is seen.
The interaction assistant: you go about 200 meters towards a red tall building.
And determining that the user starts to walk towards the red high-rise direction at the moment according to the real-time geographical position information of the user.
The interaction assistant: there is a barber shop under the red high-rise building, you see behind the barber shop, turn right
And determining that the user starts to turn right at the moment according to the real-time user direction information.
The interaction assistant: now 600 meters away from the destination.
The interaction assistant: do you see an intersection in front?
The user: it is seen.
The interaction assistant: after you walk to the intersection, turn left.
And determining that the user turns left after walking to the intersection according to the real-time user geographical position information and the user direction information.
The interaction assistant: continuing to go straight, the "east slope of Meizhou" is 100 meters ahead.
And determining that the user continues to move forward according to the real-time geographical position information of the user.
The interaction assistant: "east slope of eyebrow state" is on your left hand side, navigation is completed, and you are happy to have a meal.
In the embodiment of the invention, the earphone can acquire the voice recognition result of the voice of the user from the server; and acquiring a dialogue statement from the voice recognition result, and calling the interactive assistant to generate and play a reply statement matched with the dialogue statement. The embodiment of the invention realizes that the user does not need to operate the earphone by hands, and the earphone can ask questions and answer according to the voice of the user, thereby simplifying the operation process of the user.
The following description is from the perspective of a server.
Referring to fig. 5, a flowchart of the fifth step of an interactive method embodiment of the present invention is shown, the method is applied to a server, the server is connected with a headset in communication, the headset has an interactive assistant, and the method includes:
step 501, the server receives the user voice sent by the earphone, and identifies the user voice to obtain a voice identification result.
In the embodiment of the invention, the server has a voice recognition function, and can perform voice recognition on the user voice collected by the earphone to obtain a voice recognition result.
Step 502, sending the voice recognition result to the headset, where the headset is used to invoke the interactive assistant to execute an interactive operation according to the voice recognition result.
The headset is equipped with an interactive assistant, which may be a program installed in the headset and may provide a variety of interactive functions. After the earphone receives the voice recognition result sent by the server, the interaction assistant can be called to execute interaction operation according to the voice recognition result so as to realize various interaction functions of the earphone.
The interaction assistant may be awakened in a particular manner, such as a particular voice command. Some interactive functions of the interactive assistant may be performed after being woken up, and some interactive functions may be performed without being woken up.
In the embodiment of the invention, the server can send the voice recognition result of the user voice to the earphone, and the interaction assistant of the earphone can carry out interaction operation according to the voice recognition result of the user voice without using hands of the user to operate the earphone, thereby realizing various interaction functions of the earphone.
In one embodiment, the interactive functionality of the headset may include a song recommendation function. The headset may invoke an interactive assistant to obtain the user state based on the speech recognition result. When the speech recognition result includes information that characterizes the user's need to find a suitable song or adjust the sound effect of a song, the interactive assistant of the headset may request the server to recommend the song or adjust the sound effect of the song.
In one example, the server may obtain a user status detected by the headset; and searching a recommended song matched with the state of the user, and sending the recommended song to the headset so that the headset calls the interaction assistant to recommend the recommended song to the user. In another example, the server may obtain a user status detected by the headset; determining a sound effect matched with the user state; and adjusting the preset song to the sound effect, and sending the preset song after the sound effect is adjusted to the earphone so that the earphone calls the interactive assistant to play the preset song after the sound effect is adjusted.
In another embodiment, the interactive functionality of the headset may comprise an information recording interactive functionality. The server can recognize information from the user voice according to the voice recognition result and record the information, or acquire the recorded information according to the voice recognition result and send the information to the earphone so that the earphone plays the recorded information.
The server can send the recorded information to the earphone, and can also acquire and record the recorded information of the earphone.
In one example, the server may recognize the target voice from the user voice according to the voice recognition result and record the target voice, or acquire the recorded target voice according to the voice recognition result and send the target voice to the headset so that the headset plays the target voice. In another example, the server may identify memo information from the voice recognition result according to the voice recognition result and record the memo information, or acquire preset memo information according to the voice recognition result and send the preset memo information to the headset, so that the headset plays the preset memo information.
The server can generate a reminding event aiming at the memo information after recording the memo information; and sending the reminding event to the earphone, so that when the triggering condition of the preset reminding event is met, the earphone can call an interactive assistant to obtain and play the memo information corresponding to the preset reminding event.
The server can also perform semantic analysis on the memo information to obtain a semantic analysis result; and generating label information for the memo information according to the semantic analysis result. When the voice recognition result comprises the memo information which represents the requirement for searching the target tag information, the server can search the preset memo information matched with the target tag information and send the preset memo information to the earphone, so that the earphone plays the preset memo information matched with the target tag information.
In another embodiment, the interactive functionality of the headset may include question and answer interactive functionality. The server acquires a dialogue statement from the voice recognition result; and generating a reply sentence matched with the conversation sentence, and sending the reply sentence by the earphone so that the earphone plays the reply sentence matched with the conversation sentence to realize question-answer interaction with the user.
In one example, the question-answer interaction function may include voice navigation, and the server may acquire user orientation information detected by the headset; and generating a reply sentence for voice navigation according to the user direction information and the dialogue sentence, and sending the reply sentence to the earphone so as to enable the earphone to play the reply sentence for voice navigation.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 6, a block diagram of a first embodiment of an interactive apparatus according to the present invention is shown, where the interactive apparatus is applied to a headset, the headset is communicatively connected to a server, the headset has an interaction assistant, and the apparatus may specifically include the following modules:
a voice recognition result obtaining module 601, configured to send a user voice to the server, and obtain a voice recognition result of the user voice from the server;
a first interaction module 602, configured to invoke the interaction assistant to perform an interaction operation according to the speech recognition result.
Referring to fig. 7, a block diagram of a second embodiment of an interactive apparatus according to the present invention is shown, where the interactive apparatus is applied to a headset, the headset is communicatively connected to a server, the headset has an interaction assistant, and the apparatus may specifically include the following modules:
a voice recognition result obtaining module 701, configured to send a user voice to the server, and obtain a voice recognition result of the user voice from the server;
a first interaction module 702, configured to invoke the interaction assistant to perform an interaction operation according to the voice recognition result.
In this embodiment of the present invention, the first interaction module 702 may include:
a wake-up sub-module 7021, configured to wake up the interactive assistant according to the voice recognition result;
a user state obtaining sub-module 7022, configured to obtain a user state;
and the song interaction sub-module 7023 is used for calling the interaction assistant to recommend or play songs according to the user state.
In the embodiment of the present invention, the earphone has a gravity sensor, and the user status obtaining sub-module 7022 is configured to obtain sensing data detected by the gravity sensor, and determine the user status according to the sensing data.
In this embodiment of the present invention, the song interaction submodule 7023 is configured to send the user status to the server; and receiving a recommended song sent by the server and calling the interactive assistant to recommend the recommended song to a user, wherein the recommended song is a song searched by the server and matched with the user state.
In this embodiment of the present invention, the song interaction submodule 7023 is configured to send the user status to the server; receiving preset songs sent by the server after sound effect adjustment and calling the interaction assistant to play; and the preset song after the sound effect adjustment is the sound effect which is determined by the server to be matched with the user state, and the preset song is adjusted to be the sound effect.
In this embodiment of the present invention, the first interaction module 702 may include:
and the first recording interaction sub-module 7024 is configured to invoke the interaction assistant to recognize and record information from the user voice according to the voice recognition result, or acquire and play recorded information according to the voice recognition result.
In the embodiment of the present invention, the first recording interaction sub-module 7024 is configured to invoke the interaction assistant to identify and record memo information from the voice recognition result according to the voice recognition result, or to obtain preset memo information according to the voice recognition result and play the memo information.
In this embodiment of the present invention, the first recording interaction sub-module 7024 is configured to invoke the interaction assistant to recognize a target voice from the user voice according to the voice recognition result and record the target voice, or to obtain the recorded target voice according to the voice recognition result and play the target voice.
In this embodiment of the present invention, the interaction apparatus may further include:
the first recording information transmission module 703 is configured to send recorded information to the server, and/or obtain and record information recorded by the server.
In this embodiment of the present invention, the interaction apparatus may further include:
a first reminding event generating module 704, configured to generate a reminding event for the memo information after recording the memo information.
In this embodiment of the present invention, the interaction apparatus may further include:
a first reminding event obtaining module 705, configured to obtain a preset reminding event for the memo information from the server.
In this embodiment of the present invention, the interaction apparatus may further include:
and the first reminding event triggering module 706 is configured to, when a triggering condition of a preset reminding event is met, call the interaction assistant to obtain memo information corresponding to the preset reminding event and play the memo information.
In the embodiment of the present invention, the first recording interaction submodule 7024 is configured to search for information matching the voice recognition result from preset memo information; and calling the interactive assistant to play the information matched with the voice recognition result.
In this embodiment of the present invention, the interaction apparatus may further include:
a first semantic analysis module 707, configured to obtain a semantic analysis result obtained by performing semantic analysis on the memo information by the server;
the first tag generating module 708 is configured to generate tag information for the memo information according to a semantic analysis result.
In this embodiment of the present invention, the first record interaction sub-module 7024 is configured to, when the voice recognition result includes a memo information that represents a need to search for a target tag information, invoke the interaction assistant to search for a preset memo information that matches the target tag information, and play the preset memo information.
In this embodiment of the present invention, the first interaction module 702 may include:
a first spoken sentence acquisition submodule 7025 configured to acquire a spoken sentence from the speech recognition result;
and the first dialogue interaction sub-module 7026 is used for calling the interaction assistant to generate a reply sentence matched with the dialogue sentence and playing the reply sentence.
In the embodiment of the present invention, the first dialogue interaction submodule 7026 is configured to obtain user direction information; and calling the interactive assistant to generate a reply sentence for voice navigation according to the user direction information and the dialogue sentence and play the reply sentence.
In the embodiment of the present invention, the headset has an orientation sensor, and the first dialogue interaction submodule 7026 is configured to obtain the user orientation information detected by the orientation sensor.
In the embodiment of the present invention, the first dialogue interaction submodule 7026 is configured to obtain user geographical location information; and calling the interactive assistant to generate a reply sentence for voice navigation and play the reply sentence according to the user direction information, the conversation sentence and the user geographical position information.
In this embodiment of the present invention, the first dialogue interaction submodule 7026 is configured to send navigation query information to the server; the navigation query information comprises the user orientation information and the dialogue sentences; and receiving and playing a reply sentence for voice navigation sent by the server, wherein the reply sentence for voice navigation is generated by the server according to the user direction information and the dialogue sentence query.
Referring to fig. 8, a block diagram of a third embodiment of an interactive apparatus according to the present invention is shown, where the interactive apparatus is applied to a server, the server is communicatively connected to a headset, the headset has an interaction assistant, and the apparatus may specifically include the following modules:
the voice recognition module 801 is configured to receive the user voice sent by the earphone, and recognize the user voice to obtain a voice recognition result;
a voice recognition result sending module 802, configured to send the voice recognition result to the headset, where the headset is configured to invoke the interaction assistant to perform an interaction operation according to the voice recognition result.
Referring to fig. 9, a block diagram of a fourth embodiment of an interactive apparatus according to the present invention is shown, where the interactive apparatus is applied to a server, the server is communicatively connected to a headset, the headset has an interaction assistant, and the apparatus may specifically include the following modules:
the voice recognition module 901 is configured to receive the user voice sent by the earphone, and recognize the user voice to obtain a voice recognition result;
a voice recognition result sending module 902, configured to send the voice recognition result to the headset, where the headset is configured to invoke the interaction assistant to perform an interaction operation according to the voice recognition result.
In this embodiment of the present invention, the interaction apparatus may further include:
a first user state obtaining module 903, configured to obtain a user state detected by the earphone;
a first song sending module 904, configured to search for a recommended song matching the user status, and send the recommended song to the headset; the earphone is used for calling the interaction assistant to recommend the recommended song to the user.
In this embodiment of the present invention, the interaction apparatus may further include:
a second user state obtaining module 905, configured to obtain a user state detected by the earphone;
a sound effect determining module 906, configured to determine a sound effect matching the user state;
a second song sending module 907, configured to adjust a preset song to the sound effect, and send the preset song after the sound effect is adjusted to the earphone, where the earphone is configured to invoke the interactive assistant to play the preset song after the sound effect is adjusted.
In this embodiment of the present invention, the interaction apparatus may further include:
a recorded information processing module 908, configured to recognize and record information from the user voice according to the voice recognition result, or obtain and send recorded information to the headset according to the voice recognition result, where the headset is configured to play the recorded information.
In an embodiment of the present invention, the recorded information processing module 908 may include:
and the memo information processing submodule 9081 is configured to recognize and record memo information from the voice recognition result according to the voice recognition result, or acquire preset memo information according to the voice recognition result and send the preset memo information to the headset.
In an embodiment of the present invention, the recorded information processing module 908 may include:
and the voice processing sub-module 9082 is configured to recognize and record a target voice from the user voice according to the voice recognition result, or acquire and send a recorded target voice to the headset according to the voice recognition result.
In this embodiment of the present invention, the interaction apparatus may further include:
a second recorded information transmission module 909 for transmitting information recognized from the user's voice to the headset and/or acquiring and recording the information recorded by the headset.
In this embodiment of the present invention, the interaction apparatus may further include:
a second reminding event generating module 910, configured to generate a reminding event for the memo information after recording the memo information;
and a reminding event sending module 911, configured to send the reminding event to the headset, where the headset is configured to call the interaction assistant to obtain memo information corresponding to a preset reminding event and play the memo information when a trigger condition of the preset reminding event is met.
In this embodiment of the present invention, the interaction apparatus may further include:
a second semantic analysis module 912, configured to perform semantic analysis on the memo information to obtain a semantic analysis result;
and a second tag generating module 913, configured to generate tag information for the memo information according to a semantic analysis result.
In this embodiment of the present invention, the memo information processing sub-module 9081 is configured to, when the voice recognition result includes a memo information that represents a need to search for a target tag information, search for a preset memo information that matches the target tag information, and send the preset memo information to the headset.
In this embodiment of the present invention, the interaction apparatus may further include:
a dialogue sentence acquisition module 914, configured to acquire a dialogue sentence from the speech recognition result;
a reply sentence sending module 915, configured to generate a reply sentence matched with the conversation sentence and send the reply sentence to the headset, where the headset is configured to play the reply sentence matched with the conversation sentence.
In this embodiment of the present invention, the reply sentence sending module 915 is configured to obtain the user direction information detected by the earphone; and generating a reply sentence for voice navigation according to the user direction information and the dialogue sentence, and sending the reply sentence to the earphone.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Fig. 10 is a block diagram illustrating a structure of a headset 1000 for interaction according to an exemplary embodiment. Referring to fig. 10, the headset 1000 may include one or more of the following components: processing component 1002, memory 1004, power component 1006, multimedia component 1008, audio component 1010, input/output (I/O) interface 1012, sensor component 1014, and communications component 1016.
The processing component 1002 generally controls the overall operation of the headset 1000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 1002 may include one or more processors 1020 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components. For example, the processing component 1002 can include a multimedia module to facilitate interaction between the multimedia component 1008 and the processing component 1002.
The memory 1004 is configured to store various types of data to support operation at the headset 1000. Examples of such data include instructions for any application or method operating on the headset 1000, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1004 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power component 1006 provides power to the various components of the headset 1000. The power components 1006 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the headset 1000.
The multimedia component 1008 includes a screen that provides an output interface between the headset 1000 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the headset 1000 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 1010 is configured to output and/or input audio signals. For example, the audio component 1010 includes a Microphone (MIC) configured to receive external audio signals when the headset 1000 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 1004 or transmitted via the communication component 1016. In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.
I/O interface 1012 provides an interface between processing component 1002 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 1014 includes one or more sensors for providing various aspects of state assessment for the headset 1000. For example, the sensor assembly 1014 may detect an open/closed state of the headset 1000, the relative positioning of the components, such as a display and keypad of the headset 1000, the sensor assembly 1014 may also detect a change in position of the headset 1000 or one of the components of the headset 1000, the presence or absence of user contact with the headset 1000, orientation or acceleration/deceleration of the headset 1000, and a change in temperature of the headset 1000. The sensor assembly 1014 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1016 is configured to facilitate wired or wireless communication between the headset 1000 and other devices. The headset 1000 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1014 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1014 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the headset 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 1004 comprising instructions, executable by the processor 1020 of the headset 1000 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A headset comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:
sending user voice to a server, and acquiring a voice recognition result of the user voice from the server;
and calling an interactive assistant to execute interactive operation according to the voice recognition result.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
awakening the interactive assistant according to the voice recognition result;
acquiring a user state;
and calling the interactive assistant to recommend songs or play songs according to the user state.
Optionally, the headset has a gravity sensor, and the acquiring the user status includes:
and acquiring sensing data detected by the gravity sensor, and determining the state of the user according to the sensing data.
Optionally, the invoking the interactive assistant to recommend songs according to the user status includes:
sending the user status to the server;
and receiving a recommended song sent by the server and calling the interactive assistant to recommend the recommended song to a user, wherein the recommended song is a song searched by the server and matched with the user state.
Optionally, the invoking the interactive assistant to play a song according to the user status includes:
sending the user status to the server;
receiving preset songs sent by the server after sound effect adjustment and calling the interaction assistant to play; and the preset song after the sound effect adjustment is the sound effect which is determined by the server to be matched with the user state, and the preset song is adjusted to be the sound effect.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
and calling the interactive assistant to recognize information from the user voice according to the voice recognition result and record the information, or acquiring the recorded information according to the voice recognition result and playing the information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the memo information from the voice recognition result according to the voice recognition result and record the memo information, or acquiring preset memo information according to the voice recognition result and playing the preset memo information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the target voice from the user voice according to the voice recognition result and record the target voice, or acquiring and playing the recorded target voice according to the voice recognition result.
Optionally, the method further includes performing the following operations: and sending the recorded information to the server, and/or acquiring and recording the recorded information of the server.
Optionally, the method further includes performing the following operations: after the memo information is recorded, a reminding event for the memo information is generated.
Optionally, the method further includes performing the following operations: and acquiring a preset reminding event aiming at the memo information from the server.
Optionally, the method further includes performing the following operations: and when the trigger condition of a preset reminding event is met, calling the interactive assistant to acquire and play the memo information corresponding to the preset reminding event.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
searching information matched with the voice recognition result from preset memo information;
and calling the interactive assistant to play the information matched with the voice recognition result.
Optionally, the method further includes performing the following operations: obtaining a semantic analysis result obtained by performing semantic analysis on the memo information by the server;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
and when the voice recognition result comprises the memo information with the target tag information searched for according to the representation requirement, calling the interaction assistant to search for the preset memo information matched with the target tag information and playing the preset memo information.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
obtaining a dialogue statement from the voice recognition result;
and calling the interactive assistant to generate and play a reply sentence matched with the conversation sentence.
Optionally, the invoking the interactive assistant, generating and playing a reply sentence matched with the dialogue sentence includes:
acquiring user azimuth information;
and calling the interactive assistant to generate a reply sentence for voice navigation according to the user direction information and the dialogue sentence and play the reply sentence.
Optionally, the headset has an orientation sensor, and the acquiring of the user orientation information includes:
and acquiring the user orientation information detected by the orientation sensor.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
acquiring user geographical position information;
and calling the interactive assistant to generate a reply sentence for voice navigation and play the reply sentence according to the user direction information, the conversation sentence and the user geographical position information.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
sending navigation inquiry information to the server; the navigation query information comprises the user orientation information and the dialogue sentences;
and receiving and playing a reply sentence for voice navigation sent by the server, wherein the reply sentence for voice navigation is generated by the server according to the user direction information and the dialogue sentence query.
A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a headset, enable the headset to perform a method of interaction, the method comprising:
sending user voice to a server, and acquiring a voice recognition result of the user voice from the server;
and calling an interactive assistant to execute interactive operation according to the voice recognition result.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
awakening the interactive assistant according to the voice recognition result;
acquiring a user state;
and calling the interactive assistant to recommend songs or play songs according to the user state.
Optionally, the headset has a gravity sensor, and the acquiring the user status includes:
and acquiring sensing data detected by the gravity sensor, and determining the state of the user according to the sensing data.
Optionally, the invoking the interactive assistant to recommend songs according to the user status includes:
sending the user status to the server;
and receiving a recommended song sent by the server and calling the interactive assistant to recommend the recommended song to a user, wherein the recommended song is a song searched by the server and matched with the user state.
Optionally, the invoking the interactive assistant to play a song according to the user status includes:
sending the user status to the server;
receiving preset songs sent by the server after sound effect adjustment and calling the interaction assistant to play; and the preset song after the sound effect adjustment is the sound effect which is determined by the server to be matched with the user state, and the preset song is adjusted to be the sound effect.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
and calling the interactive assistant to recognize information from the user voice according to the voice recognition result and record the information, or acquiring the recorded information according to the voice recognition result and playing the information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the memo information from the voice recognition result according to the voice recognition result and record the memo information, or acquiring preset memo information according to the voice recognition result and playing the preset memo information.
Optionally, the invoking the interactive assistant to recognize and record information from the user voice according to the voice recognition result, or to acquire and play recorded information according to the voice recognition result, includes:
and calling the interactive assistant to recognize the target voice from the user voice according to the voice recognition result and record the target voice, or acquiring and playing the recorded target voice according to the voice recognition result.
Optionally, the method further comprises: and sending the recorded information to the server, and/or acquiring and recording the recorded information of the server.
Optionally, the method further comprises: after the memo information is recorded, a reminding event for the memo information is generated.
Optionally, the method further comprises: and acquiring a preset reminding event aiming at the memo information from the server.
Optionally, the method further comprises: and when the trigger condition of a preset reminding event is met, calling the interactive assistant to acquire and play the memo information corresponding to the preset reminding event.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
searching information matched with the voice recognition result from preset memo information;
and calling the interactive assistant to play the information matched with the voice recognition result.
Optionally, the method further comprises: obtaining a semantic analysis result obtained by performing semantic analysis on the memo information by the server;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining and playing preset memo information according to the voice recognition result includes:
and when the voice recognition result comprises the memo information with the target tag information searched for according to the representation requirement, calling the interaction assistant to search for the preset memo information matched with the target tag information and playing the preset memo information.
Optionally, the invoking the interactive assistant to perform an interactive operation according to the voice recognition result includes:
obtaining a dialogue statement from the voice recognition result;
and calling the interactive assistant to generate and play a reply sentence matched with the conversation sentence.
Optionally, the invoking the interactive assistant, generating and playing a reply sentence matched with the dialogue sentence includes:
acquiring user azimuth information;
and calling the interactive assistant to generate a reply sentence for voice navigation according to the user direction information and the dialogue sentence and play the reply sentence.
Optionally, the headset has an orientation sensor, and the acquiring of the user orientation information includes:
and acquiring the user orientation information detected by the orientation sensor.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
acquiring user geographical position information;
and calling the interactive assistant to generate a reply sentence for voice navigation and play the reply sentence according to the user direction information, the conversation sentence and the user geographical position information.
Optionally, the invoking the interactive assistant generates and plays a reply sentence for voice navigation according to the user orientation information and the dialog sentence, including:
sending navigation inquiry information to the server; the navigation query information comprises the user orientation information and the dialogue sentences;
and receiving and playing a reply sentence for voice navigation sent by the server, wherein the reply sentence for voice navigation is generated by the server according to the user direction information and the dialogue sentence query.
Fig. 11 is a schematic structural diagram illustrating a server 1100 for interaction according to another exemplary embodiment of the present invention. The servers may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors) and memory 1132, one or more storage media 1130 (e.g., one or more mass storage devices) storing applications 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1122 may be provided in communication with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server.
The server may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, one or more keyboards 1156, and/or one or more operating systems 1141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
A server comprising memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:
receiving user voice sent by an earphone, and identifying the user voice to obtain a voice identification result;
and sending the voice recognition result to the earphone, wherein the earphone is used for calling an interactive assistant to execute interactive operation according to the voice recognition result.
Optionally, the method further includes performing the following operations: acquiring a user state detected by the earphone;
searching for a recommended song matched with the user state, and sending the recommended song to the earphone; the earphone is used for calling the interaction assistant to recommend the recommended song to the user.
Optionally, the method further includes performing the following operations: acquiring a user state detected by the earphone;
determining sound effects matched with the user state;
and adjusting a preset song to the sound effect, and sending the preset song after the sound effect is adjusted to the earphone, wherein the earphone is used for calling the interactive assistant to play the preset song after the sound effect is adjusted.
Optionally, the method further includes performing the following operations: and identifying and recording information from the user voice according to the voice identification result, or acquiring and sending the recorded information to the earphone according to the voice identification result, wherein the earphone is used for playing the recorded information.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and identifying and recording memo information from the voice recognition result according to the voice recognition result, or acquiring preset memo information according to the voice recognition result and sending the preset memo information to the earphone.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and recognizing and recording a target voice from the user voice according to the voice recognition result, or acquiring and sending the recorded target voice to the earphone according to the voice recognition result.
Optionally, the method further includes performing the following operations: and sending information recognized from the user voice to the earphone, and/or acquiring and recording the recorded information of the earphone.
Optionally, the method further includes performing the following operations: after recording the memo information, generating a reminding event aiming at the memo information;
and the earphone is used for calling the interaction assistant to acquire and play memo information corresponding to a preset reminding event when a triggering condition of the preset reminding event is met.
Optionally, the method further includes performing the following operations: performing semantic analysis on the memo information to obtain a semantic analysis result;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining preset memo information according to the voice recognition result and sending the preset memo information to the headset includes:
and when the voice recognition result comprises the memo information with the target tag information, which is required to be searched, searching the preset memo information matched with the target tag information and sending the preset memo information to the earphone.
Optionally, the method further includes performing the following operations: obtaining a dialogue statement from the voice recognition result;
and generating a reply sentence matched with the conversation sentence and sending the reply sentence to the earphone, wherein the earphone is used for playing the reply sentence matched with the conversation sentence.
Optionally, the generating and sending a reply sentence matched with the dialogue sentence to the headset includes:
acquiring user direction information detected by the earphone;
and generating a reply sentence for voice navigation according to the user direction information and the dialogue sentence, and sending the reply sentence to the earphone.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of a server, enable the server to perform a method of interaction, the method comprising:
receiving user voice sent by an earphone, and identifying the user voice to obtain a voice identification result;
and sending the voice recognition result to the earphone, wherein the earphone is used for calling an interactive assistant to execute interactive operation according to the voice recognition result.
Optionally, the method further comprises:
acquiring a user state detected by the earphone;
searching for a recommended song matched with the user state, and sending the recommended song to the earphone; the earphone is used for calling the interaction assistant to recommend the recommended song to the user.
Optionally, the method further comprises:
acquiring a user state detected by the earphone;
determining sound effects matched with the user state;
and adjusting a preset song to the sound effect, and sending the preset song after the sound effect is adjusted to the earphone, wherein the earphone is used for calling the interactive assistant to play the preset song after the sound effect is adjusted.
Optionally, the method further comprises:
and identifying and recording information from the user voice according to the voice identification result, or acquiring and sending the recorded information to the earphone according to the voice identification result, wherein the earphone is used for playing the recorded information.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and identifying and recording memo information from the voice recognition result according to the voice recognition result, or acquiring preset memo information according to the voice recognition result and sending the preset memo information to the earphone.
Optionally, the recognizing and recording information from the user speech according to the speech recognition result, or acquiring recorded information according to the speech recognition result and sending the recorded information to the headset includes:
and recognizing and recording a target voice from the user voice according to the voice recognition result, or acquiring and sending the recorded target voice to the earphone according to the voice recognition result.
Optionally, the method further comprises:
and sending information recognized from the user voice to the earphone, and/or acquiring and recording the recorded information of the earphone.
Optionally, the method further comprises:
after recording the memo information, generating a reminding event aiming at the memo information;
and the earphone is used for calling the interaction assistant to acquire and play memo information corresponding to a preset reminding event when a triggering condition of the preset reminding event is met.
Optionally, the method further comprises:
performing semantic analysis on the memo information to obtain a semantic analysis result;
and generating label information for the memo information according to the semantic analysis result.
Optionally, the obtaining preset memo information according to the voice recognition result and sending the preset memo information to the headset includes:
and when the voice recognition result comprises the memo information with the target tag information, which is required to be searched, searching the preset memo information matched with the target tag information and sending the preset memo information to the earphone.
Optionally, the method further comprises:
obtaining a dialogue statement from the voice recognition result;
and generating a reply sentence matched with the conversation sentence and sending the reply sentence to the earphone, wherein the earphone is used for playing the reply sentence matched with the conversation sentence.
Optionally, the generating and sending a reply sentence matched with the dialogue sentence to the headset includes:
acquiring user direction information detected by the earphone;
and generating a reply sentence for voice navigation according to the user direction information and the dialogue sentence, and sending the reply sentence to the earphone.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The interaction method, the earphone and the server provided by the invention are described in detail, and the principle and the implementation mode of the invention are explained by applying specific examples, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. An interaction method applied to a headset which is in communication connection with a server, the headset having an interaction assistant, the method comprising:
the earphone sends user voice to the server and obtains a voice recognition result of the user voice from the server;
and calling the interactive assistant to execute interactive operation according to the voice recognition result.
2. The method of claim 1, wherein invoking the interactive assistant to perform an interactive operation based on the speech recognition result comprises:
awakening the interactive assistant according to the voice recognition result;
acquiring a user state;
and calling the interactive assistant to recommend songs or play songs according to the user state.
3. An interaction method applied to a server which is in communication connection with a headset, wherein the headset is provided with an interaction assistant, and the method comprises the following steps:
the server receives the user voice sent by the earphone and identifies the user voice to obtain a voice identification result;
and sending the voice recognition result to the earphone, wherein the earphone is used for calling the interaction assistant to execute interaction operation according to the voice recognition result.
4. The method of claim 3, further comprising:
acquiring a user state detected by the earphone;
searching for a recommended song matched with the user state, and sending the recommended song to the earphone; the earphone is used for calling the interaction assistant to recommend the recommended song to the user.
5. An interaction device applied to a headset which is in communication connection with a server, the headset having an interaction assistant, the device comprising:
the voice recognition result acquisition module is used for sending the user voice to the server and acquiring the voice recognition result of the user voice from the server;
and the first interactive module is used for calling the interactive assistant to execute interactive operation according to the voice recognition result.
6. An interaction device applied to a server which is in communication connection with a headset, wherein the headset is provided with an interaction assistant, the device comprises:
the voice recognition module is used for receiving the user voice sent by the earphone and recognizing the user voice to obtain a voice recognition result;
and the voice recognition result sending module is used for sending the voice recognition result to the earphone, and the earphone is used for calling the interaction assistant to execute interaction operation according to the voice recognition result.
7. An earphone comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
sending user voice to a server, and acquiring a voice recognition result of the user voice from the server;
and calling an interactive assistant to execute interactive operation according to the voice recognition result.
8. A server comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors the one or more programs including instructions for:
receiving user voice sent by an earphone, and identifying the user voice to obtain a voice identification result;
and sending the voice recognition result to the earphone, wherein the earphone is used for calling the interaction assistant to execute interaction operation according to the voice recognition result.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the interaction method according to any one of claims 1 to 2.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the interaction method according to any one of claims 3 to 4.
CN202010507540.4A 2020-06-05 2020-06-05 Interaction method and device, earphone and server Pending CN111739529A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010507540.4A CN111739529A (en) 2020-06-05 2020-06-05 Interaction method and device, earphone and server
PCT/CN2021/074916 WO2021244059A1 (en) 2020-06-05 2021-02-02 Interaction method and device, earphone, and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010507540.4A CN111739529A (en) 2020-06-05 2020-06-05 Interaction method and device, earphone and server

Publications (1)

Publication Number Publication Date
CN111739529A true CN111739529A (en) 2020-10-02

Family

ID=72648376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010507540.4A Pending CN111739529A (en) 2020-06-05 2020-06-05 Interaction method and device, earphone and server

Country Status (2)

Country Link
CN (1) CN111739529A (en)
WO (1) WO2021244059A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112331179A (en) * 2020-11-11 2021-02-05 北京搜狗科技发展有限公司 Data processing method and earphone accommodating device
CN112749349A (en) * 2020-12-31 2021-05-04 北京搜狗科技发展有限公司 Interaction method and earphone equipment
WO2021244059A1 (en) * 2020-06-05 2021-12-09 北京搜狗智能科技有限公司 Interaction method and device, earphone, and server

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006251376A (en) * 2005-03-10 2006-09-21 Yamaha Corp Musical sound controller
US20090274317A1 (en) * 2008-04-30 2009-11-05 Philippe Kahn Headset
CN103714836A (en) * 2012-09-29 2014-04-09 联想(北京)有限公司 Method for playing audio information and electronic equipment
CN104535074A (en) * 2014-12-05 2015-04-22 惠州Tcl移动通信有限公司 Bluetooth earphone-based voice navigation method, system and terminal
CN105263075A (en) * 2015-10-12 2016-01-20 深圳东方酷音信息技术有限公司 Earphone equipped with directional sensor and 3D sound field restoration method thereof
CN107071605A (en) * 2015-12-30 2017-08-18 杭州赛泫科技有限公司 Intelligent 3D earphones
CN206490796U (en) * 2016-08-16 2017-09-12 北京金锐德路科技有限公司 Acoustic control intelligent earphone
CN107478239A (en) * 2017-08-15 2017-12-15 上海摩软通讯技术有限公司 Air navigation aid, navigation system and audio reproducing apparatus based on audio reproducing apparatus
CN107515007A (en) * 2016-06-16 2017-12-26 北京小米移动软件有限公司 Air navigation aid and device
CN107569217A (en) * 2017-08-29 2018-01-12 上海展扬通信技术有限公司 A kind of control method of intelligent earphone and the intelligent earphone
CN108710486A (en) * 2018-05-28 2018-10-26 Oppo广东移动通信有限公司 Audio frequency playing method, device, earphone and computer readable storage medium
CN108958846A (en) * 2018-09-27 2018-12-07 出门问问信息科技有限公司 A kind of creation method and device of notepad item
CN110139178A (en) * 2018-02-02 2019-08-16 中兴通讯股份有限公司 A kind of method, apparatus, equipment and the storage medium of determining terminal moving direction
CN110136705A (en) * 2019-04-10 2019-08-16 华为技术有限公司 A kind of method and electronic equipment of human-computer interaction
CN111010641A (en) * 2019-12-20 2020-04-14 联想(北京)有限公司 Information processing method, earphone and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315561A (en) * 2017-06-30 2017-11-03 联想(北京)有限公司 A kind of data processing method and electronic equipment
CN108280067B (en) * 2018-02-26 2023-04-18 深圳市百泰实业股份有限公司 Earphone translation method and system
CN108900945A (en) * 2018-09-29 2018-11-27 上海与德科技有限公司 Bluetooth headset box and audio recognition method, server and storage medium
CN109785837A (en) * 2019-01-28 2019-05-21 上海与德通讯技术有限公司 Sound control method, device, TWS bluetooth headset and storage medium
CN111739529A (en) * 2020-06-05 2020-10-02 北京搜狗科技发展有限公司 Interaction method and device, earphone and server

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006251376A (en) * 2005-03-10 2006-09-21 Yamaha Corp Musical sound controller
US20090274317A1 (en) * 2008-04-30 2009-11-05 Philippe Kahn Headset
CN103714836A (en) * 2012-09-29 2014-04-09 联想(北京)有限公司 Method for playing audio information and electronic equipment
CN104535074A (en) * 2014-12-05 2015-04-22 惠州Tcl移动通信有限公司 Bluetooth earphone-based voice navigation method, system and terminal
CN105263075A (en) * 2015-10-12 2016-01-20 深圳东方酷音信息技术有限公司 Earphone equipped with directional sensor and 3D sound field restoration method thereof
CN107071605A (en) * 2015-12-30 2017-08-18 杭州赛泫科技有限公司 Intelligent 3D earphones
CN107515007A (en) * 2016-06-16 2017-12-26 北京小米移动软件有限公司 Air navigation aid and device
CN206490796U (en) * 2016-08-16 2017-09-12 北京金锐德路科技有限公司 Acoustic control intelligent earphone
CN107478239A (en) * 2017-08-15 2017-12-15 上海摩软通讯技术有限公司 Air navigation aid, navigation system and audio reproducing apparatus based on audio reproducing apparatus
CN107569217A (en) * 2017-08-29 2018-01-12 上海展扬通信技术有限公司 A kind of control method of intelligent earphone and the intelligent earphone
CN110139178A (en) * 2018-02-02 2019-08-16 中兴通讯股份有限公司 A kind of method, apparatus, equipment and the storage medium of determining terminal moving direction
CN108710486A (en) * 2018-05-28 2018-10-26 Oppo广东移动通信有限公司 Audio frequency playing method, device, earphone and computer readable storage medium
CN108958846A (en) * 2018-09-27 2018-12-07 出门问问信息科技有限公司 A kind of creation method and device of notepad item
CN110136705A (en) * 2019-04-10 2019-08-16 华为技术有限公司 A kind of method and electronic equipment of human-computer interaction
CN111010641A (en) * 2019-12-20 2020-04-14 联想(北京)有限公司 Information processing method, earphone and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021244059A1 (en) * 2020-06-05 2021-12-09 北京搜狗智能科技有限公司 Interaction method and device, earphone, and server
CN112331179A (en) * 2020-11-11 2021-02-05 北京搜狗科技发展有限公司 Data processing method and earphone accommodating device
CN112749349A (en) * 2020-12-31 2021-05-04 北京搜狗科技发展有限公司 Interaction method and earphone equipment

Also Published As

Publication number Publication date
WO2021244059A1 (en) 2021-12-09

Similar Documents

Publication Publication Date Title
CN110634483B (en) Man-machine interaction method and device, electronic equipment and storage medium
WO2021244057A1 (en) Interaction method and apparatus, earphone, and earphone accommodation apparatus
US9990176B1 (en) Latency reduction for content playback
CN107644646B (en) Voice processing method and device for voice processing
CN113744733B (en) Voice trigger of digital assistant
WO2021244059A1 (en) Interaction method and device, earphone, and server
WO2018018482A1 (en) Method and device for playing sound effects
JP2019117623A (en) Voice dialogue method, apparatus, device and storage medium
CN114041283A (en) Automated assistant engaged with pre-event and post-event input streams
WO2016165325A1 (en) Audio information recognition method and apparatus
CN105580071B (en) Method and apparatus for training a voice recognition model database
WO2021031308A1 (en) Audio processing method and device, and storage medium
CN107666536B (en) Method and device for searching terminal
KR20160106075A (en) Method and device for identifying a piece of music in an audio stream
WO2021051588A1 (en) Data processing method and apparatus, and apparatus used for data processing
CN111640434A (en) Method and apparatus for controlling voice device
CN112068711A (en) Information recommendation method and device of input method and electronic equipment
CN111739528A (en) Interaction method and device and earphone
CN110415703A (en) Voice memos information processing method and device
CN112988956B (en) Method and device for automatically generating dialogue, and method and device for detecting information recommendation effect
CN106098066B (en) Voice recognition method and device
EP4276818A1 (en) Speech operation method for device, apparatus, and electronic device
CN107068125B (en) Musical instrument control method and device
CN112015280B (en) Data processing method and device and electronic equipment
CN111741405B (en) Reminding method and device, earphone and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210705

Address after: 100084 Room 802, 8th floor, building 9, yard 1, Zhongguancun East Road, Haidian District, Beijing

Applicant after: Beijing Sogou Intelligent Technology Co.,Ltd.

Address before: 100084. Room 9, floor 01, cyber building, building 9, building 1, Zhongguancun East Road, Haidian District, Beijing

Applicant before: BEIJING SOGOU TECHNOLOGY DEVELOPMENT Co.,Ltd.

TA01 Transfer of patent application right