CN114822551A

CN114822551A - Interaction method based on intelligent earphone

Info

Publication number: CN114822551A
Application number: CN202210447421.3A
Authority: CN
Inventors: 丹尼尔·吴; 安德鲁·吴
Original assignee: Xinxing Technology Hangzhou Co ltd
Current assignee: Xinxing Technology Hangzhou Co ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-07-29

Abstract

An interaction method based on smart headsets, the method comprising the steps of: step 1), a user activates an intelligent earphone by using a awakening word or a certain wireless device, and the intelligent earphone is deactivated mainly through natural language; the awakening words and sentences comprise non-universal awakening words and sentences; step 2), after the intelligent earphone is activated, identifying and awakening words and sentences, presuming user intentions, inquiring and returning relevant information through the words and sentences spoken by the user; and step 3), the sound control and play module of the intelligent earphone determines the information play mode and speed according to different user intentions. The invention provides an interaction method based on an intelligent earphone, which uses a non-universal awakening word or a certain wireless device to reduce the embarrassment of a user under the situation of audiences, form natural language flow and change the playing mode and speed of the intelligent earphone according to different user intentions.

Description

Interaction method based on intelligent earphone

Technical Field

The invention relates to the technical field of intelligent earphone interaction, in particular to an interaction method based on an intelligent earphone.

Background

The smart headset mainly means 1) the bluetooth headset comprises a small microphone, a small loudspeaker and a terminal system (such as a smart phone) connected with the bluetooth headset, or 2) the miniature smart phone comprises a small microphone, a small loudspeaker and all other components of the phone, the miniature microphone, the small loudspeaker and all other components of the phone are all manufactured into the shape of the headset and placed in an external auditory canal of a user to realize sound interaction with the user.

The main limitations of current intelligent headset technology and major products on the market are:

1) the method for activating the intelligent earphone comprises the following steps: here, activation means that the smart headset waits for a certain sound or signal of the user to start receiving the voice input of the user after being powered on. The current activation methods of smart headset technology and major products on the market are: 1A) the user needs to activate the smart phone by operating the phone or operating a button on the smart phone, or 1B) activate the smart phone using a generic wake-up word (e.g., "small degree").

For 1A, the user needs to operate by hand, which increases the inconvenience of the user.

For 1B, with the popularization of a mobile phone and a voice interaction intelligent assistant of a large company (e.g. Siri of an apple mobile phone, lovely classmates of a millet mobile phone, smallness and smallness of a small family, etc.), a general awakening word is often known by other listeners near a user (e.g. "lovely classmates", "smallness and smallness"), when the user speaks a related general awakening word with mouth, in the presence of listeners, the user may experience a certain degree of embarrassment and cannot disguise himself from using the intelligent assistant to obtain related information. However, in many practical situations, the user needs to inquire about the related information by the listener without knowing that the user is using the intelligent assistant, including the specific scenes of telling the listener, speaking jokes, reciting poems, singing, etc.

2) The aspects of the playing mode and the speed are as follows: the current smart headset only plays the content (such as query content) that the user wants to listen to, and the playing speed of the smart headset is the same regardless of the current scene of the user. This design can be problematic in a number of important application scenarios: when the user inquires information and only listens to the information, the speed of the intelligent earphone playing the information can be normal, but if the user needs to speak the heard information (such as jokes, stories and the like) to the audience in real time, a certain pause exists between sentences played by the intelligent earphone needed by the user, and the user can have time to repeat the heard information to the audience. The existing intelligent headset interaction mode does not take this into account.

3) In the aspect of playing contents: the playing content of the current intelligent earphone is generally based on the existing information, and a user cannot self-define or pre-upload related information and listen to the uploaded content.

Disclosure of Invention

In order to overcome the defects of the existing intelligent earphone interaction method, the invention provides an interaction method based on an intelligent earphone, which uses non-universal awakening words or a certain wireless device, reduces the embarrassment of a user under the situation of audiences, forms natural language flow, and changes the playing mode and speed of the intelligent earphone according to different user intentions.

The technical scheme adopted by the invention is as follows:

an interaction method based on smart headsets, the method comprising the steps of:

step 1), a user activates an intelligent earphone by using a awakening word or a certain wireless device, and the intelligent earphone is deactivated mainly through natural language; the awakening words and sentences comprise non-universal awakening words and sentences; the natural language means that a user activates the intelligent earphone through a non-universal awakening word or sentence under the condition of having an audience, so that the embarrassment caused by the fact that the user uses a universal awakening word under the condition of having the audience is avoided;

step 2), after the intelligent earphone is activated, identifying and awakening words and sentences, presuming user intentions, inquiring and returning relevant information through the words and sentences spoken by the user;

step 2.1), after the intelligent earphone is activated, recognizing the awakening words and sentences through a voice activation and recognition module of the intelligent earphone, and recording an awakening mode, namely, through voice activation or through some wireless equipment, and simultaneously playing feedback voice or feedback voice to a user by a voice control and playing module of the intelligent earphone to prompt the user that the intelligent earphone receives the awakening words and sentences of the user and wait for the user to input the words and sentences; such as "i am, please say what needs to be played".

Step 2.2), the user speaks a word or sentence containing the keyword, such as a poem of Li Bai; (ii) a

Step 2.3), the voice activation and recognition module recognizes words and sentences of the user and extracts keywords in the words and sentences spoken by the user;

step 2.4), the user intention presumption and information base service interaction module of the intelligent earphone searches related information base contents according to the keywords obtained in the step 2.3), wherein the related information base contents comprise the existing information base or the information base built by the user, and simultaneously plays feedback sound or feedback voice, such as 'query in progress'. If the content can be inquired, returning the inquired content and the activated information base; if not, the feedback user does not inquire the information; the user intention presumption and information base service interaction module of the intelligent headset activates and queries a related information base according to the keywords of the user, for example, if the keywords are 'jokes', the jokes base is activated, and query content and the name of the activated information base are returned; if the key words contain related information of the song, such as the name of the song, the name of a singer or the lyrics of a song, activating an information base containing the song, and returning the query content and the name of the activated information base; the information base refers to an information base containing various characters or audio songs, can be an established information base, and can also be an information base established by a user; the information base is divided into two categories, namely a song base containing song information such as audio of lyrics or songs and the like, and a non-song base containing no song information; the non-song library may specifically include, but is not limited to: joke, story, poetry, riddle, brainstorming turn, romantic speech, delicacy and cooking, encyclopedia, safety and emergency treatment, problem solving, various expertise and dictionaries, life experience, celebrities and sentences, dialogue skills and advice, travel information, health and disease diagnosis, beauty slimming, movie & TV, lovely pets, stars, entertainment, sports, science and technology, home furnishing, constellation, art, history, geography, military, news, etc., and user-built information.

Step 2.5), the step and step 2.4) are carried out simultaneously, the user intention presumption and information base service interaction module of the intelligent earphone presumes the intention of the user according to the mode (through voice activation or through some wireless equipment) and the words (non-universal or universal words) of awakening the intelligent earphone of the user:

intention 1: if the user speaks one of the five types of non-generic wake-up phrases or presses a wireless device, presuming that the user's intent is to extract the content of an existing or self-established information base to speak to or sing to the listener; the user intention presumption and information base service interaction module sends the intention to the sound control and playing module;

intention 2: if the user speaks a general awakening word and sentence, such as the name of the intelligent earphone, the intention of the user is presumed that the user inquires information for listening; the user intention presumption and information base service interaction module sends the intention to the sound control and playing module;

step 3), the sound control and playing module of the intelligent earphone determines the information playing mode and speed according to different user intentions and the activated information base; according to the result of step 2.5), if it is intention 1: if the user wants to speak or sing to the listener, step 3.1) is entered; if intention 2, go to step 3.4);

step 3.1), judging whether the activated information library is a song library or a non-song library through the keywords of the user in the step 2.4); if the non-song library is activated, entering step 3.2); if the song library is activated, step 3.3) is entered;

step 3.2) if the song library is not activated, processing the sound control and playing module of the intelligent earphone according to the inquired content returned in the step 2.4) according to the following steps:

step 1, segmenting paragraphs and sentences into a plurality of small sentences according to the marks of the text, which are used as the marks for segmenting the small sentences;

and 2, checking the length of each small sentence, if the length of one small sentence exceeds the longest word number limiting parameter X of a single small sentence, analyzing the grammatical structure of the sentence, automatically dividing the small sentence into a plurality of small sentences, and ensuring that the divided small sentences meet the following requirements: 1) the main guest and the subordinate guest are complete as much as possible; 2) pronouns, nouns, verbs, adjectives, adverbs, phrases (such as idioms), and the like are not cut; 3) the total word number is less than the longest word number limiting parameter X;

step 3, playing each small word and stopping for N seconds intentionally, wherein N is the word number multiplied by C of the word; the parameter C is a time parameter (unit is word/second) for intentionally pausing after a single sentence is played, when the user uses the smart headset for the first time, the parameter C has a default value, for example, C is 0.4 word/second, and the user can also modify the value of the parameter through a natural language (see step 4)) or a user-defined and input module of the smart headset.

Step 3.3), if the song library is activated, the user words and sentences and the query result in the song library obtained in step 2.4) are divided into two conditions:

case 1: if the audio of the song can be inquired, playing the audio;

case 2: if only the lyrics of the song can be queried: case 2A) if the voice activation and recognition module of the smart headset can accurately recognize the current singing progress of the user, the voice control and playing module will play the next lyric of the song to the user M seconds in advance. M is the advance time for playing lyrics, the unit is second, the intelligent earphone has a default value (such as 1 second) when leaving a factory, and a user can adjust the intelligent earphone through the user definition and the input module of the intelligent earphone; case 2B) if the intelligent headset cannot recognize the singing progress of the user or the recognition error is high, playing the lyrics according to the playing time of each sentence of lyrics contained in the lyric playing table of the song.

Step 3.4): if the awakening words and sentences of the user are the name of the intelligent earphone, namely the universal awakening words, the intention of the user is presumed that the user inquires information for listening by the user; the smart headset will play these sounds at normal speed: if the non-song library is activated, the time parameter C of the intentional pause after the single sentence is played is 0; if the song is played in the song library, M is 0.

Further, the interaction method further comprises the following steps: step 4), the user controls the playing of the intelligent earphone through the keywords:

the user can control the playing of the intelligent earphone through the following natural voice interaction mode, and particularly under the condition that an audience exists, embarrassment is avoided.

Case 1: if the user is not satisfied with the content played by the smart headset, the user wants to replace the content or enter the next message or the next song:

step 4.1.1): the user may reactivate the smart headset using any of class 1 to class 5 non-generic activation words or some wireless means.

Step 4.1.2): the user speaks in his mouth keywords (e.g. story, joke) containing the required information base. Or to say synonyms that mean changing one (e.g., again, next, change one, etc.), such as "i then say you a story".

Step 4.1.3): after the voice activation and recognition module obtains the keywords of the words and phrases, the intelligent headset repeats the steps 2) and 3).

Case 2: if the user feels that the pause time between sentences played by the intelligent earphone is too short, the playing speed of the intelligent earphone is too high.

Step 4.2.1): the user may reactivate the smart headset using any of class 1 to class 5 non-generic activation words or some wireless means.

Step 4.2.2): the user may speak words and sentences that speak too fast themselves and synonyms with the mouth, such as "i may speak too fast".

Step 4.2.3): the voice activation and recognition module recognizes keywords containing 'too fast' or synonyms thereof in words and sentences, recognizes the current intention of the user through the intention presumption and information base service interaction module, and further prolongs the pause time between sentences through the voice control and playing module, namely, the value of the time parameter C of intentional pause after playing of a single small sentence is increased. Meanwhile, the smart headset feeds back the playing correlation to the user, for example, "i will slow down the playing speed when receiving".

Case 3: if the user feels that the pause time between sentences played by the intelligent earphone is too long, the playing speed is too slow:

step 4.3.1): the user may reactivate the smart headset using any of class 1 to class 5 non-generic activation words or some wireless means.

Step 4.3.2): the user may speak words and synonyms that are too slow to speak by himself with the mouth, such as "i may speak too slowly".

Step 4.3.3): the voice activation and recognition module recognizes that words and sentences contain 'too slow' or synonym keywords thereof, recognizes the current intention of the user through the intention presumption and information base service interaction module, and further reduces the pause time between the sentences through the voice control and playing module, namely reduces the value of the time parameter C of intentional pause after the single small sentence is played. Meanwhile, the smart headset feeds back the playing correlation to the user, for example, "i will speed up the playing after receiving".

Case 4: if the user wants to pause the playback of the smart headset:

step 4.4.1): condition 1) the intention inference and information base service interaction module of the intelligent headset extracts the intention of the current user to extract the content of the existing or self-established information base so as to listen to or sing to the audience; and simultaneously meeting the condition 2) the voice activation and recognition module does not acquire any sound of the user within the time range of D seconds, wherein the sound comprises a awakening word or singing voice or other words. Wherein D is a time threshold parameter for the intelligent headset to pause without a signal, and when the intelligent headset leaves the factory, a default value (e.g., 10 seconds) is provided, and the user can also modify the value of the parameter through natural language or user-defined and input modules of the intelligent headset. When the condition 1) and the condition 2) are simultaneously satisfied, the intention presumption and information base service interaction module judges that the intention of the user is to pause the playing of the earphone and sends the intention to the sound control and playing module.

Step 4.4.2): the sound control and playing module receives the intention of pausing the playing of the earphone and pauses the playing of sound; meanwhile, playing pause is fed back to the user and the user is prompted to activate the intelligent earphone, for example, the user can say that the user wants to think about a woken-up word and reactivate the user.

The user may also implement the functions of changing content, fast playing, slow playing, and pause through multiple buttons of a wireless device. Conventional keywords in addition, when the user intends to play the content for the user to listen to, that is, after step 3.4) is executed, the user can also control the playing of the smart headset by using conventional keyword sentences, such as pause, close, stop playing, and the like.

Still further, the non-generic wake-up phrases include the following five categories:

class 1: the user speaks words and sentences needing to be thought or recalled by a first person;

class 2: words and sentences that the user may know with himself/herself spoken by the first person;

class 3: the user asks for words and sentences of others;

class 4: the user speaks a word or a sentence which the user wants to speak or tell others;

class 5: a user self-defines and awakens words and sentences;

the generic wake-up expression is typically the name of the smart headset.

Still further, the first category of the non-generic wake-up phrases may employ the following: "i want to want", "let i want to want", "i want to want", "let i want to think about", "let i remember about" or synonyms of the above words and phrases;

a second category of non-generic wake words may employ the following words: "i may know", "this i may know", "that i may know", or synonyms of the above;

a third category of non-generic wake words may employ the following: "you/you know", "i want to ask you/you", "i want to know", "i want to ask down", "ask you/you", "you/you know" or "whether you/you know" or synonyms of the above words;

a fourth category of non-generic wake words may employ the following words: "i say now", "i start reporting", "i come home", "i say once", "i say home", "i start now", "i want to say down", "i want to report once", "i want to tell everybody", "i want to tell you", "i want to report once everybody", "i want to say once everybody", or synonyms of the above words.

The fifth category of non-generic wake-up words is user-defined non-generic wake-up words.

The user self-defining and input module comprises the following functions:

1) displaying the list of the information base and the method for activating each information base, allowing the user to modify or increase or decrease the method for activating the information base, and also allowing the user to modify or increase or decrease the existing information base and the content thereof; the displaying includes presenting the relevant information visually, and also includes informing the user by voice or the like.

2) Displaying and allowing a user to modify or increase or decrease all awakening words and sentences of the intelligent earphone, and also allowing the user to modify feedback sound of the intelligent earphone;

3) the method comprises the steps of displaying all parameters of the intelligent earphone and factory default values of the parameters, and allowing a user to define or adjust the parameters, wherein the parameters comprise a limitation parameter X of the longest word number of a single little sentence, a time parameter C of an intentional pause after the single little sentence is played, a time parameter M of a lyric playing advance and a time threshold parameter D of the earphone when the earphone is not in signal pause;

4) allowing a user to input and modify information to be played, namely self-building information bases, and enabling the user to appoint and activate one or more activation keyword sentences corresponding to each self-building information base;

the module may specifically be software or a web page of the terminal to which the smart headset is connected.

The invention has the following beneficial effects: the user can use the non-general awakening words or other wireless equipment to activate the intelligent earphones, so that the embarrassment caused by the fact that the user awakens the intelligent assistant by using the general awakening words under the condition that an audience is present is avoided; the invention forms a relatively natural language stream between the user and the intelligent earphone, and can adjust the playing mode and speed of the sound played by the intelligent earphone according to the intention of the user, thereby effectively matching the user to speak text or sing songs to the audience under the condition that the audience exists.

Drawings

FIG. 1 is a flow chart of an interaction method of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, the main function modules of an intelligent headset include a voice activation and recognition module, a user intention presumption and information base service interaction module, and a sound control and play module. Optional modules are user customization and input modules and some kind of wireless device.

The voice activation and recognition module is used for receiving the awakening words of the user and recognizing the voice of the user, and the voice is converted into text information.

And the user intention presumption and information base service interaction module is used for presuming the intention of the user according to the awakening words and the following words of the user, sending the identified information to a related information base for inquiring, and sending the returned inquiry information to the sound control and play module. These information bases may be stored in memory local to the smart headset or on a cloud server. The user can also build an information base by himself, and if the information base is in a text form, the information base is classified into a non-song base; if the song is in the form of a song, the song is categorized in a song library.

The sound control and play module is used for playing and controlling sound to be heard by a user and comprises a sound control component and a micro loudspeaker, and wired or wireless connection between the sound control component and the micro loudspeaker; the sound control component may be part of a playback program or software.

The user self-defines and inputs the module, the function of this module is as follows: 1) displaying a list of the information bases and a method for activating each information base (for example, the method for activating the poem base is a name of a certain author, a certain sentence of poem, a certain type of poem and the like), allowing a user to modify or increase or decrease the method for activating the information base, and allowing the user to modify or increase or decrease the existing information base and the content thereof. The displaying includes presenting the relevant information visually, and also includes informing the user by voice or the like. 2) And displaying and allowing the user to modify or increase or decrease all awakening words and sentences of the intelligent headset and also allowing the user to modify the feedback sound of the intelligent headset. 3) The method includes displaying parameters of the smart headset and factory defaults thereof, and allowing a user to customize or adjust the parameters, including a single-sentence longest word number limit parameter X (unit: word), the time parameter C (unit: word/second), advance time parameter M of playing lyrics (unit: seconds), and a time threshold parameter D (unit: seconds); 4) the method allows a user to input and modify information to be played, namely self-built information bases, and enables the user to specify one or more activating key words corresponding to each self-built information base, such as 'report of the current day'. The module may specifically be software or a webpage of a terminal connected to the smart headset, such as an APP or a webpage or a WeChat applet, computer software or a webpage on a mobile phone.

Also included is some wireless means of activating a smart headset: any device capable of performing wireless communication with the smart headset or the smart headset terminal and activating the smart headset may be a wireless device independent of the bluetooth headset, such as a remote controller.

There are many hardware solutions for implementing these functional modules, and the specific hardware such as processor, memory, battery, and buttons etc. are not listed.

The same as the existing intelligent earphone, the above module has multiple realization modes: for example, the method 1: the microphone in the voice activation and recognition module and the micro-speaker in the sound control and playing module may be formed as a bluetooth headset to be placed in the external auditory canal of a user and connected to a terminal (e.g., a smart phone) comprising other modules and components in a wireless communication manner such as bluetooth. The module in the terminal comprises: the voice activation and recognition module comprises a voice wake-up and recognition component, a user intention presumption and information base service interaction module, a voice control component and an optional user self-defining and input module, wherein the voice wake-up and recognition component, the user intention presumption and information base service interaction module and the voice control and play module can be realized through APP, a webpage, an applet, voice interaction without visual display and the like. In addition, the voice wake-up component in the voice activation and recognition module can be embedded into a Bluetooth headset or a terminal (such as a smart phone).

Another example is mode 2: the voice activation and recognition module, the user intention presumption and information base service interaction module and the sound control and playing module are all integrated into an integral hardware and are made into an earphone shape to be placed in the external auditory canal of a user. The optional user self-defining and input module can be realized through APP, a webpage, a WeChat applet or voice interaction without visual display and the like, and is connected with the whole hardware through a wireless mode such as Bluetooth and the like.

step 1), a user activates an intelligent earphone by using a awakening word or a certain wireless device, and the intelligent earphone is deactivated mainly through natural language; the natural language means that a user activates the intelligent earphone through a non-universal awakening word or sentence under the condition of having an audience, so that the embarrassment caused by the fact that the user uses a universal awakening word under the condition of having the audience is avoided;

and step 2), after the intelligent earphone is activated, identifying and awakening words and sentences, presuming user intentions, inquiring and returning relevant information through the words and sentences spoken by the user.

Step 2.1), after the intelligent earphone is activated, recognizing the awakening words and sentences through a voice activation and recognition module of the intelligent earphone, recording an awakening mode, namely activating through voice or activating through some wireless equipment, and simultaneously playing feedback voice or feedback voice to a user through a voice control and playing module of the intelligent earphone to prompt the user that the intelligent earphone receives the awakening words and sentences of the user and wait for the user to input the words and sentences; such as "i am, please say what needs to be played".

Step 2.2), the user speaks a word or sentence containing the keyword, such as a poem of plum white.

step 2.4), the user intention presumption and information base service interaction module of the intelligent earphone searches related information base contents according to the keywords obtained in the step 2.3), wherein the related information base contents comprise the existing information base or the information base built by the user, and simultaneously plays feedback sound or feedback voice, such as 'query in progress'. If the content can be inquired, returning the inquired content and the activated information base; if not, the feedback user does not inquire the information. The user intention presumption and information base service interaction module of the intelligent headset judges the activated information base according to the keywords of the user, for example, if the keywords are 'jokes', the jokes base is activated, and the query content and the name of the activated information base are returned. If the keyword contains related information of the song, such as the name of the song, the name of a singer, or a lyric of the song, the song library is activated, and the query content and the name of the activated information library are returned. The information base refers to information bases containing songs of various characters or audios, can be established information bases, and can also be established information bases by users; the information base is divided into two categories, namely a song base containing song information such as audio of lyrics or songs and the like, and a non-song base containing no song information; the non-song library may specifically include, but is not limited to: joke, story, poetry, riddle, brainstorming turn, romantic speech, delicacy and cooking, encyclopedia, safety and emergency treatment, problem solving, various professional knowledge and dictionaries, life experience, celebrities and good sentences, dialogue skills and advice, travel information, health and disease diagnosis, beauty slimming, movie & TV, lovely pets, stars, entertainment, sports, science and technology, home furnishing, constellation, art, history, geography, military, news, etc. and user self-built information;

intention 1: if the user speaks one of the 5 classes of non-generic wake-up phrases or presses a wireless device, the user's intent is presumably to extract the contents of an existing or self-created library of information to speak to or sing to the listener. The user intent inference and information base service interaction module will send the intent to the voice control and playback module.

Intention 2: if the user speaks a general awakening word, such as the name of the intelligent earphone, the intention of the user is presumed to be that the user inquires information for listening. The user intent inference and information base service interaction module will send the intent to the voice control and playback module.

And 3) determining an information playing mode and speed by a sound control and playing module of the intelligent headset according to different user intentions and the activated information base. According to the result of step 2.5), if it is intention 1: if the user wants to speak or sing to the listener, step 3.1) is entered; if it is intention 2, step 3.4) is entered.

Step 3.1), judging whether the activated information library is a song library or a non-song library through the keywords of the user in the step 2.4); if the non-song library is activated, entering step 3.2); if the song library is activated, step 3.3) is entered.

and 2, checking the length of each small sentence, if the length of one small sentence exceeds the longest word number limiting parameter X of a single small sentence, analyzing the grammatical structure of the sentence, automatically dividing the small sentence into a plurality of small sentences, and ensuring that the divided small sentences meet the following requirements: 1) the main guest and the subordinate guest are complete as much as possible; 2) pronouns, nouns, verbs, adjectives, adverbs, phrases (e.g., idioms), etc. are not cut; 3) the total number of words is less than the longest word number limit parameter X;

step 3, playing each small word and stopping for N seconds intentionally, wherein N is the word number multiplied by C of the word; the parameter C is a time parameter (unit is word/second) for intentionally pausing after a single sentence is played, when the user uses the smart headset for the first time, the parameter C has a default value, for example, C is 0.4 word/second, and the user can also modify the value of the parameter through a natural language (see step 4)) or a user-defined and input module of the smart headset;

step 3.3), if the song library is activated, the user words and sentences obtained in step 2.4) and the query results in the song library are asked for 2 conditions:

case 1: if the audio of the song can be inquired, playing the audio;

Further, the interaction method further comprises the following steps: and step 4), controlling the playing of the intelligent earphone by the user through the keyword:

Step 4.1.2): the user speaks in his mouth keywords (e.g. story, joke) containing the required information base. Or the user says synonyms for the meaning of changing one (e.g., again, next, changing one, etc.), such as "i then say you a story".

Case 2: if the user feels that the pause time between sentences played by the intelligent earphone is too short, namely the playing speed of the intelligent earphone is too fast:

step 4.2.1): the user may reactivate the smart headset using any of the category 1 to category 5 non-generic activation words or some wireless means.

Step 4.2.2): the user may speak words and synonyms that speak too fast for themselves in the mouth, such as "i may speak too fast".

step 4.3.1): the user may reactivate the smart headset using any of the category 1 to category 5 non-generic activation words or some wireless means.

Step 4.3.3): the voice activation and recognition module recognizes a keyword containing 'too slow' or a synonym thereof in words and sentences, recognizes the current intention of the user through the intention presumption and information base service interaction module, and further reduces the pause time between sentences through the voice control and playing module, namely reduces the value of the time parameter C of intentional pause after a single small sentence is played. Meanwhile, the smart headset feeds back the playing correlation to the user, for example, "i will speed up the playing after receiving".

Case 4: if the user wants to pause the playback of the smart headset:

The user may also implement the functions of changing content, fast playing, slow playing, and pause through multiple buttons of a wireless device. In addition, when the user intends to play the content for the user to listen to, that is, after the step 3.4) is executed, the user can also control the playing of the smart headset by using the conventional keyword sentences, such as pause, close, stop playing, and the like.

Still further, the wake-up words of step 1) include non-general wake-up words and general wake-up words, and the non-general wake-up words include the following five categories:

class 3: the user asks for words and sentences of others;

class 5: a user self-defines and awakens words and sentences;

the generic wake-up expression is typically the name of the smart headset.

The first category of the non-generic wake-up words may employ the following words: "i want to want", "let i want to want", "i want to want a thing", "let i want to take a picture", "let i think a picture", "let i consider a picture", "let i remember a picture", or synonyms of the above words.

a fourth category of non-generic wake words may employ the following words: "now i say", "i come to report", "i come to a bar", "i say once", "i say a bar", "i now start", "i want to say next", "i want to report once", "i want to tell everybody", "i want to say next to you", "i want to tell your, i want to report once to everybody", "i want to say once to everybody" or a synonym sentence of the above words and sentences.

An example of a natural language flow that a user activates using a non-generic wake expression and interacts with a smart headset is as follows:

example 1) user queries existing library content and speaks to listener:

1. the user says to the listener: "let I want one's touch";

2. the user hears the smart headset: "what i am, please say needs to be played";

3. the user says to the listener: "tell you a good-hearing story bar";

4. the user hears the smart headset: "query is being made … …, story occurs in far middle east area", earphone pause N seconds, where N is word number × C of the sentence, "mastergong of story is fischer-tropsch-gaa";

5. the user says to the listener: "Story occurs in the far middle east region" … …

If the user feels the play speed is too slow, the user speaks to the listener: "not good meaning, probably I say too slowly", then the intelligent earphone reduces the pause time between sentences and plays, and the user hears the intelligent earphone simultaneously and says: "i will speed up the play speed".

If condition 1) is satisfied that the current user's intent is to extract the content of an existing or self-created library of information to listen to or sing a song to the listener; and simultaneously meeting the condition 2) the voice activation and recognition module does not acquire any sound of the user within the time range of D seconds, including awakening words and sentences, singing or other words and sentences, the intelligent earphone pauses playing sound, and simultaneously the user hears the intelligent earphone: "i sleep now, you can say that let i think of a woken-up word like a thought and reactivate me".

Example 2) the user creates and speaks the self-created library content to the listener:

the first step is as follows: before a user speaks to an audience, the user inputs information to be played and the activated keywords of the information to be played, such as ' today ' report content ', into the user intention presumption and information base service interaction module of the intelligent earphone through the user customizing and input module of the intelligent earphone.

The second step is that: the user interacts with the smart headset and speaks to the listener in the following steps:

1. the user says to the listener: "I want to report one down";

2. the user hears the intelligent earphone to make ding-dong sound;

3. the user says to the listener: "today's report content";

4. the user hears the intelligent headset and starts playing "in this bright spring", the headset pauses for N seconds, where N is the word number × C of the sentence, "we are happy together to celebrate me 100 rd product sale successfully," the headset pauses for N seconds, where N is the word number × C of the sentence, … …

5. The user says to the listener: "in this bright spring" … ….

Claims

1. An interaction method based on an intelligent earphone is characterized in that: the method comprises the following steps:

step 2.1), after the intelligent earphone is activated, recognizing the awakening words and sentences through a voice activation and recognition module of the intelligent earphone, and recording an awakening mode, namely, through voice activation or through some wireless equipment, and simultaneously playing feedback voice or feedback voice to a user by a voice control and playing module of the intelligent earphone to prompt the user that the intelligent earphone receives the awakening words and sentences of the user and wait for the user to input the words and sentences;

step 2.2), the user speaks words and sentences containing keywords;

step 2.4), searching related information base contents including an existing information base or an information base built by the user according to the keywords obtained in the step 2.3) by a user intention presumption and information base service interaction module of the intelligent earphone, and simultaneously playing feedback sound or feedback voice; if the content can be inquired, returning the inquired content and the activated information base; if not, feedback that the user does not inquire the information is fed back; the user intention presumption and information base service interaction module of the intelligent headset activates and inquires related information bases according to the keywords of the user and returns inquiry contents and names of the activated information bases;

step 2.5), the step and the step 2.4) are carried out simultaneously, and the user intention presumption and information base service interaction module of the intelligent earphone presumes the intention of the user according to the mode and the words of awakening the intelligent earphone by the user;

and step 3), the sound control and play module of the intelligent earphone determines the information play mode and speed according to different user intentions and the activated information base.

2. The intelligent headset-based interaction method of claim 1, wherein: the non-universal awakening words comprise the following five categories:

class 3: the user asks for words and sentences of others;

class 5: and the user defines and awakens words and sentences.

3. The intelligent headset-based interaction method of claim 2, wherein: the step 2.5) further comprises the following steps: the user's intentions include the following two:

intention 1: if the user speaks a non-generic wakeup phrase or presses some wireless device, presuming that the user's intent is to extract the contents of an existing or self-created library of information to hear or sing a song to the listener; the user intention presumption and information base service interaction module sends the intention to the sound control and playing module;

intention 2: if the user speaks a general awakening word or sentence, presuming that the user intends to inquire information for the user to listen to; the user intent inference and information base service interaction module will send the intent to the voice control and playback module.

4. The intelligent headset-based interaction method of claim 3, wherein: the step 3) further comprises the following steps: according to the result of step 2.5), if it is intention 1: if the user wants to speak or sing to the listener, step 3.1) is entered; if intention 2, go to step 3.4);

and 2, checking the length of each small sentence, if the length of one small sentence exceeds the maximum word number limiting parameter X of the single small sentence, analyzing the grammatical structure of the sentence, automatically cutting the small sentence into a plurality of small sentences, and ensuring that the cut small sentences meet the following requirements: 1) the main guest and the subordinate guest are complete as much as possible; 2) pronouns, nouns, verbs, adjectives, adverbs and phrases are not cut open; 3) the total word number is less than the longest word number limiting parameter X;

step 3, playing each small word and stopping for N seconds intentionally, wherein N is the word number multiplied by C of the word; the parameter C is a time parameter of intentional pause after a single sentence is played, the unit is word/second, when a user uses the intelligent earphone for the first time, the parameter C has a default value, and the user can modify the value of the parameter through natural language or user self-definition and an input module of the intelligent earphone.

case 1: if the audio of the song can be inquired, playing the audio;

case 2: if only the lyrics of the song can be queried: case 2A) if the voice activation and recognition module of the smart headset can accurately recognize the current singing progress of the user, the voice control and playing module plays the next lyric of the song to the user M seconds in advance, wherein M is the advance time for playing the lyric, the unit is second, the smart headset has a default value when leaving the factory, and the user can adjust the lyric through the user-defined and input module of the smart headset; case 2B) if the intelligent headset cannot identify the singing progress of the user or the identification error is high, playing the lyrics according to the playing time of each sentence of lyrics contained in the lyric playing table of the song;

5. The intelligent headset-based interaction method of claim 4, wherein: the interaction method further comprises the following steps: step 4), the user controls the playing of the intelligent earphone through the keywords:

the user can control the playing of the intelligent earphone through the following natural voice interaction mode;

step 4.1.1): the user may reactivate the smart headset using any of class 1 to class 5 non-generic activation words or some wireless device;

step 4.1.2): the user speaks a keyword containing a required information base or speaks a synonym with a meaning changed;

step 4.1.3): after the voice activation and recognition module obtains the keywords of the words and sentences, the intelligent headset repeats the step 2) and the step 3);

step 4.2.1): the user can reactivate the smart headset using any of class 1 to class 5 non-generic activation words or some wireless device; (ii) a

Step 4.2.2): the user can speak words and sentences which are spoken too fast and synonyms with mouth;

step 4.2.3): the voice activation and recognition module recognizes keywords containing 'too fast' or synonyms thereof in words and sentences, recognizes the current intention of the user through the intention presumption and information base service interaction module, and further prolongs the pause time between sentences through the voice control and playing module, namely, the value of the time parameter C of intentional pause after playing a single small sentence is increased; meanwhile, the intelligent earphone feeds back the playing correlation to the user;

step 4.3.1): the user may reactivate the smart headset using any of class 1 to class 5 non-generic activation words or some wireless device;

step 4.3.2): the user can speak words and sentences which are too slow to speak by himself or herself and synonyms by mouth;

step 4.3.3): the voice activation and recognition module recognizes keywords including 'too slow' or synonyms thereof in words and sentences, recognizes the current intention of the user through the intention presumption and information base service interaction module, and further reduces the pause time between the sentences through the voice control and playing module, namely reduces the value of the time parameter C of intentional pause after the single small sentence is played; meanwhile, the intelligent earphone feeds back the playing correlation to the user;

case 4: if the user wants to pause the playback of the smart headset:

step 4.4.1): condition 1) the intention inference and information base service interaction module of the intelligent headset extracts the intention of the current user to extract the content of the existing or self-established information base so as to listen to or sing to the audience; simultaneously meeting the condition 2), the voice activation and recognition module does not acquire any sound of the user within the time range of D seconds, wherein the sound comprises awakening words and sentences, singing voices or other words and sentences; d is a time threshold parameter for the intelligent earphone to pause without a signal, a default value is provided when the intelligent earphone leaves a factory, and a user can modify the value of the parameter through natural language or user self-definition and an input module of the intelligent earphone; when the conditions 1) and 2) are simultaneously met, the intention presumption and information base service interaction module judges that the intention of the user is to pause the playing of the earphone and sends the intention to the sound control and playing module;

step 4.4.2): the sound control and playing module receives the intention of pausing the playing of the earphone and pauses the playing of sound; and simultaneously, playing pause feedback to the user and prompting the user of an activation method of the intelligent earphone.

6. An intelligent headset-based interaction method according to any of the claims 2 to 5, characterized in that: the first category of the non-generic wake-up words may employ the following words: "i want to want", "let i want to want", "i want to want", "let i want to think about", "let i remember about" or synonyms of the above words and phrases;

a fourth category of non-generic wake words may employ the following words: "i say now", "i start reporting", "i come home", "i say once", "i say home", "i start now", "i want to say down", "i want to report once", "i want to tell everybody", "i want to say down to your", "i want to tell you to report once to everybody", "i want to say once to everybody", or synonyms of the above words and sentences;

7. An intelligent headset-based interaction method according to claim 4 or 5, characterized in that: the user self-defining and input module comprises the following functions:

1) displaying the list of the information base and the method for activating each information base, allowing the user to modify or increase or decrease the method for activating the information base, and also allowing the user to modify or increase or decrease the existing information base and the content thereof;

the module can be realized by software of a terminal connected with the intelligent headset, a webpage or a voice interaction mode.

8. An intelligent headset-based interaction method according to any of the claims 1 to 5, characterized in that: the information base refers to various information bases containing songs in characters or audio, and can be an established information base or an information base established by a user; the information base is divided into two categories, namely a song base containing song information and a non-song base containing no song information; the non-song library may specifically include, but is not limited to: joke, story, poem, riddle, brainstorming turn, love words, cuisine and cooking, encyclopedia, safety and emergency handling, problem solving, various professional knowledge and dictionaries, life experiences, nouse and good sentences, dialogue skills and advice, travel information, health and disease treatment, beauty slimming, movie & TV, lovely pets, stars, entertainment, sports, science and technology, home furnishing, constellation, art, history, geography, military, news, and user-built information.