CN110930998A - Voice interaction method and device and vehicle - Google Patents
Voice interaction method and device and vehicle Download PDFInfo
- Publication number
- CN110930998A CN110930998A CN201811095748.9A CN201811095748A CN110930998A CN 110930998 A CN110930998 A CN 110930998A CN 201811095748 A CN201811095748 A CN 201811095748A CN 110930998 A CN110930998 A CN 110930998A
- Authority
- CN
- China
- Prior art keywords
- voice
- content
- played
- determining
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000002452 interceptive effect Effects 0.000 claims abstract description 93
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a voice interaction method, a voice interaction device and a vehicle, wherein the voice interaction method comprises the following steps: receiving voice information of a user; determining the content to be played and the current interactive scene according to the voice information; and playing the content to be played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.
Description
Technical Field
The application relates to the technical field of voice interaction, in particular to a voice interaction method, a voice interaction device and a vehicle.
Background
With the rapid development of automobile industry and the improvement of living standard of people in China, the quantity of automobiles owned by families of residents is rapidly increased, and the automobiles gradually become one of indispensable transportation means in the life of people.
The progress of intelligent science and technology makes the user experience the demand to the intellectuality in the car more and more obvious, and more mobile units have also possessed intelligent networking function, and mobile units such as on-vehicle pronunciation assistant, on-vehicle air purifier, on-vehicle portable power source have evolved in the development situation that the on-vehicle intelligent equipment appears the flowers and all together to satisfy the demand that the user improves constantly in the driving process. The vehicle-mounted voice assistant is a vehicle-mounted intelligent product integrating functions of intelligent voice, one-key call, network music, radio station customization, voice navigation and the like, and can provide better driving experience for users.
However, the existing vehicle-mounted voice assistant cannot interact with the user by adopting different voices according to different scenes, and the used voice is too monotonous, so that the user is unwilling to listen more or use the voice, and the user experience is poor.
Disclosure of Invention
An object of the present application is to provide a voice interaction method, device and vehicle, which can solve the above technical problems, and can interact with a user by using different sounds according to different interaction scenes, so that the user experience is good.
In order to solve the above technical problem, the present application provides a voice interaction method, including:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
Wherein, the determining the content to be played and the current interactive scene according to the voice information comprises:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
Wherein, the determining the content to be played and the current interactive scene according to the voice information comprises:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
Before receiving the voice information of the user, the method further includes:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-associated voice packets, age-associated voice packets, speed-associated voice packets and preference-associated voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
After the voice packet selected by the user is set as the preset voice packet corresponding to the interactive scene, the method further includes:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or optional voice packet by using the updated voice packet.
Wherein the method further comprises:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
The present application further provides a voice interaction device, comprising a processor, wherein the processor is configured to execute the program instructions to implement the steps comprising:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
Wherein, the processor executes the step of determining the current interactive scene and the content to be played according to the voice information, and the step comprises the following steps:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
Wherein, the processor executes the step of determining the current interactive scene and the content to be played according to the voice information, and the step comprises the following steps:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
The application also provides a vehicle, which comprises the voice interaction device.
According to the voice interaction method, the voice interaction device and the vehicle, after the voice information of the user is received, the content to be played and the current interactive scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.
The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical means of the present application more clearly understood, the present application may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present application more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
FIG. 1 is a flow diagram illustrating a method of voice interaction, according to an example embodiment.
Fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an exemplary embodiment.
Detailed Description
To further illustrate the technical means and effects of the present application for achieving the intended application, the following detailed description is provided with specific embodiments, methods, steps, structures, features and effects of the voice interaction method, device and vehicle according to the present application in combination with the accompanying drawings and preferred embodiments.
The foregoing and other technical matters, features and effects of the present application will be apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings. While the present application is susceptible to embodiment and specific details, specific reference will now be made in detail to the present disclosure for the purpose of illustrating the general principles of the invention.
FIG. 1 is a flow diagram illustrating a method of voice interaction, according to an example embodiment. Referring to fig. 1, the voice interaction method of the present embodiment includes, but is not limited to, the following steps:
When a user needs to interact with the voice assistant, a voice instruction is spoken in a voice manner, such as "playing a weather forecast", "introducing a nearby scenery", and the like, and the vehicle receives voice information of the user and performs voice recognition to acquire voice content of the voice information, which is usually converted into corresponding text information.
And step 120, determining the content to be played and the current interactive scene according to the voice information.
The content to be played may be a segment of speech obtained according to the voice information, such as a tourist guide, a weather forecast content, a segment of news, and the like, and the content to be played may also be a preset reply content obtained according to the voice information, such as "what kind of news you want to listen to" and "what is the local weather forecast". The interactive scene is a scene which is determined by the user characteristics of the current user or the theme of the current content to be played and accords with the mood of the user when the interactive scene interacts with the user, such as an age scene, a gender scene, an interactive theme scene and the like.
In one embodiment, determining the content to be played and the current interactive scene according to the voice information includes:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the voiceprint recognition result, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
The Voiceprint (Voiceprint) is a sound wave spectrum which is displayed by an electro-acoustic instrument and carries speech information, has the characteristics of specificity and relative stability, can be subjected to gender analysis or age analysis by utilizing the Voiceprint characteristics so as to confirm the user characteristics such as the gender, the age and the like of the current user, and different genders and ages can determine the preference of the user to sound, for example, children prefer cheerful sound, women prefer soft sound and the like, so that different user characteristics correspond to different interaction scenes, for example, a child conversation scene, a female conversation scene, a male conversation scene and the like.
When speech recognition is carried out, speech characteristic information of speech information is extracted in frames, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted for matching the content to be played after the content of the speech information is recognized.
In one embodiment, determining the content to be played and the current interactive scene according to the voice information includes:
carrying out voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
When speech recognition is carried out, speech feature information of speech information is extracted in a frame mode, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted to match the content to be played after the content of the speech information is recognized.
After the content of the voice information is identified, the current interactive scene can be determined according to the keywords of the voice content, the interactive scene is not limited to weather broadcast, tour guide, financial news, scientific news and the like, for example, the keywords of the voice information of weather broadcast, nearby scenic spots introduced and financial news played are respectively weather broadcast, scenic spots and financial news, and the corresponding interactive scene can be respectively weather broadcast, tour guide and financial news. As another embodiment, the current interactive scene may be determined according to the theme type of the content to be played, for example, the obtained content to be played is a scenic spot tour guide word, a weather forecast content, and a piece of financial news, and then the theme type of the content to be played may be determined as weather forecast, scenic spot introduction, and financial news, respectively, and further, the corresponding interactive scene may be determined as "weather broadcast", "tour guide", and "financial news".
And step 130, playing the content to be played according to a preset voice packet corresponding to the interactive scene.
The preset voice packets can be default voice packets of each interactive scene or voice packets set by a user aiming at different interactive scenes in advance, the default voice packets of each interactive scene can be multiple, for example, multiple voice packets such as male voice, female voice, young voice and the like, and when the voice packets are played, the voice packets can be randomly used or automatically selected according to the content to be played. In this embodiment, the voice packet is a real-person pronunciation voice packet, i.e., a voice library (also called a speaker), which is a warehouse for storing sounds, and the voice library usually records sounds by real persons according to words or sentence groups, and then stores the sounds in a database in a centralized manner.
When the preset voice packet is the default voice packet of each interactive scene, default setting is carried out according to general rules, for example, sound of a teacher is used for conversation of children, sound of a weather forecaster on a television is used for weather forecast, sound of a tour guide is used for historical landscape introduction, sound of high-tech fast pace is used for scientific and technological news, real person pronunciation of finance and economic is used for investment news, and the like.
When the preset voice packet is a voice packet that is preset by the user for different interactive scenes, in an embodiment, before receiving the voice information of the user in step 110, the method may further include the following steps:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-related voice packets, age-related voice packets, speed-related voice packets and preference-related voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
When a user needs to set a voice packet aiming at a certain interactive scene, a voice packet setting instruction is triggered to enter a setting interface through an operation interface of a voice assistant, the interactive scene to be set is selected, the interface displays the selectable voice packet of the currently selected interactive scene, the selectable voice packet comprises at least one of a gender-related voice packet, an age-related voice packet, a speed-related voice packet and a preference-related voice packet, the gender-related voice packet is a voice packet of real person pronunciation with different genders, the age-related voice is a voice packet of real person pronunciation with different age groups, the speed-related voice is a voice packet of real person pronunciation with different speed, and the preference-related voice packet is a voice packet recommended according to the preference of the user on the voice type, so that different voice packets can be selected under each interactive scene, for example, financial news can be male, financial news and financial news, Female, young, etc. to select the pronunciation of various real persons. When a user selects a certain selectable voice packet, the voice packet can be audited in a voice playing mode, if audition is satisfactory, the corresponding voice packet can be selected, and the corresponding voice packet is set as a preset voice packet of the currently selected interactive scene.
For obtaining the user preferences, in an embodiment, the voice interaction method of the present application may further include the following steps:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending the voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
The voice package selection method comprises the steps that after a voice package is set by a user each time, the voice type of a preset voice package used by the user in different interactive scenes is recorded, such as age, gender, speed and intonation, deep learning is conducted through the voice type of the preset voice package used historically, the voice preference of the user in different interactive scenes can be obtained, after the voice preference of the user is obtained, the voice package suitable for the different interactive scenes is recommended to the user according to the voice preference, or the selectable voice package of each interactive scene is updated according to the voice preference, and therefore the user can have more choices meeting the preference when the voice package is set.
In an embodiment, after the step of setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the method may further include the following steps:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or the optional voice packet by the updated voice packet.
The existing voice packages can be automatically updated and upgraded on line including the preset voice package or the optional voice package, and during actual implementation, the server updates the voice package updating message sent to the vehicle when the voice package is updated, if the user sets automatic updating or selects updating, the vehicle acquires the updated voice package from the server, and the updated voice package is used for replacing the corresponding preset voice package or the optional voice package, so that the content of voice playing by using the voice package is richer, and the sound is more real.
According to the voice interaction method, after the voice information of the user is received, the content to be played and the current interactive scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.
Fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an exemplary embodiment. Referring to fig. 2, the voice interaction apparatus of the present embodiment includes a memory 210 and a processor 220, where the memory 210 stores at least one program instruction, and the steps that the processor 220 implements by loading and executing the at least one program instruction include:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to the preset voice packet corresponding to the interactive scene.
In one embodiment, the processor 220 performs the step of determining the content to be played and the current interactive scene according to the voice information, including:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the voiceprint recognition result, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
The Voiceprint (Voiceprint) is a sound wave spectrum which is displayed by an electro-acoustic instrument and carries speech information, has the characteristics of specificity and relative stability, can be subjected to gender analysis or age analysis by utilizing the Voiceprint characteristics so as to confirm the user characteristics such as the gender, the age and the like of the current user, and different genders and ages can determine the preference of the user to sound, for example, children prefer cheerful sound, women prefer soft sound and the like, so that different user characteristics correspond to different interaction scenes, for example, a child conversation scene, a female conversation scene, a male conversation scene and the like.
When speech recognition is carried out, speech characteristic information of speech information is extracted in frames, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted for matching the content to be played after the content of the speech information is recognized.
In one embodiment, the processor 220 performs the step of determining the content to be played and the current interactive scene according to the voice information, including:
carrying out voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
After the content of the voice information is identified, the current interactive scene can be determined according to the keywords of the voice content, the interactive scene is not limited to weather broadcast, tour guide, financial news, scientific news and the like, for example, the keywords of the voice information of weather broadcast, nearby scenic spots introduction and financial news broadcast are respectively weather broadcast, scenic spots and financial news, and the corresponding interactive scene can be respectively weather broadcast, tour guide and financial news. As another embodiment, the current interactive scene may be determined according to the theme type of the content to be played, for example, the obtained content to be played is a scenic spot tour guide word, a weather forecast content, and a piece of financial news, and then the theme type of the content to be played may be determined as weather forecast, scenic spot introduction, and financial news, respectively, and further, the corresponding interactive scene may be determined as "weather broadcast", "tour guide", and "financial news".
In one embodiment, before the processor 220 performs the step of receiving the voice information of the user, the following steps are further performed:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-related voice packets, age-related voice packets, speed-related voice packets and preference-related voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
In an embodiment, after the processor 220 performs the step of setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the following steps are further performed:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or the optional voice packet by the updated voice packet.
The existing voice packages can be automatically updated and upgraded on line including the preset voice package or the optional voice package, and during actual implementation, the server updates the voice package updating message sent to the vehicle when the voice package is updated, if the user sets automatic updating or selects updating, the vehicle acquires the updated voice package from the server, and the updated voice package is used for replacing the corresponding preset voice package or the optional voice package, so that the content of voice playing by using the voice package is richer, and the sound is more real.
In one embodiment, the processor 220 is further configured to perform the following steps:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending the voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
The voice package selection method comprises the steps that after a voice package is set by a user each time, the voice type of a preset voice package used by the user in different interactive scenes is recorded, such as age, gender, speed and intonation, deep learning is conducted through the voice type of the preset voice package used historically, the voice preference of the user in different interactive scenes can be obtained, after the voice preference of the user is obtained, the voice package suitable for the different interactive scenes is recommended to the user according to the voice preference, or the selectable voice package of each interactive scene is updated according to the voice preference, and therefore the user can have more choices meeting the preference when the voice package is set.
For the detailed working process and steps of the processor 220 in the voice interaction apparatus of the present embodiment, please refer to the description of the embodiment shown in fig. 1, which is not described herein again.
The application also provides a vehicle, which comprises the voice interaction device.
According to the voice interaction device and the vehicle, after the voice information of the user is received, the content to be played and the current interaction scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interaction scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.
Although the present application has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application, and all changes, substitutions and alterations that fall within the spirit and scope of the application are to be understood as being included within the following description of the preferred embodiment.
Claims (10)
1. A method for voice interaction, comprising:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
2. The method of claim 1, wherein the determining the content to be played and the current interactive scene according to the voice information comprises:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
3. The method of claim 1, wherein the determining the content to be played and the current interactive scene according to the voice information comprises:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
4. The method of claim 1, wherein before receiving the voice message of the user, the method further comprises:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-associated voice packets, age-associated voice packets, speed-associated voice packets and preference-associated voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
5. The method according to claim 4, wherein after the setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the method further comprises:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or optional voice packet by using the updated voice packet.
6. The method of claim 1, further comprising:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
7. A voice interaction device comprising a processor, wherein the processor is configured to execute program instructions to perform the steps comprising:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
8. The apparatus as claimed in claim 7, wherein the processor performs the step of determining the current interactive scene and the content to be played according to the voice information, comprising:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
9. The apparatus as claimed in claim 7, wherein the processor performs the step of determining the current interactive scene and the content to be played according to the voice information, comprising:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
10. A vehicle, characterized in that it comprises a voice interaction device according to any one of claims 7 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811095748.9A CN110930998A (en) | 2018-09-19 | 2018-09-19 | Voice interaction method and device and vehicle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811095748.9A CN110930998A (en) | 2018-09-19 | 2018-09-19 | Voice interaction method and device and vehicle |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110930998A true CN110930998A (en) | 2020-03-27 |
Family
ID=69855236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811095748.9A Pending CN110930998A (en) | 2018-09-19 | 2018-09-19 | Voice interaction method and device and vehicle |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110930998A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113160832A (en) * | 2021-04-30 | 2021-07-23 | 合肥美菱物联科技有限公司 | Voice washing machine intelligent control system and method supporting voiceprint recognition |
CN113746875A (en) * | 2020-05-27 | 2021-12-03 | 百度在线网络技术(北京)有限公司 | Voice packet recommendation method, device, equipment and storage medium |
JP2022537860A (en) * | 2020-05-27 | 2022-08-31 | バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド | Voice packet recommendation method, device, electronic device and program |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002041084A (en) * | 2000-07-26 | 2002-02-08 | Victor Co Of Japan Ltd | Interactive speech processing system |
JP2004163541A (en) * | 2002-11-11 | 2004-06-10 | Mitsubishi Electric Corp | Voice response device |
JP2010078763A (en) * | 2008-09-25 | 2010-04-08 | Brother Ind Ltd | Voice processing device, voice processing program, and intercom system |
JP2012037790A (en) * | 2010-08-10 | 2012-02-23 | Toshiba Corp | Voice interaction device |
CN102426591A (en) * | 2011-10-31 | 2012-04-25 | 北京百度网讯科技有限公司 | Method and device for operating corpus used for inputting contents |
CN103236259A (en) * | 2013-03-22 | 2013-08-07 | 乐金电子研发中心(上海)有限公司 | Voice recognition processing and feedback system, voice response method |
US20150032814A1 (en) * | 2013-07-23 | 2015-01-29 | Rabt App Limited | Selecting and serving content to users from several sources |
CN105654950A (en) * | 2016-01-28 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | Self-adaptive voice feedback method and device |
US20160240195A1 (en) * | 2015-02-15 | 2016-08-18 | Lenovo (Beijing) Co., Ltd. | Information processing method and electronic device |
CN106648082A (en) * | 2016-12-09 | 2017-05-10 | 厦门快商通科技股份有限公司 | Intelligent service device capable of simulating human interactions and method |
CN108153875A (en) * | 2017-12-26 | 2018-06-12 | 广州蓝豹智能科技有限公司 | Language material processing method, device, intelligent sound box and storage medium |
US20180174577A1 (en) * | 2016-12-19 | 2018-06-21 | Microsoft Technology Licensing, Llc | Linguistic modeling using sets of base phonetics |
-
2018
- 2018-09-19 CN CN201811095748.9A patent/CN110930998A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002041084A (en) * | 2000-07-26 | 2002-02-08 | Victor Co Of Japan Ltd | Interactive speech processing system |
JP2004163541A (en) * | 2002-11-11 | 2004-06-10 | Mitsubishi Electric Corp | Voice response device |
JP2010078763A (en) * | 2008-09-25 | 2010-04-08 | Brother Ind Ltd | Voice processing device, voice processing program, and intercom system |
JP2012037790A (en) * | 2010-08-10 | 2012-02-23 | Toshiba Corp | Voice interaction device |
CN102426591A (en) * | 2011-10-31 | 2012-04-25 | 北京百度网讯科技有限公司 | Method and device for operating corpus used for inputting contents |
CN103236259A (en) * | 2013-03-22 | 2013-08-07 | 乐金电子研发中心(上海)有限公司 | Voice recognition processing and feedback system, voice response method |
US20150032814A1 (en) * | 2013-07-23 | 2015-01-29 | Rabt App Limited | Selecting and serving content to users from several sources |
US20160240195A1 (en) * | 2015-02-15 | 2016-08-18 | Lenovo (Beijing) Co., Ltd. | Information processing method and electronic device |
CN105654950A (en) * | 2016-01-28 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | Self-adaptive voice feedback method and device |
CN106648082A (en) * | 2016-12-09 | 2017-05-10 | 厦门快商通科技股份有限公司 | Intelligent service device capable of simulating human interactions and method |
US20180174577A1 (en) * | 2016-12-19 | 2018-06-21 | Microsoft Technology Licensing, Llc | Linguistic modeling using sets of base phonetics |
CN108153875A (en) * | 2017-12-26 | 2018-06-12 | 广州蓝豹智能科技有限公司 | Language material processing method, device, intelligent sound box and storage medium |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113746875A (en) * | 2020-05-27 | 2021-12-03 | 百度在线网络技术(北京)有限公司 | Voice packet recommendation method, device, equipment and storage medium |
CN113746875B (en) * | 2020-05-27 | 2022-08-02 | 百度在线网络技术(北京)有限公司 | Voice packet recommendation method, device, equipment and storage medium |
JP2022537860A (en) * | 2020-05-27 | 2022-08-31 | バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド | Voice packet recommendation method, device, electronic device and program |
JP7337172B2 (en) | 2020-05-27 | 2023-09-01 | バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド | Voice packet recommendation method, device, electronic device and program |
CN113160832A (en) * | 2021-04-30 | 2021-07-23 | 合肥美菱物联科技有限公司 | Voice washing machine intelligent control system and method supporting voiceprint recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949783B (en) | Song synthesis method and system | |
CN108962217A (en) | Phoneme synthesizing method and relevant device | |
CN107895578A (en) | Voice interactive method and device | |
CN107657017A (en) | Method and apparatus for providing voice service | |
US20110144997A1 (en) | Voice synthesis model generation device, voice synthesis model generation system, communication terminal device and method for generating voice synthesis model | |
CN108962219A (en) | Method and apparatus for handling text | |
CN111667812A (en) | Voice synthesis method, device, equipment and storage medium | |
JP2020034895A (en) | Responding method and device | |
CN110930998A (en) | Voice interaction method and device and vehicle | |
CN110475170A (en) | Control method, device, mobile terminal and the storage medium of earphone broadcast state | |
CN104471512A (en) | Content customization | |
CN107247769A (en) | Method for ordering song by voice, device, terminal and storage medium | |
CN108415942B (en) | Personalized teaching and singing scoring two-dimensional code generation method, device and system | |
CN108877803B (en) | Method and apparatus for presenting information | |
CN109036372A (en) | A kind of voice broadcast method, apparatus and system | |
CN110930999A (en) | Voice interaction method and device and vehicle | |
WO2018038235A1 (en) | Auditory training device, auditory training method, and program | |
CN108804667A (en) | The method and apparatus of information for rendering | |
US11790913B2 (en) | Information providing method, apparatus, and storage medium, that transmit related information to a remote terminal based on identification information received from the remote terminal | |
CN116403583A (en) | Voice data processing method and device, nonvolatile storage medium and vehicle | |
CN111861666A (en) | Vehicle information interaction method and device | |
CN114120943B (en) | Virtual concert processing method, device, equipment and storage medium | |
CN106653003A (en) | Voice recognition method and device | |
CN111427444A (en) | Control method and device of intelligent device | |
KR20190070682A (en) | System and method for constructing and providing lecture contents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB03 | Change of inventor or designer information |
Inventor after: Ying Zhenkai Inventor before: Ying Yilun |
|
CB03 | Change of inventor or designer information | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Room 208, building 4, 1411 Yecheng Road, Jiading District, Shanghai, 201821 Applicant after: Botai vehicle networking technology (Shanghai) Co.,Ltd. Address before: Room 208, building 4, 1411 Yecheng Road, Jiading District, Shanghai, 201821 Applicant before: SHANGHAI PATEO ELECTRONIC EQUIPMENT MANUFACTURING Co.,Ltd. |
|
CB02 | Change of applicant information | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200327 |
|
WD01 | Invention patent application deemed withdrawn after publication |