CN110930998A - Voice interaction method and device and vehicle - Google Patents

Voice interaction method and device and vehicle Download PDF

Info

Publication number
CN110930998A
CN110930998A CN201811095748.9A CN201811095748A CN110930998A CN 110930998 A CN110930998 A CN 110930998A CN 201811095748 A CN201811095748 A CN 201811095748A CN 110930998 A CN110930998 A CN 110930998A
Authority
CN
China
Prior art keywords
voice
content
played
determining
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811095748.9A
Other languages
Chinese (zh)
Inventor
应宜伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Original Assignee
Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pateo Electronic Equipment Manufacturing Co Ltd filed Critical Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Priority to CN201811095748.9A priority Critical patent/CN110930998A/en
Publication of CN110930998A publication Critical patent/CN110930998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a voice interaction method, a voice interaction device and a vehicle, wherein the voice interaction method comprises the following steps: receiving voice information of a user; determining the content to be played and the current interactive scene according to the voice information; and playing the content to be played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.

Description

Voice interaction method and device and vehicle
Technical Field
The application relates to the technical field of voice interaction, in particular to a voice interaction method, a voice interaction device and a vehicle.
Background
With the rapid development of automobile industry and the improvement of living standard of people in China, the quantity of automobiles owned by families of residents is rapidly increased, and the automobiles gradually become one of indispensable transportation means in the life of people.
The progress of intelligent science and technology makes the user experience the demand to the intellectuality in the car more and more obvious, and more mobile units have also possessed intelligent networking function, and mobile units such as on-vehicle pronunciation assistant, on-vehicle air purifier, on-vehicle portable power source have evolved in the development situation that the on-vehicle intelligent equipment appears the flowers and all together to satisfy the demand that the user improves constantly in the driving process. The vehicle-mounted voice assistant is a vehicle-mounted intelligent product integrating functions of intelligent voice, one-key call, network music, radio station customization, voice navigation and the like, and can provide better driving experience for users.
However, the existing vehicle-mounted voice assistant cannot interact with the user by adopting different voices according to different scenes, and the used voice is too monotonous, so that the user is unwilling to listen more or use the voice, and the user experience is poor.
Disclosure of Invention
An object of the present application is to provide a voice interaction method, device and vehicle, which can solve the above technical problems, and can interact with a user by using different sounds according to different interaction scenes, so that the user experience is good.
In order to solve the above technical problem, the present application provides a voice interaction method, including:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
Wherein, the determining the content to be played and the current interactive scene according to the voice information comprises:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
Wherein, the determining the content to be played and the current interactive scene according to the voice information comprises:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
Before receiving the voice information of the user, the method further includes:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-associated voice packets, age-associated voice packets, speed-associated voice packets and preference-associated voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
After the voice packet selected by the user is set as the preset voice packet corresponding to the interactive scene, the method further includes:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or optional voice packet by using the updated voice packet.
Wherein the method further comprises:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
The present application further provides a voice interaction device, comprising a processor, wherein the processor is configured to execute the program instructions to implement the steps comprising:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
Wherein, the processor executes the step of determining the current interactive scene and the content to be played according to the voice information, and the step comprises the following steps:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
Wherein, the processor executes the step of determining the current interactive scene and the content to be played according to the voice information, and the step comprises the following steps:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
The application also provides a vehicle, which comprises the voice interaction device.
According to the voice interaction method, the voice interaction device and the vehicle, after the voice information of the user is received, the content to be played and the current interactive scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.
The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical means of the present application more clearly understood, the present application may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present application more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
FIG. 1 is a flow diagram illustrating a method of voice interaction, according to an example embodiment.
Fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an exemplary embodiment.
Detailed Description
To further illustrate the technical means and effects of the present application for achieving the intended application, the following detailed description is provided with specific embodiments, methods, steps, structures, features and effects of the voice interaction method, device and vehicle according to the present application in combination with the accompanying drawings and preferred embodiments.
The foregoing and other technical matters, features and effects of the present application will be apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings. While the present application is susceptible to embodiment and specific details, specific reference will now be made in detail to the present disclosure for the purpose of illustrating the general principles of the invention.
FIG. 1 is a flow diagram illustrating a method of voice interaction, according to an example embodiment. Referring to fig. 1, the voice interaction method of the present embodiment includes, but is not limited to, the following steps:
step 110, receiving the voice information of the user.
When a user needs to interact with the voice assistant, a voice instruction is spoken in a voice manner, such as "playing a weather forecast", "introducing a nearby scenery", and the like, and the vehicle receives voice information of the user and performs voice recognition to acquire voice content of the voice information, which is usually converted into corresponding text information.
And step 120, determining the content to be played and the current interactive scene according to the voice information.
The content to be played may be a segment of speech obtained according to the voice information, such as a tourist guide, a weather forecast content, a segment of news, and the like, and the content to be played may also be a preset reply content obtained according to the voice information, such as "what kind of news you want to listen to" and "what is the local weather forecast". The interactive scene is a scene which is determined by the user characteristics of the current user or the theme of the current content to be played and accords with the mood of the user when the interactive scene interacts with the user, such as an age scene, a gender scene, an interactive theme scene and the like.
In one embodiment, determining the content to be played and the current interactive scene according to the voice information includes:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the voiceprint recognition result, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
The Voiceprint (Voiceprint) is a sound wave spectrum which is displayed by an electro-acoustic instrument and carries speech information, has the characteristics of specificity and relative stability, can be subjected to gender analysis or age analysis by utilizing the Voiceprint characteristics so as to confirm the user characteristics such as the gender, the age and the like of the current user, and different genders and ages can determine the preference of the user to sound, for example, children prefer cheerful sound, women prefer soft sound and the like, so that different user characteristics correspond to different interaction scenes, for example, a child conversation scene, a female conversation scene, a male conversation scene and the like.
When speech recognition is carried out, speech characteristic information of speech information is extracted in frames, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted for matching the content to be played after the content of the speech information is recognized.
In one embodiment, determining the content to be played and the current interactive scene according to the voice information includes:
carrying out voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
When speech recognition is carried out, speech feature information of speech information is extracted in a frame mode, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted to match the content to be played after the content of the speech information is recognized.
After the content of the voice information is identified, the current interactive scene can be determined according to the keywords of the voice content, the interactive scene is not limited to weather broadcast, tour guide, financial news, scientific news and the like, for example, the keywords of the voice information of weather broadcast, nearby scenic spots introduced and financial news played are respectively weather broadcast, scenic spots and financial news, and the corresponding interactive scene can be respectively weather broadcast, tour guide and financial news. As another embodiment, the current interactive scene may be determined according to the theme type of the content to be played, for example, the obtained content to be played is a scenic spot tour guide word, a weather forecast content, and a piece of financial news, and then the theme type of the content to be played may be determined as weather forecast, scenic spot introduction, and financial news, respectively, and further, the corresponding interactive scene may be determined as "weather broadcast", "tour guide", and "financial news".
And step 130, playing the content to be played according to a preset voice packet corresponding to the interactive scene.
The preset voice packets can be default voice packets of each interactive scene or voice packets set by a user aiming at different interactive scenes in advance, the default voice packets of each interactive scene can be multiple, for example, multiple voice packets such as male voice, female voice, young voice and the like, and when the voice packets are played, the voice packets can be randomly used or automatically selected according to the content to be played. In this embodiment, the voice packet is a real-person pronunciation voice packet, i.e., a voice library (also called a speaker), which is a warehouse for storing sounds, and the voice library usually records sounds by real persons according to words or sentence groups, and then stores the sounds in a database in a centralized manner.
When the preset voice packet is the default voice packet of each interactive scene, default setting is carried out according to general rules, for example, sound of a teacher is used for conversation of children, sound of a weather forecaster on a television is used for weather forecast, sound of a tour guide is used for historical landscape introduction, sound of high-tech fast pace is used for scientific and technological news, real person pronunciation of finance and economic is used for investment news, and the like.
When the preset voice packet is a voice packet that is preset by the user for different interactive scenes, in an embodiment, before receiving the voice information of the user in step 110, the method may further include the following steps:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-related voice packets, age-related voice packets, speed-related voice packets and preference-related voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
When a user needs to set a voice packet aiming at a certain interactive scene, a voice packet setting instruction is triggered to enter a setting interface through an operation interface of a voice assistant, the interactive scene to be set is selected, the interface displays the selectable voice packet of the currently selected interactive scene, the selectable voice packet comprises at least one of a gender-related voice packet, an age-related voice packet, a speed-related voice packet and a preference-related voice packet, the gender-related voice packet is a voice packet of real person pronunciation with different genders, the age-related voice is a voice packet of real person pronunciation with different age groups, the speed-related voice is a voice packet of real person pronunciation with different speed, and the preference-related voice packet is a voice packet recommended according to the preference of the user on the voice type, so that different voice packets can be selected under each interactive scene, for example, financial news can be male, financial news and financial news, Female, young, etc. to select the pronunciation of various real persons. When a user selects a certain selectable voice packet, the voice packet can be audited in a voice playing mode, if audition is satisfactory, the corresponding voice packet can be selected, and the corresponding voice packet is set as a preset voice packet of the currently selected interactive scene.
For obtaining the user preferences, in an embodiment, the voice interaction method of the present application may further include the following steps:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending the voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
The voice package selection method comprises the steps that after a voice package is set by a user each time, the voice type of a preset voice package used by the user in different interactive scenes is recorded, such as age, gender, speed and intonation, deep learning is conducted through the voice type of the preset voice package used historically, the voice preference of the user in different interactive scenes can be obtained, after the voice preference of the user is obtained, the voice package suitable for the different interactive scenes is recommended to the user according to the voice preference, or the selectable voice package of each interactive scene is updated according to the voice preference, and therefore the user can have more choices meeting the preference when the voice package is set.
In an embodiment, after the step of setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the method may further include the following steps:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or the optional voice packet by the updated voice packet.
The existing voice packages can be automatically updated and upgraded on line including the preset voice package or the optional voice package, and during actual implementation, the server updates the voice package updating message sent to the vehicle when the voice package is updated, if the user sets automatic updating or selects updating, the vehicle acquires the updated voice package from the server, and the updated voice package is used for replacing the corresponding preset voice package or the optional voice package, so that the content of voice playing by using the voice package is richer, and the sound is more real.
According to the voice interaction method, after the voice information of the user is received, the content to be played and the current interactive scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.
Fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an exemplary embodiment. Referring to fig. 2, the voice interaction apparatus of the present embodiment includes a memory 210 and a processor 220, where the memory 210 stores at least one program instruction, and the steps that the processor 220 implements by loading and executing the at least one program instruction include:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to the preset voice packet corresponding to the interactive scene.
In one embodiment, the processor 220 performs the step of determining the content to be played and the current interactive scene according to the voice information, including:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the voiceprint recognition result, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
The Voiceprint (Voiceprint) is a sound wave spectrum which is displayed by an electro-acoustic instrument and carries speech information, has the characteristics of specificity and relative stability, can be subjected to gender analysis or age analysis by utilizing the Voiceprint characteristics so as to confirm the user characteristics such as the gender, the age and the like of the current user, and different genders and ages can determine the preference of the user to sound, for example, children prefer cheerful sound, women prefer soft sound and the like, so that different user characteristics correspond to different interaction scenes, for example, a child conversation scene, a female conversation scene, a male conversation scene and the like.
When speech recognition is carried out, speech characteristic information of speech information is extracted in frames, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted for matching the content to be played after the content of the speech information is recognized.
In one embodiment, the processor 220 performs the step of determining the content to be played and the current interactive scene according to the voice information, including:
carrying out voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
After the content of the voice information is identified, the current interactive scene can be determined according to the keywords of the voice content, the interactive scene is not limited to weather broadcast, tour guide, financial news, scientific news and the like, for example, the keywords of the voice information of weather broadcast, nearby scenic spots introduction and financial news broadcast are respectively weather broadcast, scenic spots and financial news, and the corresponding interactive scene can be respectively weather broadcast, tour guide and financial news. As another embodiment, the current interactive scene may be determined according to the theme type of the content to be played, for example, the obtained content to be played is a scenic spot tour guide word, a weather forecast content, and a piece of financial news, and then the theme type of the content to be played may be determined as weather forecast, scenic spot introduction, and financial news, respectively, and further, the corresponding interactive scene may be determined as "weather broadcast", "tour guide", and "financial news".
In one embodiment, before the processor 220 performs the step of receiving the voice information of the user, the following steps are further performed:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-related voice packets, age-related voice packets, speed-related voice packets and preference-related voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
In an embodiment, after the processor 220 performs the step of setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the following steps are further performed:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or the optional voice packet by the updated voice packet.
The existing voice packages can be automatically updated and upgraded on line including the preset voice package or the optional voice package, and during actual implementation, the server updates the voice package updating message sent to the vehicle when the voice package is updated, if the user sets automatic updating or selects updating, the vehicle acquires the updated voice package from the server, and the updated voice package is used for replacing the corresponding preset voice package or the optional voice package, so that the content of voice playing by using the voice package is richer, and the sound is more real.
In one embodiment, the processor 220 is further configured to perform the following steps:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending the voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
The voice package selection method comprises the steps that after a voice package is set by a user each time, the voice type of a preset voice package used by the user in different interactive scenes is recorded, such as age, gender, speed and intonation, deep learning is conducted through the voice type of the preset voice package used historically, the voice preference of the user in different interactive scenes can be obtained, after the voice preference of the user is obtained, the voice package suitable for the different interactive scenes is recommended to the user according to the voice preference, or the selectable voice package of each interactive scene is updated according to the voice preference, and therefore the user can have more choices meeting the preference when the voice package is set.
For the detailed working process and steps of the processor 220 in the voice interaction apparatus of the present embodiment, please refer to the description of the embodiment shown in fig. 1, which is not described herein again.
The application also provides a vehicle, which comprises the voice interaction device.
According to the voice interaction device and the vehicle, after the voice information of the user is received, the content to be played and the current interaction scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interaction scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.
Although the present application has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application, and all changes, substitutions and alterations that fall within the spirit and scope of the application are to be understood as being included within the following description of the preferred embodiment.

Claims (10)

1. A method for voice interaction, comprising:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
2. The method of claim 1, wherein the determining the content to be played and the current interactive scene according to the voice information comprises:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
3. The method of claim 1, wherein the determining the content to be played and the current interactive scene according to the voice information comprises:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
4. The method of claim 1, wherein before receiving the voice message of the user, the method further comprises:
when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-associated voice packets, age-associated voice packets, speed-associated voice packets and preference-associated voice packets;
and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.
5. The method according to claim 4, wherein after the setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the method further comprises:
when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;
and replacing the corresponding preset voice packet or optional voice packet by using the updated voice packet.
6. The method of claim 1, further comprising:
deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;
and recommending voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.
7. A voice interaction device comprising a processor, wherein the processor is configured to execute program instructions to perform the steps comprising:
receiving voice information of a user;
determining the content to be played and the current interactive scene according to the voice information;
and playing the content to be played according to a preset voice packet corresponding to the interactive scene.
8. The apparatus as claimed in claim 7, wherein the processor performs the step of determining the current interactive scene and the content to be played according to the voice information, comprising:
respectively carrying out voiceprint recognition and voice recognition on the voice information;
determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;
determining a current interactive scene according to the user characteristics;
and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.
9. The apparatus as claimed in claim 7, wherein the processor performs the step of determining the current interactive scene and the content to be played according to the voice information, comprising:
performing voice recognition on the voice information;
determining the content to be played according to the voice content obtained by voice recognition;
and determining the current interactive scene according to the voice content or the theme type of the content to be played.
10. A vehicle, characterized in that it comprises a voice interaction device according to any one of claims 7 to 9.
CN201811095748.9A 2018-09-19 2018-09-19 Voice interaction method and device and vehicle Pending CN110930998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811095748.9A CN110930998A (en) 2018-09-19 2018-09-19 Voice interaction method and device and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811095748.9A CN110930998A (en) 2018-09-19 2018-09-19 Voice interaction method and device and vehicle

Publications (1)

Publication Number Publication Date
CN110930998A true CN110930998A (en) 2020-03-27

Family

ID=69855236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811095748.9A Pending CN110930998A (en) 2018-09-19 2018-09-19 Voice interaction method and device and vehicle

Country Status (1)

Country Link
CN (1) CN110930998A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160832A (en) * 2021-04-30 2021-07-23 合肥美菱物联科技有限公司 Voice washing machine intelligent control system and method supporting voiceprint recognition
CN113746875A (en) * 2020-05-27 2021-12-03 百度在线网络技术(北京)有限公司 Voice packet recommendation method, device, equipment and storage medium
JP2022537860A (en) * 2020-05-27 2022-08-31 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Voice packet recommendation method, device, electronic device and program

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002041084A (en) * 2000-07-26 2002-02-08 Victor Co Of Japan Ltd Interactive speech processing system
JP2004163541A (en) * 2002-11-11 2004-06-10 Mitsubishi Electric Corp Voice response device
JP2010078763A (en) * 2008-09-25 2010-04-08 Brother Ind Ltd Voice processing device, voice processing program, and intercom system
JP2012037790A (en) * 2010-08-10 2012-02-23 Toshiba Corp Voice interaction device
CN102426591A (en) * 2011-10-31 2012-04-25 北京百度网讯科技有限公司 Method and device for operating corpus used for inputting contents
CN103236259A (en) * 2013-03-22 2013-08-07 乐金电子研发中心(上海)有限公司 Voice recognition processing and feedback system, voice response method
US20150032814A1 (en) * 2013-07-23 2015-01-29 Rabt App Limited Selecting and serving content to users from several sources
CN105654950A (en) * 2016-01-28 2016-06-08 百度在线网络技术(北京)有限公司 Self-adaptive voice feedback method and device
US20160240195A1 (en) * 2015-02-15 2016-08-18 Lenovo (Beijing) Co., Ltd. Information processing method and electronic device
CN106648082A (en) * 2016-12-09 2017-05-10 厦门快商通科技股份有限公司 Intelligent service device capable of simulating human interactions and method
CN108153875A (en) * 2017-12-26 2018-06-12 广州蓝豹智能科技有限公司 Language material processing method, device, intelligent sound box and storage medium
US20180174577A1 (en) * 2016-12-19 2018-06-21 Microsoft Technology Licensing, Llc Linguistic modeling using sets of base phonetics

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002041084A (en) * 2000-07-26 2002-02-08 Victor Co Of Japan Ltd Interactive speech processing system
JP2004163541A (en) * 2002-11-11 2004-06-10 Mitsubishi Electric Corp Voice response device
JP2010078763A (en) * 2008-09-25 2010-04-08 Brother Ind Ltd Voice processing device, voice processing program, and intercom system
JP2012037790A (en) * 2010-08-10 2012-02-23 Toshiba Corp Voice interaction device
CN102426591A (en) * 2011-10-31 2012-04-25 北京百度网讯科技有限公司 Method and device for operating corpus used for inputting contents
CN103236259A (en) * 2013-03-22 2013-08-07 乐金电子研发中心(上海)有限公司 Voice recognition processing and feedback system, voice response method
US20150032814A1 (en) * 2013-07-23 2015-01-29 Rabt App Limited Selecting and serving content to users from several sources
US20160240195A1 (en) * 2015-02-15 2016-08-18 Lenovo (Beijing) Co., Ltd. Information processing method and electronic device
CN105654950A (en) * 2016-01-28 2016-06-08 百度在线网络技术(北京)有限公司 Self-adaptive voice feedback method and device
CN106648082A (en) * 2016-12-09 2017-05-10 厦门快商通科技股份有限公司 Intelligent service device capable of simulating human interactions and method
US20180174577A1 (en) * 2016-12-19 2018-06-21 Microsoft Technology Licensing, Llc Linguistic modeling using sets of base phonetics
CN108153875A (en) * 2017-12-26 2018-06-12 广州蓝豹智能科技有限公司 Language material processing method, device, intelligent sound box and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113746875A (en) * 2020-05-27 2021-12-03 百度在线网络技术(北京)有限公司 Voice packet recommendation method, device, equipment and storage medium
CN113746875B (en) * 2020-05-27 2022-08-02 百度在线网络技术(北京)有限公司 Voice packet recommendation method, device, equipment and storage medium
JP2022537860A (en) * 2020-05-27 2022-08-31 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Voice packet recommendation method, device, electronic device and program
JP7337172B2 (en) 2020-05-27 2023-09-01 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Voice packet recommendation method, device, electronic device and program
CN113160832A (en) * 2021-04-30 2021-07-23 合肥美菱物联科技有限公司 Voice washing machine intelligent control system and method supporting voiceprint recognition

Similar Documents

Publication Publication Date Title
CN109949783B (en) Song synthesis method and system
CN107833574A (en) Method and apparatus for providing voice service
CN108962217A (en) Phoneme synthesizing method and relevant device
CN107895578A (en) Voice interactive method and device
CN107657017A (en) Method and apparatus for providing voice service
US20110144997A1 (en) Voice synthesis model generation device, voice synthesis model generation system, communication terminal device and method for generating voice synthesis model
CN108962219A (en) Method and apparatus for handling text
CN111667812A (en) Voice synthesis method, device, equipment and storage medium
JP2020034895A (en) Responding method and device
CN110930998A (en) Voice interaction method and device and vehicle
CN110475170A (en) Control method, device, mobile terminal and the storage medium of earphone broadcast state
CN104471512A (en) Content customization
CN107247769A (en) Method for ordering song by voice, device, terminal and storage medium
CN108415942B (en) Personalized teaching and singing scoring two-dimensional code generation method, device and system
CN108877803B (en) Method and apparatus for presenting information
CN109036372A (en) A kind of voice broadcast method, apparatus and system
CN110930999A (en) Voice interaction method and device and vehicle
WO2018038235A1 (en) Auditory training device, auditory training method, and program
CN108804667A (en) The method and apparatus of information for rendering
CN114694651A (en) Intelligent terminal control method and device, electronic equipment and storage medium
CN107767862B (en) Voice data processing method, system and storage medium
US11790913B2 (en) Information providing method, apparatus, and storage medium, that transmit related information to a remote terminal based on identification information received from the remote terminal
CN116403583A (en) Voice data processing method and device, nonvolatile storage medium and vehicle
CN111861666A (en) Vehicle information interaction method and device
CN114120943B (en) Virtual concert processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information

Inventor after: Ying Zhenkai

Inventor before: Ying Yilun

CB03 Change of inventor or designer information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 208, building 4, 1411 Yecheng Road, Jiading District, Shanghai, 201821

Applicant after: Botai vehicle networking technology (Shanghai) Co.,Ltd.

Address before: Room 208, building 4, 1411 Yecheng Road, Jiading District, Shanghai, 201821

Applicant before: SHANGHAI PATEO ELECTRONIC EQUIPMENT MANUFACTURING Co.,Ltd.

CB02 Change of applicant information
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200327

WD01 Invention patent application deemed withdrawn after publication