CN110930998A

CN110930998A - Voice interaction method and device and vehicle

Info

Publication number: CN110930998A
Application number: CN201811095748.9A
Authority: CN
Inventors: 应宜伦
Original assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Current assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Priority date: 2018-09-19
Filing date: 2018-09-19
Publication date: 2020-03-27

Abstract

The application relates to a voice interaction method, a voice interaction device and a vehicle, wherein the voice interaction method comprises the following steps: receiving voice information of a user; determining the content to be played and the current interactive scene according to the voice information; and playing the content to be played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.

Description

Voice interaction method and device and vehicle

Technical Field

The application relates to the technical field of voice interaction, in particular to a voice interaction method, a voice interaction device and a vehicle.

Background

With the rapid development of automobile industry and the improvement of living standard of people in China, the quantity of automobiles owned by families of residents is rapidly increased, and the automobiles gradually become one of indispensable transportation means in the life of people.

The progress of intelligent science and technology makes the user experience the demand to the intellectuality in the car more and more obvious, and more mobile units have also possessed intelligent networking function, and mobile units such as on-vehicle pronunciation assistant, on-vehicle air purifier, on-vehicle portable power source have evolved in the development situation that the on-vehicle intelligent equipment appears the flowers and all together to satisfy the demand that the user improves constantly in the driving process. The vehicle-mounted voice assistant is a vehicle-mounted intelligent product integrating functions of intelligent voice, one-key call, network music, radio station customization, voice navigation and the like, and can provide better driving experience for users.

However, the existing vehicle-mounted voice assistant cannot interact with the user by adopting different voices according to different scenes, and the used voice is too monotonous, so that the user is unwilling to listen more or use the voice, and the user experience is poor.

Disclosure of Invention

An object of the present application is to provide a voice interaction method, device and vehicle, which can solve the above technical problems, and can interact with a user by using different sounds according to different interaction scenes, so that the user experience is good.

In order to solve the above technical problem, the present application provides a voice interaction method, including:

receiving voice information of a user;

determining the content to be played and the current interactive scene according to the voice information;

and playing the content to be played according to a preset voice packet corresponding to the interactive scene.

Wherein, the determining the content to be played and the current interactive scene according to the voice information comprises:

respectively carrying out voiceprint recognition and voice recognition on the voice information;

determining user characteristics according to the result of voiceprint recognition, wherein the user characteristics comprise at least one of age and gender;

determining a current interactive scene according to the user characteristics;

and determining the content to be played corresponding to the voice information according to the voice content obtained by voice recognition.

performing voice recognition on the voice information;

determining the content to be played according to the voice content obtained by voice recognition;

and determining the current interactive scene according to the voice content or the theme type of the content to be played.

Before receiving the voice information of the user, the method further includes:

when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-associated voice packets, age-associated voice packets, speed-associated voice packets and preference-associated voice packets;

and setting the voice packet selected by the user as a preset voice packet corresponding to the interactive scene.

After the voice packet selected by the user is set as the preset voice packet corresponding to the interactive scene, the method further includes:

when a voice packet updating message sent by a server is received, acquiring an updated voice packet from the server;

and replacing the corresponding preset voice packet or optional voice packet by using the updated voice packet.

Wherein the method further comprises:

deep learning is carried out on the sound types of the preset voice packets used historically so as to obtain the sound preferences of the user in different interactive scenes;

and recommending voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.

The present application further provides a voice interaction device, comprising a processor, wherein the processor is configured to execute the program instructions to implement the steps comprising:

receiving voice information of a user;

Wherein, the processor executes the step of determining the current interactive scene and the content to be played according to the voice information, and the step comprises the following steps:

determining a current interactive scene according to the user characteristics;

performing voice recognition on the voice information;

The application also provides a vehicle, which comprises the voice interaction device.

According to the voice interaction method, the voice interaction device and the vehicle, after the voice information of the user is received, the content to be played and the current interactive scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical means of the present application more clearly understood, the present application may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present application more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow diagram illustrating a method of voice interaction, according to an example embodiment.

Fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an exemplary embodiment.

Detailed Description

To further illustrate the technical means and effects of the present application for achieving the intended application, the following detailed description is provided with specific embodiments, methods, steps, structures, features and effects of the voice interaction method, device and vehicle according to the present application in combination with the accompanying drawings and preferred embodiments.

The foregoing and other technical matters, features and effects of the present application will be apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings. While the present application is susceptible to embodiment and specific details, specific reference will now be made in detail to the present disclosure for the purpose of illustrating the general principles of the invention.

FIG. 1 is a flow diagram illustrating a method of voice interaction, according to an example embodiment. Referring to fig. 1, the voice interaction method of the present embodiment includes, but is not limited to, the following steps:

step 110, receiving the voice information of the user.

When a user needs to interact with the voice assistant, a voice instruction is spoken in a voice manner, such as "playing a weather forecast", "introducing a nearby scenery", and the like, and the vehicle receives voice information of the user and performs voice recognition to acquire voice content of the voice information, which is usually converted into corresponding text information.

And step 120, determining the content to be played and the current interactive scene according to the voice information.

The content to be played may be a segment of speech obtained according to the voice information, such as a tourist guide, a weather forecast content, a segment of news, and the like, and the content to be played may also be a preset reply content obtained according to the voice information, such as "what kind of news you want to listen to" and "what is the local weather forecast". The interactive scene is a scene which is determined by the user characteristics of the current user or the theme of the current content to be played and accords with the mood of the user when the interactive scene interacts with the user, such as an age scene, a gender scene, an interactive theme scene and the like.

In one embodiment, determining the content to be played and the current interactive scene according to the voice information includes:

determining user characteristics according to the voiceprint recognition result, wherein the user characteristics comprise at least one of age and gender;

determining a current interactive scene according to the user characteristics;

The Voiceprint (Voiceprint) is a sound wave spectrum which is displayed by an electro-acoustic instrument and carries speech information, has the characteristics of specificity and relative stability, can be subjected to gender analysis or age analysis by utilizing the Voiceprint characteristics so as to confirm the user characteristics such as the gender, the age and the like of the current user, and different genders and ages can determine the preference of the user to sound, for example, children prefer cheerful sound, women prefer soft sound and the like, so that different user characteristics correspond to different interaction scenes, for example, a child conversation scene, a female conversation scene, a male conversation scene and the like.

When speech recognition is carried out, speech characteristic information of speech information is extracted in frames, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted for matching the content to be played after the content of the speech information is recognized.

carrying out voice recognition on the voice information;

When speech recognition is carried out, speech feature information of speech information is extracted in a frame mode, a recognition result, namely speech content of the speech information, is generated according to the speech information, then semantic analysis is carried out according to the content of the speech information, effective knowledge is extracted from sentences of the recognition result, the meaning of the speech information is deduced according to information such as syntax structures of sentences and word meanings of words in the sentences, so that semantic analysis results are obtained, corresponding content to be played can be obtained according to the semantic analysis results, and for simpler speech information, keywords of the content can be directly extracted to match the content to be played after the content of the speech information is recognized.

After the content of the voice information is identified, the current interactive scene can be determined according to the keywords of the voice content, the interactive scene is not limited to weather broadcast, tour guide, financial news, scientific news and the like, for example, the keywords of the voice information of weather broadcast, nearby scenic spots introduced and financial news played are respectively weather broadcast, scenic spots and financial news, and the corresponding interactive scene can be respectively weather broadcast, tour guide and financial news. As another embodiment, the current interactive scene may be determined according to the theme type of the content to be played, for example, the obtained content to be played is a scenic spot tour guide word, a weather forecast content, and a piece of financial news, and then the theme type of the content to be played may be determined as weather forecast, scenic spot introduction, and financial news, respectively, and further, the corresponding interactive scene may be determined as "weather broadcast", "tour guide", and "financial news".

And step 130, playing the content to be played according to a preset voice packet corresponding to the interactive scene.

The preset voice packets can be default voice packets of each interactive scene or voice packets set by a user aiming at different interactive scenes in advance, the default voice packets of each interactive scene can be multiple, for example, multiple voice packets such as male voice, female voice, young voice and the like, and when the voice packets are played, the voice packets can be randomly used or automatically selected according to the content to be played. In this embodiment, the voice packet is a real-person pronunciation voice packet, i.e., a voice library (also called a speaker), which is a warehouse for storing sounds, and the voice library usually records sounds by real persons according to words or sentence groups, and then stores the sounds in a database in a centralized manner.

When the preset voice packet is the default voice packet of each interactive scene, default setting is carried out according to general rules, for example, sound of a teacher is used for conversation of children, sound of a weather forecaster on a television is used for weather forecast, sound of a tour guide is used for historical landscape introduction, sound of high-tech fast pace is used for scientific and technological news, real person pronunciation of finance and economic is used for investment news, and the like.

When the preset voice packet is a voice packet that is preset by the user for different interactive scenes, in an embodiment, before receiving the voice information of the user in step 110, the method may further include the following steps:

when a voice packet setting instruction is received, displaying selectable voice packets of each interactive scene, wherein the selectable voice packets comprise at least one of gender-related voice packets, age-related voice packets, speed-related voice packets and preference-related voice packets;

When a user needs to set a voice packet aiming at a certain interactive scene, a voice packet setting instruction is triggered to enter a setting interface through an operation interface of a voice assistant, the interactive scene to be set is selected, the interface displays the selectable voice packet of the currently selected interactive scene, the selectable voice packet comprises at least one of a gender-related voice packet, an age-related voice packet, a speed-related voice packet and a preference-related voice packet, the gender-related voice packet is a voice packet of real person pronunciation with different genders, the age-related voice is a voice packet of real person pronunciation with different age groups, the speed-related voice is a voice packet of real person pronunciation with different speed, and the preference-related voice packet is a voice packet recommended according to the preference of the user on the voice type, so that different voice packets can be selected under each interactive scene, for example, financial news can be male, financial news and financial news, Female, young, etc. to select the pronunciation of various real persons. When a user selects a certain selectable voice packet, the voice packet can be audited in a voice playing mode, if audition is satisfactory, the corresponding voice packet can be selected, and the corresponding voice packet is set as a preset voice packet of the currently selected interactive scene.

For obtaining the user preferences, in an embodiment, the voice interaction method of the present application may further include the following steps:

and recommending the voice packets to the user according to the voice preference, or updating the selectable voice packets of each interactive scene according to the voice preference.

The voice package selection method comprises the steps that after a voice package is set by a user each time, the voice type of a preset voice package used by the user in different interactive scenes is recorded, such as age, gender, speed and intonation, deep learning is conducted through the voice type of the preset voice package used historically, the voice preference of the user in different interactive scenes can be obtained, after the voice preference of the user is obtained, the voice package suitable for the different interactive scenes is recommended to the user according to the voice preference, or the selectable voice package of each interactive scene is updated according to the voice preference, and therefore the user can have more choices meeting the preference when the voice package is set.

In an embodiment, after the step of setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the method may further include the following steps:

and replacing the corresponding preset voice packet or the optional voice packet by the updated voice packet.

The existing voice packages can be automatically updated and upgraded on line including the preset voice package or the optional voice package, and during actual implementation, the server updates the voice package updating message sent to the vehicle when the voice package is updated, if the user sets automatic updating or selects updating, the vehicle acquires the updated voice package from the server, and the updated voice package is used for replacing the corresponding preset voice package or the optional voice package, so that the content of voice playing by using the voice package is richer, and the sound is more real.

According to the voice interaction method, after the voice information of the user is received, the content to be played and the current interactive scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interactive scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.

Fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an exemplary embodiment. Referring to fig. 2, the voice interaction apparatus of the present embodiment includes a memory 210 and a processor 220, where the memory 210 stores at least one program instruction, and the steps that the processor 220 implements by loading and executing the at least one program instruction include:

receiving voice information of a user;

and playing the content to be played according to the preset voice packet corresponding to the interactive scene.

In one embodiment, the processor 220 performs the step of determining the content to be played and the current interactive scene according to the voice information, including:

determining a current interactive scene according to the user characteristics;

carrying out voice recognition on the voice information;

After the content of the voice information is identified, the current interactive scene can be determined according to the keywords of the voice content, the interactive scene is not limited to weather broadcast, tour guide, financial news, scientific news and the like, for example, the keywords of the voice information of weather broadcast, nearby scenic spots introduction and financial news broadcast are respectively weather broadcast, scenic spots and financial news, and the corresponding interactive scene can be respectively weather broadcast, tour guide and financial news. As another embodiment, the current interactive scene may be determined according to the theme type of the content to be played, for example, the obtained content to be played is a scenic spot tour guide word, a weather forecast content, and a piece of financial news, and then the theme type of the content to be played may be determined as weather forecast, scenic spot introduction, and financial news, respectively, and further, the corresponding interactive scene may be determined as "weather broadcast", "tour guide", and "financial news".

In one embodiment, before the processor 220 performs the step of receiving the voice information of the user, the following steps are further performed:

In an embodiment, after the processor 220 performs the step of setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the following steps are further performed:

In one embodiment, the processor 220 is further configured to perform the following steps:

For the detailed working process and steps of the processor 220 in the voice interaction apparatus of the present embodiment, please refer to the description of the embodiment shown in fig. 1, which is not described herein again.

According to the voice interaction device and the vehicle, after the voice information of the user is received, the content to be played and the current interaction scene are determined according to the voice information, and then the content to be played is played according to the preset voice packet corresponding to the interaction scene. Through the mode, the method and the device can interact with the user through different sounds according to different interaction scenes, and the user experience is good.

Although the present application has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application, and all changes, substitutions and alterations that fall within the spirit and scope of the application are to be understood as being included within the following description of the preferred embodiment.

Claims

1. A method for voice interaction, comprising:

receiving voice information of a user;

2. The method of claim 1, wherein the determining the content to be played and the current interactive scene according to the voice information comprises:

determining a current interactive scene according to the user characteristics;

3. The method of claim 1, wherein the determining the content to be played and the current interactive scene according to the voice information comprises:

performing voice recognition on the voice information;

4. The method of claim 1, wherein before receiving the voice message of the user, the method further comprises:

5. The method according to claim 4, wherein after the setting the voice packet selected by the user as the preset voice packet corresponding to the interactive scene, the method further comprises:

6. The method of claim 1, further comprising:

7. A voice interaction device comprising a processor, wherein the processor is configured to execute program instructions to perform the steps comprising:

receiving voice information of a user;

8. The apparatus as claimed in claim 7, wherein the processor performs the step of determining the current interactive scene and the content to be played according to the voice information, comprising:

determining a current interactive scene according to the user characteristics;

9. The apparatus as claimed in claim 7, wherein the processor performs the step of determining the current interactive scene and the content to be played according to the voice information, comprising:

performing voice recognition on the voice information;

10. A vehicle, characterized in that it comprises a voice interaction device according to any one of claims 7 to 9.