WO2024022027A1

WO2024022027A1 - Intelligent voice interaction method and apparatus, device, and storage medium

Info

Publication number: WO2024022027A1
Application number: PCT/CN2023/104740
Authority: WO
Inventors: 赵默涵; 孙雪迪; 郑红丽; 郑琦; 芦聪; 祝威
Original assignee: 中国第一汽车股份有限公司
Priority date: 2022-07-26
Filing date: 2023-06-30
Publication date: 2024-02-01
Also published as: CN115273865A

Abstract

The present application discloses an intelligent voice interaction method and apparatus, a device, and a storage medium. The method comprises: determining a target working mode of a vehicle according to sensing data detected by a sensor in the vehicle; determining a target image parameter, a target action parameter, and a target voice parameter of an intelligent voice assistant according to the target working mode and/or a target scenario; generating a dynamic intelligent image according to the target image parameter, the target action parameter, and the target voice parameter; and interacting with a user on the basis of the dynamic intelligent image.

Description

Intelligent voice interaction method, device, equipment and storage medium

This application claims priority to the Chinese patent application with application number 202210883395.9, which was submitted to the China Patent Office on July 26, 2022. The entire content of this application is incorporated into this application by reference.

Technical field

This application relates to the field of intelligent vehicles, for example, to an intelligent voice interaction method, device, equipment and storage medium.

Background technique

With the development of artificial intelligence, voice intelligence is gradually applied to all aspects of life. The voice intelligent interaction methods used in autonomous vehicles on the market are too simple and cannot meet the personalized needs of users. Therefore, how to meet users' personalized needs for voice intelligent interaction and improve users' vehicle experience is a problem that needs to be solved.

Contents of the invention

This application provides an intelligent voice interaction method, device, equipment and storage medium, which can improve the interaction between users and vehicles, meet users' personalized needs for voice intelligent interaction methods, and improve users' vehicle use experience.

According to one aspect of this application, an intelligent voice interaction method is provided, including:

Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario;

Generate a dynamic intelligent image according to the target image parameters, the target action parameters and the target voice parameters;

Based on the dynamic intelligent image, interact with the user.

According to another aspect of the present application, an intelligent voice interaction device is provided, which device includes:

A target working mode determination module configured to determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle;

A parameter determination module configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario;

A dynamic intelligent image generation module is configured to generate images based on the target image parameters and the target action parameters. and the target speech parameters to generate a dynamic intelligent image;

The interaction module is configured to interact with the user based on the dynamic intelligent image.

According to another aspect of the present application, an electronic device is provided, the electronic device including:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the method described in any embodiment of the present application. Intelligent voice interaction method.

According to another aspect of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores computer instructions, and the computer instructions are used to implement any of the embodiments of the present application when executed by a processor. Intelligent voice interaction method.

Description of drawings

In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings needed to be used in the description of the embodiments will be briefly introduced below.

Figure 1 is a flow chart of an intelligent voice interaction method provided by an embodiment;

Figure 2 is a flow chart of an intelligent voice interaction method provided by another embodiment;

Figure 3 is a flow chart of an intelligent voice interaction method provided by another embodiment;

Figure 4 is a schematic structural diagram of an intelligent voice interaction device according to an embodiment;

FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The terms "current", "target", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific sequence or sequence. Furthermore, the terms "including" and "etc." and any variations thereof are intended to cover non-exclusive inclusions, e.g., processes, methods, systems, products, or devices that comprise a series of steps or units need not be limited to those explicitly listed. may include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

Embodiment 1

Figure 1 is a flow chart of an intelligent voice interaction method provided in Embodiment 1 of the present application. This embodiment can be applied to situations where intelligent voice interaction is performed with a user through an intelligent voice assistant. The method can be performed by an intelligent voice interaction device, which can be implemented in the form of hardware and/or software. The intelligent voice interaction device can be configured in an electronic device, such as an intelligent voice service system of the electronic device. The intelligent voice service system includes: sensors, image acquisition equipment and intelligent voice assistants. Intelligent voice assistants can help users solve problems through intelligent interaction with intelligent conversations and instant questions and answers.

As shown in Figure 1, the method includes:

S110. Determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle.

The sensor in the vehicle is the input device of the vehicle's computer system. It converts various working conditions information during vehicle operation, such as vehicle speed, temperature of various media, engine operating conditions, etc., into electrical signals and transmits them to the computer. The target working mode refers to the current working mode of the vehicle determined by the vehicle based on the sensing data. The working modes of the vehicle include welcome mode, chat mode, autonomous driving mode, target scene mode and health monitoring mode.

The welcome mode refers to a mode that welcomes passengers to ride or drive the vehicle when the sensor detects that a passenger has entered the vehicle cabin. Chat mode is a mode that enables voice communication with users on the vehicle through the intelligent voice assistant installed on the vehicle. Autonomous driving mode is the mode in which the vehicle turns on its autonomous driving function. The target scene mode refers to the determined destination scene of the vehicle or the mode corresponding to the scene where the user is located based on the user's selection. The scene where the user is located can be a virtual scene or a real scene. The health monitoring mode refers to the mode used to perform health testing on users so that users can understand their own health status. Different vehicle working modes correspond to different dynamic intelligent images.

In this embodiment, the sensor data transmitted by the vehicle sensor is obtained in real time, and the sensor data is analyzed. According to the data analysis results, the usage of the vehicle cockpit and the instruction information issued by the user are determined. According to the usage of the vehicle cockpit and/or the instruction information issued by the user, The command information determines the target operating mode of the vehicle. The instruction information issued by the user may be one or more of voice information, text information, click information, and gesture information.

For example, the vehicle can be determined based on seat usage, facial detection results and indication information. target working mode. Specifically, the target working mode of the vehicle can be determined based on the sensing data detected by the sensors in the vehicle through the following sub-steps:

S1101. If it is determined that the seat is in use according to the sensing data of the seat sensor in the vehicle, perform face detection on the user on the seat to obtain the face detection result.

The seat sensor can obtain the signal changes of the seat in the vehicle and determine whether there is a user in the vehicle cockpit.

In this embodiment, the signal changes of the seat are determined based on the sensing data transmitted by the seat sensor, and whether a user in the vehicle cockpit enters the vehicle cockpit is determined based on the signal changes of the seat. If it is determined that the user has entered the vehicle cockpit, it is determined that the vehicle cockpit is in use, and the user's image information is further collected through the image collection device in the vehicle to perform face detection on the user based on the user's image information.

S1102. Determine the target working mode of the vehicle based on the face detection results and the instruction information issued by the user.

The face detection results include the user's identity information and age information. Different optional working modes can be provided for users of different age groups, and the target working mode can be determined from the optional working modes.

For example, if the user's age information is determined to be a minor based on the facial detection results, the available working modes are welcome mode and health detection mode, and the autonomous driving mode cannot be provided for minor users.

Obtain the user's identity information based on the facial detection results, determine the optional working modes corresponding to the identity information based on the user's identity information, and then determine the target working mode of the vehicle from the optional working modes based on the instruction information sent by the user.

The method of determining the target working mode of the vehicle can be: verify the user's identity information based on the face detection results; obtain the instruction information issued by the user if the identity information is verified; determine the set working mode based on the instruction information Target working mode.

The set working mode refers to the preset working mode of the vehicle that can be provided to the user. The set working modes can include: welcome mode, chat mode, autonomous driving mode, target scene mode and health monitoring mode.

The identity information of the user with the vehicle use authority can be stored in the intelligent voice service system in advance, and the identity information includes the facial feature information of the user with the vehicle use authority. After getting the current car After obtaining the facial detection results of the user in the vehicle cockpit, the facial detection results are compared with the facial feature information of the user with vehicle usage permissions. Based on the comparison results, it is verified whether the current user in the vehicle cockpit is a user with vehicle usage permissions. If so, then The user in the current vehicle cockpit has been authenticated.

When the user's identity information is verified, the instruction information sent by the user is obtained, the target working mode expected by the user is determined based on the instruction information sent by the user, and the target working mode of the vehicle is determined among the optional working modes.

Optionally, if the user in the current vehicle cockpit does not have the authority to use the vehicle, an early warning message will be issued.

It can be understood that by determining the target working mode of the vehicle through the above steps, the user can be provided with the target working mode corresponding to the identity information according to the user's identity information. Or it can be determined based on the user's identity information whether the user has the permission to obtain the target working mode. This avoids information leakage and ensures vehicle safety.

S120. Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.

The target scene refers to the scene using the vehicle destination or the Metaverse scene. The metaverse scene is a virtual scene constructed through virtual reality technology. The target image parameters refer to the parameter information used to construct the external image of the intelligent voice assistant, such as color information, size information, expression information, etc.; the target action parameters refer to the parameter information used to construct the action to be performed by the intelligent voice assistant; target Voice parameters refer to parameter information used to generate voice information to be emitted by the intelligent voice assistant.

Store the correspondence between the target image parameters, target action parameters, target voice parameters and the target working mode in the intelligent voice service system; at the same time, store the correspondence between the target image parameters, target action parameters, target voice parameters and the target scene Relationships are stored in the intelligent voice service system. The external image of the intelligent voice assistant constructed through the target image parameters includes: doctor image, diving image, pilot image, astronaut image and default image.

The actions of the intelligent voice assistant constructed through the target action parameters include: display actions for displaying information on the display or displaying one's own image, listening actions when the user is chatting, viewing actions when the user is operating the vehicle, and failure to recognize the user Question actions during voice messages, greeting actions to welcome the user, scene adaptive actions corresponding to the target scene, and standby actions in standby mode. Furthermore, scene adaptive actions include playing musical instruments, etc.; standby actions include: reading actions, dancing actions, weightless floating actions, etc.

For example, if the target working mode is the welcome mode, the target image parameters are the default image parameters, the target action parameters include the welcome appearance action parameters and the display action parameters, and the target voice parameters include the welcome voice parameters and the vehicle start voice parameters. .

If it is determined that the target working mode is the welcome mode, then the target image parameters are determined to be the default image parameters, and the external image of the intelligent voice assistant constructed based on the target image parameters is the default image. The target action parameters are the welcome appearance parameters and display parameters. The welcome appearance parameters can construct the appearance action and appearance time of the intelligent voice assistant. For example, the exiting action can be a waving gesture, and the exiting time can be three seconds. The display parameters can construct the display actions made by the intelligent voice assistant when introducing itself. The display actions can be turning in circles and/or spreading the arms. The welcome voice parameter in the target voice parameter can generate the welcome voice that the intelligent voice assistant emits when welcoming the user, and can also generate the prompt voice of the intelligent voice assistant prompting the user to start the vehicle based on the vehicle start voice parameter.

For example, the welcome voice can be: "The flower path has never been swept by customers, and the door is now open for you. Hello passengers, I am your smart voice assistant, my name is Xiaoqi, I am very happy to welcome you on the next trip. We are here to serve you." The prompt voice can be: "Now you click on the startup option, and we can set off."

If the target working mode is a health detection mode, the target image parameters are doctor image parameters, the target action parameters include detection process prompt action parameters, and the target voice parameters include detection process prompt voice parameters.

If it is determined that the target working mode is the health detection mode, then the target image parameters are determined to be the doctor image parameters, and the external image of the intelligent voice assistant constructed based on the target image parameters is the doctor image. The target action parameter is the detection process prompt action parameter, and the detection process prompt action parameter can construct the detection prompt action of the intelligent voice assistant during the detection process. For example, the detection prompt action may be the action of picking up and hanging up the stethoscope. The detection process prompt voice parameter in the target voice parameter can generate a health detection prompt voice issued by the intelligent voice assistant to the user when performing health detection on the user.

For example, the health check prompt voice can be: "Please look at the camera position in front, and I will conduct daily health checks for you." At the end of the health test, the health test prompt voice can be: "The physical examination results will be sent to you at the end." Optionally, if the user does not look at the camera during the health check, the health check prompt voice can be: "Health check is in progress, please look at the camera."

It can be understood that by determining multiple parameters of the intelligent voice assistant according to the target working mode and/or the target scenario, the obtained parameters corresponding to the intelligent voice assistant can be made to conform to the current working conditions of the vehicle. operation mode and/or target scenario, thereby constructing a dynamic intelligent image that meets user needs and is easy for users to understand.

S130. Generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters.

Dynamic intelligent image refers to an intelligent model image of the intelligent voice assistant that can make actions and speak.

Determine the target external image of the intelligent voice assistant based on the target image parameters; determine the target actions to be displayed by the intelligent voice assistant based on the target action parameters; determine the target voice information to be emitted by the intelligent voice assistant based on the target voice parameters. Generate a dynamic intelligent image based on the target external image of the intelligent voice assistant, the target actions to be displayed, and the target voice information to be emitted.

S140. Interact with users based on dynamic intelligent images.

After obtaining the dynamic intelligent image corresponding to the target working mode and/or the target scene, information interaction with the user can be carried out through the intelligent voice assistant based on the dynamic intelligent image.

Optionally, you can also set up an Easter egg link during the interaction with the user. That is, during the interaction with the user, if the user triggers the conditions for sending Easter eggs, the intelligent voice assistant will send out the Easter egg voice through the dynamic intelligent image, and Send e-cards to users. The voice of the easter egg can be: "I am sending you a postcard from the sky, please accept it quickly."

For example, when the user exits the intelligent voice service system or turns off the intelligent voice assistant, the dynamic intelligent image will disappear and issue an offline prompt voice. The offline prompt voice can be: "See you next time."

The above scheme determines the parameter information for constructing a dynamic intelligent image according to the vehicle's target working mode and/or target scenario, and can dynamically adjust the dynamic intelligent image in real time according to the actual situation of the vehicle, solving the problem of excessive voice intelligent interaction methods on autonomous vehicles. The problem is that it is single and cannot meet the personalized needs of users.

The technical solution provided by this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice of the intelligent voice assistant based on the target working mode and/or target scene. parameters; generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters; interact with the user based on the dynamic intelligent image. The above solution can provide users with personalized intelligence based on the target scenario and/or the working mode of the vehicle. The dynamic intelligent image of the voice assistant improves the intelligent performance of the vehicle and provides users with more convenience when using the vehicle. At the same time, users can understand the current working mode and target scenarios of the vehicle in real time based on the dynamic intelligent image, which improves the user's vehicle experience.

Embodiment 2

Figure 2 is a flow chart of an intelligent voice interaction method provided in Embodiment 2 of the present application. This embodiment is explained on the basis of the above embodiment and provides a method based on target image parameters, target action parameters and target voice. Parameters, an implementation plan for generating dynamic intelligent images. For example, as shown in Figure 2, the method includes:

S210. Determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle.

S220. Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.

S230. Generate an initial intelligent image based on the target image parameters, target action parameters, and target voice parameters.

The initial intelligent image refers to an intelligent image that needs to be further adjusted based on the multi-type parameters corresponding to the working mode and/or the target scene.

S240: Adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image.

In this embodiment, adjustment parameters for dynamic intelligent image adjustment can be assigned to multiple target scenes according to the characteristics of the target scene, and the target scenes and adjustment parameters are correspondingly stored in the intelligent voice service system. After the target scene is determined, the adjustment parameters are determined according to the target scene, and then the initial intelligent image is adjusted based on the adjustment parameters, and the adjusted initial intelligent image is used as the dynamic intelligent image.

For example, a dynamic intelligent image can be obtained according to the following sub-steps:

S2401. Determine color information and/or emotional information according to the target scene.

Color information refers to the shape and color composition of the constructed dynamic intelligent image. Emotional information refers to the external emotions of dynamic intelligent images. Emotional information can include: happy, sad, active, shy, etc.

In this embodiment, different color parameters and emotional parameters can be set for the dynamic intelligent images corresponding to different target scenes according to the scene characteristics of the target scene. After obtaining the target scene of the current vehicle, determine the color parameters and emotional parameters corresponding to the target scene according to the target scene, and determine according to the color parameters The color information of the dynamic intelligent image; the emotional information of the dynamic intelligent image is determined according to the emotional parameters.

S2402. Adjust the initial intelligent image according to the color information and/or emotional information to obtain a dynamic intelligent image.

According to the color information, the initial smart image is adjusted. After adjusting the color of the initial smart image, the expression of the dynamic smart image can be adjusted based on the emotional information to obtain a dynamic smart image that matches the scene characteristics of the target scene.

S250, interact with users based on dynamic intelligent images.

The technical solution of this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant based on the target working mode and/or target scene. ; Generate an initial intelligent image based on the target image parameters, target action parameters, and target voice parameters; adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image; interact with the user based on the dynamic intelligent image. The above solution can provide the user with the initial intelligent image of the intelligent voice assistant according to the target working mode of the vehicle, adjust the initial intelligent image in real time according to the target scene, and obtain a dynamic intelligent image, so that the user can obtain a dynamic intelligent image in real time based on the distinguishing characteristics of the dynamic intelligent image and the initial intelligent image. Obtain the scene information of the vehicle target scene to meet the user's need to grasp the vehicle driving scene in real time.

Embodiment 3

Figure 3 is a flow chart of an intelligent voice interaction method provided in Embodiment 3 of the present application. This embodiment is explained on the basis of the above embodiment and provides a method to determine the target image of the intelligent voice assistant according to the target scenario. Parameters, target action parameters and target speech parameters. As shown in Figure 3, the method includes:

S310. Determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle.

S320: Determine the target scene from the optional scenes according to the user's selection operation.

In this embodiment, the optional scenarios refer to optional scenarios that are pre-stored in the intelligent voice service system. Optional scenes can be virtual scenes or real scenes. The virtual scene can be a metaverse scene; the real scene includes the ocean scene corresponding to the aquarium, the sky scene corresponding to the aerospace museum, and the forest scene corresponding to the natural scenic spot. It is understandable that real-life scenarios can be modified based on actual circumstances. Row settings are not limited to the scenarios listed above.

For example, the information selected by the user's selection operation may be the destination of the vehicle, or it may be a scene pre-stored in the intelligent voice service system. If the user's selection operation selects the destination where the vehicle is traveling, the target scene is determined based on the destination. If the information selected by the user's selection operation is a scene pre-stored in the intelligent voice service system, the scene selected by the user is used as the target scene.

S330. If the target scene is a metaverse scene, the target image parameters are the metaverse image parameters, the target action parameters are the action parameters associated with the metaverse scene, and the target voice parameters are the voice parameters associated with the metaverse scene.

In this embodiment, the metaverse image parameters refer to parameters that can construct the metaverse image of the intelligent voice assistant. The Metaverse image can be the image of the intelligent voice assistant built by the user according to actual needs, or it can be the image of the intelligent voice assistant selected by the user through the optional images pre-stored in the intelligent voice service system.

If the target scene is a Metaverse scene, obtain the Metaverse image parameters corresponding to the Metaverse scene, the action parameters associated with the Metaverse scene, and the voice parameters associated with the Metaverse scene.

S340. Generate a dynamic intelligent image according to the target image parameters, target action parameters and target voice parameters.

Based on the metaverse image parameters, the action parameters associated with the metaverse scene, and the voice parameters associated with the metaverse scene, a dynamic intelligent image corresponding to the metaverse scene is constructed.

S350. When it is determined that the user is in the metaverse scene, determine that the working mode of the dynamic intelligent image is the follow mode.

The working modes of dynamic intelligent images can include fixed mode and following mode. The dynamic smart image in fixed mode can only move within the display displaying the dynamic smart image; the dynamic smart image in follow mode can follow the user at all times when he is in the metaverse scene.

When it is determined that the user has entered the metaverse scene, the working mode of the dynamic intelligent image is determined to be the follow mode.

S360. Control the dynamic intelligent image to interact with the user through voice and/or movement in follow mode.

When a user enters the metaverse scene through a virtual reality device, the intelligent voice assistant can accompany the user in follow mode with a dynamic intelligent image, and accompany the user to experience games in the metaverse scene. For example, intelligent voice assistants can operate in virtual gyms in metaverse scenes through dynamic intelligent images. As a sparring partner for users; the intelligent voice assistant can also compete with users in the metaverse scene through dynamic intelligent images. When the user chooses to turn off the virtual device, the intelligent voice assistant can send a farewell voice message through a dynamic intelligent image. The farewell voice message can be "Now take off the VR (Virtual Reality, virtual reality) glasses and look outside." After the user closes the virtual device, the external image of the intelligent voice assistant switches from the metaverse image to the default image.

The technical solution of this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target scene from the optional scenes according to the user's selection operation; if the target scene is a metaverse scene, the target image The parameters are the Metaverse image parameters, the target action parameters are the action parameters associated with the Metaverse scene, and the target voice parameters are the voice parameters associated with the Metaverse scene; when it is determined that the user is in the Metaverse scene, the working mode of the dynamic intelligent image is determined as Follow mode; control the dynamic intelligent image to interact with the user through voice and/or movement in the follow mode. Through the above solution, non-driving users in the vehicle can enter the virtual scene through virtual reality technology, and interact with the intelligent voice assistant in the virtual scene through voice and movement, which better meets the needs of personalized vehicle intelligent voice assistants. demand, improving the comfort and fun of users riding in vehicles.

For example, based on this embodiment, if the target scene is an ocean scene corresponding to an aquarium, then the target external image corresponding to the target scene is a diving image; if the target scene is a sky scene corresponding to an aerospace museum, then the target scene The corresponding target external image is the image of an astronaut; if the target scene is a forest scene corresponding to a natural scenic spot, the target external image corresponding to the target scene is the default image, and a piano key will appear on the display, and the user can use it on the display. Play the piano keys, and the voice message sent by the smart voice assistant at this time can be "You can play the piano on the smart surface and feel the charm of the sound of the elements." The display screen may be a display screen of a user terminal or a display screen of a vehicle terminal.

Embodiment 4

Figure 4 is a schematic structural diagram of an intelligent voice interaction device provided in Embodiment 4 of the present application. This embodiment is applicable to situations where intelligent voice interaction is performed with the user through an intelligent voice assistant. As shown in Figure 4, the intelligent voice interaction device includes: a target working mode determination module 410, a parameter determination module 420, a dynamic intelligent image generation module 430 and an interaction module 440.

The target operating mode determination module 410 is configured to determine based on the sensing data detected by the sensors in the vehicle. Determine the target operating mode of the vehicle.

The parameter determination module 420 is configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.

The dynamic intelligent image generation module 430 is configured to generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters.

The interaction module 440 is configured to interact with the user based on the dynamic intelligent image.

The technical solution provided by this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice of the intelligent voice assistant based on the target working mode and/or target scene. parameters; generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters; interact with the user based on the dynamic intelligent image. The above solution can provide users with a dynamic intelligent image of a personalized intelligent voice assistant based on the target scenario and/or the working mode of the vehicle, improves the intelligent performance of the vehicle, and provides users with more information while using the vehicle. convenient. At the same time, users can understand the current working mode and/or target scenarios of the vehicle in real time based on the dynamic intelligent image, which improves the user's vehicle experience.

The target working mode determination module 410 includes: a face detection unit, configured to detect the face of the user on the seat when it is determined that the seat is in use according to the sensing data of the seat sensor in the vehicle, and obtain the face detection Result: The working mode determination unit is configured to determine the target working mode of the vehicle based on the face detection results and the instruction information issued by the user.

Exemplarily, the working mode determination unit is configured to: verify the user's identity information based on the face detection results; obtain the instruction information issued by the user when the identity information is verified; and obtain the instruction information from the set working mode based on the instruction information. Determine the target working mode.

Exemplarily, the parameter determination module 420 is configured to: when it is determined that the target working mode is the welcome mode, determine the target image parameters as the default image parameters, and the target action parameters include the welcome appearance action parameters and Display action parameters, and the target voice parameters include welcome voice parameters and vehicle startup voice parameters; when it is determined that the target working mode is the health detection mode, the target image parameters are determined to be doctor image parameters, and the target action parameters include detection The process prompts action parameters, and the target voice parameters include the detection process prompt voice parameters.

Exemplarily, the dynamic intelligent image generation module 430 includes: an initial intelligent image generation unit, configured to generate an initial intelligent image based on target image parameters, target action parameters, and target voice parameters; The dynamic intelligent image determination unit is set to adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image.

Exemplarily, the dynamic intelligent image determination unit is configured to: determine color information and/or emotional information according to the target scene; adjust the initial intelligent image according to the color information and/or emotional information to obtain a dynamic intelligent image.

Exemplarily, the parameter determination module 420 is configured to: determine the target scene from the optional scenes according to the user's selection operation; when it is determined that the target scene is a metaverse scene, determine that the target image parameter is a metaverse scene. The universe image parameters, the target action parameters are the action parameters associated with the Metaverse scene, and the target voice parameters are the voice parameters associated with the Metaverse scene.

Correspondingly, the interaction module 440 is configured to: when it is determined that the user is in the metaverse scene, determine the working mode of the dynamic intelligent image to be the follow mode; control the dynamic intelligent image to perform voice and/or action interaction with the user in the follow mode .

The intelligent voice interaction device provided in this embodiment can be applied to the intelligent voice interaction method provided in any of the above embodiments, and has corresponding functions and effects.

Embodiment 5

FIG. 5 shows a schematic structural diagram of an electronic device 10 that can be used to implement embodiments of the present application. Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (eg, helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit the implementation of the present application as described and/or claimed herein.

As shown in Figure 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a read-only memory (Read-Only Memory, ROM) 12, a random access memory (Random Access Memory, RAM) 13 and so on, wherein the memory stores a computer program that can be executed by at least one processor. The processor 11 can execute according to the computer program stored in the ROM 12 or the computer program loaded from the storage unit 18 into the random access memory RAM 13. A variety of appropriate actions and treatments. In the RAM 13, various programs and programs required for the operation of the electronic device 10 can also be stored. data. The processor 11, the ROM 12 and the RAM 13 are connected to each other via the bus 14. An input/output (I/O) interface 15 is also connected to the bus 14 .

Multiple components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a magnetic disk, an optical disk, etc. etc.; and communication unit 19, such as network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.

Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the processor 11 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, a variety of running Machine learning model algorithm processor, digital signal processor (Digital Signal Processor, DSP), and any appropriate processor, controller, microcontroller, etc. The processor 11 executes the methods and processes described above, such as the intelligent voice interaction method.

In some embodiments, the intelligent voice interaction method may be implemented as a computer program, which is tangibly included in a computer-readable storage medium, such as the storage unit 18 . In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the intelligent voice interaction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the intelligent voice interaction method in any other suitable manner (eg, by means of firmware).

Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Parts (ASSP), System On Chip (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or they implemented in a combination. These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

Computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable intelligent voice interaction device, so that when the computer program is executed by the processor, the functions/operations specified in the flowchart and/or block diagram are implemented. A computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a computer-readable storage medium may be a tangible medium that may contain or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. Alternatively, the computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read Only Memory (EPROM), or Flash memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

To provide interaction with a user, the systems and techniques described herein may be implemented on an electronic device having a display device (e.g., a cathode ray tube (CRT) or liquid crystal) for displaying information to the user. A display (Liquid Crystal Display, LCD monitor); and a keyboard and pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.

The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or front end any combination of components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), blockchain network, and the Internet.

Computing systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the management problems that exist in traditional physical hosts and virtual private servers (VPS). It has the disadvantages of high difficulty and weak business scalability.

It should be understood that various forms of the process shown above may be used, with steps reordered, added or deleted. For example, each step described in this application can be executed in parallel, sequentially, or in a different order. As long as the desired results of the technical solution of this application can be achieved, there is no limitation here.

The above-mentioned specific embodiments do not constitute a limitation on the scope of protection of the present application.

Claims

An intelligent voice interaction method, including:

Determine the target operating mode of the vehicle based on the sensing data detected by the sensors in the vehicle;

Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to at least one of the target working mode and the target scenario;

Generate a dynamic intelligent image according to the target image parameters, the target action parameters and the target voice parameters;

Based on the dynamic intelligent image, interact with the user.
The method of claim 1, wherein determining the target operating mode of the vehicle based on sensing data detected by sensors in the vehicle includes:

When it is determined that the seat is in use according to the sensing data of the seat sensor in the vehicle, face detection is performed on the user on the seat to obtain the face detection result;

The target operating mode of the vehicle is determined based on the face detection result and the instruction information issued by the user.
The method according to claim 2, wherein determining the target operating mode of the vehicle based on the face detection result and the instruction information issued by the user includes:

Verify the identity information of the user based on the facial detection results;

If the identity information is successfully verified, obtain the instruction information issued by the user;

According to the instruction information, the target operating mode is determined from the set operating modes.
The method according to claim 1, wherein determining the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode includes:

When it is determined that the target working mode is the welcoming mode, the target image parameters are determined to be default image parameters, and the target action parameters include the welcoming appearance action parameters and the display action parameters, so The target voice parameters include welcome voice parameters and vehicle start voice parameters.

When it is determined that the target working mode is a health detection mode, the target image parameters are determined to be doctor image parameters, the target action parameters include detection process prompt action parameters, and the target voice parameters include detection process prompt voice parameters.
The method according to claim 1, wherein generating a dynamic intelligent image according to the target image parameters, the target action parameters and the target voice parameters includes:

Generate an initial intelligent image according to the target image parameters, the target action parameters, and the target voice parameters;

According to the target scene, the initial intelligent image is adjusted to obtain a dynamic intelligent image.
The method according to claim 5, wherein adjusting the initial intelligent image according to the target scene to obtain a dynamic intelligent image includes:

Determine at least one of color information and emotional information according to the target scene;

According to at least one of the color information and the emotion information, the initial intelligent image is adjusted to obtain a dynamic intelligent image.
The method according to claim 1, wherein determining the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target scenario includes:

According to the user's selection operation, the target scene is determined from the optional scenes;

When it is determined that the target scene is a metaverse scene, it is determined that the target image parameter is a metaverse image parameter, the target action parameter is an action parameter associated with the metaverse scene, and the target voice parameter is an action parameter associated with the metaverse scene. Scene-related voice parameters;

Correspondingly, based on the dynamic intelligent image, interacting with the user includes:

When it is determined that the user is in a metaverse scene, it is determined that the working mode of the dynamic intelligent image is the follow mode;

Control the dynamic intelligent image to perform voice and/or action interaction with the user in the follow mode.
An intelligent voice interaction device, including:

A target working mode determination module configured to determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle;

A parameter determination module configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant based on at least one of the target working mode and the target scenario;

A dynamic intelligent image generation module configured to generate a dynamic intelligent image based on the target image parameters, the target action parameters and the target voice parameters;

The interaction module is configured to interact with the user based on the dynamic intelligent image.
An electronic device including:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor, so that the at least one processor can execute any one of claims 1-7 The intelligent voice interaction method.
A computer-readable storage medium stores computer instructions, and the computer instructions are used to implement the intelligent voice interaction method described in any one of claims 1-7 when executed by a processor.