WO2024022027A1 - Intelligent voice interaction method and apparatus, device, and storage medium - Google Patents

Intelligent voice interaction method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2024022027A1
WO2024022027A1 PCT/CN2023/104740 CN2023104740W WO2024022027A1 WO 2024022027 A1 WO2024022027 A1 WO 2024022027A1 CN 2023104740 W CN2023104740 W CN 2023104740W WO 2024022027 A1 WO2024022027 A1 WO 2024022027A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
parameters
image
voice
intelligent
Prior art date
Application number
PCT/CN2023/104740
Other languages
French (fr)
Chinese (zh)
Inventor
赵默涵
孙雪迪
郑红丽
郑琦
芦聪
祝威
Original Assignee
中国第一汽车股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国第一汽车股份有限公司 filed Critical 中国第一汽车股份有限公司
Publication of WO2024022027A1 publication Critical patent/WO2024022027A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • This application relates to the field of intelligent vehicles, for example, to an intelligent voice interaction method, device, equipment and storage medium.
  • voice intelligence is gradually applied to all aspects of life.
  • the voice intelligent interaction methods used in autonomous vehicles on the market are too simple and cannot meet the personalized needs of users. Therefore, how to meet users' personalized needs for voice intelligent interaction and improve users' vehicle experience is a problem that needs to be solved.
  • This application provides an intelligent voice interaction method, device, equipment and storage medium, which can improve the interaction between users and vehicles, meet users' personalized needs for voice intelligent interaction methods, and improve users' vehicle use experience.
  • an intelligent voice interaction method including:
  • an intelligent voice interaction device which device includes:
  • a target working mode determination module configured to determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle;
  • a parameter determination module configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario;
  • a dynamic intelligent image generation module is configured to generate images based on the target image parameters and the target action parameters. and the target speech parameters to generate a dynamic intelligent image;
  • the interaction module is configured to interact with the user based on the dynamic intelligent image.
  • an electronic device including:
  • the memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the method described in any embodiment of the present application. Intelligent voice interaction method.
  • a computer-readable storage medium stores computer instructions, and the computer instructions are used to implement any of the embodiments of the present application when executed by a processor. Intelligent voice interaction method.
  • Figure 1 is a flow chart of an intelligent voice interaction method provided by an embodiment
  • Figure 2 is a flow chart of an intelligent voice interaction method provided by another embodiment
  • Figure 3 is a flow chart of an intelligent voice interaction method provided by another embodiment
  • Figure 4 is a schematic structural diagram of an intelligent voice interaction device according to an embodiment
  • FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment.
  • FIG 1 is a flow chart of an intelligent voice interaction method provided in Embodiment 1 of the present application.
  • This embodiment can be applied to situations where intelligent voice interaction is performed with a user through an intelligent voice assistant.
  • the method can be performed by an intelligent voice interaction device, which can be implemented in the form of hardware and/or software.
  • the intelligent voice interaction device can be configured in an electronic device, such as an intelligent voice service system of the electronic device.
  • the intelligent voice service system includes: sensors, image acquisition equipment and intelligent voice assistants. Intelligent voice assistants can help users solve problems through intelligent interaction with intelligent conversations and instant questions and answers.
  • the method includes:
  • the sensor in the vehicle is the input device of the vehicle's computer system. It converts various working conditions information during vehicle operation, such as vehicle speed, temperature of various media, engine operating conditions, etc., into electrical signals and transmits them to the computer.
  • the target working mode refers to the current working mode of the vehicle determined by the vehicle based on the sensing data.
  • the working modes of the vehicle include welcome mode, chat mode, autonomous driving mode, target scene mode and health monitoring mode.
  • the welcome mode refers to a mode that welcomes passengers to ride or drive the vehicle when the sensor detects that a passenger has entered the vehicle cabin.
  • Chat mode is a mode that enables voice communication with users on the vehicle through the intelligent voice assistant installed on the vehicle.
  • Autonomous driving mode is the mode in which the vehicle turns on its autonomous driving function.
  • the target scene mode refers to the determined destination scene of the vehicle or the mode corresponding to the scene where the user is located based on the user's selection.
  • the scene where the user is located can be a virtual scene or a real scene.
  • the health monitoring mode refers to the mode used to perform health testing on users so that users can understand their own health status. Different vehicle working modes correspond to different dynamic intelligent images.
  • the sensor data transmitted by the vehicle sensor is obtained in real time, and the sensor data is analyzed. According to the data analysis results, the usage of the vehicle cockpit and the instruction information issued by the user are determined. According to the usage of the vehicle cockpit and/or the instruction information issued by the user, The command information determines the target operating mode of the vehicle.
  • the instruction information issued by the user may be one or more of voice information, text information, click information, and gesture information.
  • the vehicle can be determined based on seat usage, facial detection results and indication information.
  • target working mode Specifically, the target working mode of the vehicle can be determined based on the sensing data detected by the sensors in the vehicle through the following sub-steps:
  • the seat sensor can obtain the signal changes of the seat in the vehicle and determine whether there is a user in the vehicle cockpit.
  • the signal changes of the seat are determined based on the sensing data transmitted by the seat sensor, and whether a user in the vehicle cockpit enters the vehicle cockpit is determined based on the signal changes of the seat. If it is determined that the user has entered the vehicle cockpit, it is determined that the vehicle cockpit is in use, and the user's image information is further collected through the image collection device in the vehicle to perform face detection on the user based on the user's image information.
  • S1102. Determine the target working mode of the vehicle based on the face detection results and the instruction information issued by the user.
  • the face detection results include the user's identity information and age information. Different optional working modes can be provided for users of different age groups, and the target working mode can be determined from the optional working modes.
  • the available working modes are welcome mode and health detection mode, and the autonomous driving mode cannot be provided for minor users.
  • the method of determining the target working mode of the vehicle can be: verify the user's identity information based on the face detection results; obtain the instruction information issued by the user if the identity information is verified; determine the set working mode based on the instruction information Target working mode.
  • the set working mode refers to the preset working mode of the vehicle that can be provided to the user.
  • the set working modes can include: welcome mode, chat mode, autonomous driving mode, target scene mode and health monitoring mode.
  • the identity information of the user with the vehicle use authority can be stored in the intelligent voice service system in advance, and the identity information includes the facial feature information of the user with the vehicle use authority.
  • the facial detection results are compared with the facial feature information of the user with vehicle usage permissions. Based on the comparison results, it is verified whether the current user in the vehicle cockpit is a user with vehicle usage permissions. If so, then The user in the current vehicle cockpit has been authenticated.
  • the instruction information sent by the user is obtained, the target working mode expected by the user is determined based on the instruction information sent by the user, and the target working mode of the vehicle is determined among the optional working modes.
  • an early warning message will be issued.
  • the user can be provided with the target working mode corresponding to the identity information according to the user's identity information. Or it can be determined based on the user's identity information whether the user has the permission to obtain the target working mode. This avoids information leakage and ensures vehicle safety.
  • S120 Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.
  • the target scene refers to the scene using the vehicle destination or the Metaverse scene.
  • the metaverse scene is a virtual scene constructed through virtual reality technology.
  • the target image parameters refer to the parameter information used to construct the external image of the intelligent voice assistant, such as color information, size information, expression information, etc.;
  • the target action parameters refer to the parameter information used to construct the action to be performed by the intelligent voice assistant;
  • target Voice parameters refer to parameter information used to generate voice information to be emitted by the intelligent voice assistant.
  • the external image of the intelligent voice assistant constructed through the target image parameters includes: doctor image, diving image, pilot image, astronaut image and default image.
  • the actions of the intelligent voice assistant constructed through the target action parameters include: display actions for displaying information on the display or displaying one's own image, listening actions when the user is chatting, viewing actions when the user is operating the vehicle, and failure to recognize the user Question actions during voice messages, greeting actions to welcome the user, scene adaptive actions corresponding to the target scene, and standby actions in standby mode.
  • scene adaptive actions include playing musical instruments, etc.
  • standby actions include: reading actions, dancing actions, weightless floating actions, etc.
  • the target working mode is the welcome mode
  • the target image parameters are the default image parameters
  • the target action parameters include the welcome appearance action parameters and the display action parameters
  • the target voice parameters include the welcome voice parameters and the vehicle start voice parameters.
  • the target image parameters are determined to be the default image parameters, and the external image of the intelligent voice assistant constructed based on the target image parameters is the default image.
  • the target action parameters are the welcome appearance parameters and display parameters.
  • the welcome appearance parameters can construct the appearance action and appearance time of the intelligent voice assistant.
  • the exiting action can be a waving gesture, and the exiting time can be three seconds.
  • the display parameters can construct the display actions made by the intelligent voice assistant when introducing itself. The display actions can be turning in circles and/or spreading the arms.
  • the welcome voice parameter in the target voice parameter can generate the welcome voice that the intelligent voice assistant emits when welcoming the user, and can also generate the prompt voice of the intelligent voice assistant prompting the user to start the vehicle based on the vehicle start voice parameter.
  • the welcome voice can be: "The flower path has never been swept by customers, and the door is now open for you. Hello passengers, I am your smart voice assistant, my name is Xiaoqi, I am very happy to welcome you on the next trip. We are here to serve you.”
  • the prompt voice can be: "Now you click on the startup option, and we can set off.”
  • the target working mode is a health detection mode
  • the target image parameters are doctor image parameters
  • the target action parameters include detection process prompt action parameters
  • the target voice parameters include detection process prompt voice parameters.
  • the target image parameters are determined to be the doctor image parameters, and the external image of the intelligent voice assistant constructed based on the target image parameters is the doctor image.
  • the target action parameter is the detection process prompt action parameter, and the detection process prompt action parameter can construct the detection prompt action of the intelligent voice assistant during the detection process.
  • the detection prompt action may be the action of picking up and hanging up the stethoscope.
  • the detection process prompt voice parameter in the target voice parameter can generate a health detection prompt voice issued by the intelligent voice assistant to the user when performing health detection on the user.
  • the health check prompt voice can be: "Please look at the camera position in front, and I will conduct daily health checks for you.”
  • the health test prompt voice can be: "The physical examination results will be sent to you at the end.”
  • the health check prompt voice can be: "Health check is in progress, please look at the camera.”
  • the obtained parameters corresponding to the intelligent voice assistant can be made to conform to the current working conditions of the vehicle. operation mode and/or target scenario, thereby constructing a dynamic intelligent image that meets user needs and is easy for users to understand.
  • Dynamic intelligent image refers to an intelligent model image of the intelligent voice assistant that can make actions and speak.
  • information interaction with the user can be carried out through the intelligent voice assistant based on the dynamic intelligent image.
  • the voice of the easter egg can be: "I am sending you a postcard from the sky, please accept it quickly.”
  • the dynamic intelligent image will disappear and issue an offline prompt voice.
  • the offline prompt voice can be: "See you next time.”
  • the above scheme determines the parameter information for constructing a dynamic intelligent image according to the vehicle's target working mode and/or target scenario, and can dynamically adjust the dynamic intelligent image in real time according to the actual situation of the vehicle, solving the problem of excessive voice intelligent interaction methods on autonomous vehicles.
  • the problem is that it is single and cannot meet the personalized needs of users.
  • the technical solution provided by this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice of the intelligent voice assistant based on the target working mode and/or target scene. parameters; generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters; interact with the user based on the dynamic intelligent image.
  • the above solution can provide users with personalized intelligence based on the target scenario and/or the working mode of the vehicle.
  • the dynamic intelligent image of the voice assistant improves the intelligent performance of the vehicle and provides users with more convenience when using the vehicle. At the same time, users can understand the current working mode and target scenarios of the vehicle in real time based on the dynamic intelligent image, which improves the user's vehicle experience.
  • Figure 2 is a flow chart of an intelligent voice interaction method provided in Embodiment 2 of the present application. This embodiment is explained on the basis of the above embodiment and provides a method based on target image parameters, target action parameters and target voice. Parameters, an implementation plan for generating dynamic intelligent images. For example, as shown in Figure 2, the method includes:
  • S220 Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.
  • the initial intelligent image refers to an intelligent image that needs to be further adjusted based on the multi-type parameters corresponding to the working mode and/or the target scene.
  • adjustment parameters for dynamic intelligent image adjustment can be assigned to multiple target scenes according to the characteristics of the target scene, and the target scenes and adjustment parameters are correspondingly stored in the intelligent voice service system. After the target scene is determined, the adjustment parameters are determined according to the target scene, and then the initial intelligent image is adjusted based on the adjustment parameters, and the adjusted initial intelligent image is used as the dynamic intelligent image.
  • a dynamic intelligent image can be obtained according to the following sub-steps:
  • Color information refers to the shape and color composition of the constructed dynamic intelligent image.
  • Emotional information refers to the external emotions of dynamic intelligent images. Emotional information can include: happy, sad, active, shy, etc.
  • different color parameters and emotional parameters can be set for the dynamic intelligent images corresponding to different target scenes according to the scene characteristics of the target scene. After obtaining the target scene of the current vehicle, determine the color parameters and emotional parameters corresponding to the target scene according to the target scene, and determine according to the color parameters The color information of the dynamic intelligent image; the emotional information of the dynamic intelligent image is determined according to the emotional parameters.
  • the initial smart image is adjusted.
  • the expression of the dynamic smart image can be adjusted based on the emotional information to obtain a dynamic smart image that matches the scene characteristics of the target scene.
  • S250 interact with users based on dynamic intelligent images.
  • the technical solution of this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant based on the target working mode and/or target scene. ; Generate an initial intelligent image based on the target image parameters, target action parameters, and target voice parameters; adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image; interact with the user based on the dynamic intelligent image.
  • the above solution can provide the user with the initial intelligent image of the intelligent voice assistant according to the target working mode of the vehicle, adjust the initial intelligent image in real time according to the target scene, and obtain a dynamic intelligent image, so that the user can obtain a dynamic intelligent image in real time based on the distinguishing characteristics of the dynamic intelligent image and the initial intelligent image.
  • Figure 3 is a flow chart of an intelligent voice interaction method provided in Embodiment 3 of the present application. This embodiment is explained on the basis of the above embodiment and provides a method to determine the target image of the intelligent voice assistant according to the target scenario. Parameters, target action parameters and target speech parameters. As shown in Figure 3, the method includes:
  • S310 Determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle.
  • S320 Determine the target scene from the optional scenes according to the user's selection operation.
  • the optional scenarios refer to optional scenarios that are pre-stored in the intelligent voice service system.
  • Optional scenes can be virtual scenes or real scenes.
  • the virtual scene can be a metaverse scene; the real scene includes the ocean scene corresponding to the aquarium, the sky scene corresponding to the aerospace museum, and the forest scene corresponding to the natural scenic spot. It is understandable that real-life scenarios can be modified based on actual circumstances. Row settings are not limited to the scenarios listed above.
  • the information selected by the user's selection operation may be the destination of the vehicle, or it may be a scene pre-stored in the intelligent voice service system. If the user's selection operation selects the destination where the vehicle is traveling, the target scene is determined based on the destination. If the information selected by the user's selection operation is a scene pre-stored in the intelligent voice service system, the scene selected by the user is used as the target scene.
  • the target scene is a metaverse scene
  • the target image parameters are the metaverse image parameters
  • the target action parameters are the action parameters associated with the metaverse scene
  • the target voice parameters are the voice parameters associated with the metaverse scene.
  • the metaverse image parameters refer to parameters that can construct the metaverse image of the intelligent voice assistant.
  • the Metaverse image can be the image of the intelligent voice assistant built by the user according to actual needs, or it can be the image of the intelligent voice assistant selected by the user through the optional images pre-stored in the intelligent voice service system.
  • the target scene is a Metaverse scene
  • a dynamic intelligent image corresponding to the metaverse scene is constructed.
  • the working modes of dynamic intelligent images can include fixed mode and following mode.
  • the dynamic smart image in fixed mode can only move within the display displaying the dynamic smart image; the dynamic smart image in follow mode can follow the user at all times when he is in the metaverse scene.
  • the working mode of the dynamic intelligent image is determined to be the follow mode.
  • S360 Control the dynamic intelligent image to interact with the user through voice and/or movement in follow mode.
  • the intelligent voice assistant can accompany the user in follow mode with a dynamic intelligent image, and accompany the user to experience games in the metaverse scene.
  • intelligent voice assistants can operate in virtual gyms in metaverse scenes through dynamic intelligent images.
  • the intelligent voice assistant can also compete with users in the metaverse scene through dynamic intelligent images.
  • the intelligent voice assistant can send a farewell voice message through a dynamic intelligent image.
  • the farewell voice message can be "Now take off the VR (Virtual Reality, virtual reality) glasses and look outside.” After the user closes the virtual device, the external image of the intelligent voice assistant switches from the metaverse image to the default image.
  • the technical solution of this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target scene from the optional scenes according to the user's selection operation; if the target scene is a metaverse scene, the target image
  • the parameters are the Metaverse image parameters
  • the target action parameters are the action parameters associated with the Metaverse scene
  • the target voice parameters are the voice parameters associated with the Metaverse scene
  • the working mode of the dynamic intelligent image is determined as Follow mode; control the dynamic intelligent image to interact with the user through voice and/or movement in the follow mode.
  • non-driving users in the vehicle can enter the virtual scene through virtual reality technology, and interact with the intelligent voice assistant in the virtual scene through voice and movement, which better meets the needs of personalized vehicle intelligent voice assistants. demand, improving the comfort and fun of users riding in vehicles.
  • the target scene is an ocean scene corresponding to an aquarium, then the target external image corresponding to the target scene is a diving image; if the target scene is a sky scene corresponding to an aerospace museum, then the target scene The corresponding target external image is the image of an astronaut; if the target scene is a forest scene corresponding to a natural scenic spot, the target external image corresponding to the target scene is the default image, and a piano key will appear on the display, and the user can use it on the display.
  • the display screen may be a display screen of a user terminal or a display screen of a vehicle terminal.
  • FIG 4 is a schematic structural diagram of an intelligent voice interaction device provided in Embodiment 4 of the present application. This embodiment is applicable to situations where intelligent voice interaction is performed with the user through an intelligent voice assistant.
  • the intelligent voice interaction device includes: a target working mode determination module 410, a parameter determination module 420, a dynamic intelligent image generation module 430 and an interaction module 440.
  • the target operating mode determination module 410 is configured to determine based on the sensing data detected by the sensors in the vehicle. Determine the target operating mode of the vehicle.
  • the parameter determination module 420 is configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.
  • the dynamic intelligent image generation module 430 is configured to generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters.
  • the interaction module 440 is configured to interact with the user based on the dynamic intelligent image.
  • the technical solution provided by this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice of the intelligent voice assistant based on the target working mode and/or target scene. parameters; generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters; interact with the user based on the dynamic intelligent image.
  • the above solution can provide users with a dynamic intelligent image of a personalized intelligent voice assistant based on the target scenario and/or the working mode of the vehicle, improves the intelligent performance of the vehicle, and provides users with more information while using the vehicle. convenient. At the same time, users can understand the current working mode and/or target scenarios of the vehicle in real time based on the dynamic intelligent image, which improves the user's vehicle experience.
  • the target working mode determination module 410 includes: a face detection unit, configured to detect the face of the user on the seat when it is determined that the seat is in use according to the sensing data of the seat sensor in the vehicle, and obtain the face detection Result:
  • the working mode determination unit is configured to determine the target working mode of the vehicle based on the face detection results and the instruction information issued by the user.
  • the working mode determination unit is configured to: verify the user's identity information based on the face detection results; obtain the instruction information issued by the user when the identity information is verified; and obtain the instruction information from the set working mode based on the instruction information. Determine the target working mode.
  • the parameter determination module 420 is configured to: when it is determined that the target working mode is the welcome mode, determine the target image parameters as the default image parameters, and the target action parameters include the welcome appearance action parameters and Display action parameters, and the target voice parameters include welcome voice parameters and vehicle startup voice parameters; when it is determined that the target working mode is the health detection mode, the target image parameters are determined to be doctor image parameters, and the target action parameters include detection The process prompts action parameters, and the target voice parameters include the detection process prompt voice parameters.
  • the dynamic intelligent image generation module 430 includes: an initial intelligent image generation unit, configured to generate an initial intelligent image based on target image parameters, target action parameters, and target voice parameters;
  • the dynamic intelligent image determination unit is set to adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image.
  • the dynamic intelligent image determination unit is configured to: determine color information and/or emotional information according to the target scene; adjust the initial intelligent image according to the color information and/or emotional information to obtain a dynamic intelligent image.
  • the parameter determination module 420 is configured to: determine the target scene from the optional scenes according to the user's selection operation; when it is determined that the target scene is a metaverse scene, determine that the target image parameter is a metaverse scene.
  • the universe image parameters, the target action parameters are the action parameters associated with the Metaverse scene, and the target voice parameters are the voice parameters associated with the Metaverse scene.
  • the interaction module 440 is configured to: when it is determined that the user is in the metaverse scene, determine the working mode of the dynamic intelligent image to be the follow mode; control the dynamic intelligent image to perform voice and/or action interaction with the user in the follow mode .
  • the intelligent voice interaction device provided in this embodiment can be applied to the intelligent voice interaction method provided in any of the above embodiments, and has corresponding functions and effects.
  • FIG. 5 shows a schematic structural diagram of an electronic device 10 that can be used to implement embodiments of the present application.
  • Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (eg, helmets, glasses, watches, etc.), and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit the implementation of the present application as described and/or claimed herein.
  • the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a read-only memory (Read-Only Memory, ROM) 12, a random access memory (Random Access Memory, RAM) 13 and so on, wherein the memory stores a computer program that can be executed by at least one processor.
  • the processor 11 can execute according to the computer program stored in the ROM 12 or the computer program loaded from the storage unit 18 into the random access memory RAM 13. A variety of appropriate actions and treatments.
  • various programs and programs required for the operation of the electronic device 10 can also be stored. data.
  • the processor 11, the ROM 12 and the RAM 13 are connected to each other via the bus 14.
  • An input/output (I/O) interface 15 is also connected to the bus 14 .
  • the I/O interface 15 Multiple components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a magnetic disk, an optical disk, etc. etc.; and communication unit 19, such as network card, modem, wireless communication transceiver, etc.
  • the communication unit 19 allows the electronic device 10 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.
  • Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the processor 11 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, a variety of running Machine learning model algorithm processor, digital signal processor (Digital Signal Processor, DSP), and any appropriate processor, controller, microcontroller, etc.
  • the processor 11 executes the methods and processes described above, such as the intelligent voice interaction method.
  • the intelligent voice interaction method may be implemented as a computer program, which is tangibly included in a computer-readable storage medium, such as the storage unit 18 .
  • part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19.
  • the processor 11 may be configured to perform the intelligent voice interaction method in any other suitable manner (eg, by means of firmware).
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSP Application Specific Standard Parts
  • SOC System On Chip
  • CPLD Complex Programmable Logic Device
  • programmable processors can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable intelligent voice interaction device, so that when the computer program is executed by the processor, the functions/operations specified in the flowchart and/or block diagram are implemented.
  • a computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a computer-readable storage medium may be a tangible medium that may contain or store a computer program for use by or in connection with an instruction execution system, apparatus, or device.
  • Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • the computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read Only Memory (EPROM), or Flash memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on an electronic device having a display device (e.g., a cathode ray tube (CRT) or liquid crystal) for displaying information to the user.
  • a display device e.g., a cathode ray tube (CRT) or liquid crystal
  • a display Liquid Crystal Display, LCD monitor
  • a keyboard and pointing device e.g., a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or front end any combination of components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), blockchain network, and the Internet.
  • Computing systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the management problems that exist in traditional physical hosts and virtual private servers (VPS). It has the disadvantages of high difficulty and weak business scalability.
  • VPN virtual private servers

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present application discloses an intelligent voice interaction method and apparatus, a device, and a storage medium. The method comprises: determining a target working mode of a vehicle according to sensing data detected by a sensor in the vehicle; determining a target image parameter, a target action parameter, and a target voice parameter of an intelligent voice assistant according to the target working mode and/or a target scenario; generating a dynamic intelligent image according to the target image parameter, the target action parameter, and the target voice parameter; and interacting with a user on the basis of the dynamic intelligent image.

Description

智能语音交互方法、装置、设备和存储介质Intelligent voice interaction method, device, equipment and storage medium
本申请要求在2022年07月26日提交中国专利局、申请号为202210883395.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202210883395.9, which was submitted to the China Patent Office on July 26, 2022. The entire content of this application is incorporated into this application by reference.
技术领域Technical field
本申请涉及智能车辆领域,例如涉及一种智能语音交互方法、装置、设备和存储介质。This application relates to the field of intelligent vehicles, for example, to an intelligent voice interaction method, device, equipment and storage medium.
背景技术Background technique
随着人工智能的发展,语音智能逐渐应用于生活中的各个方面。市场上应用于自动驾驶车辆的语音智能交互方式过于单一,无法满足用户的个性化需求。因此,如何满足用户对语音智能交互方式的个性化需求,提高用户的车辆使用体验,是需要解决的问题。With the development of artificial intelligence, voice intelligence is gradually applied to all aspects of life. The voice intelligent interaction methods used in autonomous vehicles on the market are too simple and cannot meet the personalized needs of users. Therefore, how to meet users' personalized needs for voice intelligent interaction and improve users' vehicle experience is a problem that needs to be solved.
发明内容Contents of the invention
本申请提供了一种智能语音交互方法、装置、设备和存储介质,可以提高用户和车辆的互动性,满足用户对语音智能交互方式的个性化需求,提高用户的车辆使用体验。This application provides an intelligent voice interaction method, device, equipment and storage medium, which can improve the interaction between users and vehicles, meet users' personalized needs for voice intelligent interaction methods, and improve users' vehicle use experience.
根据本申请的一方面,提供了一种智能语音交互方法,包括:According to one aspect of this application, an intelligent voice interaction method is provided, including:
根据所述目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数;Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario;
根据所述目标形象参数、所述目标动作参数和所述目标语音参数,生成动态智能形象;Generate a dynamic intelligent image according to the target image parameters, the target action parameters and the target voice parameters;
基于所述动态智能形象,与用户进行交互。Based on the dynamic intelligent image, interact with the user.
根据本申请的另一方面,提供了一种智能语音交互装置,该装置包括:According to another aspect of the present application, an intelligent voice interaction device is provided, which device includes:
目标工作模式确定模块,设置为根据车辆中传感器检测的传感数据,确定所述车辆的目标工作模式;A target working mode determination module configured to determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle;
参数确定模块,设置为根据所述目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数;A parameter determination module configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario;
动态智能形象生成模块,设置为根据所述目标形象参数、所述目标动作参 数和所述目标语音参数,生成动态智能形象;A dynamic intelligent image generation module is configured to generate images based on the target image parameters and the target action parameters. and the target speech parameters to generate a dynamic intelligent image;
交互模块,设置为基于所述动态智能形象,与用户进行交互。The interaction module is configured to interact with the user based on the dynamic intelligent image.
根据本申请的另一方面,提供了一种电子设备,所述电子设备包括:According to another aspect of the present application, an electronic device is provided, the electronic device including:
至少一个处理器;以及at least one processor; and
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行本申请任一实施例所述的智能语音交互方法。The memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the method described in any embodiment of the present application. Intelligent voice interaction method.
根据本申请的另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现本申请任一实施例所述的智能语音交互方法。According to another aspect of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores computer instructions, and the computer instructions are used to implement any of the embodiments of the present application when executed by a processor. Intelligent voice interaction method.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings needed to be used in the description of the embodiments will be briefly introduced below.
图1为一实施例提供的一种智能语音交互方法的流程图;Figure 1 is a flow chart of an intelligent voice interaction method provided by an embodiment;
图2为另一实施例提供的一种智能语音交互方法的流程图;Figure 2 is a flow chart of an intelligent voice interaction method provided by another embodiment;
图3为另一实施例提供的一种智能语音交互方法的流程图;Figure 3 is a flow chart of an intelligent voice interaction method provided by another embodiment;
图4为一实施例提供的一种智能语音交互装置的结构示意图;Figure 4 is a schematic structural diagram of an intelligent voice interaction device according to an embodiment;
图5为一实施例提供的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行说明。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
本申请的说明书和权利要求书及上述附图中的术语“当前”、“目标”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。此外,术语“包括”和“等”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列 出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "current", "target", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific sequence or sequence. Furthermore, the terms "including" and "etc." and any variations thereof are intended to cover non-exclusive inclusions, e.g., processes, methods, systems, products, or devices that comprise a series of steps or units need not be limited to those explicitly listed. may include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.
实施例一Embodiment 1
图1为本申请实施例一提供了一种智能语音交互方法的流程图,本实施例可适用于通过智能语音助手和用户进行智能语音交互的情况。该方法可以由智能语音交互装置来执行,该智能语音交互装置可以采用硬件和/或软件的形式实现,该智能语音交互装置可配置于电子设备中,例如电子设备的智能语音服务系统中。智能语音服务系统包括:传感器、图像采集设备和智能语音助手。智能语音助手可以通过智能对话与即时问答的智能交互,帮助用户解决问题。Figure 1 is a flow chart of an intelligent voice interaction method provided in Embodiment 1 of the present application. This embodiment can be applied to situations where intelligent voice interaction is performed with a user through an intelligent voice assistant. The method can be performed by an intelligent voice interaction device, which can be implemented in the form of hardware and/or software. The intelligent voice interaction device can be configured in an electronic device, such as an intelligent voice service system of the electronic device. The intelligent voice service system includes: sensors, image acquisition equipment and intelligent voice assistants. Intelligent voice assistants can help users solve problems through intelligent interaction with intelligent conversations and instant questions and answers.
如图1所示,该方法包括:As shown in Figure 1, the method includes:
S110、根据车辆中传感器检测的传感数据,确定车辆的目标工作模式。S110. Determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle.
车辆中的传感器是车辆的计算机系统的输入装置,它把车辆运行中多种工况信息,如车速、多种介质的温度、发动机运转工况等,转化成电信号输给计算机。目标工作模式是指车辆根据传感数据,确定的当前车辆的工作模式。车辆的工作模式包括迎宾模式、聊天模式、自动驾驶模式、目标场景模式和健康监测模式。The sensor in the vehicle is the input device of the vehicle's computer system. It converts various working conditions information during vehicle operation, such as vehicle speed, temperature of various media, engine operating conditions, etc., into electrical signals and transmits them to the computer. The target working mode refers to the current working mode of the vehicle determined by the vehicle based on the sensing data. The working modes of the vehicle include welcome mode, chat mode, autonomous driving mode, target scene mode and health monitoring mode.
迎宾模式是指当传感器检测到有乘客进入车辆座舱时,迎接乘客乘坐或驾驶车辆的模式。聊天模式是可以通过车辆上安装的智能语音助手和车辆上的用户进行语音沟通的模式。自动驾驶模式是车辆开启自动驾驶功能的模式。目标场景模式是指根据用户的选择,确定的车辆的目的地场景或用户所在场景对应的模式,用户所在场景可以是虚拟场景或现实场景。健康监测模式是指用于对用户进行健康检测,以使用户了解自身健康状况的模式。不同的车辆的工作模式对应不同的动态智能形象。The welcome mode refers to a mode that welcomes passengers to ride or drive the vehicle when the sensor detects that a passenger has entered the vehicle cabin. Chat mode is a mode that enables voice communication with users on the vehicle through the intelligent voice assistant installed on the vehicle. Autonomous driving mode is the mode in which the vehicle turns on its autonomous driving function. The target scene mode refers to the determined destination scene of the vehicle or the mode corresponding to the scene where the user is located based on the user's selection. The scene where the user is located can be a virtual scene or a real scene. The health monitoring mode refers to the mode used to perform health testing on users so that users can understand their own health status. Different vehicle working modes correspond to different dynamic intelligent images.
本实施例中,实时获取车辆传感器传送的传感器数据,并对传感器数据进行数据分析,根据数据分析结果确定车辆座舱的使用情况和用户发出的指令信息,根据车辆座舱的使用情况和/或用户发出的指令信息确定车辆的目标工作模式。用户发出的指令信息可以是语音信息、文字信息、点击信息和手势信息中的一个或多个。In this embodiment, the sensor data transmitted by the vehicle sensor is obtained in real time, and the sensor data is analyzed. According to the data analysis results, the usage of the vehicle cockpit and the instruction information issued by the user are determined. According to the usage of the vehicle cockpit and/or the instruction information issued by the user, The command information determines the target operating mode of the vehicle. The instruction information issued by the user may be one or more of voice information, text information, click information, and gesture information.
示例性的,可以根据座椅使用情况、面部检测结果和指示信息,确定车辆 的目标工作模式。具体的,可以通过如下子步骤实现根据车辆中传感器检测的传感数据,确定车辆的目标工作模式:For example, the vehicle can be determined based on seat usage, facial detection results and indication information. target working mode. Specifically, the target working mode of the vehicle can be determined based on the sensing data detected by the sensors in the vehicle through the following sub-steps:
S1101、若根据车辆中座椅传感器的传感数据,确定座椅处于使用状态,则对座椅上的用户进行面部检测,得到面部检测结果。S1101. If it is determined that the seat is in use according to the sensing data of the seat sensor in the vehicle, perform face detection on the user on the seat to obtain the face detection result.
座椅传感器可以通过获取车辆中座椅的信号变化情况获取座椅的信号变化情况,确定车辆座舱是否存在用户。The seat sensor can obtain the signal changes of the seat in the vehicle and determine whether there is a user in the vehicle cockpit.
本实施例中,根据座椅传感器传输的传感数据,确定座椅的信号变化情况,根据座椅的信号变化情况,确定车辆座舱中是否有用户进入车辆座舱。若确定用户进入车辆座舱,则确定车辆座舱处于使用状态,并进一步的通过车辆中的图像采集设备采集用户的图像信息,以根据用户的图像信息对用户进行面部检测。In this embodiment, the signal changes of the seat are determined based on the sensing data transmitted by the seat sensor, and whether a user in the vehicle cockpit enters the vehicle cockpit is determined based on the signal changes of the seat. If it is determined that the user has entered the vehicle cockpit, it is determined that the vehicle cockpit is in use, and the user's image information is further collected through the image collection device in the vehicle to perform face detection on the user based on the user's image information.
S1102、根据面部检测结果和用户发出的指示信息,确定车辆的目标工作模式。S1102. Determine the target working mode of the vehicle based on the face detection results and the instruction information issued by the user.
面部检测结果包括用户的身份信息和年龄信息。可以为不同年龄段的用户提供不同的可选工作模式,并从可选工作模式中确定目标工作模式。The face detection results include the user's identity information and age information. Different optional working modes can be provided for users of different age groups, and the target working mode can be determined from the optional working modes.
例如,若根据面部检测结果确定用户的年龄信息为未成年,则可选工作模式为迎宾模式和健康检测模式,不能为未成年用户提供自动驾驶模式。For example, if the user's age information is determined to be a minor based on the facial detection results, the available working modes are welcome mode and health detection mode, and the autonomous driving mode cannot be provided for minor users.
根据面部检测结果获取用户的身份信息,根据用户的身份信息确定身份信息对应的可选工作模式,再根据用户发出的指示信息,从可选工作模式中确定车辆的目标工作模式。Obtain the user's identity information based on the facial detection results, determine the optional working modes corresponding to the identity information based on the user's identity information, and then determine the target working mode of the vehicle from the optional working modes based on the instruction information sent by the user.
确定车辆的目标工作模式的方法可以是:根据面部检测结果,验证用户的身份信息;在对身份信息验证通过的情况下,获取用户发出的指示信息;根据指示信息,从设定工作模式中确定目标工作模式。The method of determining the target working mode of the vehicle can be: verify the user's identity information based on the face detection results; obtain the instruction information issued by the user if the identity information is verified; determine the set working mode based on the instruction information Target working mode.
设定工作模式是指预先设置好的,可以提供给用户的车辆的工作模式。设定工作模式可以包括:迎宾模式、聊天模式、自动驾驶模式、目标场景模式和健康监测模式。The set working mode refers to the preset working mode of the vehicle that can be provided to the user. The set working modes can include: welcome mode, chat mode, autonomous driving mode, target scene mode and health monitoring mode.
可以预先将具有车辆使用权限的用户的身份信息存储在智能语音服务系统中,身份信息包括具有车辆使用权限的用户的面部特征信息。在获取到当前车 辆座舱中用户的面部检测结果后,将面部检测结果和具有车辆使用权限的用户的面部特征信息进行对比,根据对比结果验证当前车辆座舱中用户是否为具有车辆使用权限的用户,若是,则对当前车辆座舱中用户的验证通过。The identity information of the user with the vehicle use authority can be stored in the intelligent voice service system in advance, and the identity information includes the facial feature information of the user with the vehicle use authority. After getting the current car After obtaining the facial detection results of the user in the vehicle cockpit, the facial detection results are compared with the facial feature information of the user with vehicle usage permissions. Based on the comparison results, it is verified whether the current user in the vehicle cockpit is a user with vehicle usage permissions. If so, then The user in the current vehicle cockpit has been authenticated.
在对用户的身份信息验证通过的情况下,获取用户发出的指示信息,根据用户发出的指示信息确定用户期望获取的目标工作模式,以可选工作模式中确定车辆的目标工作模式。When the user's identity information is verified, the instruction information sent by the user is obtained, the target working mode expected by the user is determined based on the instruction information sent by the user, and the target working mode of the vehicle is determined among the optional working modes.
可选的,若当前车辆座舱中用户无车辆使用权限,则发出预警信息。Optionally, if the user in the current vehicle cockpit does not have the authority to use the vehicle, an early warning message will be issued.
可以理解的是,通过上述步骤确定车辆的目标工作模式,可以根据用户的身份信息,为用户提供身份信息对应的目标工作模式。或者可以根据用户的身份信息,确定用户是否具有目标工作模式的获取权限。从而避免了信息泄露,保证了车辆的安全性。It can be understood that by determining the target working mode of the vehicle through the above steps, the user can be provided with the target working mode corresponding to the identity information according to the user's identity information. Or it can be determined based on the user's identity information whether the user has the permission to obtain the target working mode. This avoids information leakage and ensures vehicle safety.
S120、根据目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数。S120. Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.
目标场景是指用车辆目的地的场景或元宇宙场景。元宇宙场景是一种通过虚拟现实技术构建的虚拟场景。目标形象参数是指用于构建智能语音助手的外在形象的参数信息,例如颜色信息、尺寸信息和表情信息等;目标动作参数是指用于构建智能语音助手所要执行的动作的参数信息;目标语音参数是指用于生成智能语音助手所要发出的语音信息的参数信息。The target scene refers to the scene using the vehicle destination or the Metaverse scene. The metaverse scene is a virtual scene constructed through virtual reality technology. The target image parameters refer to the parameter information used to construct the external image of the intelligent voice assistant, such as color information, size information, expression information, etc.; the target action parameters refer to the parameter information used to construct the action to be performed by the intelligent voice assistant; target Voice parameters refer to parameter information used to generate voice information to be emitted by the intelligent voice assistant.
将目标形象参数、目标动作参数和目标语音参数和目标工作模式之间的对应关系存储在智能语音服务系统中;同时,将目标形象参数、目标动作参数和目标语音参数和目标场景之间的对应关系存储在智能语音服务系统中。通过目标形象参数构建出的智能语音助手的外在形象包括:医生形象、潜水形象、飞行员形象、宇航员形象和默认形象。Store the correspondence between the target image parameters, target action parameters, target voice parameters and the target working mode in the intelligent voice service system; at the same time, store the correspondence between the target image parameters, target action parameters, target voice parameters and the target scene Relationships are stored in the intelligent voice service system. The external image of the intelligent voice assistant constructed through the target image parameters includes: doctor image, diving image, pilot image, astronaut image and default image.
通过目标动作参数构建出的智能语音助手的动作包括:用于展示显示屏上的信息或展示自身形象的展示动作,倾听用户聊天时的聆听动作,用户操作车辆时的观看动作,未能识别用户语音信息时的疑问动作,对用户表示欢迎的迎宾动作,和目标场景对应的场景适应性动作,以及在待机模式下的待机动作。进一步的,场景适应性动作包括弹奏乐器动作等;待机动作包括:看书动作、舞蹈动作和失重漂浮动作等。 The actions of the intelligent voice assistant constructed through the target action parameters include: display actions for displaying information on the display or displaying one's own image, listening actions when the user is chatting, viewing actions when the user is operating the vehicle, and failure to recognize the user Question actions during voice messages, greeting actions to welcome the user, scene adaptive actions corresponding to the target scene, and standby actions in standby mode. Furthermore, scene adaptive actions include playing musical instruments, etc.; standby actions include: reading actions, dancing actions, weightless floating actions, etc.
示例性的,若目标工作模式为迎宾模式,则目标形象参数为默认形象参数,目标动作参数包括迎宾出场动作参数和展示动作参数,目标语音参数包括迎宾语音参数和提示车辆启动语音参数。For example, if the target working mode is the welcome mode, the target image parameters are the default image parameters, the target action parameters include the welcome appearance action parameters and the display action parameters, and the target voice parameters include the welcome voice parameters and the vehicle start voice parameters. .
若确定目标工作模式为迎宾模式,则确定目标形象参数为默认形象参数,根据目标形象参数构建出的智能语音助手的外在形象为默认形象。目标动作参数为迎宾出场参数和展示参数,迎宾出场参数可以构建出智能语音助手的出场动作和出场所用时间。例如,出场动作可以是挥手示好的动作,出场所用时间可以是三秒。展示参数可以构建出智能语音助手在自我介绍时做出的展示动作,展示动作可以是转圈动作和/或两臂张开的动作。目标语音参数中的迎宾语音参数可以生成智能语音助手在对用户表示欢迎时所发出的迎宾语音,还可以根据提示车辆启动语音参数生成智能语音助手提示用户启动车辆的提示语音。If it is determined that the target working mode is the welcome mode, then the target image parameters are determined to be the default image parameters, and the external image of the intelligent voice assistant constructed based on the target image parameters is the default image. The target action parameters are the welcome appearance parameters and display parameters. The welcome appearance parameters can construct the appearance action and appearance time of the intelligent voice assistant. For example, the exiting action can be a waving gesture, and the exiting time can be three seconds. The display parameters can construct the display actions made by the intelligent voice assistant when introducing itself. The display actions can be turning in circles and/or spreading the arms. The welcome voice parameter in the target voice parameter can generate the welcome voice that the intelligent voice assistant emits when welcoming the user, and can also generate the prompt voice of the intelligent voice assistant prompting the user to start the vehicle based on the vehicle start voice parameter.
例如,迎宾语音可以是:“花径不曾缘客扫,蓬门今始为君开。乘客您好,我是您的智能语音助手,我的名字是小七,很高兴在接下来的行程中为您服务。”提示语音可以是:“现在您点击启动选项,我们就可以启程啦”。For example, the welcome voice can be: "The flower path has never been swept by customers, and the door is now open for you. Hello passengers, I am your smart voice assistant, my name is Xiaoqi, I am very happy to welcome you on the next trip. We are here to serve you." The prompt voice can be: "Now you click on the startup option, and we can set off."
若目标工作模式为健康检测模式,则目标形象参数为医生形象参数,目标动作参数包括检测过程提示动作参数,目标语音参数包括检测过程提示语音参数。If the target working mode is a health detection mode, the target image parameters are doctor image parameters, the target action parameters include detection process prompt action parameters, and the target voice parameters include detection process prompt voice parameters.
若确定目标工作模式为健康检测模式,则确定目标形象参数为医生形象参数,根据目标形象参数构建出的智能语音助手的外在形象为医生形象。目标动作参数为检测过程提示动作参数,检测过程提示动作参数可以构建出智能语音助手在检测过程中的检测提示动作。例如,检测提示动作可以是摘挂听诊器的动作。目标语音参数中的检测过程提示语音参数可以生成智能语音助手在对用户进行健康检测时向用户发出的健康检测提示语音。If it is determined that the target working mode is the health detection mode, then the target image parameters are determined to be the doctor image parameters, and the external image of the intelligent voice assistant constructed based on the target image parameters is the doctor image. The target action parameter is the detection process prompt action parameter, and the detection process prompt action parameter can construct the detection prompt action of the intelligent voice assistant during the detection process. For example, the detection prompt action may be the action of picking up and hanging up the stethoscope. The detection process prompt voice parameter in the target voice parameter can generate a health detection prompt voice issued by the intelligent voice assistant to the user when performing health detection on the user.
例如,健康检测提示语音可以是:“请注视前方摄像头位置,我来为你进行日常健康检查”。健康检测结束时,健康检测提示语音可以是:“体检结果在结束时发给您”。可选的,若健康检测过程中用户未注视摄像头,则健康检测提示语音可以是:“健康检测中,请注视摄像头”。For example, the health check prompt voice can be: "Please look at the camera position in front, and I will conduct daily health checks for you." At the end of the health test, the health test prompt voice can be: "The physical examination results will be sent to you at the end." Optionally, if the user does not look at the camera during the health check, the health check prompt voice can be: "Health check is in progress, please look at the camera."
可以理解的是,通过根据目标工作模式和/或目标场景,确定智能语音助手的多个参数的方式,可以使获取的智能语音助手对应的参数符合车辆当前的工 作模式和/或目标场景,从而构建出符合用户需求且方便用户理解的动态智能形象。It can be understood that by determining multiple parameters of the intelligent voice assistant according to the target working mode and/or the target scenario, the obtained parameters corresponding to the intelligent voice assistant can be made to conform to the current working conditions of the vehicle. operation mode and/or target scenario, thereby constructing a dynamic intelligent image that meets user needs and is easy for users to understand.
S130、根据目标形象参数、目标动作参数和目标语音参数,生成动态智能形象。S130. Generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters.
动态智能形象是指智能语音助手的一种可以做出动作和发出语音的智能模型形象。Dynamic intelligent image refers to an intelligent model image of the intelligent voice assistant that can make actions and speak.
根据目标形象参数确定智能语音助手的目标外在形象;根据目标动作参数确定智能语音助手所要展示的目标动作;根据目标语音参数确定智能语音助手所要发出的目标语音信息。根据智能语音助手的目标外在形象、所要展示的目标动作和所要发出的目标语音信息,生成动态智能形象。Determine the target external image of the intelligent voice assistant based on the target image parameters; determine the target actions to be displayed by the intelligent voice assistant based on the target action parameters; determine the target voice information to be emitted by the intelligent voice assistant based on the target voice parameters. Generate a dynamic intelligent image based on the target external image of the intelligent voice assistant, the target actions to be displayed, and the target voice information to be emitted.
S140、基于动态智能形象,与用户进行交互。S140. Interact with users based on dynamic intelligent images.
在获取和目标工作模式和/或目标场景对应的动态智能形象后,可以基于动态智能形象,通过智能语音助手与用户进行信息交互。After obtaining the dynamic intelligent image corresponding to the target working mode and/or the target scene, information interaction with the user can be carried out through the intelligent voice assistant based on the dynamic intelligent image.
可选的,还可以在与用户进行交互的过程中设置彩蛋环节,即在与用户进行交互的过程中,若用户触发了发送彩蛋的条件,则智能语音助手通过动态智能形象发出彩蛋语音,并向用户送出电子贺卡。彩蛋语音可以是:“送您一封天空明信片,快收下吧”。Optionally, you can also set up an Easter egg link during the interaction with the user. That is, during the interaction with the user, if the user triggers the conditions for sending Easter eggs, the intelligent voice assistant will send out the Easter egg voice through the dynamic intelligent image, and Send e-cards to users. The voice of the easter egg can be: "I am sending you a postcard from the sky, please accept it quickly."
示例性的,在用户退出智能语音服务系统或关闭智能语音助手时,动态智能形象会作出消失动作,并发出下线提示语音。下线提示语音可以是:“我们下次再见”。For example, when the user exits the intelligent voice service system or turns off the intelligent voice assistant, the dynamic intelligent image will disappear and issue an offline prompt voice. The offline prompt voice can be: "See you next time."
上述方案,根据车辆的目标工作模式和/或目标场景,确定构建动态智能形象的参数信息,可以根据车辆的实际情况,实时动态化调整动态智能形象,解决了自动驾驶车辆上语音智能交互方式过于单一,无法满足用户个性化需求的问题。The above scheme determines the parameter information for constructing a dynamic intelligent image according to the vehicle's target working mode and/or target scenario, and can dynamically adjust the dynamic intelligent image in real time according to the actual situation of the vehicle, solving the problem of excessive voice intelligent interaction methods on autonomous vehicles. The problem is that it is single and cannot meet the personalized needs of users.
本实施例提供的技术方案,根据车辆中传感器检测的传感数据,确定车辆的目标工作模式;根据目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数;根据目标形象参数、目标动作参数和目标语音参数,生成动态智能形象;基于动态智能形象,与用户进行交互。上述方案,可以根据目标场景和/或车辆的工作模式,为用户提供个性化的智能 语音助手的动态智能形象,提高了车辆的智能化性能,在用户使用车辆的过程中为用户提供更多的便利。同时,用户可以根据动态智能形象实时了解车辆当前的工作模式和目标场景,提高了用户的车辆使用体验。The technical solution provided by this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice of the intelligent voice assistant based on the target working mode and/or target scene. parameters; generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters; interact with the user based on the dynamic intelligent image. The above solution can provide users with personalized intelligence based on the target scenario and/or the working mode of the vehicle. The dynamic intelligent image of the voice assistant improves the intelligent performance of the vehicle and provides users with more convenience when using the vehicle. At the same time, users can understand the current working mode and target scenarios of the vehicle in real time based on the dynamic intelligent image, which improves the user's vehicle experience.
实施例二Embodiment 2
图2为本申请实施例二提供的一种智能语音交互方法的流程图,本实施例在上述实施例的基础上进行了说明,给出了一种根据目标形象参数、目标动作参数和目标语音参数,生成动态智能形象的实施方案。例如,如图2所示,该方法包括:Figure 2 is a flow chart of an intelligent voice interaction method provided in Embodiment 2 of the present application. This embodiment is explained on the basis of the above embodiment and provides a method based on target image parameters, target action parameters and target voice. Parameters, an implementation plan for generating dynamic intelligent images. For example, as shown in Figure 2, the method includes:
S210、根据车辆中传感器检测的传感数据,确定车辆的目标工作模式。S210. Determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle.
S220、根据目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数。S220. Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.
S230、根据目标形象参数、目标动作参数、目标语音参数,生成初始智能形象。S230. Generate an initial intelligent image based on the target image parameters, target action parameters, and target voice parameters.
初始智能形象是指根据工作模式和/或目标场景对应的多类参数生成的需要进一步调整的智能形象。The initial intelligent image refers to an intelligent image that needs to be further adjusted based on the multi-type parameters corresponding to the working mode and/or the target scene.
S240、根据目标场景,调整初始智能形象,得到动态智能形象。S240: Adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image.
本实施例中,可以根据目标场景的特色,为多个目标场景分配可供动态智能形象调整的调整参数,并将目标场景和调整参数对应存储在智能语音服务系统中。在确定目标场景后,根据目标场景确定调整参数,再基于调整参数对初始智能形象进行调整,将调整后的初始智能形象作为动态智能形象。In this embodiment, adjustment parameters for dynamic intelligent image adjustment can be assigned to multiple target scenes according to the characteristics of the target scene, and the target scenes and adjustment parameters are correspondingly stored in the intelligent voice service system. After the target scene is determined, the adjustment parameters are determined according to the target scene, and then the initial intelligent image is adjusted based on the adjustment parameters, and the adjusted initial intelligent image is used as the dynamic intelligent image.
示例性的,可以根据如下子步骤,得到动态智能形象:For example, a dynamic intelligent image can be obtained according to the following sub-steps:
S2401、根据目标场景,确定颜色信息和/或情绪信息。S2401. Determine color information and/or emotional information according to the target scene.
颜色信息是指构建出的动态智能形象的外形颜色组成。情绪信息是指动态智能形象的外在情绪。情绪信息可以包括:开心、悲伤、活跃和害羞等。Color information refers to the shape and color composition of the constructed dynamic intelligent image. Emotional information refers to the external emotions of dynamic intelligent images. Emotional information can include: happy, sad, active, shy, etc.
本实施例中,可以根据目标场景的场景特色,为不同的目标场景对应的动态智能形象设置不同的颜色参数和情绪参数。在获取到当前车辆的目标场景后,根据目标场景,确定目标场景对应的颜色参数和情绪参数,根据颜色参数确定 动态智能形象的颜色信息;根据情绪参数确定动态智能形象的情绪信息。In this embodiment, different color parameters and emotional parameters can be set for the dynamic intelligent images corresponding to different target scenes according to the scene characteristics of the target scene. After obtaining the target scene of the current vehicle, determine the color parameters and emotional parameters corresponding to the target scene according to the target scene, and determine according to the color parameters The color information of the dynamic intelligent image; the emotional information of the dynamic intelligent image is determined according to the emotional parameters.
S2402、根据颜色信息和/或情绪信息,调整初始智能形象,得到动态智能形象。S2402. Adjust the initial intelligent image according to the color information and/or emotional information to obtain a dynamic intelligent image.
根据颜色信息,调整初始智能形象,还可以在调整初始智能形象的颜色后,根据情绪信息,调整动态智能形象的表情,以获得符合目标场景的场景特色的动态智能形象According to the color information, the initial smart image is adjusted. After adjusting the color of the initial smart image, the expression of the dynamic smart image can be adjusted based on the emotional information to obtain a dynamic smart image that matches the scene characteristics of the target scene.
S250、基于动态智能形象,与用户进行交互。S250, interact with users based on dynamic intelligent images.
本实施例的技术方案,根据车辆中传感器检测的传感数据,确定车辆的目标工作模式;根据目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数;根据目标形象参数、目标动作参数、目标语音参数,生成初始智能形象;根据目标场景,调整初始智能形象,得到动态智能形象;基于动态智能形象,与用户进行交互。上述方案可以根据车辆的目标工作模式为用户提供智能语音助手的初始智能形象,根据目标场景实时调整初始智能形象,获得动态智能形象,以使用户根据动态智能形象和初始智能形象的区别特征,实时获取车辆目标场景的场景信息,满足用户实时掌握车辆行驶场景的需求。The technical solution of this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant based on the target working mode and/or target scene. ; Generate an initial intelligent image based on the target image parameters, target action parameters, and target voice parameters; adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image; interact with the user based on the dynamic intelligent image. The above solution can provide the user with the initial intelligent image of the intelligent voice assistant according to the target working mode of the vehicle, adjust the initial intelligent image in real time according to the target scene, and obtain a dynamic intelligent image, so that the user can obtain a dynamic intelligent image in real time based on the distinguishing characteristics of the dynamic intelligent image and the initial intelligent image. Obtain the scene information of the vehicle target scene to meet the user's need to grasp the vehicle driving scene in real time.
实施例三Embodiment 3
图3为本申请实施例三提供的一种智能语音交互方法的流程图,本实施例在上述实施例的基础上进行了说明,给出了一种根据目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数的实施方式。如图3所示,该方法包括:Figure 3 is a flow chart of an intelligent voice interaction method provided in Embodiment 3 of the present application. This embodiment is explained on the basis of the above embodiment and provides a method to determine the target image of the intelligent voice assistant according to the target scenario. Parameters, target action parameters and target speech parameters. As shown in Figure 3, the method includes:
S310、根据车辆中传感器检测的传感数据,确定车辆的目标工作模式。S310. Determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle.
S320、根据用户的选择操作,从可选场景中确定目标场景。S320: Determine the target scene from the optional scenes according to the user's selection operation.
本实施例中,可选场景是指预先存储在智能语音服务系统中的,可供选择的场景。可选场景可以是虚拟场景,也可以是现实场景。虚拟场景可以是元宇宙场景;现实场景包括海洋馆对应的海洋场景,航天博物馆对应的天空场景,以及自然景区对应的森林场景。可以理解的是,现实场景可以根据实际情况进 行设置,不局限于上述已列出的场景。In this embodiment, the optional scenarios refer to optional scenarios that are pre-stored in the intelligent voice service system. Optional scenes can be virtual scenes or real scenes. The virtual scene can be a metaverse scene; the real scene includes the ocean scene corresponding to the aquarium, the sky scene corresponding to the aerospace museum, and the forest scene corresponding to the natural scenic spot. It is understandable that real-life scenarios can be modified based on actual circumstances. Row settings are not limited to the scenarios listed above.
例如,用户的选择操作选择的信息可以是车辆行驶的目的地,也可以是预先存储在智能语音服务系统中的场景。若用户的选择操作选择的是车辆行驶的目的地,则根据目的地,确定目标场景。若用户的选择操作选择的信息是预先存储在智能语音服务系统中的场景,则将用户选择的场景作为目标场景。For example, the information selected by the user's selection operation may be the destination of the vehicle, or it may be a scene pre-stored in the intelligent voice service system. If the user's selection operation selects the destination where the vehicle is traveling, the target scene is determined based on the destination. If the information selected by the user's selection operation is a scene pre-stored in the intelligent voice service system, the scene selected by the user is used as the target scene.
S330、若目标场景为元宇宙场景,则目标形象参数为元宇宙形象参数,目标动作参数为元宇宙场景关联的动作参数,目标语音参数为元宇宙场景关联的语音参数。S330. If the target scene is a metaverse scene, the target image parameters are the metaverse image parameters, the target action parameters are the action parameters associated with the metaverse scene, and the target voice parameters are the voice parameters associated with the metaverse scene.
本实施例中,元宇宙形象参数是指可以构建出智能语音助手的元宇宙形象的参数。元宇宙形象可以是用户根据实际需求,自行构建的智能语音助手的形象,也可以是用户通过预先存储在智能语音服务系统中可选形象选择出的智能语音助手的形象。In this embodiment, the metaverse image parameters refer to parameters that can construct the metaverse image of the intelligent voice assistant. The Metaverse image can be the image of the intelligent voice assistant built by the user according to actual needs, or it can be the image of the intelligent voice assistant selected by the user through the optional images pre-stored in the intelligent voice service system.
若目标场景为元宇宙场景,则获取元宇宙场景对应的元宇宙形象参数,元宇宙场景关联的动作参数,以及元宇宙场景关联的语音参数。If the target scene is a Metaverse scene, obtain the Metaverse image parameters corresponding to the Metaverse scene, the action parameters associated with the Metaverse scene, and the voice parameters associated with the Metaverse scene.
S340、根据目标形象参数目标动作参数和目标语音参数,生成动态智能形象。S340. Generate a dynamic intelligent image according to the target image parameters, target action parameters and target voice parameters.
根据元宇宙形象参数、元宇宙场景关联的动作参数、以及元宇宙场景关联的语音参数,构建出元宇宙场景对应的动态智能形象。Based on the metaverse image parameters, the action parameters associated with the metaverse scene, and the voice parameters associated with the metaverse scene, a dynamic intelligent image corresponding to the metaverse scene is constructed.
S350、在确定用户处于元宇宙场景的情况下,确定动态智能形象的工作模式为跟随模式。S350. When it is determined that the user is in the metaverse scene, determine that the working mode of the dynamic intelligent image is the follow mode.
动态智能形象的工作模式可以包括固定模式和跟随模式。固定模式下的动态智能形象只能够在显示动态智能形象的显示器内移动;跟随模式下的动态智能形象可以在用户处于元宇宙场景时,时刻跟随用。The working modes of dynamic intelligent images can include fixed mode and following mode. The dynamic smart image in fixed mode can only move within the display displaying the dynamic smart image; the dynamic smart image in follow mode can follow the user at all times when he is in the metaverse scene.
当确定用户进入元宇宙场景后,确定动态智能形象的工作模式为跟随模式。When it is determined that the user has entered the metaverse scene, the working mode of the dynamic intelligent image is determined to be the follow mode.
S360、控制动态智能形象在跟随模式下,与用户进行语音和/或动作交互。S360. Control the dynamic intelligent image to interact with the user through voice and/or movement in follow mode.
当用户通过虚拟现实设备进入元宇宙场景时,智能语音助手可以以动态智能形象在跟随模式下陪伴在用户身边,并陪伴用户体验元宇宙场景中的游戏。例如,智能语音助手可以通过动态智能形象,在元宇宙场景中的虚拟健身房作 为用户的陪练;智能语音助手还可以通过动态智能形象,与元宇宙场景中的用户进行竞赛活动。当用户选择关闭虚拟设备时,智能语音助手可以通过动态智能形象发出告别语音信息,告别语音信息可以是“现在摘下来VR(Virtual Reality,虚拟现实)眼镜看看外面吧”。用户关闭虚拟设备后,智能语音助手的外在形象从元宇宙形象切换为默认形象。When a user enters the metaverse scene through a virtual reality device, the intelligent voice assistant can accompany the user in follow mode with a dynamic intelligent image, and accompany the user to experience games in the metaverse scene. For example, intelligent voice assistants can operate in virtual gyms in metaverse scenes through dynamic intelligent images. As a sparring partner for users; the intelligent voice assistant can also compete with users in the metaverse scene through dynamic intelligent images. When the user chooses to turn off the virtual device, the intelligent voice assistant can send a farewell voice message through a dynamic intelligent image. The farewell voice message can be "Now take off the VR (Virtual Reality, virtual reality) glasses and look outside." After the user closes the virtual device, the external image of the intelligent voice assistant switches from the metaverse image to the default image.
本实施例的技术方案,根据车辆中传感器检测的传感数据,确定车辆的目标工作模式;根据用户的选择操作,从可选场景中确定目标场景;若目标场景为元宇宙场景,则目标形象参数为元宇宙形象参数,目标动作参数为元宇宙场景关联的动作参数,目标语音参数为元宇宙场景关联的语音参数;在确定用户处于元宇宙场景的情况下,确定动态智能形象的工作模式为跟随模式;控制动态智能形象在跟随模式下,与用户进行语音和/或动作交互。通过上述方案,可以使乘坐车辆的非驾驶用户通过虚拟现实技术,进入虚拟场景,在虚拟场景下和智能语音助手进行语音和动作交互,更好的满足了用于对于车辆智能语音助手的个性化需求,提升了用户乘坐车辆的舒适性和趣味性。The technical solution of this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target scene from the optional scenes according to the user's selection operation; if the target scene is a metaverse scene, the target image The parameters are the Metaverse image parameters, the target action parameters are the action parameters associated with the Metaverse scene, and the target voice parameters are the voice parameters associated with the Metaverse scene; when it is determined that the user is in the Metaverse scene, the working mode of the dynamic intelligent image is determined as Follow mode; control the dynamic intelligent image to interact with the user through voice and/or movement in the follow mode. Through the above solution, non-driving users in the vehicle can enter the virtual scene through virtual reality technology, and interact with the intelligent voice assistant in the virtual scene through voice and movement, which better meets the needs of personalized vehicle intelligent voice assistants. demand, improving the comfort and fun of users riding in vehicles.
示例性的,在本实施例的基础上,若目标场景为海洋馆对应的海洋场景,则目标场景对应的目标外在形象为潜水形象;若目标场景为航天博物馆对应的天空场景,则目标场景对应的目标外在形象为宇航员形象;若目标场景为自然景区对应的森林场景,则目标场景对应的目标外在形象为默认形象,显示屏上会出现钢琴键,用户可以在显示屏上使用钢琴键弹奏,此时智能语音助手发出的语音信息可以是“您可以在智能表面上弹奏钢琴,感受元素之声的魅力”。其中,显示屏可以是用户终端的显示屏,也可以是车端的显示屏。For example, based on this embodiment, if the target scene is an ocean scene corresponding to an aquarium, then the target external image corresponding to the target scene is a diving image; if the target scene is a sky scene corresponding to an aerospace museum, then the target scene The corresponding target external image is the image of an astronaut; if the target scene is a forest scene corresponding to a natural scenic spot, the target external image corresponding to the target scene is the default image, and a piano key will appear on the display, and the user can use it on the display. Play the piano keys, and the voice message sent by the smart voice assistant at this time can be "You can play the piano on the smart surface and feel the charm of the sound of the elements." The display screen may be a display screen of a user terminal or a display screen of a vehicle terminal.
实施例四Embodiment 4
图4为本申请实施例四提供的一种智能语音交互装置的结构示意图。本实施例可适用于通过智能语音助手和用户进行智能语音交互的情况。如图4所示,该智能语音交互装置包括:目标工作模式确定模块410、参数确定模块420、动态智能形象生成模块430和交互模块440。Figure 4 is a schematic structural diagram of an intelligent voice interaction device provided in Embodiment 4 of the present application. This embodiment is applicable to situations where intelligent voice interaction is performed with the user through an intelligent voice assistant. As shown in Figure 4, the intelligent voice interaction device includes: a target working mode determination module 410, a parameter determination module 420, a dynamic intelligent image generation module 430 and an interaction module 440.
目标工作模式确定模块410,设置为根据车辆中传感器检测的传感数据,确 定车辆的目标工作模式。The target operating mode determination module 410 is configured to determine based on the sensing data detected by the sensors in the vehicle. Determine the target operating mode of the vehicle.
参数确定模块420,设置为根据目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数。The parameter determination module 420 is configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode and/or target scenario.
动态智能形象生成模块430,设置为根据目标形象参数、目标动作参数和目标语音参数,生成动态智能形象。The dynamic intelligent image generation module 430 is configured to generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters.
交互模块440,设置为基于动态智能形象,与用户进行交互。The interaction module 440 is configured to interact with the user based on the dynamic intelligent image.
本实施例提供的技术方案,根据车辆中传感器检测的传感数据,确定车辆的目标工作模式;根据目标工作模式和/或目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数;根据目标形象参数、目标动作参数和目标语音参数,生成动态智能形象;基于动态智能形象,与用户进行交互。上述方案,可以根据目标场景和/或车辆的工作模式,为用户提供个性化的智能语音助手的动态智能形象,提高了车辆的智能化性能,在用户使用车辆的过程中为用户提供更多的便利。同时,用户可以根据动态智能形象实时了解车辆当前的工作模式和/或目标场景,提高了用户的车辆使用体验。The technical solution provided by this embodiment determines the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle; determines the target image parameters, target action parameters and target voice of the intelligent voice assistant based on the target working mode and/or target scene. parameters; generate a dynamic intelligent image based on the target image parameters, target action parameters and target voice parameters; interact with the user based on the dynamic intelligent image. The above solution can provide users with a dynamic intelligent image of a personalized intelligent voice assistant based on the target scenario and/or the working mode of the vehicle, improves the intelligent performance of the vehicle, and provides users with more information while using the vehicle. convenient. At the same time, users can understand the current working mode and/or target scenarios of the vehicle in real time based on the dynamic intelligent image, which improves the user's vehicle experience.
目标工作模式确定模块410包括:面部检测单元,设置为在根据车辆中座椅传感器的传感数据,确定座椅处于使用状态的情况下,则对座椅上的用户进行面部检测,得到面部检测结果;工作模式确定单元,设置为根据面部检测结果和用户发出的指示信息,确定车辆的目标工作模式。The target working mode determination module 410 includes: a face detection unit, configured to detect the face of the user on the seat when it is determined that the seat is in use according to the sensing data of the seat sensor in the vehicle, and obtain the face detection Result: The working mode determination unit is configured to determine the target working mode of the vehicle based on the face detection results and the instruction information issued by the user.
示例性的,工作模式确定单元是设置为:根据面部检测结果,验证用户的身份信息;在对身份信息验证通过的情况下,获取用户发出的指示信息;根据指示信息,从设定工作模式中确定目标工作模式。Exemplarily, the working mode determination unit is configured to: verify the user's identity information based on the face detection results; obtain the instruction information issued by the user when the identity information is verified; and obtain the instruction information from the set working mode based on the instruction information. Determine the target working mode.
示例性的,参数确定模块420是设置为:在确定所述目标工作模式为迎宾模式的情况下,确定所述目标形象参数为默认形象参数,所述目标动作参数包括迎宾出场动作参数和展示动作参数,目标语音参数包括迎宾语音参数和提示车辆启动语音参数;在确定所述目标工作模式为健康检测模式的情况下,确定所述目标形象参数为医生形象参数,目标动作参数包括检测过程提示动作参数,目标语音参数包括检测过程提示语音参数。Exemplarily, the parameter determination module 420 is configured to: when it is determined that the target working mode is the welcome mode, determine the target image parameters as the default image parameters, and the target action parameters include the welcome appearance action parameters and Display action parameters, and the target voice parameters include welcome voice parameters and vehicle startup voice parameters; when it is determined that the target working mode is the health detection mode, the target image parameters are determined to be doctor image parameters, and the target action parameters include detection The process prompts action parameters, and the target voice parameters include the detection process prompt voice parameters.
示例性的,动态智能形象生成模块430包括:初始智能形象生成单元,设置为根据目标形象参数、目标动作参数、目标语音参数,生成初始智能形象;动 态智能形象确定单元,设置为根据目标场景,调整初始智能形象,得到动态智能形象。Exemplarily, the dynamic intelligent image generation module 430 includes: an initial intelligent image generation unit, configured to generate an initial intelligent image based on target image parameters, target action parameters, and target voice parameters; The dynamic intelligent image determination unit is set to adjust the initial intelligent image according to the target scene to obtain a dynamic intelligent image.
示例性的,动态智能形象确定单元是设置为:根据目标场景,确定颜色信息和/或情绪信息;根据颜色信息和/或情绪信息,调整初始智能形象,得到动态智能形象。Exemplarily, the dynamic intelligent image determination unit is configured to: determine color information and/or emotional information according to the target scene; adjust the initial intelligent image according to the color information and/or emotional information to obtain a dynamic intelligent image.
示例性的,参数确定模块420,是设置为:根据用户的选择操作,从可选场景中确定目标场景;在确定所述目标场景为元宇宙场景的情况下,确定所述目标形象参数为元宇宙形象参数,目标动作参数为元宇宙场景关联的动作参数,目标语音参数为元宇宙场景关联的语音参数。Exemplarily, the parameter determination module 420 is configured to: determine the target scene from the optional scenes according to the user's selection operation; when it is determined that the target scene is a metaverse scene, determine that the target image parameter is a metaverse scene. The universe image parameters, the target action parameters are the action parameters associated with the Metaverse scene, and the target voice parameters are the voice parameters associated with the Metaverse scene.
相应的,交互模块440是设置为:在确定用户处于元宇宙场景的情况下,确定动态智能形象的工作模式为跟随模式;控制动态智能形象在跟随模式下,与用户进行语音和/或动作交互。Correspondingly, the interaction module 440 is configured to: when it is determined that the user is in the metaverse scene, determine the working mode of the dynamic intelligent image to be the follow mode; control the dynamic intelligent image to perform voice and/or action interaction with the user in the follow mode .
本实施例提供的智能语音交互装置可适用于上述任意实施例提供的智能语音交互方法,具备相应的功能和效果。The intelligent voice interaction device provided in this embodiment can be applied to the intelligent voice interaction method provided in any of the above embodiments, and has corresponding functions and effects.
实施例五Embodiment 5
图5示出了可以用来实施本申请的实施例的电子设备10的结构示意图。电子设备旨在表示多种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示多种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备(如头盔、眼镜、手表等)和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。FIG. 5 shows a schematic structural diagram of an electronic device 10 that can be used to implement embodiments of the present application. Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (eg, helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit the implementation of the present application as described and/or claimed herein.
如图5所示,电子设备10包括至少一个处理器11,以及与至少一个处理器11通信连接的存储器,如只读存储器(Read-Only Memory,ROM)12、随机访问存储器(Random Access Memory,RAM)13等,其中,存储器存储有可被至少一个处理器执行的计算机程序,处理器11可以根据存储ROM12中的计算机程序或者从存储单元18加载到随机访问存储器RAM13中的计算机程序,来执行多种适当的动作和处理。在RAM 13中,还可存储电子设备10操作所需的多种程序和 数据。处理器11、ROM 12以及RAM 13通过总线14彼此相连。输入/输出(Input/Output,I/O)接口15也连接至总线14。As shown in Figure 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a read-only memory (Read-Only Memory, ROM) 12, a random access memory (Random Access Memory, RAM) 13 and so on, wherein the memory stores a computer program that can be executed by at least one processor. The processor 11 can execute according to the computer program stored in the ROM 12 or the computer program loaded from the storage unit 18 into the random access memory RAM 13. A variety of appropriate actions and treatments. In the RAM 13, various programs and programs required for the operation of the electronic device 10 can also be stored. data. The processor 11, the ROM 12 and the RAM 13 are connected to each other via the bus 14. An input/output (I/O) interface 15 is also connected to the bus 14 .
电子设备10中的多个部件连接至I/O接口15,包括:输入单元16,例如键盘、鼠标等;输出单元17,例如多种类型的显示器、扬声器等;存储单元18,例如磁盘、光盘等;以及通信单元19,例如网卡、调制解调器、无线通信收发机等。通信单元19允许电子设备10通过诸如因特网的计算机网络和/或多种电信网络与其他设备交换信息/数据。Multiple components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a magnetic disk, an optical disk, etc. etc.; and communication unit 19, such as network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.
处理器11可以是多种具有处理和计算能力的通用和/或专用处理组件。处理器11的一些示例包括但不限于中央处理单元(Central Processing Unit,CPU)、图形处理单元(Graphics Processing Unit,GPU)、多种专用的人工智能(Artificial Intelligence,AI)计算芯片、多种运行机器学习模型算法的处理器、数字信号处理器(Digital Signal Processor,DSP)、以及任何适当的处理器、控制器、微控制器等。处理器11执行上文所描述的方法和处理,例如智能语音交互方法。Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the processor 11 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, a variety of running Machine learning model algorithm processor, digital signal processor (Digital Signal Processor, DSP), and any appropriate processor, controller, microcontroller, etc. The processor 11 executes the methods and processes described above, such as the intelligent voice interaction method.
在一些实施例中,智能语音交互方法可被实现为计算机程序,其被有形地包含于计算机可读存储介质,例如存储单元18。在一些实施例中,计算机程序的部分或者全部可以经由ROM 12和/或通信单元19而被载入和/或安装到电子设备10上。当计算机程序加载到RAM 13并由处理器11执行时,可以执行上文描述的智能语音交互方法的一个或多个步骤。备选地,在其他实施例中,处理器11可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行智能语音交互方法。In some embodiments, the intelligent voice interaction method may be implemented as a computer program, which is tangibly included in a computer-readable storage medium, such as the storage unit 18 . In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the intelligent voice interaction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the intelligent voice interaction method in any other suitable manner (eg, by means of firmware).
本文中以上描述的系统和技术的多种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、芯片上系统的系统(System On Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、 该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Parts (ASSP), System On Chip (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or they implemented in a combination. These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
用于实施本申请的方法的计算机程序可以采用一个或多个编程语言的任何组合来编写。这些计算机程序可以提供给通用计算机、专用计算机或其他可编程智能语音交互装置的处理器,使得计算机程序当由处理器执行时使流程图和/或框图中所规定的功能/操作被实施。计算机程序可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Computer programs for implementing the methods of the present application may be written in any combination of one or more programming languages. These computer programs can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable intelligent voice interaction device, so that when the computer program is executed by the processor, the functions/operations specified in the flowchart and/or block diagram are implemented. A computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本申请的上下文中,计算机可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的计算机程序。计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。备选地,计算机可读存储介质可以是机器可读信号介质。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、可擦除可编程只读存储器,(Erasable Programmable Read Only Memory,EPROM)或快闪存储器、光纤、便捷式紧凑盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this application, a computer-readable storage medium may be a tangible medium that may contain or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. Alternatively, the computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read Only Memory (EPROM), or Flash memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
为了提供与用户的交互,可以在电子设备上实施此处描述的系统和技术,该电子设备具有:用于向用户显示信息的显示装置(例如,阴极射线管(Cathode Ray Tube,CRT)或者液晶显示器(Liquid Crystal Display,LCD)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给电子设备。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on an electronic device having a display device (e.g., a cathode ray tube (CRT) or liquid crystal) for displaying information to the user. A display (Liquid Crystal Display, LCD monitor); and a keyboard and pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部 件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(Local Area Network,LAN)、广域网(Wide Area Network,WAN)、区块链网络和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or front end any combination of components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), blockchain network, and the Internet.
计算系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与虚拟专用服务器(Virtual Private Server,VPS)中,存在的管理难度大,业务扩展性弱的缺陷。Computing systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the management problems that exist in traditional physical hosts and virtual private servers (VPS). It has the disadvantages of high difficulty and weak business scalability.
应该理解,可以使用上面所示的多种形式的流程,重新排序、增加或删除步骤。例如,本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请的技术方案所期望的结果,本文在此不进行限制。It should be understood that various forms of the process shown above may be used, with steps reordered, added or deleted. For example, each step described in this application can be executed in parallel, sequentially, or in a different order. As long as the desired results of the technical solution of this application can be achieved, there is no limitation here.
上述具体实施方式,并不构成对本申请保护范围的限制。 The above-mentioned specific embodiments do not constitute a limitation on the scope of protection of the present application.

Claims (10)

  1. 一种智能语音交互方法,包括:An intelligent voice interaction method, including:
    根据车辆中传感器检测的传感数据,确定所述车辆的目标工作模式;Determine the target operating mode of the vehicle based on the sensing data detected by the sensors in the vehicle;
    根据所述目标工作模式和目标场景中的至少之一,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数;Determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to at least one of the target working mode and the target scenario;
    根据所述目标形象参数、所述目标动作参数和所述目标语音参数,生成动态智能形象;Generate a dynamic intelligent image according to the target image parameters, the target action parameters and the target voice parameters;
    基于所述动态智能形象,与用户进行交互。Based on the dynamic intelligent image, interact with the user.
  2. 根据权利要求1所述的方法,其中,所述根据车辆中传感器检测的传感数据,确定所述车辆的目标工作模式,包括:The method of claim 1, wherein determining the target operating mode of the vehicle based on sensing data detected by sensors in the vehicle includes:
    在根据车辆中座椅传感器的传感数据,确定座椅处于使用状态的情况下,则对所述座椅上的用户进行面部检测,得到面部检测结果;When it is determined that the seat is in use according to the sensing data of the seat sensor in the vehicle, face detection is performed on the user on the seat to obtain the face detection result;
    根据所述面部检测结果和所述用户发出的指示信息,确定所述车辆的目标工作模式。The target operating mode of the vehicle is determined based on the face detection result and the instruction information issued by the user.
  3. 根据权利要求2所述的方法,其中,所述根据面部检测结果和所述用户发出的指示信息,确定所述车辆的目标工作模式,包括:The method according to claim 2, wherein determining the target operating mode of the vehicle based on the face detection result and the instruction information issued by the user includes:
    根据面部检测结果,验证所述用户的身份信息;Verify the identity information of the user based on the facial detection results;
    在对所述身份信息验证通过的情况下,获取所述用户发出的指示信息;If the identity information is successfully verified, obtain the instruction information issued by the user;
    根据所述指示信息,从设定工作模式中确定目标工作模式。According to the instruction information, the target operating mode is determined from the set operating modes.
  4. 根据权利要求1所述的方法,其中,所述根据所述目标工作模式,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数,包括:The method according to claim 1, wherein determining the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target working mode includes:
    在确定所述目标工作模式为迎宾模式的情况下,确定所述目标形象参数为默认形象参数,所述目标动作参数包括迎宾出场动作参数和展示动作参数,所 述目标语音参数包括迎宾语音参数和提示车辆启动语音参数。When it is determined that the target working mode is the welcoming mode, the target image parameters are determined to be default image parameters, and the target action parameters include the welcoming appearance action parameters and the display action parameters, so The target voice parameters include welcome voice parameters and vehicle start voice parameters.
    在确定所述目标工作模式为健康检测模式的情况下,确定所述目标形象参数为医生形象参数,目标动作参数包括检测过程提示动作参数,目标语音参数包括检测过程提示语音参数。When it is determined that the target working mode is a health detection mode, the target image parameters are determined to be doctor image parameters, the target action parameters include detection process prompt action parameters, and the target voice parameters include detection process prompt voice parameters.
  5. 根据权利要求1所述的方法,其中,所述根据所述目标形象参数、所述目标动作参数和所述目标语音参数,生成动态智能形象,包括:The method according to claim 1, wherein generating a dynamic intelligent image according to the target image parameters, the target action parameters and the target voice parameters includes:
    根据所述目标形象参数、所述目标动作参数、所述目标语音参数,生成初始智能形象;Generate an initial intelligent image according to the target image parameters, the target action parameters, and the target voice parameters;
    根据目标场景,调整所述初始智能形象,得到动态智能形象。According to the target scene, the initial intelligent image is adjusted to obtain a dynamic intelligent image.
  6. 根据权利要求5所述的方法,其中,所述根据目标场景,调整所述初始智能形象,得到动态智能形象,包括:The method according to claim 5, wherein adjusting the initial intelligent image according to the target scene to obtain a dynamic intelligent image includes:
    根据所述目标场景,确定颜色信息和情绪信息中的至少之一;Determine at least one of color information and emotional information according to the target scene;
    根据所述颜色信息和所述情绪信息中的至少之一,调整所述初始智能形象,得到动态智能形象。According to at least one of the color information and the emotion information, the initial intelligent image is adjusted to obtain a dynamic intelligent image.
  7. 根据权利要求1所述的方法,所述根据目标场景,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数,包括:The method according to claim 1, wherein determining the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant according to the target scenario includes:
    根据用户的选择操作,从可选场景中确定目标场景;According to the user's selection operation, the target scene is determined from the optional scenes;
    在确定所述目标场景为元宇宙场景的情况下,确定所述目标形象参数为元宇宙形象参数,所述目标动作参数为与元宇宙场景关联的动作参数,所述目标语音参数为与元宇宙场景关联的语音参数;When it is determined that the target scene is a metaverse scene, it is determined that the target image parameter is a metaverse image parameter, the target action parameter is an action parameter associated with the metaverse scene, and the target voice parameter is an action parameter associated with the metaverse scene. Scene-related voice parameters;
    相应的,基于所述动态智能形象,与用户进行交互,包括:Correspondingly, based on the dynamic intelligent image, interacting with the user includes:
    在确定用户处于元宇宙场景的情况下,确定所述动态智能形象的工作模式为跟随模式; When it is determined that the user is in a metaverse scene, it is determined that the working mode of the dynamic intelligent image is the follow mode;
    控制所述动态智能形象在跟随模式下,与所述用户进行语音和/或动作交互。Control the dynamic intelligent image to perform voice and/or action interaction with the user in the follow mode.
  8. 一种智能语音交互装置,包括:An intelligent voice interaction device, including:
    目标工作模式确定模块,设置为根据车辆中传感器检测的传感数据,确定所述车辆的目标工作模式;A target working mode determination module configured to determine the target working mode of the vehicle based on the sensing data detected by the sensors in the vehicle;
    参数确定模块,设置为根据所述目标工作模式和目标场景中的至少之一,确定智能语音助手的目标形象参数、目标动作参数和目标语音参数;A parameter determination module configured to determine the target image parameters, target action parameters and target voice parameters of the intelligent voice assistant based on at least one of the target working mode and the target scenario;
    动态智能形象生成模块,设置为根据所述目标形象参数、所述目标动作参数和所述目标语音参数,生成动态智能形象;A dynamic intelligent image generation module configured to generate a dynamic intelligent image based on the target image parameters, the target action parameters and the target voice parameters;
    交互模块,设置为基于所述动态智能形象,与用户进行交互。The interaction module is configured to interact with the user based on the dynamic intelligent image.
  9. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述的智能语音交互方法。The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor, so that the at least one processor can execute any one of claims 1-7 The intelligent voice interaction method.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使处理器执行时实现权利要求1-7中任一项所述的智能语音交互方法。 A computer-readable storage medium stores computer instructions, and the computer instructions are used to implement the intelligent voice interaction method described in any one of claims 1-7 when executed by a processor.
PCT/CN2023/104740 2022-07-26 2023-06-30 Intelligent voice interaction method and apparatus, device, and storage medium WO2024022027A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210883395.9A CN115273865A (en) 2022-07-26 2022-07-26 Intelligent voice interaction method, device, equipment and storage medium
CN202210883395.9 2022-07-26

Publications (1)

Publication Number Publication Date
WO2024022027A1 true WO2024022027A1 (en) 2024-02-01

Family

ID=83768855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104740 WO2024022027A1 (en) 2022-07-26 2023-06-30 Intelligent voice interaction method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN115273865A (en)
WO (1) WO2024022027A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273865A (en) * 2022-07-26 2022-11-01 中国第一汽车股份有限公司 Intelligent voice interaction method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200211553A1 (en) * 2018-12-28 2020-07-02 Harman International Industries, Incorporated Two-way in-vehicle virtual personal assistant
CN112959998A (en) * 2021-03-19 2021-06-15 恒大新能源汽车投资控股集团有限公司 Vehicle-mounted human-computer interaction method and device, vehicle and electronic equipment
CN114327041A (en) * 2021-11-26 2022-04-12 北京百度网讯科技有限公司 Multi-mode interaction method and system for intelligent cabin and intelligent cabin with multi-mode interaction method and system
CN115273865A (en) * 2022-07-26 2022-11-01 中国第一汽车股份有限公司 Intelligent voice interaction method, device, equipment and storage medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110871684A (en) * 2018-09-04 2020-03-10 比亚迪股份有限公司 In-vehicle projection method, device, equipment and storage medium
CN110641476A (en) * 2019-08-16 2020-01-03 广汽蔚来新能源汽车科技有限公司 Interaction method and device based on vehicle-mounted robot, controller and storage medium
CN110929078A (en) * 2019-11-08 2020-03-27 中国第一汽车股份有限公司 Automobile voice image reloading method, device, equipment and storage medium
CN111137296A (en) * 2019-12-24 2020-05-12 吉利汽车研究院(宁波)有限公司 Method and device for adjusting vehicle equipment parameters based on vehicle face recognition
CN111429907B (en) * 2020-03-25 2023-10-20 北京百度网讯科技有限公司 Voice service mode switching method, device, equipment and storage medium
CN111487890A (en) * 2020-05-13 2020-08-04 佛山市科智美家具有限公司 Intelligent seat interaction method and system
CN113910990A (en) * 2020-07-08 2022-01-11 佛吉亚歌乐电子(厦门)有限公司 Method and device for adjusting adjustable device of vehicle
CN112017667B (en) * 2020-09-04 2024-03-15 华人运通(上海)云计算科技有限公司 Voice interaction method, vehicle and computer storage medium
CN112090053A (en) * 2020-09-14 2020-12-18 成都拟合未来科技有限公司 3D interactive fitness training method, device, equipment and medium
CN113112243A (en) * 2021-04-28 2021-07-13 南京交通职业技术学院 Automobile identity recognition device and data processing and communication method
CN113409785A (en) * 2021-06-30 2021-09-17 中国第一汽车股份有限公司 Vehicle-based voice interaction method and device, vehicle and storage medium
CN113641442A (en) * 2021-08-31 2021-11-12 京东方科技集团股份有限公司 Interaction method, electronic device and storage medium
CN114124528B (en) * 2021-09-17 2024-01-23 珠海极海半导体有限公司 Wireless MCU and vehicle configuration system
CN114103873B (en) * 2021-11-18 2023-03-28 华人运通(江苏)技术有限公司 Control method and device for vehicle center console and vehicle
CN114697755A (en) * 2022-03-31 2022-07-01 北京百度网讯科技有限公司 Virtual scene information interaction method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200211553A1 (en) * 2018-12-28 2020-07-02 Harman International Industries, Incorporated Two-way in-vehicle virtual personal assistant
CN112959998A (en) * 2021-03-19 2021-06-15 恒大新能源汽车投资控股集团有限公司 Vehicle-mounted human-computer interaction method and device, vehicle and electronic equipment
CN114327041A (en) * 2021-11-26 2022-04-12 北京百度网讯科技有限公司 Multi-mode interaction method and system for intelligent cabin and intelligent cabin with multi-mode interaction method and system
CN115273865A (en) * 2022-07-26 2022-11-01 中国第一汽车股份有限公司 Intelligent voice interaction method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115273865A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
US11398067B2 (en) Virtual reality presentation of body postures of avatars
WO2021043053A1 (en) Animation image driving method based on artificial intelligence, and related device
US11100694B2 (en) Virtual reality presentation of eye movement and eye contact
US11435980B2 (en) System for processing user utterance and controlling method thereof
US7065711B2 (en) Information processing device and method, and recording medium
EP2959474B1 (en) Hybrid performance scaling for speech recognition
CN103190124B (en) The situation chat of Behavior-based control and utilization
WO2024022027A1 (en) Intelligent voice interaction method and apparatus, device, and storage medium
US11398218B1 (en) Dynamic speech output configuration
CN107294837A (en) Engaged in the dialogue interactive method and system using virtual robot
DE112021001301T5 (en) DIALOGUE-BASED AI PLATFORM WITH RENDERED GRAPHIC OUTPUT
CN110059241A (en) Information query method and device, computer readable storage medium, electronic equipment
KR20190009101A (en) Method for operating speech recognition service, electronic device and server supporting the same
CN107924482A (en) Emotional control system, system and program
CN110618757A (en) Online teaching control method and device and electronic equipment
CN111063024A (en) Three-dimensional virtual human driving method and device, electronic equipment and storage medium
KR20220130000A (en) Ai avatar-based interaction service method and apparatus
US11562271B2 (en) Control method, terminal, and system using environmental feature data and biological feature data to display a current movement picture
KR20200057501A (en) ELECTRONIC APPARATUS AND WiFi CONNECTING METHOD THEREOF
US7818374B2 (en) Effective communication in virtual worlds
CN114189731B (en) Feedback method, device, equipment and storage medium after giving virtual gift
JP2003108502A (en) Physical media communication system
CN110152292A (en) Display control method and device, the storage medium and electronic equipment of rising space in game
JP2023120130A (en) Conversation-type ai platform using extraction question response
WO2021217527A1 (en) In-vehicle voice interaction method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845240

Country of ref document: EP

Kind code of ref document: A1