WO2023019475A1 - Procédé et appareil d'affichage d'un assistant personnel virtuel, dispositif, support et produit - Google Patents

Procédé et appareil d'affichage d'un assistant personnel virtuel, dispositif, support et produit Download PDF

Info

Publication number
WO2023019475A1
WO2023019475A1 PCT/CN2021/113299 CN2021113299W WO2023019475A1 WO 2023019475 A1 WO2023019475 A1 WO 2023019475A1 CN 2021113299 W CN2021113299 W CN 2021113299W WO 2023019475 A1 WO2023019475 A1 WO 2023019475A1
Authority
WO
WIPO (PCT)
Prior art keywords
vpa
vehicle
information
display
interaction instruction
Prior art date
Application number
PCT/CN2021/113299
Other languages
English (en)
Chinese (zh)
Inventor
陈真
Original Assignee
阿波罗智联(北京)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿波罗智联(北京)科技有限公司 filed Critical 阿波罗智联(北京)科技有限公司
Priority to PCT/CN2021/113299 priority Critical patent/WO2023019475A1/fr
Publication of WO2023019475A1 publication Critical patent/WO2023019475A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present disclosure relates to the field of data processing technology, especially to the field of vehicle network technology, and in particular to a display method, device, device, medium and product of a virtual personal assistant.
  • VPA Virtual Personal Assistant
  • the present disclosure provides a display method, device, equipment, medium and product of a virtual personal assistant.
  • a method for displaying a virtual personal assistant including:
  • the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;
  • the VPA display information including the VPA image
  • the display information of the VPA is output through the display screen in the vehicle.
  • the output of the VPA display information through the display screen in the vehicle includes:
  • the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
  • the manner of determining the target interaction instruction includes:
  • the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.
  • the acquisition of the display information of the virtual personal assistant VPA that matches the target attribute information includes:
  • the VPA display information with the highest display frequency is selected.
  • the acquisition of the display information of the virtual personal assistant VPA that matches the target attribute information includes:
  • the VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.
  • the method before identifying the identity attribute information of the passenger based on the sound features represented by the target interaction instruction, the method further includes:
  • the VPA display information also includes a first guide; after the VPA display information is output through the display screen in the vehicle, it also includes:
  • the multimedia information is the information displayed to passengers by the car machine of the vehicle through the display screen after outputting the first guide language through the display screen in the vehicle ;
  • the second guide is output through a display screen in the vehicle.
  • identifying the identity attribute information of the passenger based on the sound features represented by the target interaction instruction includes:
  • a virtual personal assistant display system including: a server and a vehicle-mounted terminal of a vehicle;
  • the vehicle-mounted terminal is used to obtain a target interaction instruction, determine the sound characteristics represented by the target interaction instruction, and send the sound characteristics to the server, and the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;
  • the server is configured to receive the sound feature represented by the target interaction instruction, identify the identity attribute information of the passenger based on the sound feature represented by the target interaction instruction, and use the identity attribute information of the passenger as the target attribute Information, obtaining the virtual personal assistant VPA display information matched with the target attribute information, the VPA display information including the VPA image, and sending the VPA display information to the vehicle terminal;
  • the vehicle-mounted terminal is further configured to receive the VPA display information, and output the VPA display information through a display screen in the vehicle.
  • the vehicle-mounted terminal is specifically used for:
  • the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
  • the vehicle-mounted terminal is also used for:
  • the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.
  • the vehicle-mounted terminal is also used for:
  • the VPA display information with the highest display frequency is selected.
  • the server is specifically used for:
  • the VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.
  • the server is also used for:
  • the VPA display information also includes a first guide
  • the server is further configured to obtain display status information for multimedia information after outputting the VPA display information through the display screen in the vehicle, and the multimedia information is to pass through the display screen in the vehicle, and to After the first guide is output, the vehicle machine of the vehicle displays the information to passengers through the display screen;
  • the server is further configured to determine a second guide that matches the user behavior data of the passenger and send the second guide to the vehicle-mounted terminal if the display state information indicates that the multimedia information display is completed;
  • the vehicle-mounted terminal is further configured to receive the second guide, and output the second guide through a display screen in the vehicle.
  • the server is specifically used for:
  • a virtual personal assistant display device including:
  • An acquisition module configured to acquire the sound features represented by the target interaction instruction; wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;
  • An identification module configured to identify the passenger's identity attribute information based on the sound features represented by the target interaction instruction, and use the passenger's identity attribute information as the target attribute information;
  • the obtaining module is also used to obtain the virtual personal assistant VPA display information matching the target attribute information, and the VPA display information includes a VPA image;
  • the output module is used to output the display information of the VPA through the display screen in the vehicle.
  • the output module is specifically used for:
  • the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
  • the device further includes a determination module, the determination module is configured to:
  • the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.
  • the acquiring module is specifically used for:
  • the VPA display information with the highest display frequency is selected.
  • the acquiring module is specifically used for:
  • the VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.
  • the device may also include a search module and an execution module;
  • the search module is used to search for the information related to the target interaction instruction from the corresponding relationship between each sound feature and user behavior data before identifying the identity attribute information of the passenger based on the sound characteristics represented by the target interaction instruction.
  • User behavior data corresponding to the represented sound features
  • the execution module is used to determine the VPA display information matching the found user behavior data if found, and execute the step of outputting the VPA display information through the display screen in the vehicle;
  • the execution module is further configured to execute the step of identifying the identity attribute information of the passenger based on the voice features represented by the target interaction instruction if not found.
  • the VPA presentation information also includes a first guide; the device may also include: a determination module;
  • the acquisition module is also used to acquire display status information for multimedia information after outputting the VPA display information through the display screen in the vehicle. After the first guide is output, the vehicle machine of the vehicle displays the information to passengers through the display screen;
  • a determining module configured to determine a second guide that matches the user behavior data of the passenger if the display state information indicates that the display of the multimedia information is completed;
  • the output module is further configured to output the second guide language through the display screen in the vehicle.
  • the identification module is specifically used for:
  • an electronic device including:
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect above.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect above.
  • a computer program product including a computer program, which implements the method described in the first aspect when executed by a processor.
  • the embodiment of the present disclosure can obtain the sound features represented by the target interaction instruction, and then identify the passenger's identity attribute information based on the sound feature, and use the passenger's identity attribute information as the target attribute information, and output the information corresponding to the target attribute information through the display screen in the vehicle.
  • Matched VPA display information where the VPA display information includes a VPA image. It can be seen that the embodiment of the present disclosure determines the VPA image displayed for the passenger based on the identity attribute information of the passenger, so different VPA images can be displayed to different passengers.
  • FIG. 1 is a flowchart of a display method of a virtual personal assistant provided by an embodiment of the present disclosure
  • Fig. 2 is a flow chart of another display method of a virtual personal assistant provided by an embodiment of the present disclosure
  • Fig. 3 is a flow chart of a method for outputting a second guide provided by an embodiment of the present disclosure
  • Fig. 4 is a signaling diagram of a presentation process of a virtual personal assistant provided by an embodiment of the present disclosure
  • Fig. 5 is an exemplary schematic diagram of a presentation process of a virtual personal assistant provided by an embodiment of the present disclosure
  • Fig. 6 is a schematic structural diagram of a display system of a virtual personal assistant provided by an embodiment of the present disclosure
  • Fig. 7 is a schematic structural diagram of a virtual personal assistant display device provided by an embodiment of the present disclosure.
  • FIG. 8 is a block diagram of an electronic device used to implement the presentation method of a virtual personal assistant according to an embodiment of the present disclosure.
  • VPA Virtual Personal Assistant
  • VPA image displayed on the car screen is fixed, so the display method is single and the flexibility is poor.
  • the embodiment of the present disclosure provides a method for displaying a virtual personal assistant, which is applied to an electronic device.
  • the electronic device may be a vehicle-mounted terminal in the vehicle, or a server corresponding to the vehicle-mounted terminal, which is reasonable.
  • the vehicle-mounted terminal may be a vehicle-mounted brain, or a vehicle super-brain, or a driving brain; IVI), vehicle-mounted infotainment host (Display Head Unit, DHU) or any other type of vehicle-mounted terminal, which is not limited in this embodiment of the present disclosure.
  • IVI vehicle-mounted infotainment host
  • DHU Display Head Unit
  • the method includes the following steps:
  • the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle.
  • the sending out of the target interaction instruction means that the passenger and the vehicle-mounted terminal start to conduct voice interaction.
  • a predetermined wake-up word spoken by a passenger may be a target interaction instruction.
  • the embodiment of the present disclosure does not limit the specific content of the predetermined wake-up word.
  • the vehicle-mounted terminal can directly collect the target interaction instruction through the sound collection device in the vehicle when executing S101, and extract the sound features represented by the target interaction instruction.
  • the server may receive the voice feature represented by the target interaction instruction sent by the vehicle-mounted terminal when executing S101.
  • the vehicle-mounted terminal collects the target interaction command, it can extract the sound features represented by the target interaction command, and then send the sound features represented by the target interaction command to the server, so that the server can obtain the sound features represented by the target interaction command. characterizing sound characteristics.
  • voice features represented by the target interaction instruction may also be called voiceprint features, and the voice features include formants.
  • voiceprint features may also be called voiceprint features
  • voice features include formants.
  • the present disclosure does not limit the manner of extracting voice features, and any implementation manner capable of extracting voice features from voice instructions can be applied to the present disclosure.
  • the collection, storage, use, processing, transmission, provision, and disclosure of passengers' voice interaction instructions involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.
  • the passenger's identity attribute information is used to represent the passenger's natural attributes.
  • the identity attribute information may include gender and/or age.
  • the virtual personal assistant image displayed to the passenger is determined by gender and/or age.
  • the electronic device may input the sound features represented by the target interaction instruction into a pre-trained neural network model, and then obtain identity attribute information output by the neural network model.
  • the neural network model is trained through a training set, and the training set includes a plurality of sample sound features and a training label corresponding to each sample sound feature.
  • the training label includes the identity attribute information of the person to whom the sample voice feature belongs.
  • the embodiment of the present disclosure utilizes a neural network model to identify identity attribute information of passengers through sound features, so that the judgment of identity attribute information of passengers is more accurate and robust.
  • the identity attribute information of passengers can also be identified through a specified voice recognition algorithm information.
  • the collection, storage, use, processing, transmission, provision, and disclosure of passenger target attribute information and training sets are all in compliance with relevant laws and regulations, and do not violate public order and good customs.
  • VPA display information matching the target attribute information.
  • the VPA display information includes a VPA image.
  • the VPA image is used to represent the appearance of the VPA.
  • the VPA image may include an image composed of the VPA's facial features, figure, hairstyle, and clothing.
  • the VPA display information may also include: the first guide language and/or the pronunciation from text to speech (Text To Speech, TTS).
  • TTS Text To Speech
  • the electronic device may display the first guide language while displaying the image of the VPA.
  • the first guide is used to guide passengers to interact with the vehicle terminal.
  • the first guiding language includes: "Try to start the unmanned driving mode” and "Let's watch a cartoon together”.
  • the TTS pronunciation is the pronunciation upon which the first lead language is played.
  • the TTS pronunciation includes any of an elderly voice, an adult male voice, an adult female voice, a child voice, and the like.
  • the corresponding VPA display information when the target attribute information includes 10 years old and male, the corresponding VPA display information includes: cartoon images, children's voices and children's guide words; when the target attribute information includes 20 years old and male, the corresponding VPA display information includes: Male image, adult male voice and male guide; target attribute information includes 20-year-old and female, and its corresponding VPA display information includes: female image, adult female voice and female guide; target attribute information includes 60-year-old and male, The corresponding VPA display information includes: images of old people, voices of old people and guiding words for old people.
  • the vehicle-mounted terminal when executing S104, can directly output the display information of the VPA through the display screen in the vehicle, that is, display the image of the VPA that matches the passenger.
  • the server can send VPA display information to the vehicle-mounted terminal when executing S104, so that the vehicle-mounted terminal can output the VPA display information through the display screen in the vehicle after receiving the VPA display information. Match the VPA image.
  • the vehicle has only one display screen, that is, the display screen at the driver's seat, then, in one implementation, no matter which seat the passenger who issued the target interaction instruction belongs to, the information displayed by the VPA will be displayed through the driver's seat. Display at the seat for output.
  • the embodiment of the present disclosure can obtain the sound features represented by the target interaction instruction, and then identify the passenger's identity attribute information based on the sound feature, and use the passenger's identity attribute information as the target attribute information, and output the information corresponding to the target attribute information through the display screen in the vehicle.
  • Matched VPA display information where the VPA display information includes a VPA image. It can be seen that the embodiment of the present disclosure determines the VPA image displayed for the passenger based on the identity attribute information of the passenger, so different VPA images can be displayed to different passengers.
  • the embodiments of the present disclosure can display different VPA images to different passengers, the embodiments of the present disclosure can more flexibly display the VPA images and at the same time meet the personalized needs of users.
  • the method for determining the target interaction instruction in S101 includes: if multiple voice interaction instructions are collected in the vehicle at the same time, Based on the priority of each voice interaction instruction, the instruction with the highest priority is selected as the target interaction instruction. Wherein, the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.
  • the electronic device when multiple voice interaction commands are collected in the vehicle at the same time, the electronic device can select the command with the highest priority as the target interaction command based on the priority of each voice interaction command ; Furthermore, the sound characteristics of the target interactive instruction are obtained.
  • the vehicle-mounted terminal selects the command with the highest priority as the target interaction command based on the priority of each voice interaction command; The voice feature of the command, and send the voice feature of the target interaction command to the server, so that the voice feature acquired by the server is the voice feature of the voice interaction command with the highest priority.
  • multiple sound collection devices such as in-vehicle microphones, may be installed in the vehicle where the vehicle-mounted terminal is located. Different sound collection devices can collect the voices of passengers in different seating areas.
  • the priorities of the seat areas are in order from high to low: the main driving area, the co-pilot area, and the rear row area.
  • the vehicle-mounted terminal simultaneously collects the interactive command 1 issued by the passenger in the main driving area and the interactive command 2 issued by the passenger in the co-pilot area. Since the priority of interactive command 1 is higher than that of interactive command 2, interactive command 1 is taken as the target Interactive instructions.
  • the source area of the voice interaction instruction that is, the identification manner of the seat area where the passenger who issued the voice interaction instruction is located.
  • multiple sound collection devices are installed at different positions in the vehicle, and each seat in the vehicle is set as an independent sound system. Area, by judging the source direction of the sound signal received by the sound collection equipment in multiple locations, determine which seat the sound signal is sent from, and then know the source area of the voice interaction command, that is, the passenger who issued the voice interaction command. Located in the seating area.
  • the embodiment of the present disclosure can select the command with the highest priority according to the priority of the voice interaction commands, so as to prioritize the interaction needs of the driver and improve the safety of vehicle driving.
  • the vehicle where the vehicle-mounted terminal is located may include multiple display screens.
  • the way of outputting the VPA display information through the display screens in the vehicle in S104 above can be implemented as: outputting the VPA display information through the display screens in the designated seat area in the vehicle .
  • the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
  • the electronic device can determine the designated seat area before outputting the VPA display information, and then output the VPA display information on the display screen of the designated seat area in the vehicle.
  • the electronic device can obtain the area identification of the source area of the target interaction instruction from the vehicle-mounted terminal while obtaining the sound feature represented by the target interaction instruction, that is, the area identification of the designated seat area, and then After the electronic device obtains the VPA display information, it can send the VPA display information and the area identification obtained before to the vehicle-mounted terminal, so that the vehicle-mounted terminal can output the VPA display information on the display screen of the designated seat area in the vehicle;
  • the electronic device can also only feed back the VPA display information to the vehicle-mounted terminal, and the vehicle-mounted terminal outputs the VPA display information on the display screen in the designated seat area in the vehicle by itself, which is also reasonable.
  • the embodiments of the present disclosure can be applied to multi-screen interaction scenarios, so that for passengers in different seating areas, different VPA display information can be determined according to the identity attribute information of the passengers, so as to improve the use interest and interaction fun of passengers.
  • the embodiments of the present disclosure can be applied not only to single-screen interaction scenarios, but also to multi-screen interaction scenarios, so that the application range of the embodiments of the present disclosure is wider.
  • S102 before the above S102, it also includes: S105, from the corresponding relationship between each sound feature and user behavior data, find the information represented by the target interaction instruction. The user behavior data corresponding to the voice features. If found, determine the VPA display information matching the found user behavior data, and execute S104; if not found, execute S102.
  • the user behavior data is used to represent the user's historical operation behavior.
  • user behavior data includes: VPA display information set by the user history, videos watched by the user, news watched by the user and/or music played by the user, etc.
  • the electronic device may use the VPA display information set by the user history as the VPA display information matching the user behavior data.
  • the electronic device determines the preference type of the passenger according to the user behavior data, and then determines the VPA display information corresponding to the preference type.
  • the passenger's user behavior data includes: cartoons and nursery rhymes
  • the passenger's favorite type is animation
  • the VPA image and the first guide language corresponding to the animation are determined.
  • each VPA image corresponds to a preference type
  • each first guide language corresponds to a preference type.
  • the collection, storage, use, processing, transmission, provision, and disclosure of the passenger's user behavior data involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.
  • the embodiment of the present disclosure can determine the VPA display information according to the user behavior data. Since the user behavior data can better reflect the user's interest compared with the identity attribute information, the embodiment of the present disclosure determines according to the user behavior data when the user behavior data is collected in advance. The information displayed by the VPA can better meet the interests of users.
  • the above step of S103 acquiring VPA presentation information matching the target attribute information may include the following three implementations:
  • Method 1 Obtain the VPA display information matching the target attribute information from the preset corresponding relationship between each identity attribute information and the VPA display information.
  • the server may pre-collect user behavior data of passengers with different identity attribute information, and then for each passenger, based on the user behavior data of the passenger, determine the VPA display information that matches the user behavior data as the The VPA display information corresponding to the identity attribute information.
  • the pre-collected identity attribute information of passenger 1 includes age 10, and user behavior data includes cartoon A and cartoon B.
  • Passenger 2's identity attribute information includes age 5, and user behavior data includes cartoon C and cartoon B.
  • Passenger 3’s identity attribute information includes age 5, and user behavior data includes nursery rhyme A and cartoon A.
  • cartoon A, cartoon B, cartoon C and nursery rhyme A are all animation types, since passenger 1, passenger 2 and passenger 3 all belong to the children's age group, the children's age group is corresponding to the VPA image of the animation type.
  • the age of the passenger 4 is obtained later as 7 years old, since 7 years old belongs to the age group of children, the VPA image of the animation type is determined.
  • the embodiments of the present disclosure can collect user behavior data of various passengers in advance, so as to obtain the preferences of passengers of different ages and genders, and thus can determine the VPA display information that the passenger may like based on the identity attribute information of the passenger.
  • Method 2 Find the historical VPA display information corresponding to the sound feature represented by the target interactive command, and then select the VPA display information with the highest display frequency among the historical VPA display information.
  • the historical VPA display information is the VPA display information that has been displayed to the passenger who issued the target interaction instruction.
  • the vehicle-mounted terminal can record the voice characteristics of each passenger who has used the voice interaction function, as well as the corresponding historical VPA display information. Therefore, when the sound feature represented by the target interaction instruction is obtained, the historical VPA display information corresponding to the sound feature is searched, and then the VPA display information with the highest display frequency is selected from the historical VPA display information.
  • the VPA display information with the highest display frequency is most likely to be liked by passengers, so in the embodiment of the present disclosure, selecting the VPA display information with the highest display frequency can better meet user preferences.
  • Method 3 When the vehicle-mounted terminal communicates with the server, the vehicle-mounted terminal sends the voice feature to the server, and then the server determines the VPA display information through the method 1.
  • the vehicle-mounted terminal disconnects from the communication connection with the server, that is, when the vehicle-mounted terminal is offline, the vehicle-mounted terminal determines the VPA display information in the second manner.
  • the vehicle-mounted terminal may send the sound feature to the server to obtain the VPA display information, and execute the second method when the VPA display information is not obtained from the server.
  • the electronic device may further display a second guide for passengers, including the following steps:
  • the multimedia information is the information displayed by the vehicle machine to the passengers through the display screen after outputting the first guide language through the display screen in the vehicle.
  • the multimedia information includes: text, picture, audio and/or video, etc.
  • the multimedia information refers to the information displayed by the vehicle-mounted terminal during the interaction between passengers and the vehicle-mounted terminal.
  • Display status information includes: not displayed, completed display, partially displayed, etc.
  • the display status information includes the number of episodes of the series of cartoons currently being played. When the number of episodes is the last episode, it means that the series of cartoons is displayed. One episode, which means that the series of cartoons is partially shown.
  • the display status includes the serial number of the song currently being played.
  • the serial number is the last one, it means that the song display of the album is completed; The songs section of the album is shown.
  • the vehicle-machine system Since before S301, the vehicle-machine system has displayed multimedia information for the passenger, the passenger has corresponding user behavior data, and the second guide that matches the passenger's user behavior data can be determined.
  • the type of user preference is entertainment
  • the second guide language of entertainment is determined.
  • the vehicle-mounted terminal can directly output the second guide language through a display screen in the vehicle.
  • the server can send the second guide to the vehicle-machine system, so that the vehicle-mounted terminal can output the second guide through the display screen in the vehicle.
  • the embodiment of the present disclosure can obtain the passenger's user behavior data after the passenger has interacted with the vehicle-mounted terminal for a period of time, so as to recommend the second guide language that the passenger is more likely to like based on the user behavior data, so as to guide the user to further interact with the vehicle-mounted terminal. Interaction, thereby improving the interaction interest of passengers.
  • the vehicle-mounted terminal obtains a target interaction instruction issued by a passenger in a seat area, and obtains a sound feature represented by the target interaction instruction.
  • the car-machine terminal sends the sound feature to the server.
  • the server receives the voice features, and based on the voice features represented by the target interaction instruction, identifies the identity attribute information of the passenger, and takes the passenger's identity attribute information as the target attribute information.
  • the server obtains the VPA display information matching the target attribute information.
  • the VPA display information includes VPA image, first guide language and TTS pronunciation.
  • the server sends the VPA display information to the vehicle terminal.
  • the vehicle-mounted terminal receives the VPA display information, and outputs the VPA display information through the display screen in the seat area in the vehicle.
  • the vehicle-mounted terminal can collect the target interaction instruction issued by the passenger during the voice interaction process, and then obtain the sound characteristics of the target interaction instruction.
  • the age and gender of the passenger are recognized by the server using voice features, and the VPA display information matching the age and gender of the passenger is determined by the VPA generation switching system.
  • the VPA display information includes VPA image, first guide language and TTS pronunciation; each VPA information corresponds to a gender (male or female) and an age group (such as old, middle-aged, young or infant). According to the age and gender of the passengers, it is possible to determine the VPA display information that users of this age and gender may like.
  • the VPA is displayed as an image of the elderly in the vehicle display screen, and the first guide language of the elderly is played using the elderly TTS pronunciation.
  • the VPA is displayed as a middle-aged female image on the vehicle display screen, and the first guide language of the middle-aged female is played using the TTS pronunciation of the middle-aged female.
  • the VPA When the age of the passenger is middle-aged and the gender is male, the VPA is displayed as a middle-aged male image on the vehicle display screen, and the middle-aged male TTS pronunciation is used to play the first guide language of the middle-aged male.
  • the VPA When the age of the passenger is young and the gender is female, the VPA is displayed as a young female image on the vehicle display screen, and at the same time, the young female TTS pronunciation is used to play the first guide language of the young female.
  • the VPA When the age of the passenger belongs to youth and the gender is male, the VPA is displayed as a young male image on the vehicle display screen, and at the same time, the young male TTS pronunciation is used to play the first guiding language of the young male.
  • the VPA is displayed as an image of infants on the vehicle display screen, and at the same time, the infant's first guide language is played using infant TTS pronunciation.
  • the embodiment of the present disclosure also provides a display system of a virtual personal assistant, as shown in FIG. 6 , including: a server 601 and a vehicle-mounted terminal 602 of the vehicle;
  • the vehicle-mounted terminal 602 is used to obtain the target interaction instruction, and determine the sound characteristics represented by the target interaction instruction, and send the sound characteristics to the server, and the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;
  • the server 601 is configured to receive the sound feature represented by the target interaction instruction, identify the passenger's identity attribute information based on the sound feature represented by the target interaction instruction, use the passenger's identity attribute information as the target attribute information, and obtain the matching target attribute information
  • the virtual personal assistant VPA display information, the VPA display information includes the VPA image, and sends the VPA display information to the vehicle terminal;
  • the vehicle-mounted terminal 602 is also used to receive VPA display information, and output the VPA display information through the display screen in the vehicle.
  • vehicle terminal 602 is specifically used for:
  • the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
  • vehicle terminal 602 is also used for:
  • the instruction with the highest priority is selected as the target interaction instruction
  • the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.
  • vehicle terminal 602 is also used for:
  • the server If it fails to obtain the VPA display information from the server, then search for the historical VPA display information corresponding to the sound feature represented by the target interactive command, wherein the historical VPA display information is the VPA display information that has been displayed to the passenger who issued the target interactive command;
  • the vehicle-mounted terminal 602 if the vehicle-mounted terminal 602 does not receive the VPA presentation information sent by the server within the timeout period after the vehicle-mounted terminal 602 sends the voice feature to the server 601, it means that the acquisition of the VPA presentation information from the server fails.
  • the server 601 is specifically used for:
  • the VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.
  • the server 601 is also used for:
  • the VPA display information also includes the first introductory language
  • the server 601 is also used to obtain display status information for multimedia information after outputting the VPA display information through the display screen in the vehicle.
  • the multimedia information is after outputting the first guide language through the display screen in the vehicle. , the information displayed to passengers by the vehicle's machine through the display screen;
  • the server 601 is also used to determine the second guide that matches the passenger's user behavior data and send the second guide to the vehicle-mounted terminal if the display status information indicates that the multimedia information display is completed;
  • the vehicle-mounted terminal 602 is also used to receive the second guide, and output the second guide through the display screen in the vehicle.
  • the server 601 is specifically used for:
  • the embodiment of the present disclosure provides a virtual personal assistant display device, as shown in FIG. 7 , including: an acquisition module 701, an identification module 702 and an output module 703.
  • the acquisition module 701 is configured to acquire the sound features represented by the target interaction instruction; wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;
  • the identification module 702 is used to identify the passenger's identity attribute information based on the sound characteristics represented by the target interaction instruction, and use the passenger's identity attribute information as the target attribute information;
  • the obtaining module 701 is also used to obtain the virtual personal assistant VPA display information matching the target attribute information, and the VPA display information includes the VPA image;
  • the output module 703 is configured to output the VPA display information through the display screen in the vehicle.
  • output module specifically for:
  • the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
  • the device further includes a determination module, which is used for:
  • the instruction with the highest priority is selected as the target interaction instruction
  • the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.
  • obtain modules specifically for:
  • the historical VPA display information is the VPA display information that was displayed to the passenger who issued the target interactive command;
  • obtain modules specifically for:
  • the VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.
  • the device may also include a search module and an execution module;
  • the search module is used to search for the corresponding voice feature represented by the target interactive command from the corresponding relationship between each voice feature and user behavior data before identifying the identity attribute information of the passenger based on the voice feature represented by the target interactive command.
  • the execution module is used to determine the VPA display information matching the found user behavior data if it is found, and execute the step of outputting the VPA display information through the display screen in the vehicle;
  • the executing module is further configured to execute the step of identifying the passenger's identity attribute information based on the voice features represented by the target interaction instruction if it is not found.
  • the VPA display information also includes a first guide; the device may also include: a determination module;
  • the obtaining module is also used to obtain the display status information for the multimedia information after outputting the VPA display information through the display screen in the vehicle.
  • the multimedia information is after outputting the first guide language through the display screen in the vehicle. , the information displayed to passengers by the vehicle's machine through the display screen;
  • a determining module configured to determine the second guide that matches the passenger's user behavior data if the display state information indicates that the display of the multimedia information is completed;
  • the output module is also used to output the second guide language through the display screen in the vehicle.
  • the identification module is specifically used for:
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • an electronic device 800 includes a computing unit 801, which can perform calculations according to a computer program stored in a read-only memory (ROM) 802 or a computer program loaded from a storage unit 808 into a random access memory (RAM) 803. Various appropriate actions and processes are performed. In the RAM 803, various programs and data necessary for the operation of the electronic device 800 can also be stored.
  • the computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to the bus 804 .
  • the I/O interface 805 includes: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk etc.; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 801 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 801 executes the various methods and processes described above, such as the presentation method of the virtual personal assistant, for example, in some embodiments, the presentation method of the virtual personal assistant can be implemented as a computer software program, which is tangibly contained in A machine-readable medium, such as storage unit 808 .
  • part or all of the computer program can be loaded and/or installed on the electronic device 800 via the ROM 802 and/or the communication unit 809.
  • the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the virtual personal assistant presentation method described above can be performed.
  • the computing unit 801 may be configured in any other appropriate way (for example, by means of firmware) to execute the virtual personal assistant presentation method.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente divulgation relève du domaine technique du traitement de données et, en particulier, du domaine technique de l'Internet des véhicules. Elle concerne un procédé et un appareil d'affichage d'un assistant personnel virtuel (VPA), ainsi qu'un dispositif, un support et un produit. Une solution de mise en œuvre spécifique comprend les étapes consistant à : obtenir une caractéristique sonore représentée par une instruction d'interaction cible, l'instruction d'interaction cible étant une instruction d'interaction vocale émise par un passager dans un véhicule ; identifier des informations sur les attributs d'identité du passager sur la base de la caractéristique sonore représentée par l'instruction d'interaction cible et utiliser les informations sur les attributs d'identité du passager en tant qu'informations sur les attributs cibles ; obtenir des informations d'affichage d'un VPA correspondant aux informations sur les attributs cibles, les informations d'affichage d'un VPA contenant une image d'un VPA ; et délivrer en sortie les informations d'affichage d'un VPA au moyen d'un écran d'affichage dans le véhicule.
PCT/CN2021/113299 2021-08-18 2021-08-18 Procédé et appareil d'affichage d'un assistant personnel virtuel, dispositif, support et produit WO2023019475A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113299 WO2023019475A1 (fr) 2021-08-18 2021-08-18 Procédé et appareil d'affichage d'un assistant personnel virtuel, dispositif, support et produit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113299 WO2023019475A1 (fr) 2021-08-18 2021-08-18 Procédé et appareil d'affichage d'un assistant personnel virtuel, dispositif, support et produit

Publications (1)

Publication Number Publication Date
WO2023019475A1 true WO2023019475A1 (fr) 2023-02-23

Family

ID=85239326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113299 WO2023019475A1 (fr) 2021-08-18 2021-08-18 Procédé et appareil d'affichage d'un assistant personnel virtuel, dispositif, support et produit

Country Status (1)

Country Link
WO (1) WO2023019475A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110171372A (zh) * 2019-05-27 2019-08-27 广州小鹏汽车科技有限公司 车载终端的界面显示方法、装置及车辆
CN110427472A (zh) * 2019-08-02 2019-11-08 深圳追一科技有限公司 智能客服匹配的方法、装置、终端设备及存储介质
CN111381673A (zh) * 2018-12-28 2020-07-07 哈曼国际工业有限公司 双向车载虚拟个人助理
US20200339142A1 (en) * 2019-02-28 2020-10-29 Google Llc Modalities for authorizing access when operating an automated assistant enabled vehicle
CN112959998A (zh) * 2021-03-19 2021-06-15 恒大新能源汽车投资控股集团有限公司 一种车载人机交互方法、装置、车辆及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111381673A (zh) * 2018-12-28 2020-07-07 哈曼国际工业有限公司 双向车载虚拟个人助理
US20200339142A1 (en) * 2019-02-28 2020-10-29 Google Llc Modalities for authorizing access when operating an automated assistant enabled vehicle
CN110171372A (zh) * 2019-05-27 2019-08-27 广州小鹏汽车科技有限公司 车载终端的界面显示方法、装置及车辆
CN110427472A (zh) * 2019-08-02 2019-11-08 深圳追一科技有限公司 智能客服匹配的方法、装置、终端设备及存储介质
CN112959998A (zh) * 2021-03-19 2021-06-15 恒大新能源汽车投资控股集团有限公司 一种车载人机交互方法、装置、车辆及电子设备

Similar Documents

Publication Publication Date Title
CN107507612B (zh) 一种声纹识别方法及装置
US11004444B2 (en) Systems and methods for enhancing user experience by communicating transient errors
US10733987B1 (en) System and methods for providing unplayed content
US20210280190A1 (en) Human-machine interaction
US20230325663A1 (en) Systems and methods for domain adaptation in neural networks
US20210201886A1 (en) Method and device for dialogue with virtual object, client end, and storage medium
US11494612B2 (en) Systems and methods for domain adaptation in neural networks using domain classifier
US11640519B2 (en) Systems and methods for domain adaptation in neural networks using cross-domain batch normalization
JP2017527926A (ja) 社交的会話入力に対するコンピュータレスポンスの生成
US10672379B1 (en) Systems and methods for selecting a recipient device for communications
EP3647914B1 (fr) Appareil électronique et son procédé de commande
US10699706B1 (en) Systems and methods for device communications
CN109119069B (zh) 特定人群识别方法、电子装置及计算机可读存储介质
US20230022004A1 (en) Dynamic vocabulary customization in automated voice systems
WO2023019475A1 (fr) Procédé et appareil d'affichage d'un assistant personnel virtuel, dispositif, support et produit
WO2020087534A1 (fr) Génération de réponse dans une conversation
EP3846164B1 (fr) Procédé et appareil de traitement de la voix, dispositif électronique, support d'enregistrement et produit programme informatique
EP4123477A1 (fr) Recommandation d'informations multimédia
KR102669100B1 (ko) 전자 장치 및 그 제어 방법
CN115062691A (zh) 属性识别方法和装置
CN117809681A (zh) 一种服务器、显示设备及数字人交互方法
CN115858601A (zh) 通过自动助理进行协作搜索会话
CN117809679A (zh) 一种服务器、显示设备及数字人交互方法
CN117809682A (zh) 一种服务器、显示设备及数字人交互方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21953722

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE