WO2023019475A1

WO2023019475A1 - Virtual personal assistant displaying method and apparatus, device, medium, and product

Info

Publication number: WO2023019475A1
Application number: PCT/CN2021/113299
Authority: WO
Inventors: 陈真
Original assignee: 阿波罗智联(北京)科技有限公司
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2023-02-23

Abstract

The present disclosure relates to the technical field of data processing, and in particular to the technical field of Internet of Vehicles, and provides a virtual personal assistant (VPA) displaying method and apparatus, a device, a medium, and a product. A specific implementation solution is: obtaining a sound feature represented by a target interaction instruction, wherein the target interaction instruction is a voice interaction instruction issued by a passenger in a vehicle; then identifying identity attribute information of the passenger on the basis of the sound feature represented by the target interaction instruction, and using the identity attribute information of the passenger as target attribute information; then obtaining VPA displaying information matching the target attribute information, wherein the VPA displaying information comprises a VPA image; and then outputting the VPA displaying information by means of a display screen in the vehicle.

Description

Display method, device, equipment, medium and product of virtual personal assistant

technical field

The present disclosure relates to the field of data processing technology, especially to the field of vehicle network technology, and in particular to a display method, device, device, medium and product of a virtual personal assistant.

Background technique

When the car-machine interacts with passengers, a virtual personal assistant (Virtual Personal Assistant, VPA) is displayed on the car-machine screen, and the VPA guides passengers to interact, such as guiding passengers to play videos, view news, etc.

Contents of the invention

The present disclosure provides a display method, device, equipment, medium and product of a virtual personal assistant.

According to a first aspect of the present disclosure, a method for displaying a virtual personal assistant is provided, including:

Acquiring the sound features represented by the target interaction instruction; wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

Identifying the passenger's identity attribute information based on the sound features represented by the target interaction instruction, and using the passenger's identity attribute information as the target attribute information;

Obtaining the virtual personal assistant VPA display information matching the target attribute information, the VPA display information including the VPA image;

The display information of the VPA is output through the display screen in the vehicle.

Optionally, the output of the VPA display information through the display screen in the vehicle includes:

Outputting the VPA display information through a display screen in a designated seating area in the vehicle;

Wherein, the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.

Optionally, the manner of determining the target interaction instruction includes:

If multiple voice interaction instructions are collected in the vehicle at the same time, then based on the priority of each voice interaction instruction, select the instruction with the highest priority as the target interaction instruction;

Wherein, the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.

Optionally, the acquisition of the display information of the virtual personal assistant VPA that matches the target attribute information includes:

Find the historical VPA display information corresponding to the sound feature represented by the target interaction instruction, wherein the historical VPA display information is the VPA display information shown to the passenger who issued the target interaction instruction;

Among the historical VPA display information, the VPA display information with the highest display frequency is selected.

The VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.

Optionally, before identifying the identity attribute information of the passenger based on the sound features represented by the target interaction instruction, the method further includes:

From the corresponding relationship between each sound feature and user behavior data, find the user behavior data corresponding to the sound feature represented by the target interaction instruction;

If it is found, then determine the VPA display information that matches the found user behavior data, and execute the step of outputting the VPA display information through the display screen in the vehicle;

If not found, perform the step of identifying the identity attribute information of the passenger based on the voice features represented by the target interaction instruction.

Optionally, the VPA display information also includes a first guide; after the VPA display information is output through the display screen in the vehicle, it also includes:

Acquiring display status information for multimedia information, the multimedia information is the information displayed to passengers by the car machine of the vehicle through the display screen after outputting the first guide language through the display screen in the vehicle ;

If the display status information indicates that the display of the multimedia information is completed, then determine the second guide that matches the user behavior data of the passenger;

The second guide is output through a display screen in the vehicle.

Optionally, identifying the identity attribute information of the passenger based on the sound features represented by the target interaction instruction includes:

Inputting the voice features represented by the target interaction instruction into a pre-trained neural network model, and obtaining identity attribute information output by the neural network model.

According to a second aspect of the present disclosure, a virtual personal assistant display system is provided, including: a server and a vehicle-mounted terminal of a vehicle;

The vehicle-mounted terminal is used to obtain a target interaction instruction, determine the sound characteristics represented by the target interaction instruction, and send the sound characteristics to the server, and the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

The server is configured to receive the sound feature represented by the target interaction instruction, identify the identity attribute information of the passenger based on the sound feature represented by the target interaction instruction, and use the identity attribute information of the passenger as the target attribute Information, obtaining the virtual personal assistant VPA display information matched with the target attribute information, the VPA display information including the VPA image, and sending the VPA display information to the vehicle terminal;

The vehicle-mounted terminal is further configured to receive the VPA display information, and output the VPA display information through a display screen in the vehicle.

Optionally, the vehicle-mounted terminal is specifically used for:

Optionally, the vehicle-mounted terminal is also used for:

If it fails to obtain the VPA display information from the server, then search for the historical VPA display information corresponding to the sound feature represented by the target interaction instruction, wherein the historical VPA display information is shown to the passenger who issued the target interaction instruction VPA display information;

Optionally, the server is specifically used for:

Optionally, the server is also used for:

Before identifying the identity attribute information of the passenger based on the sound features represented by the target interaction instruction, search for the correspondence between the sound features represented by the target interaction instruction from the correspondence between each sound feature and user behavior data user behavior data;

Optionally, the VPA display information also includes a first guide;

The server is further configured to obtain display status information for multimedia information after outputting the VPA display information through the display screen in the vehicle, and the multimedia information is to pass through the display screen in the vehicle, and to After the first guide is output, the vehicle machine of the vehicle displays the information to passengers through the display screen;

The server is further configured to determine a second guide that matches the user behavior data of the passenger and send the second guide to the vehicle-mounted terminal if the display state information indicates that the multimedia information display is completed;

The vehicle-mounted terminal is further configured to receive the second guide, and output the second guide through a display screen in the vehicle.

Optionally, the server is specifically used for:

According to a third aspect of the present disclosure, a virtual personal assistant display device is provided, including:

An acquisition module, configured to acquire the sound features represented by the target interaction instruction; wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

An identification module, configured to identify the passenger's identity attribute information based on the sound features represented by the target interaction instruction, and use the passenger's identity attribute information as the target attribute information;

The obtaining module is also used to obtain the virtual personal assistant VPA display information matching the target attribute information, and the VPA display information includes a VPA image;

The output module is used to output the display information of the VPA through the display screen in the vehicle.

Optionally, the output module is specifically used for:

Optionally, the device further includes a determination module, the determination module is configured to:

Optionally, the acquiring module is specifically used for:

Find historical VPA display information corresponding to the sound feature represented by the target interaction instruction, wherein the historical VPA display information is the VPA display information shown to the passenger who issued the target interaction instruction;

Optionally, the acquiring module is specifically used for:

Optionally, the device may also include a search module and an execution module;

The search module is used to search for the information related to the target interaction instruction from the corresponding relationship between each sound feature and user behavior data before identifying the identity attribute information of the passenger based on the sound characteristics represented by the target interaction instruction. User behavior data corresponding to the represented sound features;

The execution module is used to determine the VPA display information matching the found user behavior data if found, and execute the step of outputting the VPA display information through the display screen in the vehicle;

The execution module is further configured to execute the step of identifying the identity attribute information of the passenger based on the voice features represented by the target interaction instruction if not found.

Optionally, the VPA presentation information also includes a first guide; the device may also include: a determination module;

The acquisition module is also used to acquire display status information for multimedia information after outputting the VPA display information through the display screen in the vehicle. After the first guide is output, the vehicle machine of the vehicle displays the information to passengers through the display screen;

A determining module, configured to determine a second guide that matches the user behavior data of the passenger if the display state information indicates that the display of the multimedia information is completed;

The output module is further configured to output the second guide language through the display screen in the vehicle.

Optionally, the identification module is specifically used for:

According to a fourth aspect of the present disclosure, an electronic device is provided, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect above.

According to a fifth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect above.

According to a sixth aspect of the present disclosure, there is provided a computer program product, including a computer program, which implements the method described in the first aspect when executed by a processor.

The embodiment of the present disclosure can obtain the sound features represented by the target interaction instruction, and then identify the passenger's identity attribute information based on the sound feature, and use the passenger's identity attribute information as the target attribute information, and output the information corresponding to the target attribute information through the display screen in the vehicle. Matched VPA display information, where the VPA display information includes a VPA image. It can be seen that the embodiment of the present disclosure determines the VPA image displayed for the passenger based on the identity attribute information of the passenger, so different VPA images can be displayed to different passengers.

It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.

Description of drawings

The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:

FIG. 1 is a flowchart of a display method of a virtual personal assistant provided by an embodiment of the present disclosure;

Fig. 2 is a flow chart of another display method of a virtual personal assistant provided by an embodiment of the present disclosure;

Fig. 3 is a flow chart of a method for outputting a second guide provided by an embodiment of the present disclosure;

Fig. 4 is a signaling diagram of a presentation process of a virtual personal assistant provided by an embodiment of the present disclosure;

Fig. 5 is an exemplary schematic diagram of a presentation process of a virtual personal assistant provided by an embodiment of the present disclosure;

Fig. 6 is a schematic structural diagram of a display system of a virtual personal assistant provided by an embodiment of the present disclosure;

Fig. 7 is a schematic structural diagram of a virtual personal assistant display device provided by an embodiment of the present disclosure;

FIG. 8 is a block diagram of an electronic device used to implement the presentation method of a virtual personal assistant according to an embodiment of the present disclosure.

Detailed ways

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Today's vehicles have voice interaction capabilities. When the vehicle-mounted terminal in the vehicle interacts with passengers, it displays a virtual personal assistant (Virtual Personal Assistant, VPA) on the screen of the vehicle, and guides passengers to interact through the VPA, such as guiding passengers to play videos and view news. wait.

However, the VPA image displayed on the car screen is fixed, so the display method is single and the flexibility is poor.

In order to improve the flexibility of displaying the image of a VPA, the embodiment of the present disclosure provides a method for displaying a virtual personal assistant, which is applied to an electronic device. In a specific application, the electronic device may be a vehicle-mounted terminal in the vehicle, or a server corresponding to the vehicle-mounted terminal, which is reasonable.

In addition, as an example, the vehicle-mounted terminal may be a vehicle-mounted brain, or a vehicle super-brain, or a driving brain; IVI), vehicle-mounted infotainment host (Display Head Unit, DHU) or any other type of vehicle-mounted terminal, which is not limited in this embodiment of the present disclosure.

As shown in Figure 1, the method includes the following steps:

S101. Acquire sound features represented by a target interaction instruction.

Wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle.

The sending out of the target interaction instruction means that the passenger and the vehicle-mounted terminal start to conduct voice interaction. For example, a predetermined wake-up word spoken by a passenger may be a target interaction instruction. The embodiment of the present disclosure does not limit the specific content of the predetermined wake-up word.

It can be understood that, when the electronic device is a vehicle-mounted terminal, the vehicle-mounted terminal can directly collect the target interaction instruction through the sound collection device in the vehicle when executing S101, and extract the sound features represented by the target interaction instruction.

However, when the electronic device is a server, the server may receive the voice feature represented by the target interaction instruction sent by the vehicle-mounted terminal when executing S101. In this case, after the vehicle-mounted terminal collects the target interaction command, it can extract the sound features represented by the target interaction command, and then send the sound features represented by the target interaction command to the server, so that the server can obtain the sound features represented by the target interaction command. characterizing sound characteristics.

In addition, the voice features represented by the target interaction instruction may also be called voiceprint features, and the voice features include formants. The present disclosure does not limit the manner of extracting voice features, and any implementation manner capable of extracting voice features from voice instructions can be applied to the present disclosure.

In the technical solution of the present disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of passengers' voice interaction instructions involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

S102, based on the voice features represented by the target interaction instruction, identify the passenger's identity attribute information, and use the passenger's identity attribute information as the target attribute information.

Optionally, the passenger's identity attribute information is used to represent the passenger's natural attributes. Moreover, since people of different genders and ages usually have different preferences for the form of things, and people of the same gender or age usually have similar preferences, therefore, for example, the identity attribute information may include gender and/or age. Furthermore, the virtual personal assistant image displayed to the passenger is determined by gender and/or age.

In one implementation manner, the electronic device may input the sound features represented by the target interaction instruction into a pre-trained neural network model, and then obtain identity attribute information output by the neural network model.

Wherein, the neural network model is trained through a training set, and the training set includes a plurality of sample sound features and a training label corresponding to each sample sound feature. Wherein, the training label includes the identity attribute information of the person to whom the sample voice feature belongs.

The embodiment of the present disclosure utilizes a neural network model to identify identity attribute information of passengers through sound features, so that the judgment of identity attribute information of passengers is more accurate and robust.

It should be emphasized that the above-mentioned implementation of identifying the identity attribute information of passengers based on the neural network model is only an example, and should not constitute a limitation to the present disclosure. For example, the identity attribute information of passengers can also be identified through a specified voice recognition algorithm information.

In the technical solution of the present disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of passenger target attribute information and training sets are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

S103. Obtain VPA display information matching the target attribute information. Wherein, the VPA display information includes a VPA image.

The VPA image is used to represent the appearance of the VPA. For example, the VPA image may include an image composed of the VPA's facial features, figure, hairstyle, and clothing.

Optionally, the VPA display information may also include: the first guide language and/or the pronunciation from text to speech (Text To Speech, TTS).

Wherein, the electronic device may display the first guide language while displaying the image of the VPA. The first guide is used to guide passengers to interact with the vehicle terminal. For example, the first guiding language includes: "Try to start the unmanned driving mode" and "Let's watch a cartoon together".

The TTS pronunciation is the pronunciation upon which the first lead language is played. For example, the TTS pronunciation includes any of an elderly voice, an adult male voice, an adult female voice, a child voice, and the like.

Exemplarily, when the target attribute information includes 10 years old and male, the corresponding VPA display information includes: cartoon images, children's voices and children's guide words; when the target attribute information includes 20 years old and male, the corresponding VPA display information includes: Male image, adult male voice and male guide; target attribute information includes 20-year-old and female, and its corresponding VPA display information includes: female image, adult female voice and female guide; target attribute information includes 60-year-old and male, The corresponding VPA display information includes: images of old people, voices of old people and guiding words for old people.

S104, outputting information displayed by the VPA through a display screen in the vehicle.

When the electronic device is a vehicle-mounted terminal, when executing S104, the vehicle-mounted terminal can directly output the display information of the VPA through the display screen in the vehicle, that is, display the image of the VPA that matches the passenger.

When the electronic device is a server, the server can send VPA display information to the vehicle-mounted terminal when executing S104, so that the vehicle-mounted terminal can output the VPA display information through the display screen in the vehicle after receiving the VPA display information. Match the VPA image.

It should be noted that if the vehicle has only one display screen, that is, the display screen at the driver's seat, then, in one implementation, no matter which seat the passenger who issued the target interaction instruction belongs to, the information displayed by the VPA will be displayed through the driver's seat. Display at the seat for output.

At the same time, since the embodiments of the present disclosure can display different VPA images to different passengers, the embodiments of the present disclosure can more flexibly display the VPA images and at the same time meet the personalized needs of users.

Optionally, in another embodiment of the present disclosure, regardless of whether the above-mentioned electronic device is a vehicle-mounted terminal or the above-mentioned electronic device is a server, the method for determining the target interaction instruction in S101 includes: if multiple voice interaction instructions are collected in the vehicle at the same time, Based on the priority of each voice interaction instruction, the instruction with the highest priority is selected as the target interaction instruction. Wherein, the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.

That is to say, if the above-mentioned electronic device is a vehicle-mounted terminal, when multiple voice interaction commands are collected in the vehicle at the same time, the electronic device can select the command with the highest priority as the target interaction command based on the priority of each voice interaction command ; Furthermore, the sound characteristics of the target interactive instruction are obtained. If the above-mentioned electronic device is a server, when multiple voice interaction commands are collected in the vehicle at the same time, the vehicle-mounted terminal selects the command with the highest priority as the target interaction command based on the priority of each voice interaction command; The voice feature of the command, and send the voice feature of the target interaction command to the server, so that the voice feature acquired by the server is the voice feature of the voice interaction command with the highest priority.

Optionally, multiple sound collection devices, such as in-vehicle microphones, may be installed in the vehicle where the vehicle-mounted terminal is located. Different sound collection devices can collect the voices of passengers in different seating areas.

For example, it is assumed that the priorities of the seat areas are in order from high to low: the main driving area, the co-pilot area, and the rear row area. The vehicle-mounted terminal simultaneously collects the interactive command 1 issued by the passenger in the main driving area and the interactive command 2 issued by the passenger in the co-pilot area. Since the priority of interactive command 1 is higher than that of interactive command 2, interactive command 1 is taken as the target Interactive instructions.

It can be understood that if only one interaction instruction is collected at the same time, this interaction instruction is taken as the target interaction instruction.

Among them, there may be many ways to identify the source area of the voice interaction instruction, that is, the identification manner of the seat area where the passenger who issued the voice interaction instruction is located.

In addition to distinguishing the source area of the voice interaction command by the position of the sound collection device, in another implementation, multiple sound collection devices are installed at different positions in the vehicle, and each seat in the vehicle is set as an independent sound system. Area, by judging the source direction of the sound signal received by the sound collection equipment in multiple locations, determine which seat the sound signal is sent from, and then know the source area of the voice interaction command, that is, the passenger who issued the voice interaction command. Located in the seating area.

When multiple passengers in the vehicle have voice interaction needs at the same time, the embodiment of the present disclosure can select the command with the highest priority according to the priority of the voice interaction commands, so as to prioritize the interaction needs of the driver and improve the safety of vehicle driving.

Optionally, in another embodiment of the present disclosure, the vehicle where the vehicle-mounted terminal is located may include multiple display screens.

In the case that the vehicle includes multiple display screens, the way of outputting the VPA display information through the display screens in the vehicle in S104 above can be implemented as: outputting the VPA display information through the display screens in the designated seat area in the vehicle . Wherein, the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.

Regarding the specific implementation manner of identifying the seat area where the passenger who issued the target interaction instruction is located, reference may be made to the corresponding content in the foregoing embodiments, and details are not repeated here. It can be understood that, if the electronic device is a vehicle-mounted terminal, the electronic device can determine the designated seat area before outputting the VPA display information, and then output the VPA display information on the display screen of the designated seat area in the vehicle. And if the electronic device is a server, the electronic device can obtain the area identification of the source area of the target interaction instruction from the vehicle-mounted terminal while obtaining the sound feature represented by the target interaction instruction, that is, the area identification of the designated seat area, and then After the electronic device obtains the VPA display information, it can send the VPA display information and the area identification obtained before to the vehicle-mounted terminal, so that the vehicle-mounted terminal can output the VPA display information on the display screen of the designated seat area in the vehicle; Of course, if the electronic device is a server, the electronic device can also only feed back the VPA display information to the vehicle-mounted terminal, and the vehicle-mounted terminal outputs the VPA display information on the display screen in the designated seat area in the vehicle by itself, which is also reasonable.

The embodiments of the present disclosure can be applied to multi-screen interaction scenarios, so that for passengers in different seating areas, different VPA display information can be determined according to the identity attribute information of the passengers, so as to improve the use interest and interaction fun of passengers.

At the same time, the embodiments of the present disclosure can be applied not only to single-screen interaction scenarios, but also to multi-screen interaction scenarios, so that the application range of the embodiments of the present disclosure is wider.

Optionally, in another embodiment of the present disclosure, as shown in FIG. 2 , before the above S102, it also includes: S105, from the corresponding relationship between each sound feature and user behavior data, find the information represented by the target interaction instruction. The user behavior data corresponding to the voice features. If found, determine the VPA display information matching the found user behavior data, and execute S104; if not found, execute S102.

In the embodiments of the present disclosure, the user behavior data is used to represent the user's historical operation behavior. For example, user behavior data includes: VPA display information set by the user history, videos watched by the user, news watched by the user and/or music played by the user, etc.

In one embodiment, if the user behavior data includes the VPA display information set by the user history, the electronic device may use the VPA display information set by the user history as the VPA display information matching the user behavior data.

If the user behavior data does not include the VPA display information set by the user history, the electronic device determines the preference type of the passenger according to the user behavior data, and then determines the VPA display information corresponding to the preference type.

For example, if the passenger's user behavior data includes: cartoons and nursery rhymes, it is determined that the passenger's favorite type is animation, and then the VPA image and the first guide language corresponding to the animation are determined. Wherein, each VPA image corresponds to a preference type, and each first guide language corresponds to a preference type.

In the technical solution of the present disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of the passenger's user behavior data involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

The embodiment of the present disclosure can determine the VPA display information according to the user behavior data. Since the user behavior data can better reflect the user's interest compared with the identity attribute information, the embodiment of the present disclosure determines according to the user behavior data when the user behavior data is collected in advance. The information displayed by the VPA can better meet the interests of users.

Optionally, in another embodiment of the present disclosure, the above step of S103 acquiring VPA presentation information matching the target attribute information may include the following three implementations:

Method 1: Obtain the VPA display information matching the target attribute information from the preset corresponding relationship between each identity attribute information and the VPA display information.

In one embodiment, the server may pre-collect user behavior data of passengers with different identity attribute information, and then for each passenger, based on the user behavior data of the passenger, determine the VPA display information that matches the user behavior data as the The VPA display information corresponding to the identity attribute information.

Wherein, the manner of determining the VPA display information matching the user behavior data can refer to the above description, which will not be repeated here.

For example, the pre-collected identity attribute information of passenger 1 includes age 10, and user behavior data includes cartoon A and cartoon B. Passenger 2's identity attribute information includes age 5, and user behavior data includes cartoon C and cartoon B. Passenger 3’s identity attribute information includes age 5, and user behavior data includes nursery rhyme A and cartoon A. Assuming that cartoon A, cartoon B, cartoon C and nursery rhyme A are all animation types, since passenger 1, passenger 2 and passenger 3 all belong to the children's age group, the children's age group is corresponding to the VPA image of the animation type. When the age of the passenger 4 is obtained later as 7 years old, since 7 years old belongs to the age group of children, the VPA image of the animation type is determined.

The embodiments of the present disclosure can collect user behavior data of various passengers in advance, so as to obtain the preferences of passengers of different ages and genders, and thus can determine the VPA display information that the passenger may like based on the identity attribute information of the passenger.

Method 2: Find the historical VPA display information corresponding to the sound feature represented by the target interactive command, and then select the VPA display information with the highest display frequency among the historical VPA display information.

Wherein, the historical VPA display information is the VPA display information that has been displayed to the passenger who issued the target interaction instruction.

In one embodiment, the vehicle-mounted terminal can record the voice characteristics of each passenger who has used the voice interaction function, as well as the corresponding historical VPA display information. Therefore, when the sound feature represented by the target interaction instruction is obtained, the historical VPA display information corresponding to the sound feature is searched, and then the VPA display information with the highest display frequency is selected from the historical VPA display information.

Because among the historical VPA display information, the VPA display information with the highest display frequency is most likely to be liked by passengers, so in the embodiment of the present disclosure, selecting the VPA display information with the highest display frequency can better meet user preferences.

Method 3: When the vehicle-mounted terminal communicates with the server, the vehicle-mounted terminal sends the voice feature to the server, and then the server determines the VPA display information through the method 1.

When the vehicle-mounted terminal disconnects from the communication connection with the server, that is, when the vehicle-mounted terminal is offline, the vehicle-mounted terminal determines the VPA display information in the second manner.

Specifically, the vehicle-mounted terminal may send the sound feature to the server to obtain the VPA display information, and execute the second method when the VPA display information is not obtained from the server.

In another embodiment of the present disclosure, as shown in FIG. 3 , after the above S104, the electronic device may further display a second guide for passengers, including the following steps:

S301. Acquire display state information for multimedia information.

Wherein, the multimedia information is the information displayed by the vehicle machine to the passengers through the display screen after outputting the first guide language through the display screen in the vehicle.

Optionally, the multimedia information includes: text, picture, audio and/or video, etc.

The multimedia information refers to the information displayed by the vehicle-mounted terminal during the interaction between passengers and the vehicle-mounted terminal.

Display status information includes: not displayed, completed display, partially displayed, etc. For example, when the multimedia information is a series of cartoons, the display status information includes the number of episodes of the series of cartoons currently being played. When the number of episodes is the last episode, it means that the series of cartoons is displayed. One episode, which means that the series of cartoons is partially shown.

For another example, when the multimedia information is a song of a certain album, the display status includes the serial number of the song currently being played. When the serial number is the last one, it means that the song display of the album is completed; The songs section of the album is shown.

S302. If the display state information indicates that the display of the multimedia information is completed, determine the second guide that matches the passenger's user behavior data.

Since before S301, the vehicle-machine system has displayed multimedia information for the passenger, the passenger has corresponding user behavior data, and the second guide that matches the passenger's user behavior data can be determined.

For the method of determining the second guide that matches the user behavior data, refer to the above method of determining the VPA display information that matches the user behavior data, which will not be repeated here.

For example, according to the user behavior data, it is determined that the type of user preference is entertainment, and then the second guide language of entertainment is determined.

S303. Output the second guide language through the display screen in the vehicle.

In one embodiment, when the electronic device is a vehicle-mounted terminal, the vehicle-mounted terminal can directly output the second guide language through a display screen in the vehicle.

When the electronic device is a server, the server can send the second guide to the vehicle-machine system, so that the vehicle-mounted terminal can output the second guide through the display screen in the vehicle.

The embodiment of the present disclosure can obtain the passenger's user behavior data after the passenger has interacted with the vehicle-mounted terminal for a period of time, so as to recommend the second guide language that the passenger is more likely to like based on the user behavior data, so as to guide the user to further interact with the vehicle-mounted terminal. Interaction, thereby improving the interaction interest of passengers.

Referring to Fig. 4, the overall flow of the virtual personal assistant display method provided by the embodiment of the present disclosure is described below in combination with specific application scenarios:

S401. The vehicle-mounted terminal obtains a target interaction instruction issued by a passenger in a seat area, and obtains a sound feature represented by the target interaction instruction.

S402. The car-machine terminal sends the sound feature to the server.

S403, the server receives the voice features, and based on the voice features represented by the target interaction instruction, identifies the identity attribute information of the passenger, and takes the passenger's identity attribute information as the target attribute information.

S404. The server obtains the VPA display information matching the target attribute information. The VPA display information includes VPA image, first guide language and TTS pronunciation.

S405. The server sends the VPA display information to the vehicle terminal.

S406. The vehicle-mounted terminal receives the VPA display information, and outputs the VPA display information through the display screen in the seat area in the vehicle.

Referring to FIG. 5 , in the display method of the virtual personal assistant provided by the embodiment of the present disclosure, the vehicle-mounted terminal can collect the target interaction instruction issued by the passenger during the voice interaction process, and then obtain the sound characteristics of the target interaction instruction. The age and gender of the passenger are recognized by the server using voice features, and the VPA display information matching the age and gender of the passenger is determined by the VPA generation switching system. Among them, the VPA display information includes VPA image, first guide language and TTS pronunciation; each VPA information corresponds to a gender (male or female) and an age group (such as old, middle-aged, young or infant). According to the age and gender of the passengers, it is possible to determine the VPA display information that users of this age and gender may like.

Then when the age of the passenger belongs to the elderly, the VPA is displayed as an image of the elderly in the vehicle display screen, and the first guide language of the elderly is played using the elderly TTS pronunciation.

When the age of the passenger is middle-aged and the gender is female, the VPA is displayed as a middle-aged female image on the vehicle display screen, and the first guide language of the middle-aged female is played using the TTS pronunciation of the middle-aged female.

When the age of the passenger is middle-aged and the gender is male, the VPA is displayed as a middle-aged male image on the vehicle display screen, and the middle-aged male TTS pronunciation is used to play the first guide language of the middle-aged male.

When the age of the passenger is young and the gender is female, the VPA is displayed as a young female image on the vehicle display screen, and at the same time, the young female TTS pronunciation is used to play the first guide language of the young female.

When the age of the passenger belongs to youth and the gender is male, the VPA is displayed as a young male image on the vehicle display screen, and at the same time, the young male TTS pronunciation is used to play the first guiding language of the young male.

When the age of the passenger belongs to infants, the VPA is displayed as an image of infants on the vehicle display screen, and at the same time, the infant's first guide language is played using infant TTS pronunciation.

Based on the same idea, corresponding to the above-mentioned method embodiment, the embodiment of the present disclosure also provides a display system of a virtual personal assistant, as shown in FIG. 6 , including: a server 601 and a vehicle-mounted terminal 602 of the vehicle;

The vehicle-mounted terminal 602 is used to obtain the target interaction instruction, and determine the sound characteristics represented by the target interaction instruction, and send the sound characteristics to the server, and the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

The server 601 is configured to receive the sound feature represented by the target interaction instruction, identify the passenger's identity attribute information based on the sound feature represented by the target interaction instruction, use the passenger's identity attribute information as the target attribute information, and obtain the matching target attribute information The virtual personal assistant VPA display information, the VPA display information includes the VPA image, and sends the VPA display information to the vehicle terminal;

The vehicle-mounted terminal 602 is also used to receive VPA display information, and output the VPA display information through the display screen in the vehicle.

Optionally, the vehicle terminal 602 is specifically used for:

Output the VPA display information through the display screen in the designated seat area in the vehicle;

Optionally, the vehicle terminal 602 is also used for:

If multiple voice interaction instructions are collected in the vehicle at the same time, based on the priority of each voice interaction instruction, the instruction with the highest priority is selected as the target interaction instruction;

Optionally, the vehicle terminal 602 is also used for:

If it fails to obtain the VPA display information from the server, then search for the historical VPA display information corresponding to the sound feature represented by the target interactive command, wherein the historical VPA display information is the VPA display information that has been displayed to the passenger who issued the target interactive command;

From the historical VPA display information, select the VPA display information with the highest display frequency.

In the embodiment of the present disclosure, if the vehicle-mounted terminal 602 does not receive the VPA presentation information sent by the server within the timeout period after the vehicle-mounted terminal 602 sends the voice feature to the server 601, it means that the acquisition of the VPA presentation information from the server fails.

Optionally, the server 601 is specifically used for:

Optionally, the server 601 is also used for:

Before identifying the passenger's identity attribute information based on the sound features represented by the target interaction instruction, from the corresponding relationship between each sound feature and user behavior data, search for the user behavior data corresponding to the sound feature represented by the target interaction instruction;

If it is found, determine the VPA display information matching the found user behavior data, and execute the step of outputting the VPA display information through the display screen in the vehicle;

Optionally, the VPA display information also includes the first introductory language;

The server 601 is also used to obtain display status information for multimedia information after outputting the VPA display information through the display screen in the vehicle. The multimedia information is after outputting the first guide language through the display screen in the vehicle. , the information displayed to passengers by the vehicle's machine through the display screen;

The server 601 is also used to determine the second guide that matches the passenger's user behavior data and send the second guide to the vehicle-mounted terminal if the display status information indicates that the multimedia information display is completed;

The vehicle-mounted terminal 602 is also used to receive the second guide, and output the second guide through the display screen in the vehicle.

Optionally, the server 601 is specifically used for:

Input the sound features represented by the target interaction instructions into the pre-trained neural network model, and obtain the identity attribute information output by the neural network model.

Based on the same idea, corresponding to the above-mentioned method embodiment, the embodiment of the present disclosure provides a virtual personal assistant display device, as shown in FIG. 7 , including: an acquisition module 701, an identification module 702 and an output module 703.

The acquisition module 701 is configured to acquire the sound features represented by the target interaction instruction; wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

The identification module 702 is used to identify the passenger's identity attribute information based on the sound characteristics represented by the target interaction instruction, and use the passenger's identity attribute information as the target attribute information;

The obtaining module 701 is also used to obtain the virtual personal assistant VPA display information matching the target attribute information, and the VPA display information includes the VPA image;

The output module 703 is configured to output the VPA display information through the display screen in the vehicle.

Optional, output module, specifically for:

Optionally, the device further includes a determination module, which is used for:

Optionally, obtain modules, specifically for:

Find the historical VPA display information corresponding to the sound feature represented by the target interactive command, wherein the historical VPA display information is the VPA display information that was displayed to the passenger who issued the target interactive command;

Optionally, obtain modules, specifically for:

The search module is used to search for the corresponding voice feature represented by the target interactive command from the corresponding relationship between each voice feature and user behavior data before identifying the identity attribute information of the passenger based on the voice feature represented by the target interactive command. user behavior data;

The execution module is used to determine the VPA display information matching the found user behavior data if it is found, and execute the step of outputting the VPA display information through the display screen in the vehicle;

The executing module is further configured to execute the step of identifying the passenger's identity attribute information based on the voice features represented by the target interaction instruction if it is not found.

Optionally, the VPA display information also includes a first guide; the device may also include: a determination module;

The obtaining module is also used to obtain the display status information for the multimedia information after outputting the VPA display information through the display screen in the vehicle. The multimedia information is after outputting the first guide language through the display screen in the vehicle. , the information displayed to passengers by the vehicle's machine through the display screen;

A determining module, configured to determine the second guide that matches the passenger's user behavior data if the display state information indicates that the display of the multimedia information is completed;

The output module is also used to output the second guide language through the display screen in the vehicle.

Optionally, the identification module is specifically used for:

In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.

According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in FIG. 8 , an electronic device 800 includes a computing unit 801, which can perform calculations according to a computer program stored in a read-only memory (ROM) 802 or a computer program loaded from a storage unit 808 into a random access memory (RAM) 803. Various appropriate actions and processes are performed. In the RAM 803, various programs and data necessary for the operation of the electronic device 800 can also be stored. The computing unit 801, ROM 802, and RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804 .

Multiple components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, etc.; an output unit 807, such as various types of displays, speakers, etc.; a storage unit 808, such as a magnetic disk, an optical disk etc.; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

The computing unit 801 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 801 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 executes the various methods and processes described above, such as the presentation method of the virtual personal assistant, for example, in some embodiments, the presentation method of the virtual personal assistant can be implemented as a computer software program, which is tangibly contained in A machine-readable medium, such as storage unit 808 . In some embodiments, part or all of the computer program can be loaded and/or installed on the electronic device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the virtual personal assistant presentation method described above can be performed. Alternatively, in other embodiments, the computing unit 801 may be configured in any other appropriate way (for example, by means of firmware) to execute the virtual personal assistant presentation method.

Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.

Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.

The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.

A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.

It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims

A display method of a virtual personal assistant, comprising:

Acquiring the sound features represented by the target interaction instruction; wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

Identifying the passenger's identity attribute information based on the sound features represented by the target interaction instruction, and using the passenger's identity attribute information as the target attribute information;

Obtaining the virtual personal assistant VPA display information matching the target attribute information, the VPA display information including the VPA image;

The display information of the VPA is output through the display screen in the vehicle.
The method according to claim 1, wherein said outputting said VPA display information through a display screen in the vehicle comprises:

Outputting the VPA display information through a display screen in a designated seating area in the vehicle;

Wherein, the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
The method according to claim 1, wherein the method of determining the target interaction instruction comprises:

If multiple voice interaction instructions are collected in the vehicle at the same time, then based on the priority of each voice interaction instruction, select the instruction with the highest priority as the target interaction instruction;

Wherein, the priority of each voice interaction instruction is determined based on the seat area where the passenger who issues the voice interaction instruction is located.
The method according to any one of claims 1-3, wherein said obtaining display information of a virtual personal assistant (VPA) that matches said target attribute information includes:

Find historical VPA display information corresponding to the sound feature represented by the target interaction instruction, wherein the historical VPA display information is the VPA display information shown to the passenger who issued the target interaction instruction;

Among the historical VPA display information, the VPA display information with the highest display frequency is selected.
The method according to claim 1, wherein said acquiring the virtual personal assistant (VPA) display information matching said target attribute information comprises:

The VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.
According to the method according to claim 1, before identifying the identity attribute information of the passenger based on the sound characteristics represented by the target interaction instruction, the method further includes:

From the corresponding relationship between each sound feature and user behavior data, find the user behavior data corresponding to the sound feature represented by the target interaction instruction;

If it is found, then determine the VPA display information that matches the found user behavior data, and execute the step of outputting the VPA display information through the display screen in the vehicle;

If not found, perform the step of identifying the identity attribute information of the passenger based on the voice features represented by the target interaction instruction.
According to the method according to claim 1, the VPA display information also includes a first guide; after the VPA display information is output through the display screen in the vehicle, it also includes:

Acquiring display status information for multimedia information, the multimedia information is the information displayed to passengers by the car machine of the vehicle through the display screen after outputting the first guide language through the display screen in the vehicle ;

If the display status information indicates that the display of the multimedia information is completed, then determine the second guide that matches the user behavior data of the passenger;

The second guide is output through a display screen in the vehicle.
The method according to any one of claims 1, 5-7, wherein identifying the identity attribute information of the passenger based on the voice characteristics represented by the target interaction instruction includes:

Inputting the voice features represented by the target interaction instruction into a pre-trained neural network model, and obtaining identity attribute information output by the neural network model.
A display system of a virtual personal assistant, comprising: a server and a vehicle-mounted terminal of a vehicle;

The vehicle-mounted terminal is used to obtain a target interaction instruction, determine the sound characteristics represented by the target interaction instruction, and send the sound characteristics to the server, and the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

The server is configured to receive the sound feature represented by the target interaction instruction, identify the identity attribute information of the passenger based on the sound feature represented by the target interaction instruction, and use the identity attribute information of the passenger as the target attribute Information, obtaining the virtual personal assistant VPA display information matched with the target attribute information, the VPA display information including the VPA image, and sending the VPA display information to the vehicle terminal;

The vehicle-mounted terminal is further configured to receive the VPA display information, and output the VPA display information through a display screen in the vehicle.
The system according to claim 9, wherein the vehicle-mounted terminal is specifically used for:

Outputting the VPA display information through a display screen in a designated seating area in the vehicle;

Wherein, the designated seat area is the seat area where the passenger who issued the target interaction instruction is located.
The system according to claim 9, the vehicle-mounted terminal is also used for:

If multiple voice interaction instructions are collected in the vehicle at the same time, then based on the priority of each voice interaction instruction, select the instruction with the highest priority as the target interaction instruction;

Wherein, the priority of each voice interaction instruction is determined based on the seat area where the passenger who issued the voice interaction instruction is located.
The system according to any one of claims 9-11, wherein the vehicle-mounted terminal is also used for:

If it fails to obtain the VPA display information from the server, then search for the historical VPA display information corresponding to the sound feature represented by the target interaction instruction, wherein the historical VPA display information is shown to the passenger who issued the target interaction instruction VPA display information;

Among the historical VPA display information, the VPA display information with the highest display frequency is selected.
The system according to claim 9, wherein the server is specifically used for:

The VPA display information matching the target attribute information is obtained from the preset corresponding relationship between each identity attribute information and the VPA display information.
The system according to claim 9, the server is further configured to:

Before identifying the identity attribute information of the passenger based on the sound features represented by the target interaction instruction, search for the correspondence between the sound features represented by the target interaction instruction from the correspondence between each sound feature and user behavior data user behavior data;

If it is found, then determine the VPA display information that matches the found user behavior data, and execute the step of outputting the VPA display information through the display screen in the vehicle;

If not found, perform the step of identifying the identity attribute information of the passenger based on the voice features represented by the target interaction instruction.
The system according to claim 9, the VPA presentation information further includes a first guide;

The server is further configured to obtain display status information for multimedia information after outputting the VPA display information through the display screen in the vehicle, and the multimedia information is to pass through the display screen in the vehicle, and to After the first guide is output, the vehicle machine of the vehicle displays the information to passengers through the display screen;

The server is further configured to determine a second guide that matches the user behavior data of the passenger and send the second guide to the vehicle-mounted terminal if the display state information indicates that the multimedia information display is completed;

The vehicle-mounted terminal is further configured to receive the second guide, and output the second guide through a display screen in the vehicle.
The system according to any one of claims 9, 13-15, wherein the server is specifically used for:

Inputting the voice features represented by the target interaction instruction into a pre-trained neural network model, and obtaining identity attribute information output by the neural network model.
A display device for a virtual personal assistant, comprising:

An acquisition module, configured to acquire the sound features represented by the target interaction instruction; wherein, the target interaction instruction is a voice interaction instruction issued by a passenger in the vehicle;

An identification module, configured to identify the passenger's identity attribute information based on the sound features represented by the target interaction instruction, and use the passenger's identity attribute information as the target attribute information;

The obtaining module is also used to obtain the virtual personal assistant VPA display information matching the target attribute information, and the VPA display information includes a VPA image;

The output module is used to output the display information of the VPA through the display screen in the vehicle.
An electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1-8. Methods.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1-8.
A computer program product, comprising a computer program, which implements the steps of any one of claims 1-8 when the computer program is executed by a processor.