CN116343821A

CN116343821A - Method and device for carrying out dialogue based on user information for vehicle

Info

Publication number: CN116343821A
Application number: CN202310304184.XA
Authority: CN
Inventors: 李龙飞; 陈彩可; 刘杰; 林孟超; 张炜玮; 李晓琴; 李�浩
Original assignee: Faw Beijing Software Technology Co ltd; FAW Group Corp
Current assignee: Faw Beijing Software Technology Co ltd; FAW Group Corp
Priority date: 2023-03-27
Filing date: 2023-03-27
Publication date: 2023-06-27

Abstract

The application discloses a method and a device for carrying out dialogue on the basis of user information for a vehicle. The method for carrying out dialogue on the basis of the user information for the vehicle comprises the following steps: acquiring user instruction information; acquiring user identity information; and generating speaking information according to the user identity information and the user instruction information. The method for carrying out dialogue based on the user information for the vehicle generates the corresponding conversation information according to the user identity and the instruction, so that the situation that corresponding actions are carried out after the user demands are received under any condition and the same conversation is generated no matter what the user identity is, and compared with the prior art, the conversation information can be adjusted more intelligently.

Description

Method and device for carrying out dialogue based on user information for vehicle

Technical Field

The application relates to the technical field of automobile voice control, in particular to a method for carrying out dialogue based on user information for a vehicle and a device for carrying out dialogue based on the user information for the vehicle.

Background

In prior art speech systems, most execute user commands based on semantically parsed content. For example: when the user controls through the voice command of opening the window, the car machine recognizes that the voice command can perform the operation of opening the window on one hand and generate an answer phone, for example, the answer phone is as follows: "good, window is being opened for you.

In the prior art, a vehicle cannot intelligently identify whether a user who gives an instruction can perform related operation or not, and danger cannot occur after the related operation is performed. For example, when the user is a child, if the child also instructs the vehicle to open the window by instruction, there may be a potential hazard if the window opening operation is performed at this time.

It is therefore desirable to have a solution that solves or at least alleviates the above-mentioned drawbacks of the prior art.

Disclosure of Invention

The present invention is directed to a method for performing a dialogue based on user information for a vehicle, which solves at least one of the above-mentioned problems.

In one aspect of the present invention, there is provided a method for performing a dialogue based on user information for a vehicle, the method comprising:

acquiring user instruction information;

acquiring user identity information;

and generating speaking information according to the user identity information and the user instruction information.

Optionally, the acquiring the user instruction information includes:

acquiring voice information of a user;

and recognizing the voice information of the user according to the voice semantic recognition engine so as to acquire user instruction information.

Optionally, the acquiring the user identity information includes:

in the process of acquiring voice information of a user, acquiring video information of personnel in a vehicle through a vehicle-mounted camera device;

acquiring face information of a person in the vehicle with the lip having actions according to the video information;

acquiring a user database, wherein the user database comprises at least one piece of preset user identity information and preset user face information corresponding to each piece of preset user identity information;

judging whether the similarity between the acquired user face information and the preset user face information exceeds a preset threshold value, if yes, then

And acquiring user identity information corresponding to the preset user face information with similarity exceeding a preset threshold value.

Optionally, the generating the speaking information according to the user identity information and the user instruction information includes:

obtaining a random construction voice database, wherein the random construction voice database comprises reply trends and at least one voice database, one reply trend corresponds to one voice database, the voice database comprises a plurality of prefix voices and a plurality of suffix voices, and one reply trend corresponds to at least one preset user instruction information;

acquiring basic style information according to the characteristic information;

obtaining reply trends corresponding to preset user instruction information according to the basic style information and the preset user instruction information corresponding to the user instruction information;

and obtaining a prefix phone and a suffix phone from a phone database corresponding to the reply trend according to the basic style information and the reply trend, wherein the prefix phone and the suffix phone form the phone information.

Optionally, the acquiring the feature information of the in-vehicle personnel sending the user instruction information according to the acquired video information of the in-vehicle personnel includes:

extracting multi-frame images of in-car personnel with user instruction information in the video information;

and extracting the characteristic information of the multi-frame images.

Optionally, the acquiring the basic style information according to the feature information includes:

and inputting the image characteristics into the basic style classifier so as to acquire basic style information.

Optionally, the method for performing dialogue based on the user information for the vehicle further comprises:

acquiring a positive telephone set and a negative telephone set, wherein the positive telephone set comprises at least one telephone information, the negative telephone set comprises at least one telephone information, and any telephone information in the positive telephone set is different from any telephone information in the negative telephone set;

judging whether the acquired speaking information belongs to a positive speaking set or a negative speaking set, if the acquired speaking information belongs to the positive speaking set, generating a control instruction according to the user instruction information, sending the control instruction to a control mechanism in a corresponding vehicle, and broadcasting the speaking information in a voice broadcasting mode.

Optionally, judging whether the acquired voice operation information belongs to a positive voice operation set or a negative voice operation set, and if the acquired voice operation information belongs to the negative voice operation set, broadcasting the generated voice operation in a voice broadcasting mode.

The application also provides a device for carrying out dialogue based on the user information for the vehicle, which comprises:

the user instruction information acquisition module is used for acquiring user instruction information;

the system comprises a user identity information acquisition module, a user identity information processing module and a user identity information processing module, wherein the user identity information acquisition module is used for acquiring user identity information;

and the conversation information generation module is used for generating conversation information according to the user identity information and the user instruction information.

Advantageous effects

The method for carrying out dialogue based on the user information for the vehicle generates the corresponding conversation information according to the user identity and the instruction, so that the situation that corresponding actions are carried out after the user demands are received under any condition and the same conversation is generated no matter what the user identity is, and compared with the prior art, the conversation information can be adjusted more intelligently.

Drawings

Fig. 1 is a flowchart of a method for performing a dialogue based on user information for a vehicle according to an embodiment of the present application.

Fig. 2 is a schematic view of an electronic device for implementing the method of performing a dialogue based on user information for the vehicle shown in fig. 1.

Detailed Description

In order to make the purposes, technical solutions and advantages of the implementation of the present application more clear, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the accompanying drawings in the embodiments of the present application. In the drawings, the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The described embodiments are some, but not all, of the embodiments of the present application. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application. Embodiments of the present application are described in detail below with reference to the accompanying drawings.

The method for performing a dialogue based on user information for a vehicle as shown in fig. 1 includes:

acquiring user instruction information;

acquiring user identity information;

In this embodiment, acquiring user instruction information includes:

acquiring voice information of a user;

Specifically, the user instruction information may be acquired by the following method:

collecting a voice to be recognized and inputting the voice to a recognition decoder to obtain voice features, wherein the voice features comprise recognition texts, voiceprint confidence, audio features and acoustic model features;

semantic analysis is carried out according to the voiceprint confidence and the recognition text so as to obtain service field information;

acquiring the confidence coefficient of the recognition result according to the service field information, the audio characteristics and the recognition text;

and acquiring instruction word confidence coefficient according to the identification result, the identification result confidence coefficient, the acoustic model feature and the service field information, wherein the instruction with the largest instruction word confidence coefficient is user instruction information.

In this embodiment, the obtaining the user identity information includes:

In this embodiment, when user identification information is identified, it is first necessary to determine which user in the vehicle is speaking, and therefore, the user speaking is first determined by identifying the lip motion, for example, the main driver seat in the vehicle is sitting on the user a, the co-driver seat is sitting on the user B, and by identifying the lip motion, it is found that the user B is speaking, and then it is determined that the user instruction information comes out of the user B.

In this embodiment, after the user instruction information is judged to be from the B user, the face information of the B user is obtained, and may be obtained specifically by the vehicle-mounted image capturing device.

And (3) comparing the similarity between the face information of the user B and the face information of the preset user, and acquiring user identity information exceeding a preset threshold value if the similarity exceeds the preset threshold value.

In this embodiment, the generating the session information according to the user identity information and the user instruction information includes:

acquiring a preset voice database, wherein the preset voice database comprises at least one preset voice group and at least one preset user identity information, one preset user identity information corresponds to one preset voice group, and each preset voice group comprises at least one preset instruction and a preset voice corresponding to each preset instruction;

acquiring a preset conversation group corresponding to preset user identity information which is the same as the user identity information;

and acquiring a preset speaking operation corresponding to the preset instruction which is identical to the user instruction information.

In this embodiment, the preset session group corresponding to the preset user identity information identical to the user identity information needs to be acquired first, for example, assuming that an artificial owner who issues an instruction has a preset session group, when some specific instructions are specific, only the owner is allowed to perform the operation, for example, the instruction is "navigation to the company" and only the owner is allowed to perform the operation, at this time, in the preset session group of the owner, the user has a good session to start navigation to the company.

If the person who gives the instruction is not the owner of the car, at this time, if the instruction is "navigate to the company", even if the person who gives the instruction has the preset user identity information, there is no good in the preset speaking group, and the navigation to the company is started, which may be other speaking, for example, bad meaning, and you have no authority to navigate the company.

In this embodiment, the method for performing a dialogue based on user information for a vehicle further includes:

judging whether the similarity between the acquired user face information and the preset user face information in the user database exceeds a preset threshold value, if not, then

Acquiring characteristic information of the personnel in the vehicle, which sends out user instruction information, according to the acquired video information of the personnel in the vehicle;

acquiring basic style information according to the characteristic information;

In this embodiment, acquiring the feature information of the in-vehicle person sending the user instruction information according to the acquired video information of the in-vehicle person includes:

and extracting the characteristic information of the multi-frame images.

In this embodiment, the obtaining the basic style information according to the feature information includes:

In some cases, the person who may speak is not a person in the user database, at which time the speaking technique may be set up randomly, thereby making the method of the present application more intelligent.

For example, if the person speaking is not a person in the user database, at this time, image information of the person speaking may be acquired by the image capturing device, and the image information may include various information such as face information, clothing information, and hair color information.

By identifying the above-described various information, basic style information may be obtained, which may include, for example, the following styles:

the male punk style (the male can be identified by face images, and the punk style can be identified by dyeing hair, dressing to denim, etc.).

Male child style (sex and approximate age are identified by face images, and thus determined as male child style).

Female child style (sex and approximate age can be identified from the face image, thereby determining female child style).

Through the identification, basic style information is obtained.

For example, assume that the style of the C user is determined to be a male child style by image recognition, at which time the user instruction information of the C user is to open a window.

At this time, the reply trend corresponding to the preset user instruction information is obtained according to the preset user instruction information corresponding to the basic style information and the user instruction information, specifically, the user is a child, so that the reply trend is a negative trend, that is, the instruction is not allowed to be executed through the instruction.

And obtaining a voice operation database corresponding to the reply tendency through the reply tendency, wherein in all voice operations in the voice operation database, at least one prefix voice operation in each prefix voice operation clearly expresses negative meaning, at least one suffix voice operation in each suffix voice operation clearly expresses negative meaning, and when the prefix voice operation and the suffix voice operation are selected, no matter which combination is adopted, the negative meaning can be clearly expressed in the voice operation information finally.

For example, the prefix technique may include the following:

children are not available;

the small partner can not open the window without permission.

For example, suffix-call can include the following:

you cannot open the window without permission, asking parents to help you open.

Please let parents help you open the window ha.

In this way, the composed session information may be as follows:

the small partner can not open the window without permission, and the parents can help you open the window without permission.

By adopting the mode, on one hand, the user can be told about the specific condition of answering the speaking, and on the other hand, the user can have fresh experience through different combinations, and the user is prevented from tiring a constant man-machine interaction language.

In this embodiment, it is determined whether the obtained speech information belongs to a positive speech set or a negative speech set, and if the obtained speech information belongs to the negative speech set, the generated speech is broadcast in a voice broadcast manner.

In this embodiment, not only the person who sends the user instruction information needs to answer through the phone operation, but also whether to execute the corresponding operation needs to be determined according to the specific phone operation condition, at this time, the phone operation information is divided into a positive phone operation set and a negative phone operation set, when the phone operation information belongs to the positive phone operation set, not only the phone operation is output, but also the instruction of the corresponding action is performed, for example, the vehicle owner issues the instruction: navigating to the company. At this time, if the speaking information corresponding to the instruction is found to be the positive speaking set by parsing, the speaking may be: preferably, navigation is started to the company, and navigation operation is performed.

If the style is the style of the male children, identifying that the voice operation information belongs to the negative voice operation set, and only performing voice broadcasting: the small partner can not open the window without permission, and the parents can help you open the window without permission. Without executing the corresponding operation instruction.

The application also provides a device for carrying out dialogue based on the user information for the vehicle, which comprises a user instruction information acquisition module, a user identity information acquisition module and a speech information generation module, wherein,

the user identity information acquisition module is used for acquiring user identity information;

It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and will not be repeated here.

The application also provides an electronic device comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, the processor implementing the method for the vehicle to conduct a dialogue based on user information as above when executing the computer program.

The application also provides a computer readable storage medium storing a computer program which when executed by a processor is capable of implementing the method for performing a dialogue based on user information for a vehicle as above.

Fig. 2 is an exemplary block diagram of an electronic device capable of implementing a method for a vehicle to conduct a conversation based on user information according to one embodiment of the present application.

As shown in fig. 2, the electronic device includes an input device 501, an input interface 502, a central processor 503, a memory 504, an output interface 505, and an output device 506. The input interface 502, the central processing unit 503, the memory 504, and the output interface 505 are connected to each other through a bus 507, and the input device 501 and the output device 506 are connected to the bus 507 through the input interface 502 and the output interface 505, respectively, and further connected to other components of the electronic device. Specifically, the input device 504 receives input information from the outside, and transmits the input information to the central processor 503 through the input interface 502; the central processor 503 processes the input information based on computer executable instructions stored in the memory 504 to generate output information, temporarily or permanently stores the output information in the memory 504, and then transmits the output information to the output device 506 through the output interface 505; the output device 506 outputs the output information to the outside of the electronic device for use by the user.

That is, the electronic device shown in fig. 2 may also be implemented to include: a memory storing computer-executable instructions; and one or more processors that, when executing the computer-executable instructions, implement the method for dialogue based on user information for a vehicle described in connection with fig. 1.

In one embodiment, the electronic device shown in FIG. 2 may be implemented to include: a memory 504 configured to store executable program code; the one or more processors 503 are configured to execute the executable program code stored in the memory 504 to perform the method for performing a dialogue based on user information for a vehicle in the above-described embodiment.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer-readable media include both permanent and non-permanent, removable and non-removable media, and the media may be implemented in any method or technology for storage of information. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps. A plurality of units, modules or means recited in the apparatus claims can also be implemented by means of software or hardware by means of one unit or total means.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The processor referred to in this embodiment may be a central processing unit (CentralProcessingUnit, CPU), or other general purpose processor, digital signal processor (DigitalSignalProcessor, DSP), application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), off-the-shelf programmable gate array (Field-ProgrammableGateArray, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be used to store computer programs and/or modules, and the processor may perform various functions of the apparatus/terminal device by executing or executing the computer programs and/or modules stored in the memory, and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SmartMediaCard, SMC), secure digital (SecureDigital, SD) card, flash card (FlashCard), at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

In this embodiment, the modules/units of the apparatus/terminal device integration may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the legislation and the practice of the patent in the jurisdiction. While the preferred embodiments have been described, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention, and it is intended that the scope of the invention shall be limited only by the claims appended hereto.

While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. A method for a vehicle to conduct a conversation based on user information, the method for a vehicle to conduct a conversation based on user information comprising:

acquiring user instruction information;

acquiring user identity information;

2. The method for performing a dialogue based on user information for a vehicle as claimed in claim 1, wherein said acquiring user instruction information includes:

acquiring voice information of a user;

3. The method for performing a dialogue based on user information for a vehicle as claimed in claim 2, wherein said acquiring user identification information includes:

4. The method for performing dialogue based on user information for vehicle as claimed in claim 3, wherein said generating dialogue information based on user identification information and user instruction information comprises:

5. The method for conducting a dialogue based on user information for a vehicle as claimed in claim 4, wherein said method for conducting a dialogue based on user information for a vehicle further comprises:

acquiring basic style information according to the characteristic information;

6. The method for performing dialogue based on user information for vehicle as claimed in claim 5, wherein said acquiring feature information of the in-vehicle person who issues the user instruction information based on the acquired video information of the in-vehicle person comprises:

and extracting the characteristic information of the multi-frame images.

7. The method for performing a dialogue based on user information for a vehicle as claimed in claim 6, wherein said acquiring basic style information based on feature information comprises:

8. The method for conducting a dialogue based on user information for a vehicle as claimed in claim 7, wherein said method for conducting a dialogue based on user information for a vehicle further comprises:

9. The method for performing dialogue based on user information according to claim 8, wherein it is determined whether the obtained speech information belongs to a positive speech set or a negative speech set, and if the obtained speech information belongs to the negative speech set, the generated speech is broadcasted by voice broadcasting.

10. A device for a vehicle to conduct a conversation based on user information, the device for a vehicle to conduct a conversation based on user information comprising: