CN116597406A

CN116597406A - User intention vehicle type recognition method and device based on multiple modes and storage medium

Info

Publication number: CN116597406A
Application number: CN202310592053.6A
Authority: CN
Inventors: 米佳
Original assignee: Weilai Software Technology Shanghai Co ltd
Current assignee: Weilai Software Technology Shanghai Co ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-08-15

Abstract

The invention relates to the technical field of multi-mode recognition, in particular to a multi-mode-based user intention vehicle type recognition method and device and a storage medium, and aims to solve the problem that the intention vehicle type of a user cannot be obtained according to information such as text, audio and video of the user reserved in automobile mobile application at present. For this purpose, the multi-modal-based user intention vehicle type recognition method of the present invention includes: acquiring multi-mode data of a user of an intended vehicle type to be identified; preprocessing the multi-mode data to obtain data which are respectively represented by vectors by the multi-mode data; and inputting the data respectively represented by the vectors of the multi-mode data into a pre-trained CNN-BiLSTM neural network, wherein the CNN-BiLSTM neural network outputs the intention probability of the user intention vehicle type. The method can integrate multi-mode data, and capture multiple vehicle types with higher user intention from the multiple vehicle types by using the CNN-BiLSTM neural network.

Description

User intention vehicle type recognition method and device based on multiple modes and storage medium

Technical Field

The invention relates to the technical field of multi-mode recognition, and particularly provides a multi-mode-based user intention vehicle type recognition method and device, a storage medium and a control device.

Background

With the advance of the mobile age, there are a large number of active users in the car mobile application installed on the intelligent terminal, and the active users keep a large amount of information such as text, audio and video in the car mobile application, however, the prior art does not have content about researching the intended car type of the user according to the information such as text, audio and video acquired from the car mobile application.

Accordingly, there is a need in the art for a new multi-modal based user intent vehicle type recognition scheme to address the above-described problems.

Disclosure of Invention

In order to solve the problems in the prior art, namely to solve the problem that the intention vehicle type of the user cannot be obtained according to the text, audio, video and other information of the user which are reserved in the automobile mobile application, the invention provides a multi-mode-based user intention vehicle type identification method and device, a storage medium and a control device.

In a first aspect, the present invention provides a method for identifying a user intention vehicle type based on multiple modes, the method comprising:

acquiring multi-mode data of a user of an intended vehicle type to be identified;

preprocessing the multi-modal data to obtain multi-modal data expressed by vectors;

and inputting the multi-mode data represented by the vectors into a pre-trained CNN-BiLSTM neural network, wherein the CNN-BiLSTM neural network outputs the intention probability of the user intention vehicle type.

In one technical scheme of the method for identifying the user intention vehicle type based on multiple modes, the acquiring the multiple mode data of the user of the intention vehicle type to be identified comprises the following steps:

acquiring basic attribute data of a user of an intended vehicle type to be identified, wherein the basic attribute data comprises the age, sex and family structure of the user;

obtaining test driving data of a user of an intended vehicle type to be identified, wherein the test driving data comprise specific test driving time, test driving vehicle types and test driving times;

and acquiring text behavior data, audio behavior data and video behavior data of the user of the intention vehicle type to be identified under the mobile application of the automobile.

In one technical scheme of the multi-mode-based user intention vehicle type recognition method, the method further comprises the following steps:

acquiring a training sample set, wherein the training sample set comprises a label with a user intention vehicle type and multi-mode data of a user corresponding to the label;

and training the CNN-BiLSTM neural network by using the training sample set, thereby obtaining the trained CNN-BiLSTM neural network.

In one technical scheme of the above method for identifying a user intention vehicle model based on multiple modes, the inputting the multiple mode data represented by the vector into a pre-trained CNN-BiLSTM neural network, the CNN-BiLSTM neural network outputting the intention probability of the user intention vehicle model includes:

extracting the characteristics of the intention vehicle model in the audio mode, the characteristics of the intention vehicle model in the text mode and the characteristics of the intention vehicle model in the video mode according to the multi-mode data expressed by the vectors;

feature fusion is carried out on the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode extracted by the feature extraction network, so that the overall features of the intention vehicle model of the user are obtained;

and classifying the overall characteristics of the user intention vehicle type output by the characteristic fusion network to obtain the intention probability of the user intention vehicle type.

In one technical scheme of the method for identifying the user intention vehicle type based on the multiple modes, extracting the characteristics of the intention vehicle type in the audio mode, the characteristics of the intention vehicle type in the text mode and the characteristics of the intention vehicle type in the video mode according to the multiple mode data expressed by vectors, wherein the method comprises the following steps:

basic attribute data and test driving data which are expressed by vectors are input into a trained CNN neural network, and the CNN neural network outputs the intention vehicle type of a user;

the method comprises the steps of inputting an intended vehicle type of a user, text behavior data, audio behavior data and video behavior data which are output by a CNN (computer numerical network) to a trained BiLSTM (computer numerical network), wherein the intended vehicle type, the text behavior data, the audio behavior data and the video behavior data are output by the CNN, and the BiLSTM outputs the characteristics of the intended vehicle type under an audio mode, the characteristics of the intended vehicle type under a text mode and the characteristics of the intended vehicle type under a video mode.

In one technical scheme of the method for identifying the user intention vehicle type based on the multiple modes, feature fusion is performed on features of the intention vehicle type in an audio mode, features of the intention vehicle type in a text mode and features of the intention vehicle type in a video mode extracted by the feature extraction network, so that overall features of the user intention vehicle type are obtained, and the method comprises the following steps:

calculating the similarity of the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode of the user, and obtaining the similarity of the features of the intention vehicle model in the audio and text modes, the similarity of the features of the intention vehicle model in the text and video modes and the similarity of the features of the intention vehicle model in the audio and video modes;

the method comprises the steps of taking similarity of features of an intention vehicle model in an audio mode and a text mode as weights of features of the intention vehicle model in the audio mode, taking similarity of features of the intention vehicle model in the text mode and the video mode as weights of features of the intention vehicle model in the video mode, taking similarity of features of the intention vehicle model in the audio mode and the video mode as weights of features of the intention vehicle model in the audio mode, and carrying out weighted average on the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode to obtain integral features of the intention vehicle model of a user.

In one technical scheme of the method for identifying the intention vehicle type of the user based on multiple modes, calculating the similarity of the features of the intention vehicle type in two different modes according to the features of the intention vehicle type in the audio mode, the features of the intention vehicle type in the text mode and the features of the intention vehicle type in the video mode of the user to obtain the similarity of the features of the intention vehicle type in the audio and text modes, the similarity of the features of the intention vehicle type in the text and the features of the intention vehicle type in the video modes, and the similarity of the features of the intention vehicle type in the audio and the video modes, wherein the method comprises the following steps:

the Similarity is the Similarity of the features of the intended vehicle model in two different modes, and a is the feature of the intended vehicle model in an audio mode, the feature of the intended vehicle model in a video mode or the feature of the intended vehicle model in a text mode; b is the characteristic of the intention vehicle model in an audio mode, the characteristic of the intention vehicle model in a video mode or the characteristic of the intention vehicle model in a text mode, and a and b are two different modes.

In a second aspect, the present invention provides a multi-modality based user intention vehicle type recognition apparatus, the apparatus comprising:

the acquisition module is used for acquiring multi-mode data of a user of an intended vehicle type to be identified;

the preprocessing module is used for preprocessing the multi-mode data to obtain multi-mode data represented by vectors;

the recognition module is used for inputting the multi-mode data represented by the vector into a pre-trained CNN-BiLSTM neural network, and the CNN-BiLSTM neural network outputs the intention probability of the user intention vehicle type.

In a third aspect, the present invention provides a control device, which includes a processor and a storage device, where the storage device is adapted to store a plurality of program codes, where the program codes are adapted to be loaded and executed by the processor to perform the method according to any one of the above technical solutions of the multi-modal-based user intention vehicle type recognition method.

In a fourth aspect, the present invention provides a computer readable storage medium having stored therein a plurality of program codes adapted to be loaded and executed by a processor to perform the method according to any one of the above-mentioned solutions of the multi-modal based user intention vehicle type recognition method.

One or more of the above technical solutions of the present invention at least has one or more of the following

The beneficial effects are that:

in the technical scheme of implementing the invention, a multi-mode user intention vehicle type recognition method based on multiple modes is provided, and the method is implemented by acquiring multi-mode data of a user of an intention vehicle type to be recognized; preprocessing the multi-modal data to obtain multi-modal data expressed by vectors; and inputting the multi-mode data represented by the vectors into a pre-trained CNN-BiLSTM neural network, wherein the CNN-BiLSTM neural network outputs the intention probability of the user intention vehicle type. The recognition method can integrate multi-mode data, capture multiple vehicle types with higher user intention degree from the multiple vehicle types by utilizing the CNN-BiLSTM neural network, so that the user intention vehicle types can be timely and efficiently found, important influence is brought to accurately and efficiently following the user intention, and meanwhile, the product optimization of a vehicle enterprise on the vehicle type can be driven.

Drawings

The present disclosure will become more readily understood with reference to the accompanying drawings. As will be readily appreciated by those skilled in the art: the drawings are for illustrative purposes only and are not intended to limit the scope of the present invention. Moreover, like numerals in the figures are used to designate like parts, wherein:

FIG. 1 is a flow chart of the main steps of a multi-modality based user intention vehicle type recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of the main steps of step S101 according to one embodiment of the invention;

FIG. 3 is a schematic diagram of the overall model structure of a CNN-BiLSTM neural network according to one embodiment of the invention;

FIG. 4 is a main step flow diagram of step S103 according to one embodiment of the present invention;

FIG. 5 is a main step flow diagram of step S1031 according to an embodiment of the invention;

FIG. 6 is a main step flow diagram of step S1032 according to one embodiment of the present invention;

FIG. 7 is a schematic workflow diagram of a specific application of a CNN-BiLSTM neural network according to one embodiment of the invention;

fig. 8 is a main structural diagram of a multi-modal based user intention vehicle type recognition apparatus according to an embodiment of the present invention;

fig. 9 is a main structural diagram of a control device according to an embodiment of the present invention.

Detailed Description

Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

In the description of the present invention, a "module," "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, or software components, such as program code, or a combination of software and hardware. The processor may be a central processor, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor may be implemented in software, hardware, or a combination of both. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, and the like. The term "a and/or B" means all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" has a meaning similar to "A and/or B" and may include A alone, B alone or A and B. The singular forms "a", "an" and "the" include plural referents.

Referring to fig. 1, fig. 1 is a flowchart illustrating main steps of a multi-mode-based user intention vehicle type recognition method according to an embodiment of the present invention. As shown in fig. 1, the method for identifying a vehicle type based on multi-mode user intention in the embodiment of the present invention mainly includes the following steps S101 to S103.

Step S101: and acquiring multi-mode data of the user of the intention vehicle type to be identified.

In this embodiment, the vehicle enterprise will make a mobile application for the vehicle model sold by itself, and the mobile application details the content of the vehicle model, the corresponding quotation, the picture, the quotation, the evaluation, the shopping guide, and the like, including some text, audio, and video. When a user enters the car mobile application and browses the content in the car mobile application, some data is retained in the car mobile application. In the embodiment, the multi-mode data of the user of the intention vehicle type to be identified can be obtained from the automobile mobile application.

The modes refer to modes of expressing or perceiving things, wherein speech languages and the like belong to natural and initial modes, and emotion and the like belong to abstract modes. For each source or form of information, a mode (mode) can be used, for example, the media of the information include voice, video, text and the like, the voice is a mode, the video is a mode, and the text is a mode. Multimodal is the expression or perception of things from multiple modalities.

In one implementation of the embodiment of the present invention, as shown in fig. 2, the step S101 may further include steps S1011 to S1013:

step S1011: acquiring basic attribute data of a user of an intended vehicle type to be identified, wherein the basic attribute data comprises the age, sex and family structure of the user;

step S1012: obtaining test driving data of a user of an intended vehicle type to be identified, wherein the test driving data comprise specific test driving time, test driving vehicle types and test driving times;

step S1013: and acquiring text behavior data, audio behavior data and video behavior data of the user of the intention vehicle type to be identified under the mobile application of the automobile.

In a specific example, for example, the fruit company has built a Nio APP, and the Nio APP details the vehicle model, the corresponding quotation, the picture, the quotation, the evaluation, the shopping guide and the like of the new energy automobile sold by the fruit company. The operation behaviors of the user in the Nio App, such as the operation behaviors that the user searches for a certain vehicle type or clicks on the configuration of a certain vehicle type, can keep some user use data records in the Nio App, such as user attribute data including user age, sex, family structure and the like of the user ID; the method comprises the steps that a user tries driving data such as vehicle types, try driving times and the like which are tried to be driven in different time periods; the system also comprises behavior data of each mode of the user under the Nio APP, for example, text behavior data of notes of certain vehicle types or comments made under the content of notes of certain vehicle types issued by others are written in the Nio APP; or audio behavior data of audio which discusses certain vehicle types with other users; or the video behavior data of videos of certain vehicle types uploaded by the user on the Nio APP.

In this example, the usage data of the users, which are retained in the Nio APP, are recorded as multi-modal data, the multi-modal data are fused to effectively integrate the data of multiple modes, the features of the user intention vehicle model reflected under the data of different modes are captured to draw the advantages of different modes, and then the features of the user intention vehicle model reflected under the data of different modes are fused to obtain the integral features of the user intention vehicle model.

Step S102: preprocessing the multi-modal data to obtain multi-modal data represented by vectors.

In this embodiment, preprocessing is performed on data in different modes, such as word segmentation, stop word removal, and the like, for text data; the voice data needs to be subjected to voice recognition, audio feature extraction and the like; video data is subjected to video shot segmentation, key frame extraction and the like, and after the multi-modal data is preprocessed, the multi-modal data expressed by vectors is obtained.

Step S103: and inputting the multi-mode data represented by the vectors into a pre-trained CNN-BiLSTM neural network, wherein the CNN-BiLSTM neural network outputs the intention probability of the user intention vehicle type.

In one implementation of the embodiment of the present invention, before step S102, the method further includes the following steps:

In one embodiment, the CNN-BiLSTM neural network is trained based on multimodal data that was previously retained in the Nio APP by the user, and the model of the intention of the user followed by the user at the fruit as a training sample set. The overall model structure of the CNN-BiLSTM neural network is shown in fig. 3, text behavior data such as comment characters under the mobile application of an automobile, audio behavior data such as vector characteristics after the record characters are converted under the mobile application of the automobile, video behavior data such as vector characteristics after the record characters are converted in the audio of the video under the mobile application of the automobile, and features obtained by extracting the basic attribute data of a user and test driving data through CNN are input into the BiLSTM neural network, and then the intention probability of the intended automobile type of the user is obtained through a feature fusion network (not shown in fig. 3) and a decision network (not shown in fig. 3), namely the ordering of the final scores of the automobile types.

That is, the CNN-BiLSTM neural network includes a feature extraction network, a feature fusion network, and a decision network. The feature extraction network is used for extracting and extracting features of the intention vehicle type in the audio mode, features of the intention vehicle type in the text mode and features of the intention vehicle type in the video mode according to the multi-mode data; the feature fusion network is used for carrying out feature fusion on the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode extracted by the feature extraction network, so that the overall feature decision network of the intention vehicle model of the user is obtained, and the overall features of the intention vehicle model of the user, which are output by the feature fusion network, are classified, so that the intention probability of the intention vehicle model of the user is obtained.

In the feature extraction network, the basic attribute data in the multi-mode data and the intention embodied in the test driving data cannot change along with time; therefore, the basic attribute data and the test driving data are input to the CNN neural network, and the CNN neural network outputs the intention vehicle type of the user. The intention of a user in the multimodal data, which is embodied in text behavior data, audio behavior data and video behavior data under the mobile application of the automobile, can change with time for a long term, a short term or a cycle, so that the characteristics of the intention vehicle type under the audio mode, the characteristics of the intention vehicle type under the text mode and the characteristics of the intention vehicle type under the video mode are extracted by adopting the BiLSTM neural network.

In the training process, continuously optimizing parameters of the CNN-BiLSTM neural network, and continuously training the CNN-BiLSTM neural network when the loss function of the CNN-BiLSTM neural network is larger than a preset threshold value; stopping training until the loss function of the CNN-BiLSTM neural network is smaller than a preset threshold or the training iteration number reaches the maximum value, and obtaining the trained CNN-BiLSTM neural network at the moment.

In one embodiment, the loss function employs a cross entropy loss function.

In one implementation of the embodiment of the present invention, as shown in fig. 4, the step S103 may further include steps S1031 to S1033:

step S1031: and extracting the characteristics of the intention vehicle model in the audio mode, the characteristics of the intention vehicle model in the text mode and the characteristics of the intention vehicle model in the video mode according to the multi-mode data expressed by the vectors.

In one implementation of the embodiment of the present invention, as shown in fig. 5, the step S1031 may further include a step S10311 to a step S10312:

step S10311: and inputting the basic attribute data and the test driving data which are represented by the vectors into a trained CNN neural network, wherein the CNN neural network outputs the intended vehicle type of the user.

In this embodiment, the CNN neural network includes an input layer, a convolution layer, a pooling layer, a full connection layer and an output layer, after training of the training sample set, the CNN neural network is also trained, and after basic attribute data and test driving data represented by vectors are input into the CNN neural network, the trained CNN neural network outputs the intended vehicle types of the user, for example, the intended vehicle types of the output user include ET5, ET7, EC6, EC7, ES6, ES7 and ES8 vehicle types being sold by the company of the fruit company, and the finally determined intended vehicle types of the user need to rank the intended probabilities of ET5, ET7, EC6, EC7, ES6, ES7 and ES8 vehicle types, which are the preceding vehicle type or the preceding vehicle types with higher intended probabilities, that is, the intended vehicle types of the user.

Step S10312: the method comprises the steps of inputting an intended vehicle type of a user, text behavior data, audio behavior data and video behavior data which are output by a CNN (computer numerical network) to a trained BiLSTM (computer numerical network), wherein the intended vehicle type, the text behavior data, the audio behavior data and the video behavior data are output by the CNN, and the BiLSTM outputs the characteristics of the intended vehicle type under an audio mode, the characteristics of the intended vehicle type under a text mode and the characteristics of the intended vehicle type under a video mode.

Continuing the above embodiment, inputting ET5, ET7, EC6, EC7, ES6, ES7 and ES8 models, text behavior data, audio behavior data and video behavior data represented by vectors, which are output by the CNN neural network, into the trained BiLSTM neural network, and outputting the characteristics of the intended model in the audio mode, the characteristics of the intended model in the text mode and the characteristics of the intended model in the video mode by the BiLSTM neural network. The BiLSTM neural network comprises an input layer, a forward LSTM layer, a backward LSTM layer, a full connection layer and an output layer, and text behavior data, audio behavior data and the characteristics of the intended vehicle types corresponding to the video behavior data can be more accurately captured through the forward LSTM layer and the backward LSTM layer.

Step S1032: and carrying out feature fusion on the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode extracted by the feature extraction network to obtain the integral features of the intention vehicle model of the user.

In one implementation of the embodiment of the present invention, as shown in fig. 6, the step S1032 may further include steps S10321 to S10322:

step S10321: calculating the similarity of the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode of the user, and obtaining the similarity of the features of the intention vehicle model in the audio and text modes, the similarity of the features of the intention vehicle model in the text and video modes and the similarity of the features of the intention vehicle model in the audio and video modes.

In one implementation manner of the embodiment of the present invention, the calculation formula of the step S10321 is:

In this embodiment, when a is a feature of the intended vehicle model in the audio mode, b is a feature of the intended vehicle model in the video mode or a feature of the intended vehicle model in the text mode; and b is the characteristic of the intention vehicle model in the audio mode or the characteristic of the intention vehicle model in the text mode, and b is the characteristic of the intention vehicle model in the audio mode or the characteristic of the intention vehicle model in the video mode, so as to respectively calculate the Similarity.

Step S10322: the method comprises the steps of taking similarity of features of an intention vehicle model in an audio mode and a text mode as weights of features of the intention vehicle model in the audio mode, taking similarity of features of the intention vehicle model in the text mode and the video mode as weights of features of the intention vehicle model in the video mode, taking similarity of features of the intention vehicle model in the audio mode and the video mode as weights of features of the intention vehicle model in the audio mode, and carrying out weighted average on the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode to obtain integral features of the intention vehicle model of a user.

In this embodiment, the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the video mode and the features of the intention vehicle model in the text mode are three intention degrees, the three intention degrees are used as weights, weighted average is performed on the three intention degrees to obtain a value of comprehensive intention degrees, the value is the integral feature of the intention vehicle model of the user, normalization processing is performed on the value of the comprehensive intention degrees, and the value is mapped to a fixed range, for example [0,1].

Setting fusion intensity of audio, video, text, user basic attribute and test driving, comparing the value of the comprehensive intention degree with the fusion intensity, if the value of the comprehensive intention degree is larger than the fusion intensity, indicating that the feature fusion is better, entering a subsequent decision network by using the comprehensive intention degree to obtain the intention probability of the user intention vehicle type.

Step S1033: and classifying the overall characteristics of the user intention vehicle type output by the characteristic fusion network to obtain the intention probability of the user intention vehicle type.

Continuing the above example, the comprehensive intent is input to the decision network, the decision network outputs the intent probability of the user's intent model, as shown in fig. 7, the multi-modal data is input to the CNN-BiLSTM neural network, after feature extraction, the intent of ET5, ET7, EC6, EC7, ES6, ES7 and ES8 models in the audio mode, the intent of ET5, ET7, EC6, EC7, ES6, ES7 and ES8 models in the text mode, and the intent of ET5, ET7, EC6, EC7, ES6, ES7 and ES8 models in the video mode are obtained, and after the feature fusion network and the decision network, the CNN-BiLSTM neural network outputs the intent of final ET5, ET7, EC6, EC7, ES6, ES7 and ES8 models after the fusion of different modes. For example, the intent probability of the ET5 model is 0.2, the intent probability of the ET7 model is 0.4, the intent probability of the EC6 model is 0.5, the intent probability of the EC7 model is 0.1, the intent probability of the ES6 model is 0.3, the intent probability of the ES7 model is 0.8, and the intent probability of the ES8 model is 0.6.

In a specific example, the intention probabilities of the user intention vehicle types output by the decision network can be ranked from high to low, and the ranking order is as follows: the ES7 model intention probability > ES8 model intention probability > EC6 model intention probability > ET7 model intention probability > ES6 model intention probability > ET5 model intention probability > EC7 model intention probability.

When the first two types of vehicle models with higher intent are determined to be the user intent vehicle models, the ES7 and ES8 in the embodiment are the user intent vehicle models.

Based on the steps S101-S103, a multi-mode user intention vehicle type recognition method is provided, wherein the method is realized by acquiring multi-mode data of a user of an intention vehicle type to be recognized; preprocessing the multi-modal data to obtain multi-modal data expressed by vectors; and inputting the multi-mode data represented by the vectors into a pre-trained CNN-BiLSTM neural network, wherein the CNN-BiLSTM neural network outputs the intention probability of the user intention vehicle type. The recognition method can integrate multi-mode data, capture multiple vehicle types with higher user intention degree from the multiple vehicle types by utilizing the CNN-BiLSTM neural network, so that the user intention vehicle types can be timely and efficiently found, important influence is brought to accurately and efficiently following the user intention, and meanwhile, the product optimization of a vehicle enterprise on the vehicle type can be driven.

It should be noted that, although the foregoing embodiments describe the steps in a specific order, it will be understood by those skilled in the art that, in order to achieve the effects of the present invention, the steps are not necessarily performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of the present invention.

The invention further provides a device for identifying the user intention vehicle type based on the multiple modes.

Referring to fig. 8, fig. 8 is a main block diagram of a multi-modal based user intention vehicle type recognition apparatus according to one embodiment of the present invention. As shown in fig. 8, the device for identifying a vehicle type based on multi-mode user intention in the embodiment of the invention mainly comprises an acquisition module 11, a preprocessing module 12 and an identification module 13. The acquisition module 11 may be configured in some embodiments to acquire multimodal data of a user of an intended vehicle type to be identified. The preprocessing module 12 may be configured to preprocess the multimodal data to obtain multimodal data represented by vectors. The recognition module 13 may be configured to input the multi-modal data represented by vectors to a pre-trained CNN-BiLSTM neural network that outputs the probability of intent of the user's intended vehicle model.

In one embodiment, the description of the specific implementation functions may be described with reference to step S101-step S103.

The foregoing multi-mode-based user intention vehicle type recognition device is used for executing the multi-mode-based user intention vehicle type recognition method embodiment shown in fig. 1, and the technical principles of the two are similar, the technical problems to be solved and the technical effects to be produced are similar, and those skilled in the art can clearly understand that, for convenience and brevity of description, the specific working process and related description of the multi-mode-based user intention vehicle type recognition device may refer to the description of the multi-mode-based user intention vehicle type recognition method embodiment, and will not be repeated herein.

It will be appreciated by those skilled in the art that the present invention may implement all or part of the above-described methods according to the above-described embodiments, or may be implemented by means of a computer program for instructing relevant hardware, where the computer program may be stored in a computer readable storage medium, and where the computer program may implement the steps of the above-described embodiments of the method when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable storage medium may include: any entity or device, medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, software distribution media, and the like capable of carrying the computer program code. It should be noted that the computer readable storage medium may include content that is subject to appropriate increases and decreases as required by jurisdictions and by jurisdictions in which such computer readable storage medium does not include electrical carrier signals and telecommunications signals.

Further, the invention also provides a control device. In one control device embodiment according to the present invention, as shown in fig. 9, the control device includes a processor and a storage device, the storage device may be configured to store a program for executing the multimodal user intention vehicle type recognition method of the above method embodiment, and the processor may be configured to execute the program in the storage device, including but not limited to the program for executing the multimodal user intention vehicle type recognition method of the above method embodiment. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The control device may be a control device formed of various electronic devices.

Further, the invention also provides a computer readable storage medium. In one embodiment of the computer-readable storage medium according to the present invention, the computer-readable storage medium may be configured to store a program for performing the multi-modality-based user intention vehicle type recognition method of the above-described method embodiment, which may be loaded and executed by a processor to implement the multi-modality-based user intention vehicle type recognition method. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The computer readable storage medium may be a storage device including various electronic devices, and optionally, the computer readable storage medium in the embodiments of the present invention is a non-transitory computer readable storage medium.

Further, it should be understood that, since the respective modules are merely set to illustrate the functional units of the apparatus of the present invention, the physical devices corresponding to the modules may be the processor itself, or a part of software in the processor, a part of hardware, or a part of a combination of software and hardware. Accordingly, the number of individual modules in the figures is merely illustrative.

Those skilled in the art will appreciate that the various modules in the apparatus may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solution to deviate from the principle of the present invention, and therefore, the technical solution after splitting or combining falls within the protection scope of the present invention.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims

1. The method for identifying the user intention vehicle type based on the multiple modes is characterized by comprising the following steps of:

2. The method for identifying a user intention vehicle type based on multiple modes according to claim 1, wherein the acquiring the multiple mode data of the user of the intention vehicle type to be identified comprises:

3. The multi-modality based user intention vehicle type recognition method of claim 1, further comprising:

4. The method for recognizing a user intention vehicle model based on multiple modes according to claim 2, wherein the inputting the multiple mode data represented by vectors into a pre-trained CNN-BiLSTM neural network, the CNN-BiLSTM neural network outputting the intention probability of the user intention vehicle model, comprises:

5. The method for recognizing a user intention vehicle type based on a multi-modality according to claim 4, wherein the extracting features of the intention vehicle type in an audio modality, features of the intention vehicle type in a text modality, and features of the intention vehicle type in a video modality from the multi-modality data expressed in vectors comprises:

6. The method for identifying a user intention vehicle model based on multiple modes according to claim 5, wherein feature fusion is performed on features of the intention vehicle model in an audio mode, features of the intention vehicle model in a text mode and features of the intention vehicle model in a video mode extracted by the feature extraction network, so as to obtain overall features of the user intention vehicle model, and the method comprises the following steps:

7. The method for identifying a user intention vehicle model based on multiple modes according to claim 6, wherein calculating the similarity of the features of the intention vehicle model in two different modes according to the features of the intention vehicle model in the audio mode, the features of the intention vehicle model in the text mode and the features of the intention vehicle model in the video mode of the user to obtain the similarity of the features of the intention vehicle model in the audio and text modes, the similarity of the features of the intention vehicle model in the text and video modes and the similarity of the features of the intention vehicle model in the audio and video modes comprises:

8. A multi-modality based user intention vehicle type recognition apparatus, comprising:

9. A control device comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, characterized in that the program codes are adapted to be loaded and executed by the processor to perform the multimodal user intention vehicle type recognition method of any of claims 1 to 7.

10. A computer readable storage medium, in which a plurality of program codes are stored, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the multi-modality based user intention vehicle type recognition method of any one of claims 1 to 7.