CN115782835A

CN115782835A - Automatic parking remote driving control method for passenger boarding vehicle

Info

Publication number: CN115782835A
Application number: CN202310084318.1A
Authority: CN
Inventors: 马琼琼; 单萍; 沈亮; 马列
Original assignee: Jiangsu Tianyi Aviation Industry Co Ltd
Current assignee: Jiangsu Tianyi Aviation Industry Co Ltd
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-03-14
Anticipated expiration: 2043-02-09
Also published as: CN115782835B

Abstract

The invention provides a remote driving control method for automatic parking of a passenger boarding vehicle, which belongs to the field of intelligent driving, adopts a brand-new analysis technology framework combining local image analysis with global image and voice analysis to realize the remote driving control method for the boarding vehicle, carries out cooperative analysis in real time according to passenger voice signals in the boarding vehicle, and greatly improves the safety and the accuracy of the remote driving control method for the boarding vehicle; the attention cooperative analysis model is further pertinently improved, so that the analysis result is remarkably improved, and the control effect of boarding of passengers is further improved.

Description

Automatic parking remote driving control method for passenger boarding vehicle

Technical Field

The invention belongs to the field of intelligent driving, and particularly relates to an automatic parking remote driving control method for a passenger boarding vehicle.

Background

At present, in the current situation of automatic control of boarding vehicles, due to poor optimization of a control strategy, the existing automatic driving method does not consider sudden change factors of temporary plans of boarding vehicles during driving under the condition that passengers have emergency events in the process. The existing remote automatic parking driving control method has a deep learning strategy of adopting a convolutional neural network structure, the deep learning method mostly adopts a convolutional neural network to realize road image signals, and obstacle avoidance, path planning and the like are carried out, but the identification precision is limited, and the extraction capability of a neural network algorithm on large-view global correlation information is poor. In addition, the conventional method is used for a deep learning method of automatic driving, and a single convolutional neural network is mostly adopted to analyze image signals in road conditions, so that sudden conditions in the actual process of the vehicle and the requirements of passengers in the riding process are not considered. Therefore, the technical problem which needs to be solved at present is to carry out automatic parking remote driving control on passengers boarding a locomotive by carrying out collaborative analysis in real time according to the demands of the passengers in the locomotive.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides an automatic parking remote driving control method for a passenger boarding vehicle, which realizes automatic driving of the passenger boarding vehicle and sends the passenger to a designated boarding place.

The invention is realized by the following technical scheme:

step S100: acquiring signals, namely acquiring local image signals and global image signals of the surrounding environment through camera equipment arranged on the boarding roof, and acquiring real-time voice signals of passengers through voice acquisition equipment in a passenger boarding vehicle;

step S110: the images obtained by the camera comprise normal images and wide-angle high-resolution images with different magnification factors and are used for capturing local image signals and global image signals in road conditionsZ _a Wherein the local image signal is represented asZ _a1 The global image signal is represented asZ _a2 Voice signal acquired by voice signal acquisition moduleZ _b 。

Step S200: based on the image signal and the voice signal obtained in S100, an image signal processing module and a voice signal coding module are constructed to preprocess signals in different modes;

the signal preprocessing of the invention comprises the following steps: for image signal, adopting value normalization method, and for input signal vectorZ _a The preprocessed image signal isX _z = (Z _a -Z _min )/(Z _max -Z _min ) Wherein Z is _min Represents the sameZ _a Minimum value in signal, Z _max Is composed ofZ _a Maximum signal value in the signal respectively obtaining the preprocessed local image signalXAnd a global image signalX ₁ 。

For speech signalsZ _b The collected voice signal is vector-coded by adopting a voice vector coding algorithm to obtain a voice signalX ₂ ；

Step S300: training a local information analysis model based on the preprocessed local image signals, and training an image and voice collaborative analysis model based on the preprocessed global image signals and voice signals;

the local information analysis model realizes road identification, pedestrian identification, vehicle identification, signal lamp identification and dynamic obstacle identification in the driving process, and the specific steps are as follows:

step S311: the input image signal is a preprocessed partial image signal

WhereinC×HWhich represents the size of the image,Wrepresenting the number of images;

step S312: to pairXFeature extraction is carried out through a deep convolution feature module to obtain a feature map

Local key information is selected by adopting multi-region pooling operation, and the specific calculation steps are as follows:

step S313: for characteristic diagramX _C Randomly dividing N image blocks with different sizes, and calculating by maximum poolingWThe image block with the largest pixel value is contained in the same position of the image, and the maximum value solving function of the image block is

Wherein

Is a function of the maximum value of the signal,

all image block vectors representing the kth position in the W images;

finally, the image blocks with the maximum pixel value in each corresponding position are spliced into an image feature map

Wherein

In the formula

Representing the characteristics of the spliced image, k is the same as (1,N),

is represented at a positionkOn the upper partWThe largest image block of a picture is,

splicing functions for image block features;

step S314: image block vector using average pooling operation

Processing, averaging pooled operating functions

Wherein

Represents the average pooling function and is the average pooling function,

is represented at a positionkOn the upper partWThe image block characteristics of the mean value in an image,

all image block vectors representing the kth position in the W images; and the image block features after the average pooling processing are spliced into an image feature map

Wherein

，k∈（1，N），

Splicing functions for image block features;

step S315: for the spliced image feature map

And

the vector is processed into a one-dimensional vector through a convolution layer, a maximum pooling layer and an average pooling layer and is input into a full-link layer, and finally, a nonlinear function is adopted for processingBy usingsoftmaxThe function, its expression is as follows:

in the formula a _i 、a _j Weight, x, representing input vector _i And x _j As input variables, E ₁ Is the output class number.

In addition, the generation of training samples in the model training in the embodiment is well known to those skilled in the art.

After the local image signals are processed, in order to fuse voice interaction information and carry out cooperative analysis by combining a global image signal and a voice signal of the position of the boarding vehicle in real time, information extraction is carried out on a global wide-angle image shot by a camera system through a convolutional neural network, and then the voice signal and the image signal obtained after vector coding are processed through an automatic supervision coding module at the same time. The method comprises the following specific steps:

step S320: in the collaborative analysis model, assumptionsZ _a Global image signal inX ₁ The vector processed by the feature extraction module isX _a1 ,Speech signalX ₂ The coded vector isX _a2 And adopting a vector sequential splicing mode to carry out image signal processingX _a1 And coded speech signalX _a2 Are combined into a vectorX _a The module comprises four self-learning matrixes which are respectively used for representing position information, depth information, content information and relevance informationA，Q、K、VVector representation with dimensions of then vectorX _a Multiplication by a weight matrix (linear transformation), the corresponding matrix of which is defined as W _a 、W _q 、W _k 、W _v And obtaining a corresponding self-learning input vector, wherein the self-supervision learning strategy based on the four vectors is as follows:

position information vector in formula A = X _a ×W _a Depth information vector Q = X _a ×W _q In, innerVolume information vector K = X _a ×W _k And a relevancy information vector V = X _a ×W _v ，d _k Is a vectorKDimension of (2), self-attention mechanism function

The output value represents the degree of association between the information with high degree of association between the image and the voice signal and the detection object and between the information when the model decodes the vector information.

Step S321: the invention can adopt sine coding mode and other coding modes aiming at the coding modes of the image block and the voice signal. The coding mode is changed, so that the relative relevance between different signals can be ensured to be learned by the constructed model.

In the self-attention module. Slave type

As can be seen in (1), attention mechanism function output values and vectorsVAnd AQK ^T Is proportional, i.e. the signal processed by the module is determined by the associated information learned from the signal itself.

Step S322: vector transformation matrix W for position information, depth information, content information and information associated with voice _a ，W _q 、W _k And W _v And the model is continuously optimized in the training process of the model, so that the model is ensured to realize the global image signal learning. Based on the attention mechanism module, the self-attention mechanism module is improved in the following mode:

in the formula

Is a splicing function with a total dimension after splicing ofd _model Then, the corresponding parameter matrices are:

and

wherein R is the output variable domain of the self-attention mechanism module,

respectively represent the number of columns of the matrix,

for the number of rows in the matrix,Focus() is a function of the self-attention mechanism,H _i is shown as

The attention mechanism sub-module outputs.

Step S323: also, the self-attention module transforms the matrix through self-learning according to the principle of the self-attention mechanism

Respectively calculating the attention degree of a single self-attention mechanism, wherein the output value of the attention degree corresponds to the emphasis degree of the attention message; all the output values of the self-attention module are spliced through a total coefficient matrix

The attention degree of the image signal and the voice signal is output, so that the interaction and the collaborative learning of the image signal and the voice signal are realized.

Step S324: in a collaborative analysis model, the feature dimensionality is reduced through an improved attention mechanism, the network layer can automatically screen out a key feature set through continuous training, and then the combination of features is realized through a nonlinear function, so that the relevance is further enhanced.

Step S325: in a network, different activation functions can be used, and the invention adopts a nonlinear function:

in the formulaW _z This can be achieved by fully connecting layers, representing two vectors multiplied by corresponding elements,b ₀ in order to be a term of the offset,xrepresenting input variables, notLinear piecewise exponential functionNLEThe expression of (c) is as follows:

the piecewise activation function can improve the model speed and inhibit irrelevant information on one hand, and on the other hand, the function retains the attribute of nonlinear transformation, the piecewise activation function is different from linearity, and the information obtained after screening is further strengthened by adopting exponential segmentation. The piecewise function is applied to the input variablexWhen the slope is negative, a smaller slope is adopted, and the phenomenon that network neurons do not work due to the fact that gradients in the network disappear is avoided.

And outputting a global detection and analysis result through the collaborative analysis model, wherein the output result comprises road information, distance information of vehicles before and after the distance, the number information of vehicles in other lanes, parking point pedestrian condition information and emergency parking information.

Step S400: selecting a corresponding executed state instruction as a remote driving control instruction based on two outputs of the S300 local information analysis model and the collaborative analysis model;

the module mainly completes simple task analysis in road conditions based on local attention, including traffic light identification and pedestrian and obstacle detection. And outputting corresponding waiting and parking state instructions.

The cooperative analysis model learns the global correlation information, controls and infers based on the detection and analysis results, and outputs driving state instructions mainly comprising road switching, acceleration, deceleration, left and right turning and parking point intelligent identification parking points. If the cooperative analysis model detects that the number of the road vehicles is less, switching to a road model with better road condition execution according to the model lane change instruction; if the distance between the vehicle and the front vehicle is detected to be close, the signal corresponds to a deceleration signal, and if the distance between the vehicle and the front vehicle is detected to be far, the signal is an acceleration signal. The information obtained by sorting the different lane information corresponds to lane change information. If the information of the parking point passengers is detected, a parking instruction is sent out; and if the detected passenger information comprises the information of seeking help, decelerating and parking, giving a parking instruction by combining the road condition.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a brand-new boarding vehicle remote driving control method based on local image analysis and global voice analysis, which can carry out cooperative analysis in real time according to passenger voice signals in a vehicle, thereby greatly improving the safety and the accuracy of the boarding vehicle remote driving control method; and secondly, the attention cooperative analysis model is improved in a targeted manner, so that the analysis result is obviously improved, and the control effect of boarding of passengers is further improved.

Drawings

FIG. 1 is a flow chart of the passenger boarding vehicle automatic parking remote driving control method.

Detailed Description

The present invention is described in further detail below with reference to FIG. 1:

step S100: acquiring signals, namely acquiring local image signals and global image signals of the surrounding environment through camera equipment arranged on the boarding vehicle roof, and acquiring real-time voice signals of passengers through voice acquisition equipment in a passenger boarding vehicle;

step S110: the images obtained by the camera comprise normal images and wide-angle high-resolution images with different magnification factors and are used for capturing local image signals and global image signals in road conditionsZ _a Wherein the local image signal is represented asZ _a1 The global image signal is represented asZ _a2 Voice signal obtained by voice signal acquisition moduleZ _b 。

the signal preprocessing of the invention comprises the following steps: for image signal, adopting value normalization method, and for input signal vectorZ _a The pre-processed image signal isX _z = (Z _a -Z _min )/(Z _max -Z _min ) Wherein Z is _min Represents the sameZ _a Minimum value in signal, Z _max Is composed ofZ _a Maximum signal value in the signal respectively obtaining the preprocessed local image signalXAnd a global image signalX ₁ 。

step S311: the input image signal is a preprocessed partial image signal

In whichC×HWhich represents the size of the image,Wrepresenting the number of images;

step S312: for is toXFeature extraction is carried out through a deep convolution feature module to obtain a feature map

In which

Is a function of the maximum value of the signal,

all image block vectors representing the kth position in the W images;

In which

In the formula

Representing the characteristics of the spliced image, k is the same as (1,N),

is represented at a positionkOn the upper partWThe largest image block in a picture is,

splicing functions for image block features;

step S314: vector image block using average pooling

Processing, averaging pooled operating functions

In which

Represents the average pooling function and is the average pooling function,

all image block vectors representing the kth position in the W images; and the image block features after the average pooling are spliced intoAn image feature map

Wherein

，k∈（1，N），

Splicing functions for image block features;

step S315: for the spliced image feature map

And

processing the vector into a one-dimensional vector by a convolution layer, a maximum pooling layer and an average pooling layer, inputting the vector into a full-connection layer, and finally processing the vector by a nonlinear functionsoftmaxThe function, its expression is as follows:

step S320: in the collaborative analysis model, assumptionsZ _a Global image signal inX ₁ The vector processed by the feature extraction module isX _a1 ,Speech signalX ₂ The coded vector isX _a2 Image signal is spliced by vector sequenceX _a1 And coded speech signalX _a2 Are combined into a vectorX _a The module comprises four self-learning matrixes which are respectively used for representing position information, depth information, content information and relevance informationA，Q、K、VVector representation with dimensions of the vectorX _a Multiplication by a weight matrix (linear transformation), the corresponding matrix of which is defined as W _a 、W _q 、W _k 、W _v And obtaining a corresponding self-learning input vector, wherein the self-supervision learning strategy based on the four vectors is as follows:

position information vector in formula A = X _a ×W _a Depth information vector Q = X _a ×W _q Content information vector K = X _a ×W _k And degree of association information vector V = X _a ×W _v ，d _k Is a vectorKDimension of (2), self-attention mechanism function

In the self-attention module. Slave type

Step S322: position information, depth information, content information, and phraseSound correlation degree information vector transformation matrix W _a ，W _q 、W _k And W _v And the model is continuously optimized in the training process of the model, so that the model is ensured to realize the global image signal learning. Based on the attention mechanism module, the self-attention mechanism module is improved in the following mode:

in the formula

Is a splicing function with a total dimension after splicing ofd _model Then, the corresponding parameter matrices are respectively:

and

wherein R is the output variable domain of the self-attention mechanism module,

respectively represent the number of columns of the matrix,

The attention mechanism sub-module outputs.

The attention degrees of the image signal and the voice signal are output, so that the interaction and the cooperative learning of the image signal and the voice signal are realized.

in the formulaW _z This can be achieved by fully connecting layers, representing two vectors multiplied by corresponding elements,b ₀ in order to be a term of the offset,xrepresenting input variables, non-linear piecewise exponential functionsNLEThe expression of (a) is as follows:

the piecewise activation function can improve the model speed and inhibit irrelevant information on one hand, and on the other hand, the function retains the attribute of nonlinear transformation, the piecewise activation function is different from linearity, and the information obtained after screening is further strengthened by adopting exponential segmentation. The piecewise function is applied to the input variablexWhen the slope is negative, a smaller slope is adopted, and the phenomenon that network neurons do not work due to disappearance of gradients in the network is avoided.

the local attention mechanism-based module mainly completes simple task analysis in road conditions, including traffic light identification and pedestrian and obstacle detection. And outputting corresponding waiting and parking state instructions.

The cooperative analysis model learns the global correlation information, controls and infers based on the detection and analysis results, and outputs driving state instructions mainly comprising road switching, acceleration, deceleration, left and right turning and parking point intelligent identification parking points. If the cooperative analysis model detects that the number of the road vehicles is less, switching to a road model with better road condition execution according to the model lane change instruction; if the distance between the vehicle and the front vehicle is detected to be close, the signal corresponds to a deceleration signal, and if the distance between the vehicle and the front vehicle is detected to be far, the signal is an acceleration signal. The information obtained by sorting the different lane information corresponds to lane change information. If the parking point passenger information is detected, a parking instruction is sent out; and if the detected passenger information comprises help seeking, deceleration and parking information, giving a parking instruction by combining the road condition.

In the description of the present invention, unless otherwise expressly specified or limited, the terms "connected" and "connected" are to be construed broadly, e.g., as meaning a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. Those skilled in the art understand the specific meanings of the above terms in the present invention according to specific situations.

In the description of the present invention, unless otherwise specified, the terms "upper", "lower", "left", "right", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience of description and simplification of description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

Finally, it should be noted that the above-mentioned technical solution is only one embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be easily made based on the application method and principle of the present invention disclosed, and the method is not limited to the above-mentioned specific embodiment of the present invention, so that the above-mentioned embodiment is only preferred, and not restrictive.

Claims

1. A passenger boarding vehicle automatic parking remote driving control method is characterized by comprising the following steps:

step S400: and selecting a corresponding executed state instruction as a remote driving control instruction through a path selection and planning module based on the two outputs of the S300 local information analysis model and the collaborative analysis model.

2. The passenger boarding vehicle automatic parking remote driving control method according to claim 1, characterized in that step S200: aiming at the image signals, a value normalization method is adopted to respectively obtain the preprocessed local image signalsXAnd a global image signalX ₁ 。

3. The passenger boarding vehicle automatic parking remote driving control method according to claim 1, characterized in that step S200: for speech signalsZ _b The collected voice signal is vector-coded by adopting a voice vector coding algorithm to obtain a voice signalX ₂ 。

4. The passenger boarding vehicle automatic parking remote driving control method according to claim 1, characterized in that: the local information analysis model realizes road recognition, pedestrian recognition, vehicle recognition, signal lamp recognition and dynamic obstacle recognition in the driving process.

5. The passenger boarding vehicle automatic parking remote driving control method according to claim 1, characterized in that the local information analysis model comprises the following specific steps: step S311: the input image signal is a preprocessed partial image signal

WhereinC×HWhich is representative of the size of the image,Wrepresenting the number of images;

Wherein

Is a function of the maximum value of the signal,

all image block vectors representing the kth position in the W images;

Wherein

In the formula

Representing the characteristics of the spliced image, k is the same as (1,N),

splicing functions for image block features;

step S314: vector image block using average pooling

Processing, averaging pooled operating functions

Wherein

Represents the average pooling function of the samples of the sample,

is represented at a positionkUpper part ofWThe image block characteristics of the mean value in an image,

Wherein

，k∈（1，N），

Splicing functions for image block features;

step S315: for the spliced image feature map

And

the vector is processed into a one-dimensional vector by the convolution layer, the maximum pooling layer and the average pooling layer and is input into the full-connection layer.

6. The passenger boarding vehicle automatic parking remote driving control method according to claim 5, wherein step S315 is adopted in a full connection layersoftmaxThe function, its expression is as follows:

in the formula a _i 、a _j Weight, x, representing input vector _i And x _j As an input variable, E ₁ Is the output class number.

7. The passenger boarding vehicle automatic parking remote driving control method according to claim 1, characterized in that in an image and voice cooperative analysis model: the global image signal and the voice signal of the position of the boarding vehicle are combined for real-time analysis, information extraction is carried out on a global wide-angle image shot by a camera system through a convolutional neural network, and then the voice signal and the image signal obtained after vector-based coding are processed through an automatic supervision coding module at the same time.

8. The passenger boarding vehicle automatic parking remote driving control method according to claim 6, characterized in that: the image and voice cooperative analysis model comprises four self-learning matrixes which are respectively used for representing position information, depth information, content information and relevance information and are respectively used forA，Q、K、VA vector representation; then vector X is added _a Multiplied by a weight matrix, whichThe correspondence matrix is defined as W _a 、W _q 、W _k 、W _v And obtaining a corresponding self-learning input vector, wherein the self-supervision learning strategy based on the four vectors is as follows:

9. The passenger boarding vehicle automatic parking remote driving control method according to claim 7, characterized in that the self-attention mechanism module is improved in the following manner:

in the formula

and

wherein R is the output variable domain of the self-attention mechanism module,

respectively represent the number of columns of the matrix,

The attention mechanism sub-module outputs.

10. The passenger boarding vehicle automatic parking remote driving control method according to claim 1, characterized in that: and outputting a global detection and analysis result through the collaborative analysis model, wherein the output result comprises road information, distance information of vehicles before and after the distance, the number information of vehicles in other lanes, parking point pedestrian condition information and emergency parking information.