CN112464922B

CN112464922B - Human-vehicle weight recognition and model training method, device, equipment and storage medium thereof

Info

Publication number: CN112464922B
Application number: CN202110139718.9A
Authority: CN
Inventors: 闾凡兵; 吴蕊; 姚胜; 曹达; 秦拯; 曾海文
Original assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Current assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2021-05-28
Anticipated expiration: 2041-02-02
Also published as: CN112464922A

Abstract

The application discloses people and vehicle weight recognition and model training method, device, equipment and storage medium thereof, wherein the people and vehicle weight recognition model training method comprises the following steps: obtaining a plurality of training samples, wherein each training sample comprises a first sample image and a second sample image; training a pre-established initial human-vehicle weight recognition model by using a training sample until the loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition to obtain a target human-vehicle weight recognition model; the initial human-vehicle weight recognition model comprises a target detection model, a first initial multi-layer sensor and a second initial multi-layer sensor. When the initial human-vehicle heavy recognition model is trained, the characteristics of at least two sample images can be mutually enhanced, the characteristic information of the first object in different sample images is fully utilized, the loss of the characteristic information caused by the change of the shooting state is improved, and therefore the human-vehicle heavy recognition effect is improved.

Description

Human-vehicle weight recognition and model training method, device, equipment and storage medium thereof

Technical Field

The application belongs to the technical field of information, and particularly relates to a method, a device, equipment and a storage medium for identifying people and vehicles and training models of people and vehicles.

Background

With the development of artificial intelligence technology, machine vision recognition is increasingly applied to life, for example, pedestrian recognition is performed based on video images acquired by imaging equipment. In order to acquire the moving track of the pedestrian, a pedestrian re-identification technology is generally used, that is, the pedestrian identification may be performed based on images acquired by different imaging devices.

Since the shooting angles of different image devices may be different, the states (such as length, width or angle) of the same pedestrian in different images may be different, and when the pedestrian is in a riding state, the states of the combination of the pedestrian and the vehicle (hereinafter referred to as a pedestrian-vehicle) are more variable. The prior art receives the changeable influence of people's car state when carrying out people's car heavy discernment, often has the relatively poor problem of identification effect.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for identifying the weight of a person and a vehicle and training a model thereof, and solves the problem of poor identification effect when identifying the weight of the person and the vehicle in the prior art.

In a first aspect, an embodiment of the present application provides a training method for a human-vehicle weight recognition model, including:

the method comprises the steps of obtaining a plurality of training samples, wherein each training sample comprises a first sample image and a second sample image, the first sample image and the second sample image are obtained by shooting a first object in different shooting states, and the first object comprises a pedestrian and a vehicle which are matched with each other;

training a pre-established initial human-vehicle weight recognition model by using a training sample until the loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition to obtain a target human-vehicle weight recognition model;

the initial human-vehicle weight recognition model comprises a target detection model, a first initial multi-layer sensor and a second initial multi-layer sensor; the input end of the target detection model is used for receiving the first sample image and the second sample image, the output end of the target detection model is respectively connected to the input end of the first initial multilayer sensor and the input end of the second initial multilayer sensor, the output end of the first initial multilayer sensor is used for outputting a pedestrian feature vector associated with a pedestrian, and the output end of the second initial multilayer sensor is used for outputting a vehicle feature vector associated with a vehicle; the loss value of the loss function is obtained based on the pedestrian feature vector and the vehicle feature vector.

In a second aspect, an embodiment of the present application provides a method for identifying a human-vehicle weight, including:

acquiring a target image, wherein the target image is obtained by shooting a second object, and the second object comprises a pedestrian and a vehicle which are matched with each other;

inputting the target image into a target person vehicle weight recognition model to obtain a third row person feature vector and a third vehicle feature vector, wherein the third row person feature vector and the third vehicle feature vector are respectively associated with pedestrians and vehicles included in the second object; the target person vehicle weight recognition model comprises a target detection model, a first target multilayer sensor and a second target multilayer sensor; the input end of the target detection model is used for receiving a target image, the output end of the target detection model is respectively connected to the input end of the first target multilayer sensor and the input end of the second target multilayer sensor, the output end of the first target multilayer sensor is used for outputting a third pedestrian feature vector, and the output end of the second target multilayer sensor is used for outputting a third vehicle feature vector;

and identifying the second object according to the third-row characteristic vector and the third vehicle characteristic vector to obtain an identification result.

In a third aspect, an embodiment of the present application provides a training device for a human-vehicle weight recognition model, including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of training samples, each training sample comprises a first sample image and a second sample image, the first sample image and the second sample image are obtained by shooting a first object in different shooting states, and the first object comprises a pedestrian and a vehicle which are matched with each other;

the training module is used for training a pre-established initial human-vehicle weight recognition model by using a training sample until the loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition, so as to obtain a target human-vehicle weight recognition model;

In a fourth aspect, an embodiment of the present application provides a human-vehicle weight recognition apparatus, including:

the second acquisition module is used for acquiring a target image, wherein the target image is obtained by shooting a second object, and the second object comprises a pedestrian and a vehicle which are matched with each other;

the third acquisition module is used for inputting the target image into the target person vehicle weight recognition model to obtain a third pedestrian characteristic vector and a third vehicle characteristic vector, and the third pedestrian characteristic vector and the third vehicle characteristic vector are respectively associated with the pedestrian and the vehicle included in the second object; the target person vehicle weight recognition model comprises a target detection model, a first target multilayer sensor and a second target multilayer sensor; the input end of the target detection model is used for receiving a target image, the output end of the target detection model is respectively connected to the input end of the first target multilayer sensor and the input end of the second target multilayer sensor, the output end of the first target multilayer sensor is used for outputting a third pedestrian feature vector, and the output end of the second target multilayer sensor is used for outputting a third vehicle feature vector;

and the identification module is used for identifying the second object according to the third-row person feature vector and the third vehicle feature vector to obtain an identification result.

In a fifth aspect, an embodiment of the present application provides an electronic device, where the device includes: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the human-vehicle weight recognition model training method as shown in the first aspect, or implements the human-vehicle weight recognition method as shown in the second aspect.

In a sixth aspect, the present application provides a computer storage medium, where computer program instructions are stored on the computer storage medium, and when executed by a processor, the computer program instructions implement the human-vehicle weight recognition model training method shown in the first aspect, or implement the human-vehicle weight recognition method shown in the second aspect.

The method for training the human-vehicle weight recognition model comprises the steps of training a pre-established initial human-vehicle weight recognition model by using a plurality of training samples, wherein each training sample comprises a first sample image and a second sample image which are obtained by shooting a first object under different shooting states; the initial human-vehicle weight recognition model comprises a target detection model, a first initial multilayer sensor and a second initial multilayer sensor, a training sample can be input into the target detection model, one path of output of the target detection model can be used as the input of the first initial multilayer sensor, the output of the first initial multilayer sensor is used for obtaining a pedestrian characteristic vector, the other path of output of the target detection model can be used as the input of the second initial multilayer sensor, and the output of the second initial multilayer sensor is used for obtaining a vehicle characteristic vector; when the loss value of the loss function obtained based on the pedestrian feature vector and the vehicle feature vector meets a preset condition, a target person-vehicle weight recognition model can be obtained. When the initial human-vehicle heavy recognition model is trained, the characteristics of at least two sample images can be mutually enhanced, the characteristic information of the first object in different sample images is fully utilized, the loss of the characteristic information caused by the change of the shooting state is improved, and therefore the human-vehicle heavy recognition effect is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a human-vehicle weight recognition model training method provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an initial human-vehicle weight recognition model in an embodiment of the present application;

FIG. 3 is a schematic diagram of the loss value acquisition of the loss function in the embodiment of the present application;

fig. 4 is a schematic flow chart of a human-vehicle weight recognition method provided in the embodiment of the present application;

FIG. 5 is a schematic diagram of extracting image features of a target image in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a human-vehicle weight recognition model training device provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a human-vehicle weight recognition device provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to still another embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In order to solve the prior art problems, embodiments of the present application provide a method, an apparatus, a device, and a storage medium for identifying a human-vehicle weight and training a model thereof. First, a human-vehicle weight recognition model training method provided by the embodiment of the present application is introduced below.

Fig. 1 shows a flowchart of a human-vehicle weight recognition model training method according to an embodiment of the present application. As shown in fig. 1, the method includes:

101, obtaining a plurality of training samples, wherein each training sample comprises a first sample image and a second sample image, the first sample image and the second sample image are obtained by shooting a first object in different shooting states, and the first object comprises a pedestrian and a vehicle which are matched with each other;

step 102, training a pre-established initial human-vehicle weight recognition model by using a training sample until a loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition to obtain a target human-vehicle weight recognition model;

In this embodiment, each training sample includes a first sample image and a second sample image, that is, each training sample may include at least two sample images; for simplicity of explanation, the description will be given mainly on the manner of acquiring a sample image in a certain training sample.

The sample image may be obtained by a camera or other type of imaging device, specifically, a shot photo, or an image frame in a shot video, and is not limited herein. At least two sample images in the same training sample can be obtained by shooting the first object under different shooting states; the different shooting states herein may refer to different shooting angles, shooting distances, or shooting parameters, so as to adapt to the situation that different image devices are installed at different positions or angles or different shooting performances in practical applications. Of course, it is easy to understand that, in the process of training the initial human-vehicle weight recognition model, multiple sample images in one training sample may be obtained by shooting with the same imaging device or obtained by shooting with different imaging devices, so as to ensure that the shooting states corresponding to different sample images have certain differences.

In combination with a specific application scene, the sample image in the training sample can be obtained by shooting for a cycling pedestrian (hereinafter referred to as a man-car); in an example, the first sample image may be obtained by shooting when the human-vehicle is located right in front of the camera, and the second sample image may be obtained by shooting when the human-vehicle is about to leave the shooting range of the camera; in another example, the first sample image may be captured by one camera located on the side of the human-vehicle, and the second sample image may be captured by another camera located on the front of the human-vehicle.

It is easy to understand that a plurality of sample images in a training sample can be obtained by shooting the same object, namely the first object; in this embodiment, the first object may include both a pedestrian and a vehicle, and for the vehicle, the first object may be a bicycle, an electric vehicle, or a wheelchair, and the like, which is not limited herein; and the pedestrian and the vehicle are matched with each other, it can be intuitively understood that there is interaction between the pedestrian and the vehicle, such as the pedestrian riding the vehicle, or the pedestrian pushing the vehicle, and the like.

The training samples can be input into a pre-established initial human-vehicle weight recognition model so as to train the initial human-vehicle weight recognition model.

Referring to fig. 2, in this embodiment, the initial human-vehicle weight recognition model mainly includes a target detection model, a first initial multi-layer sensor, and a second initial multi-layer sensor. For the target detection model, the method can be used for detecting pedestrians and vehicles in the sample image; from another perspective, the object detection model may separate the sample image into a pedestrian portion and a vehicle portion. The target detection model may be an existing deep learning model, such as a fast R-CNN, SSD, or YOLO model, and specifically, the target detection model may be a pre-trained deep learning model, and may be directly used for detection of pedestrians and vehicles, and the architecture establishment and training of these target detection models may be implemented based on the prior art, which is not described herein again.

Both the first initial multi-layer Perceptron and the second initial multi-layer Perceptron may belong to a multi-layer Perceptron (MLP), which may generally be a feedforward artificial neural network model that maps multiple input data sets onto a single output data set. Specifically, in the present application, the MLP may convert the output of the target detection model into corresponding feature vectors. In this embodiment, the first initial multi-layer sensor and the second initial multi-layer sensor are respectively used for outputting a pedestrian feature vector and a vehicle feature vector; in conjunction with fig. 2, for simplicity of explanation, the multilayer sensor for outputting pedestrian feature vectors will be hereinafter referred to as MLP1, and the multilayer sensor for outputting vehicle feature vectors will be hereinafter referred to as MLP 2.

The first initial multi-layered perceptron may be considered as an insufficiently trained MLP1, denoted as initial MLP 1; similarly, the second initial multi-layer perceptron, which may be denoted as initial MLP 2; in this embodiment, for the training process of the initial human-vehicle weight recognition model, it may be considered that network parameters of the initial MLP1 and the initial MLP2 are adjusted; the adjustment basis of the network parameters can be the loss value of the loss function in the initial human-vehicle weight recognition model.

In this embodiment, a first sample image and a second sample image exist in each training sample used for training the initial human-vehicle weight recognition model, and the two sample images are respectively recorded asImage of a personuAnd imagesv(ii) a During the training process, the images may be takenuAnd imagesvThe images are respectively input into a target detection model, and the output of the target detection model can be a pedestrian part and a vehicle part of each image; while initial MLP1 may receive the pedestrian portion output by the target detection model and output the pedestrian feature vector corresponding to each image, initial MLP2 may receive the vehicle portion output by the target detection model and output the vehicle feature vector corresponding to each image.

In addition, in the present embodiment, the loss value of the loss function may be calculated based on a pedestrian feature vector and a vehicle feature vector. For example, for imagesuAnd imagesvThe initial MLP1 may output two pedestrian feature vectors in total due to the images in the same training sampleuAnd imagesvThe pedestrians in (1) are consistent, and the difference between the two pedestrian feature vectors should be made smaller theoretically; therefore, in the loss function, when the difference between two pedestrian feature vectors is small, the loss value can be made correspondingly small; the same is true for the two vehicle feature vectors output by the initial MLP 2. Of course, this is only a few examples of determining the loss value of the loss function based on the pedestrian feature vector and the vehicle feature vector, and in practical applications, the determination manner of the loss value of the loss function may be selected according to actual needs.

When the training of the initial human-vehicle weight recognition model is finished, a target human-vehicle weight recognition model can be obtained; generally speaking, the basis for the completion of the training may be that the loss value of the loss function satisfies a preset condition, for example, when the loss value of the loss function is smaller than a loss threshold, the training is considered to be completed; alternatively, if the number of training samples is limited, the training may be considered to be completed when the loss value of the loss function falls within an acceptable range of values, which may be preset.

Optionally, in step 102, training the pre-established initial human-vehicle weight recognition model by using the training sample until a loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition, to obtain the target human-vehicle weight recognition model, including:

inputting the first sample image into a target detection model to obtain a first pedestrian characteristic and a first vehicle characteristic, and inputting the second sample image into the target detection model to obtain a second pedestrian characteristic and a second vehicle characteristic;

respectively inputting the first pedestrian characteristic and the second pedestrian characteristic into a first initial multi-layer sensor to obtain a first pedestrian characteristic vector and a second pedestrian characteristic vector, and respectively inputting the first vehicle characteristic and the second vehicle characteristic into a second initial multi-layer sensor to obtain a first vehicle characteristic vector and a second vehicle characteristic vector;

determining a loss value of a loss function according to the first pedestrian characteristic vector, the second pedestrian characteristic vector, the first vehicle characteristic vector and the second vehicle characteristic vector;

adjusting the network parameters of the first initial multilayer perceptron and the network parameters of the second initial multilayer perceptron according to the loss value of the loss function until the loss value of the loss function meets the preset condition, and obtaining a first target multilayer perceptron and a second target multilayer perceptron; the target person vehicle weight recognition model comprises a first target multilayer sensor and a second target multilayer sensor.

The present embodiment may be considered to be a further limitation to the training process of the initial human-vehicle weight recognition model. As shown above, the training process for the initial human-vehicle weight recognition model may be considered to adjust the network parameters of the initial MLP1 and the initial MLP 2; the adjustment basis of the network parameters can be the loss value of the loss function in the initial human-vehicle weight recognition model.

In this embodiment, the first sample image, i.e., the above-mentioned image, is useduAfter the first pedestrian characteristic and the first vehicle characteristic are input into the target detection model, the first pedestrian characteristic and the first vehicle characteristic can be obtained and recorded asu _f1、u _f2(ii) a Accordingly, the above-mentioned imagevAfter the first pedestrian characteristic and the second vehicle characteristic are input into the target detection model, the second pedestrian characteristic and the second vehicle characteristic can be obtained and recorded as the first pedestrian characteristic and the second vehicle characteristic respectivelyv _f1、v _f2。

u _f1Andv _f1the first pedestrian feature vector and the second pedestrian feature vector can be respectively input into the initial MLP1 and output by the initial MLP1, and the two pedestrian feature vectors can be respectively recorded asu _p、v _p(ii) a In a similar manner to that described above,v _f1andv _f2can be respectively input into the initial MLP2 to obtain a first vehicle feature vectoru _bAnd a second vehicle feature vectorv _b. From another perspective, the above process of inputting various types of features into corresponding multi-layer perceptrons to obtain feature vectors can be regarded as mapping different types of features into feature spaces of corresponding types through the multi-layer perceptronsThe process of (2); different multilayer perceptrons are used between the two types of characteristics of the pedestrian characteristics and the vehicle characteristics so as to map the characteristics in a distinguishing way, and the same type of characteristics in different sample images in the same training sample can be mapped in a similar way by using the same multilayer perceptron.

The process of inputting various features into the corresponding multilayer perceptron to obtain the feature vector by combining with an actual application scene can be expressed by the following formula:

wherein the content of the first and second substances,ReLUin order to activate the function(s),W ₁andW ₂the weight parameters in the initial MLP1 and the initial MLP2 respectively,b ₁andb ₂the offsets in initial MLP1 and initial MLP2, respectively; the weight parameter and the bias may both be part of the network parameter.

Of course, this is merely an illustration of an application scenario, and the activation function or architecture used by the above multi-layer sensor may be set according to actual needs.

The following illustrates possible corresponding practical meanings of the loss values of the loss function. For example, in a training sample, the first object in each sample image may have the same label or identifier, i.e., the pedestrian in different sample images may actually be the same person and the vehicle in different sample images may actually be the same vehicle, so that, when the first object in each sample image is a reference image, the first object in each sample image may have the same reference number or identifieru _pAndv _pthe difference is small, oru _bAndv _bwhen the difference is small, the loss generated correspondingly can be considered to be small; as another example, in the same sample image, a pedestrian and a vehicle belong to two different types of features, and therefore, when the features are different from each other, the image quality of the pedestrian and the vehicle is improvedu _pAndu _ba greater difference is made, orv _pAndv _bwhen the difference is large, it can be considered that the difference is generated correspondinglyThe loss is small. In some application scenarios, the above-mentioned loss can be considered as a pixel-level loss.

As can be seen in connection with the above example, the loss value of the loss function may be based onu _p、u _b、v _pAndv _bcomprehensively calculating to obtain; in this embodiment, the network parameters of the initial MLP1 and the initial MLP2 may be adjusted based on the calculated loss value until the loss value of the loss function meets the preset condition, so as to obtain a target passenger-vehicle weight recognition model; the first target multi-layer sensor and the second target multi-layer sensor included in the target person vehicle weight recognition model can be respectively marked as target MLP1 and target MLP 2; it is easily understood that the target MLP1 can be substantially considered as the initial MLP1 after network parameter adjustment, and the target MLP2 can be substantially considered as the initial MLP2 after network parameter adjustment.

In the embodiment, the first pedestrian characteristic and the second pedestrian characteristic are input into the first initial multi-layer perceptron to obtain a characteristic vector associated with the pedestrian; inputting the first vehicle feature and the second vehicle feature into a second initial multi-layer perceptron to obtain a feature vector associated with the vehicle; that is, the extraction precision of the feature vectors of different types can be improved by adopting different multilayer perceptrons to extract the feature vectors of different characteristics; and the loss value of the loss function is determined by comprehensively considering the first pedestrian characteristic vector, the second pedestrian characteristic vector, the first vehicle characteristic vector and the second vehicle characteristic vector, so that mutual enhancement of characteristics among different sample images can be realized, and the identification effect of the obtained target person-vehicle weight identification model is favorably improved.

In order to further improve the effect of mutual enhancement of features between different sample images in the same training sample, optionally, the loss value of the loss function includes a first loss value, a second loss value, a third loss value, and a fourth loss value;

the first loss value is determined based on the first pedestrian feature vector and the first vehicle feature vector;

the second loss value is determined based on the first pedestrian feature vector and the second pedestrian feature vector;

a third loss value is determined based on the second pedestrian feature vector and the second vehicle feature vector;

the fourth loss value is determined based on the first vehicle feature vector and the second vehicle feature vector.

For pedestrians and vehicles in the first object, two sub-objects can be considered; in this embodiment, the loss value caused by the difference between the same sub-objects in different sample images is considered, and the loss value caused by the difference between different sub-objects in the same sample image is also considered.

Specifically, the first loss value and the third loss value may be considered to be loss values due to differences between different sub-objects in the same sample image; and the second loss value and the fourth loss value can be regarded as loss values caused by differences between the same sub-objects in different sample images.

Generally, in different sample images of the same training sample, the pedestrian sub-object may indicate the same pedestrian, and the vehicle sub-object may indicate the same vehicle; in the training process, pedestrian feature vectors and vehicle feature vectors which pass through different sample images are consistent as much as possible; in the same sample image, enough difference exists between the pedestrian feature vector and the vehicle feature vector so as to represent the difference of the two types of sub-objects.

Based on the above considerations, in one example, in conjunction with FIG. 3, the first penalty valueLoss _uSecond loss valueLoss _ppThe third loss valueLoss _vAnd a fourth loss valueLoss _bbRespectively calculated according to the following formula:

where | | is a norm calculation, as shown in the above embodiment,u _pis the first pedestrian feature vector, and the first pedestrian feature vector,u _bis a first vehicle feature vector and is,v _pis the feature vector of the second pedestrian,v _bis the second vehicle feature vector.

For the above norm operation, the specific norm type is not limited here, and the difference between two eigenvectors can be represented by norm calculation. The loss value of the loss function may be calculated by combining the above four types of loss values, for example, the above four types of loss values may be directly added, or may be added after being multiplied by a specific coefficient, and the like, and is not limited herein.

Based on the above formula, in the present example, the loss value of the loss function may be positively correlated with the difference between the same sub-objects in different sample images, and may also be negatively correlated with the difference between different sub-objects in the uniform sample image; the sub-objects herein may refer to pedestrians and vehicles included in the first object.

Of course, in some possible embodiments, the loss value of the loss function may also take into account the degree of difference between different sub-objects in different sample images.

As shown in fig. 4, an embodiment of the present application further provides a method for identifying a human-vehicle weight, including:

step 401, acquiring a target image, wherein the target image is obtained by shooting a second object, and the second object comprises a pedestrian and a vehicle which are matched with each other;

step 402, inputting the target image into a target person vehicle weight recognition model to obtain a third row person feature vector and a third vehicle feature vector, wherein the third row person feature vector and the third vehicle feature vector are respectively associated with a pedestrian and a vehicle included in the second object; the target person vehicle weight recognition model comprises a target detection model, a first target multilayer sensor and a second target multilayer sensor; the input end of the target detection model is used for receiving a target image, the output end of the target detection model is respectively connected to the input end of the first target multilayer sensor and the input end of the second target multilayer sensor, the output end of the first target multilayer sensor is used for outputting a third pedestrian feature vector, and the output end of the second target multilayer sensor is used for outputting a third vehicle feature vector;

and 403, identifying the second object according to the third-row characteristic vector and the third vehicle characteristic vector to obtain an identification result.

In this embodiment, the target image may be considered as an image that needs to be recognized by using a target person vehicle weight recognition model, specifically, the target image may be obtained by shooting a second object, and the second object may also include a pedestrian and a vehicle that are matched with each other; the pedestrian and the vehicle are matched with each other, and interaction between the pedestrian and the vehicle can be considered to exist to a certain extent, for example, the pedestrian rides the vehicle, or the pedestrian pushes the vehicle, and the like. The designation of the second object here, and the first object in the above embodiment, is mainly for distinguishing the training image from the recognition image; in practical applications, the vehicles and persons specifically indicated by the first object and the second object may be the same or different.

The target person weight recognition model can comprise a target detection model, a first target multilayer perceptron and a second target multilayer perceptron; the input end of the target detection model can be used for receiving the target image and detecting pedestrians and vehicles in the target image. In combination with the above embodiments, the target detection model may be an existing deep learning model, such as a fast R-CNN, SSD, or YOLO model, and specifically, the target detection model may be a pre-trained deep learning model, and may be directly used for detecting pedestrians and vehicles, and the framework establishment and training of the target detection models may be implemented based on the prior art, which is not described herein again.

The first target multi-layered sensor and the second target multi-layered sensor may both belong to multi-layered sensors, and for convenience of description, they may be referred to as target MLP1 and target MLP2, respectively. The target MLP1 may be configured to receive a first output of the target detection model, and the target MLP1 may output a third pedestrian feature vector associated with a pedestrian comprised by the second object; similarly, target MLP2 may be configured to receive a second output of the target detection model, and target MLP2 may output a third vehicle feature vector associated with a vehicle included with the second object.

After the third row feature vector and the third vehicle feature vector are obtained, the second object may be identified according to the third row feature vector and the third vehicle feature vector, for example, the second object may be identified based on similarity matching, or identified based on a classification model, and the like, which is not limited herein.

According to the person-vehicle weight recognition method provided by the embodiment of the application, a target image is input into a target person-vehicle weight recognition model, wherein the target image is obtained by shooting a second object, and the second object comprises a pedestrian and a vehicle which are matched with each other; the target person vehicle weight recognition model comprises a target detection model, a first target multilayer sensor and a second target multilayer sensor, wherein the target detection model can be used for receiving a target image, one path of output of the target detection model can be input into the first target multilayer sensor to obtain a third road person feature vector, the other path of output of the target detection model can be input into the second target multilayer sensor to obtain a third vehicle feature vector, and recognition of a second object can be achieved according to the third road person feature vector and the third vehicle feature vector to obtain a recognition result. In the embodiment of the application, extraction of pedestrian features and vehicle features can be realized based on a target detection model, the two types of features are respectively obtained by using the corresponding multilayer perceptrons, and the second object is identified according to the two types of feature vectors; because the overall images of the people and the vehicle have larger differences in different shooting states, the embodiment avoids the process of overall recognition of the people and the vehicle, further contributes to reducing the requirements on training samples in the process of obtaining the target people and vehicle weight recognition model, and enables the recognition result of the target people and vehicle weight recognition model to be more accurate.

Optionally, in step 403, recognizing the second object according to the third pedestrian feature vector and the third vehicle feature vector, and obtaining a recognition result, including:

splicing the third-row person feature vector and the third vehicle feature vector to obtain the target image feature of the target image;

comparing the similarity of the target image characteristic with each preset image characteristic in a preset image characteristic library to obtain a comparison result;

and identifying the second object according to the comparison result to obtain an identification result.

The present embodiment is described below with reference to a specific application example, in which the target detection model may be FasterR-CNN.

With reference to fig. 5, an image base gallery may be prepared in advance, and the target person/vehicle weight recognition model may be configured to receive any image in the query image base (or the image base gallery) and is set as a target imagex ₀Firstly, target detection (or feature extraction) is carried out through FasterR-CNN, then a pedestrian target and a vehicle target are respectively mapped to two feature spaces by utilizing a target MLP1 and a target MLP2 obtained through training, and a pedestrian feature vector is obtainedx _pAnd vehicle feature vectorx _bThen the output pedestrian feature vector and the vehicle feature vector are spliced to obtain (x _p,x _b) To obtain the final target imagex ₀Picture feature ofx. Image features upon recognitionxAnd comparing the similarity with the characteristics of all images in the image base gallery, and finally returning a retrieval result (corresponding to the comparison result) according to the similarity to realize the identification of the second object.

To facilitate understanding of the relationship between the human-vehicle weight recognition method provided in this embodiment and the human-vehicle weight recognition model training method in the foregoing embodiment, it may be simply considered that the target human-vehicle weight recognition model in this embodiment is obtained by training the initial human-vehicle weight recognition model based on the human-vehicle weight recognition model training method.

The process of obtaining the target person-vehicle weight recognition model through training is combined with the process of recognizing the target image by using the target person-vehicle weight recognition model, and in the embodiment of the application, a person-vehicle weight recognition method based on a deep learning model is provided, so that information loss caused by angle change and the like is improved by fully utilizing information of person-vehicle-driving pictures under different lenses, information of the pictures at the levels of appearance, angle and the like is enhanced, and the detection and retrieval recognition performance is improved.

This as shown in fig. 6, this application embodiment still provides a people's car weight recognition model training device, includes:

the first obtaining module 601 is configured to obtain a plurality of training samples, where each training sample includes a first sample image and a second sample image, and the first sample image and the second sample image are obtained by shooting a first object in different shooting states, where the first object includes a pedestrian and a vehicle that are matched with each other;

the training module 602 is configured to train a pre-established initial human-vehicle weight recognition model by using a training sample until a loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition, so as to obtain a target human-vehicle weight recognition model;

Optionally, the training module 602 may include:

the first acquisition unit is used for inputting the first sample image into the target detection model to obtain a first pedestrian characteristic and a first vehicle characteristic, and inputting the second sample image into the target detection model to obtain a second pedestrian characteristic and a second vehicle characteristic;

the second acquisition unit is used for respectively inputting the first pedestrian characteristic and the second pedestrian characteristic into the first initial multilayer perceptron to obtain a first pedestrian characteristic vector and a second pedestrian characteristic vector, and respectively inputting the first vehicle characteristic and the second vehicle characteristic into the second initial multilayer perceptron to obtain a first vehicle characteristic vector and a second vehicle characteristic vector;

the determining unit is used for determining a loss value of the loss function according to the first pedestrian characteristic vector, the second pedestrian characteristic vector, the first vehicle characteristic vector and the second vehicle characteristic vector;

the adjusting unit is used for adjusting the network parameters of the first initial multilayer perceptron and the network parameters of the second initial multilayer perceptron according to the loss values of the loss functions until the loss values of the loss functions meet preset conditions, and then obtaining a first target multilayer perceptron and a second target multilayer perceptron; the target person vehicle weight recognition model comprises a first target multilayer sensor and a second target multilayer sensor.

Optionally, the loss values of the loss function include a first loss value, a second loss value, a third loss value, and a fourth loss value;

Optionally, a first loss valueLoss _uSecond loss valueLoss _ppThe third loss valueLoss _vAnd a fourth loss valueLoss _bbRespectively calculated according to the following formula:

wherein, | | | is a norm calculation,u _pis the first pedestrian feature vector, and the first pedestrian feature vector,u _bis a first vehicle feature vector and is,v _pis the feature vector of the second pedestrian,v _bis the second vehicle feature vector.

It should be noted that the human-vehicle weight recognition model training device is a device corresponding to the human-vehicle weight recognition model training method, and all implementation manners in the method embodiment are applicable to the embodiment of the device, and the same technical effect can be achieved.

As shown in fig. 7, an embodiment of the present application further provides a human-vehicle weight recognition apparatus, including:

the second obtaining module 701 is configured to obtain a target image, where the target image is obtained by shooting a second object, and the second object includes a pedestrian and a vehicle that are matched with each other;

a third obtaining module 702, configured to input the target image into the target person-vehicle weight recognition model, so as to obtain a third pedestrian feature vector and a third vehicle feature vector, where the third pedestrian feature vector and the third vehicle feature vector are respectively associated with a pedestrian and a vehicle included in the second object; the target person vehicle weight recognition model comprises a target detection model, a first target multilayer sensor and a second target multilayer sensor; the input end of the target detection model is used for receiving a target image, the output end of the target detection model is respectively connected to the input end of the first target multilayer sensor and the input end of the second target multilayer sensor, the output end of the first target multilayer sensor is used for outputting a third pedestrian feature vector, and the output end of the second target multilayer sensor is used for outputting a third vehicle feature vector;

the identifying module 703 is configured to identify the second object according to the third pedestrian feature vector and the third vehicle feature vector, so as to obtain an identification result.

Optionally, the identifying module 703 includes:

the third obtaining unit is used for splicing the third pedestrian feature vector and the third vehicle feature vector to obtain the target image feature of the target image;

the fourth obtaining unit is used for comparing the similarity of the target image characteristic with each preset image characteristic in the preset image characteristic library to obtain a comparison result;

and the fifth acquisition unit is used for identifying the second object according to the comparison result to obtain an identification result.

It should be noted that the human-vehicle weight recognition device is a device corresponding to the human-vehicle weight recognition method, and all implementation manners in the method embodiment are applicable to the embodiment of the device, and the same technical effects can be achieved.

Fig. 8 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application.

The electronic device may include a processor 801 and a memory 802 that stores computer program instructions.

Specifically, the processor 801 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 802 may include mass storage for data or instructions. By way of example, and not limitation, memory 802 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, a tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 802 may include removable or non-removable (or fixed) media, where appropriate. The memory 802 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 802 is a non-volatile solid-state memory.

The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the present disclosure.

The processor 801 reads and executes the computer program instructions stored in the memory 802 to implement any one of the human-vehicle weight recognition model training methods in the above embodiments, or to implement any one of the human-vehicle weight recognition methods in the above embodiments.

In one example, the electronic device may also include a communication interface 803 and a bus 804. As shown in fig. 8, the processor 801, the memory 802, and the communication interface 803 are connected by a bus 804 to complete communication therebetween.

The communication interface 803 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.

Bus 804 comprises hardware, software, or both that couple the components of the online data traffic billing device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 804 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

In addition, by combining the human-vehicle weight recognition model training method and the human-vehicle weight recognition method in the above embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; when executed by a processor, the computer program instructions implement any one of the human-vehicle weight recognition model training methods in the above embodiments, or implement any one of the human-vehicle weight recognition methods in the above embodiments.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As will be apparent to those skilled in the art, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A training method of a human-vehicle weight recognition model is characterized by comprising the following steps:

the method comprises the steps of obtaining a plurality of training samples, wherein each training sample comprises a first sample image and a second sample image, the first sample image and the second sample image are obtained by shooting a first object under different shooting states, and the first object comprises a pedestrian and a vehicle which are matched with each other;

training a pre-established initial human-vehicle weight recognition model by using the training sample until the loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition to obtain a target human-vehicle weight recognition model;

the initial human-vehicle weight recognition model comprises a target detection model, a first initial multi-layer sensor and a second initial multi-layer sensor; the input end of the target detection model is used for receiving the first sample image and the second sample image, the output end of the target detection model is respectively connected to the input end of the first initial multi-layer sensor and the input end of the second initial multi-layer sensor, the output end of the first initial multi-layer sensor is used for outputting a pedestrian feature vector associated with the pedestrian, and the output end of the second initial multi-layer sensor is used for outputting a vehicle feature vector associated with the vehicle; the loss value of the loss function is obtained based on the pedestrian feature vector and the vehicle feature vector.

2. The method according to claim 1, wherein the training of the pre-established initial human-vehicle weight recognition model by using the training samples until a loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition to obtain a target human-vehicle weight recognition model comprises:

inputting the first sample image into the target detection model to obtain a first pedestrian characteristic and a first vehicle characteristic, and inputting the second sample image into the target detection model to obtain a second pedestrian characteristic and a second vehicle characteristic;

inputting the first pedestrian characteristic and the second pedestrian characteristic into a first initial multilayer perceptron respectively to obtain a first pedestrian characteristic vector and a second pedestrian characteristic vector, and inputting the first vehicle characteristic and the second vehicle characteristic into a second initial multilayer perceptron respectively to obtain a first vehicle characteristic vector and a second vehicle characteristic vector;

determining a loss value of the loss function according to the first pedestrian feature vector, the second pedestrian feature vector, the first vehicle feature vector and the second vehicle feature vector;

adjusting the network parameters of the first initial multilayer perceptron and the network parameters of the second initial multilayer perceptron according to the loss value of the loss function until the loss value of the loss function meets a preset condition, and obtaining a first target multilayer perceptron and a second target multilayer perceptron; the target person vehicle weight recognition model comprises the first target multilayer perceptron and the second target multilayer perceptron.

3. The method of claim 2, wherein the penalty values of the penalty function comprise a first penalty value, a second penalty value, a third penalty value, and a fourth penalty value;

the third loss value is determined based on the second pedestrian feature vector and the second vehicle feature vector;

4. The method of claim 3, wherein the first loss valueLoss _uThe second loss valueLoss _ppThe third loss valueLoss _vAnd the fourth loss valueLoss _bbRespectively calculated according to the following formula:

wherein, | | | is a norm calculation,u _pis the first pedestrian feature vector and is a first pedestrian feature vector,u _bfor the first vehicle feature vector,v _pis the second pedestrian feature vector and is the second pedestrian feature vector,v _bis the second vehicle feature vector.

5. A method for identifying a person and a vehicle weight is characterized by comprising the following steps:

inputting the target image into a target person vehicle weight recognition model to obtain a third row person feature vector and a third vehicle feature vector, wherein the third row person feature vector and the third vehicle feature vector are respectively associated with a pedestrian and a vehicle included in the second object; the target person vehicle weight recognition model comprises a target detection model, a first target multilayer sensor and a second target multilayer sensor; the input end of the target detection model is used for receiving the target image, the output end of the target detection model is respectively connected to the input end of the first target multilayer sensor and the input end of the second target multilayer sensor, the output end of the first target multilayer sensor is used for outputting the third pedestrian feature vector, and the output end of the second target multilayer sensor is used for outputting the third vehicle feature vector;

and identifying the second object according to the third row characteristic vector and the third vehicle characteristic vector to obtain an identification result.

6. The method of claim 5, wherein the identifying the second object according to the third row feature vector and the third vehicle feature vector to obtain an identification result comprises:

splicing the third row person feature vector and the third vehicle feature vector to obtain the target image feature of the target image;

comparing the similarity of the target image features with each preset image feature in a preset image feature library to obtain a comparison result;

7. The utility model provides a people's car weight recognition model trainer which characterized in that includes:

the training module is used for training a pre-established initial human-vehicle weight recognition model by using the training sample until the loss value of a loss function in the initial human-vehicle weight recognition model meets a preset condition, so as to obtain a target human-vehicle weight recognition model;

8. A human-vehicle weight recognition device, comprising:

a third obtaining module, configured to input the target image into a target person vehicle weight recognition model to obtain a third pedestrian feature vector and a third vehicle feature vector, where the third pedestrian feature vector and the third vehicle feature vector are respectively associated with a pedestrian and a vehicle included in the second object; the target person vehicle weight recognition model comprises a target detection model, a first target multilayer sensor and a second target multilayer sensor; the input end of the target detection model is used for receiving the target image, the output end of the target detection model is respectively connected to the input end of the first target multilayer sensor and the input end of the second target multilayer sensor, the output end of the first target multilayer sensor is used for outputting the third pedestrian feature vector, and the output end of the second target multilayer sensor is used for outputting the third vehicle feature vector;

9. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the human-vehicle weight recognition model training method of any one of claims 1-4, or implements the human-vehicle weight recognition method of claim 5 or 6.

10. A computer storage medium having computer program instructions stored thereon, which when executed by a processor implement the human-vehicle weight recognition model training method of any one of claims 1-4 or the human-vehicle weight recognition method of claim 5 or 6.