CN113554013B - Cross-scene recognition model training method, cross-scene road recognition method and device - Google Patents

Cross-scene recognition model training method, cross-scene road recognition method and device Download PDF

Info

Publication number
CN113554013B
CN113554013B CN202111106779.1A CN202111106779A CN113554013B CN 113554013 B CN113554013 B CN 113554013B CN 202111106779 A CN202111106779 A CN 202111106779A CN 113554013 B CN113554013 B CN 113554013B
Authority
CN
China
Prior art keywords
level
cross
local
domain
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111106779.1A
Other languages
Chinese (zh)
Other versions
CN113554013A (en
Inventor
周智恒
张鹏宇
郭勇帆
沈世福
王怡凡
李波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202111106779.1A priority Critical patent/CN113554013B/en
Publication of CN113554013A publication Critical patent/CN113554013A/en
Application granted granted Critical
Publication of CN113554013B publication Critical patent/CN113554013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a cross-scene recognition model training method, a cross-scene road recognition method and a cross-scene road recognition device. The invention uses source domain imagesX s And source domain real label graphy s And cross-scene label-free target domain imagesX t As training data, adopting forward conduction and chain reverse gradient conduction updating methods, and respectively calculating and outputting a prediction of cross-scene recognition, a recognition loss value, a prediction of field adaptation and a field adaptation loss value at a pixel level, a local level and an image level; and respectively performing iterative training of the cross-scene recognition model on prediction and loss values of pixel level, local level and image level of the cross-scene recognition and the field adaptation at the region level and the sample level, and finally obtaining the trained region level and sample level cross-scene recognition model. The cross-scene road identification method constructed by the invention outputs an accurate and safe identification result based on the judgment strategy, and can provide effective data support for an intelligent driving system in a new scene.

Description

Cross-scene recognition model training method, cross-scene road recognition method and device
Technical Field
The invention relates to the technical fields of artificial intelligence, information processing technology and automatic driving, in particular to a cross-scene recognition model training method, a cross-scene road recognition method and a cross-scene recognition model training device.
Background
In recent years, artificial intelligence techniques based on image recognition have played an increasing role in the field of automated driving. The road recognition method based on the deep neural network performs supervised training on the urban street image data with a large number of labels, provides a high-performance road recognition model for automatic driving, and greatly promotes the technical development of automatic driving.
In a specific road identification algorithm based on a deep neural network, a large number of pictures of an urban road or an expressway with label information are often used as samples, a convolutional neural network is input to perform feature extraction on the samples, the probability that each pixel point in an image is a road area is obtained, and the probability is used as a judgment standard of the road identification area according to the output probability. For example, the invention CN 107808140B, which is based on the invention of monocular vision road recognition algorithm based on image fusion, discloses a deep neural network model obtained by supervised training based on label samples to complete road recognition. The method needs a large amount of labeled image data, is high in cost and low in efficiency, the training data set and the verification data set used by the trained model need to be subjected to independent same-distribution setting, and when actual input does not meet the independent same-distribution setting, namely the road identification performance under a new scene is often poor. The new segmentation model connection method proposed in the invention "segmentation model training method _ road segmentation method _ vehicle control method and apparatus" of patent No. CN106558058 is to use an unsupervised "free area segmentation method" to perform free area segmentation on a training sample, and the obtained segmentation image is used as label information of a training image to complete the training of the model, i.e. training based on a pseudo label. The method 'free region segmentation method' provided in the aspect cannot guarantee the accuracy of the output segmentation image as a pseudo label, and further can cause that the segmentation model thereof cannot complete the expected training.
In addition, the intelligent driving system should have an effective recognition capability for roads in various scenes, such as city block roads with complex scenes, expressways, mountain roads in underdeveloped regions, etc., but it is impossible to collect label samples under various conditions, and the recognition system trained on a certain specific scene road (such as a city block road) cannot be directly applied to a new scene (such as a mountain road).
Disclosure of Invention
In order to overcome the technical problems, the invention provides a cross-scene recognition model training method, a cross-scene road recognition method and a cross-scene recognition road recognition device.
The purpose of the invention is realized by at least one of the following technical solutions.
Cross-scene recognition model training method based on source domain imagesX s And source domain real label graphy s And cross-scene label-free target domain imagesX t As training data, adopting forward conduction and chain reverse gradient conduction updating methods, and respectively calculating and outputting a prediction of cross-scene recognition, a recognition loss value, a prediction of field adaptation and a field adaptation loss value at a pixel level, a local level and an image level; and respectively performing iterative training of the cross-scene recognition model on prediction and loss values of pixel level, local level and image level of the cross-scene recognition and the field adaptation at the region level and the sample level, and finally obtaining the trained region level and sample level cross-scene recognition model.
Further, the cross-scene recognition model includes a multi-scale feature extractorGHigh resolution aggregate feature extractorMPixel level identifierF p Pixel level domain classifierD p Local level recognizerF l Local area domain classifierD l Image level recognizerF g And an image-level domain classifierD g pTo representpixelIn the short-hand form of (1),lto representlocalIn the short-hand form of (1),gto representglobalThe abbreviation of (1);
the cross-scene recognition model is used for extracting high-resolution aggregation features from the multi-scale features of the input image, recognizing and predicting the high-resolution aggregation features at a pixel level, a local level and an image level, and calculating a recognition loss value, a domain classification prediction and a domain adaptation loss value.
Further, a multi-scale feature extractorGIncluding deep convolutional neural network for source domain samplesX s And target area samplesX t Extracting nSource domain and target domain multi-scale featuresf n s Andf n t wherein s represents a source domain, t represents a target domain, n represents the number of multi-scale features, and the multi-scale features of the source domain and the target domainf n s Andf n t the amount and the size of the channel are the same;
high resolution aggregate feature extractorMThe method comprises a multi-scale feature aggregation layer and a cavity convolution neural network, wherein the multi-scale features of n source fields and n target fields are respectively combinedf n s Andf n t separately aggregated into high resolution source domain featuresO s And target area aggregation featuresO t
The multi-scale feature aggregation layer comprises dynamic parameter variables and a convolutional neural network, wherein the dynamic parameter variables are used for storing convolutional neural parameters for carrying out convolution operation on different scale features, and the convolutional neural network takes the multi-scale features as input to carry out feature calculation and carry out parameter reading and writing on the dynamic parameter variables;
pixel level recognizerF p Local level recognizerF l And image level recognizerF g Each comprises a full convolution neural network and an activation function; aggregating source and target domainsO s AndO t equal input pixel level recognizerF p Local level recognizerF l And image level recognizerF g Pixel level recognizerF p Output source domain and target domain pixel level identification prediction probability mapp s pixel Andp t pixel local level recognizer
Figure DEST_PATH_IMAGE001
Output source field and target field local level identification prediction probability chartp s local Andp t local image level recognizerF g Output source domain and target domain image level identification prediction probability mapp s global Andp t global
further, performing recognition prediction and calculating a recognition loss value, a domain classification prediction and a domain adaptation loss value on the high-resolution aggregation features at a pixel level, a local level and an image level, specifically comprising:
s1 identifying prediction probability map at source domain pixel levelp s pixel And source domain real label graphy s As input, loss functions are identified by pixel levelL pixel Outputting a pixel-level identification penalty value, the pixel-level identification function defined as:
Figure DEST_PATH_IMAGE002
wherein, w and h are the width and height of the probability map respectively; identifying prediction probability maps at the source domain and target domain pixel levelsp s pixel Andp t pixel as input, to a pixel-level domain classifierD p Output domain classification prediction probabilityh s pixel Andh t pixel adapting loss function to pixel level domainL p adapt A pixel-level domain fit loss value is calculated,h s pixel andh t pixel respectively classifying and predicting probabilities for pixel level fields of a source field and a target field; the pixel-level domain fit loss function is defined as:
Figure DEST_PATH_IMAGE003
wherein the content of the first and second substances,L Dp is a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;
s2, identifying and predicting probability chart by local level in source fieldp s local And source domain real label graphy s As input, loss functions are identified by local levelL local Outputting a local level identification loss value, wherein a local level identification loss function is defined as:
Figure DEST_PATH_IMAGE004
wherein the content of the first and second substances,x={x i :i=1,2,…,N}andy={y i :i=1,2,…,N}respectively in a local block manner on the probability prediction graphp s local And source domain real label graphy s The local predicted value obtained above is used as the local predicted value,Nthe number of the partial blocks is represented,μ x μ y are respectively local predicted valuesx, yThe mean value and the standard deviation of the measured values,σ xy is the covariance of the local predicted values,C 1 andC 2 is radix Ginseng;
identifying prediction probability map by local level of source field and target fieldp s local Andp t local as input, to a local level domain classifierD l Output domain classification prediction probabilityh s local Andh t local adapting loss function to local level domainL l adapt Calculating the local level domain adaptation loss value,h s local andh t local respectively predicting probability for local domain classification of a source domain and a target domain; the local level domain adaptation loss function is defined as:
Figure DEST_PATH_IMAGE005
wherein the content of the first and second substances,L Dl is a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;
s3, identifying the prediction probability map at the source domain image levelp s global And source domain real label graphy s As input, loss functions are identified by image levelL global Outputting an image-level identification penalty value, the image-level identification penalty function being defined as follows:
Figure DEST_PATH_IMAGE006
wherein, w and h are the width and height of the probability map respectively;
identifying prediction probability maps at the source and target domain image levelp s global Andp t global as input, an input image level domain classifierD g Output domain classification prediction probabilityh s global Andh t global adapting a loss function to an image-level domainL g adapt Calculating an image-level domain fit loss value,h s global and h t global Respectively classifying and predicting probabilities for image-level fields of a source field and a target field; the image-level domain fit loss function is defined as:
Figure DEST_PATH_IMAGE007
wherein the content of the first and second substances,L Dg is the cross entropy loss function, with 1 being the source domain label and 0 being the target domain label.
Further, pixel-level, local-level and image-level prediction and loss values that jointly cross scene recognition and domain adaptation are performed at the region level acrossIterative training of scene recognition model, which refers to recognizing prediction probability graph by source domain pixel level, local level and image levelp s pixel p s local Andp s global as input, a loss function is trained over a hard-to-partition regionL region Obtaining a training loss value of the hard partition area; the overall training loss of the region-level cross-scene recognition model is defined as follows:
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
wherein the content of the first and second substances,L seg for training loss of the cross scene recognition model recognition function,L adapt for the training loss of the cross-scene-domain adaptation,L R total for the global training loss of the region-level cross-scene recognition model, here
Figure DEST_PATH_IMAGE011
In the shorthand for the region that is,L region training a loss function for difficult regions; hard partition regional training loss functionL region Is defined as:
Figure DEST_PATH_IMAGE012
wherein the content of the first and second substances,p s =1/3p s pixel +3p s local +1/3p s global
Figure DEST_PATH_IMAGE013
the probability condition of the difficult-to-partition area is super-parameter, and gamma is super-parameter; the region horizontal training can enable a training process to pay attention to a pixel region which is difficult and easy to identify errors, and a region horizontal cross-scene identification model which is trained is output;
performing iterative training of a cross-scene recognition model on prediction and loss values of pixel level, local level and image level of sample level combined cross-scene recognition and field adaptation, and recognizing a prediction probability map by using image levels of a source field and a target fieldp s global Andp t global calculating a prediction confidence weight Z for each sample input to the sample level cross-scene recognition model, wherein the overall training loss of the sample level cross-scene recognition model is defined as follows:
Figure 542370DEST_PATH_IMAGE008
Figure 836342DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE014
wherein the content of the first and second substances,L I total for the overall training loss of the sample-level cross-scene recognition model, hereIFor short for Instance, the calculation function of Z is defined as:
Z
Figure DEST_PATH_IMAGE015
p global specifically, the value is taken according to the field to which the sample belongsp s global Andp t global if the training sample is from the source domain, thenp global =p s global If the training sample is from the target domain, thenp global =p t global (ii) a c is radix Ginseng; the sample level training can enable the training process to pay attention to image samples which are difficult to recognize and errors are easy to recognize, and a trained sample level cross-scene recognition model is output.
The cross-scene road identification method comprises the following steps:
a1, receiving road image data X to be identified in the target field;
a2, respectively inputting the road image data X to be recognized in the target field into the trained region horizontal cross-scene recognition model and the sample horizontal cross-scene recognition model obtained by the cross-scene recognition model training method, and respectively outputting region horizontal prediction imagesP R mask And sample level prediction mapP I mask
A3, predicting map according to regional levelP R mask Obtaining the number of horizontally connected areasN R Area corresponding to each region horizontal connected regionK R (ii) a Prediction of pictures from sample levelP I mask Obtaining the number of horizontally connected regions of the sampleN I Area corresponding to each sample horizontal connected regionK I
A4, connecting the number of areas according to area level and sample levelN R AndN I and the area corresponding to the region horizontal and sample horizontal connected regionK R AndK I and outputting a road identification result of the target field road image data I to be identified according to the judgment strategy.
Further, in step a3, a connected region analysis algorithm is used to obtain the number of horizontal connected regions in the regionN R The area corresponding to each region horizontally connected with the regionK R Number of horizontally connected regions of the sampleN I Area corresponding to each sample horizontal connected regionK I
Further, in step a4, the determination strategy is specifically as follows:
j1 strategy, number of connected zones if zone levelN R =1 and the area of the region is horizontally communicatedK R >Area corresponding to horizontal connected region of sampleK I K min Outputting a region level prediction graph for the minimum safe passing region area of the road image data X to be identifiedP R mask The method comprises the steps that a road image data X in the target field is identified, and if not, a J2 strategy is triggered;
j2 strategy, number of connected regions if sample levelN I = 1 and area of sample horizontal connected regionK I Satisfy the requirement ofK I >Area of horizontally connected areaK R Then outputting the sample level prediction graphP S mask And as a result of the target region to-be-recognized road image data X road recognition, otherwise, considering that the J1 policy and the J2 policy are not satisfied at the same time, and taking the warning signal E as a result of the target region to-be-recognized road image data X road recognition.
Further, the air conditioner is provided with a fan,K min = K r βfor the conversion ratio of the resolution of the road image data X to be identified to the physical length,K r the minimum safe passing area in the real road is
Figure DEST_PATH_IMAGE016
Is composed ofrealIn the short-hand writing of the Chinese characters,K r = L r *H r , H r the width of the road required for ground vehicle traffic, such as the distance between the left and right wheels,L r in order to safely stop the distance forwards,L r = v * tvfor the current rate per second of the time,tfor safe braking total time.
The training device for the cross-scene recognition model is characterized by comprising:
a training data input processing unit for inputting the source region imageX s And pixel level real label mapy s And label-free target domain images of different scenesX t Inputting data to a multi-scale feature extraction unit;
the multi-scale feature extraction unit is used for extracting features of multiple scales from input training data and inputting the features into the high-resolution aggregation feature extraction unit;
the high-resolution aggregation feature extraction unit aggregates input features of multiple scales into high-resolution aggregation features; the high-resolution aggregation feature extraction unit comprises a multi-scale feature aggregation module and a cavity convolution calculation module, wherein the multi-scale feature aggregation module is used for carrying out transformation aggregation on a plurality of scale features, and the cavity convolution calculation module is used for improving the resolution of aggregation features;
the multi-stage cross-scene recognition and field classification prediction and loss calculation unit takes the high-resolution aggregation characteristics as input, respectively outputs recognition probability graphs of corresponding levels at a pixel level, a local level and an image level through forward conduction, and further respectively calculates the pixel level, the local level and the image level recognition loss and the field adaptation loss for updating cross-scene recognition model parameters by taking the recognition probability graphs of the pixel level, the local level and the image level, a source field sample real label graph and a field label as input;
the region horizontal cross-scene recognition model joint training unit is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the region horizontal joint pixel level, local level and image level training units and outputting a region horizontal cross-scene recognition model;
the sample horizontal cross-scene recognition model joint training unit is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the sample horizontal joint pixel level, local level and image level training units and outputting a sample horizontal cross-scene recognition model;
and the model storage unit is used for storing the region horizontal cross-scene recognition model output by the region horizontal cross-scene recognition model joint training unit and storing the sample horizontal cross-scene recognition model output by the sample horizontal cross-scene recognition model joint training unit.
Further, the multi-scale feature aggregation module comprises a feature scale transformation processing submodule, a dynamic parameter storage submodule, a convolution calculation submodule and a feature aggregation processing submodule;
the characteristic scale transformation processing submodule is used for carrying out scale scaling transformation on the characteristics of a plurality of scales; the dynamic parameter storage submodule is used for storing neuron parameters for performing convolution operation on different scale features; the convolution calculation submodule performs characteristic calculation by taking different scale characteristics as input and reads and writes neuron parameters in the dynamic parameter storage submodule; the feature aggregation processing submodule is used for completing aggregation operation of a plurality of features.
A cross-scene road recognition device comprising:
the cross-scene image receiving unit to be identified is used for receiving road image data X to be identified in the target field;
the cross-scene road image recognition unit is used for inputting the road image data X to be recognized in the target field into the region level and sample level cross-scene recognition models and respectively outputting a region level road recognition result and a sample level road recognition result;
the connected region calculation unit is used for calculating the connected regions of the region horizontal and sample horizontal path identification results and outputting the number of the region horizontal connected regions, the area of the corresponding region horizontal connected regions, the number of the sample horizontal connected regions and the area of the corresponding sample horizontal connected regions;
the result judging unit is used for judging whether the number of the connected regions of the region level and the sample level and the area of the connected regions of the region level and the sample level meet a judgment strategy or not and outputting an identification result according to the judgment strategy;
and a recognition result storage unit for storing the recognition result output from the result determination unit.
Compared with the prior art, the invention has the advantages that:
the invention provides a cross-scene recognition model training method, on one hand, with labeled source field data and unlabeled target field data as input, a high-resolution aggregation feature can be obtained through a multi-scale feature extractor and high-resolution aggregation feature extraction, which can provide abundant feature information for recognition prediction and field adaptation, further, the prediction losses of all levels of cross-scene recognition and field adaptation are calculated and output at pixel level, local level and image level, and the prediction result of the cross-scene recognition model can be effectively improved in the iterative training, namely, the prior background recognition confidence coefficient and the edge recognition confidence coefficient; preferably, the cross-scene recognition model training is carried out by combining the prediction losses of the cross-scene recognition and the field adaptation at each level in an area level, so that a pixel area which is difficult to recognize and is easy to recognize errors can be focused in the training process, and the cross-scene recognition performance is further improved; preferably, the cross-scene recognition model training is carried out by combining the prediction losses of all levels of cross-scene recognition and field adaptation at the sample level, so that the training process can be focused on image samples which are difficult to recognize and error is easy to recognize, and the cross-scene recognition performance is further improved; on the other hand, the cross-scene road identification method constructed by the region level and sample level cross-scene identification model can provide effective data support for the intelligent driving system in a new scene by calculating the connected regions of the region level and sample level prediction identification results, outputting accurate and safe identification results based on a judgment strategy according to the number and the corresponding areas of the connected regions.
Drawings
Fig. 1 is a schematic diagram of a cross-scene recognition model according to an embodiment of the present invention.
Fig. 2 is a flowchart of training a cross-scene recognition model according to an embodiment of the present invention.
Fig. 3 is a flowchart of a cross-scene road identification method disclosed in the embodiment of the present invention.
Fig. 4 is a schematic diagram of a cross-scene recognition model training device disclosed in the embodiment of the present invention.
Fig. 5 is a schematic diagram of a multi-scale feature aggregation module disclosed in an embodiment of the present invention.
Fig. 6 is a schematic view of a cross-scene road recognition device disclosed in the embodiment of the present invention.
Detailed Description
In the following description, technical solutions are set forth in conjunction with specific figures in order to provide a thorough understanding of the present invention. This application is capable of embodiments in many different forms than those described herein and it is intended that all such modifications that would occur to one skilled in the art are deemed to be within the scope of the invention.
The terminology used in the description is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, etc. may be used herein to describe various information in one or more embodiments of the specification, these information should not be limited by these terms, which are used only for distinguishing between similar items and not necessarily for describing a sequential or chronological order of the features described in one or more embodiments of the specification. Furthermore, the terms "having," "including," and similar referents, are intended to cover a non-exclusive scope, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to the particular details set forth, but may include other inherent information not expressly listed for such steps or modules.
In the first place, a description will be given of terms used in designing one or more embodiments of the present invention.
Source field: refers to a road image data set with labels and data distribution conforming to independent same distribution, such as a road image from a city street;
target area: the data distribution is different from the data distribution of the source field without a label, such as a road image from a rural mountain land;
identifying a cross-scene road: when the road recognition based on the source field training cannot be directly applied to the road image recognition in the target field, the cross-scene iterative training needs to be carried out on the road recognition model based on the data in the source field and the data in the target field, and the obtained cross-scene road recognition model can be applied to the road recognition in the target field, so that an important basis is provided for the application of the intelligent driving system in a new scene.
Example 1:
in embodiment 1, what is adopted is a cross-scene recognition model training method provided by the present invention, and in order to make the skilled person better understand the method, firstly, the important components of the method are explained in detail:
in this example 1, training samples include collecting source domain images from city street views
Figure DEST_PATH_IMAGE017
And pixel level real label map
Figure DEST_PATH_IMAGE018
And capturing unlabeled target domain images of different scenes from remote mountainous areas
Figure DEST_PATH_IMAGE019
The collection scene is only explained as embodiment 1, and the invention is not strictly limited;
in this example 1, the source region image is usedX s And source domain real label graphy s And cross-scene label-free target domain imagesX t As training data, adopting forward conduction and chain reverse gradient conduction updating methods, and respectively calculating and outputting a prediction of cross-scene recognition, a recognition loss value, a prediction of field adaptation and a field adaptation loss value at a pixel level, a local level and an image level; at pixel level, local level and image level for joint cross scene recognition and domain adaptation at region level and sample level, respectivelyAnd performing iterative training of the cross-scene recognition model on the prediction and loss values to finally obtain a trained region level and sample level cross-scene recognition model.
In this embodiment 1, the cross-scene recognition model is shown in fig. 1 and includes a multi-scale feature extractorGHigh resolution aggregate feature extractorMPixel level identifierF p Pixel level domain classifierD p Local level recognizerF l Local area domain classifierD l Image level recognizerF g And an image-level domain classifierD g pTo representpixelIn the short-hand form of (1),lto representlocalIn the short-hand form of (1),gto representglobalThe abbreviation of (1);
the cross-scene recognition model is used for extracting high-resolution aggregation features from the multi-scale features of the input image, recognizing and predicting the high-resolution aggregation features at a pixel level, a local level and an image level, and calculating a recognition loss value, a domain classification prediction and a domain adaptation loss value.
Multi-scale feature extractorGIncluding a deep convolutional neural network, the multi-scale feature extractor may adopt various mainstream deep convolutional networks, the present invention is not strictly defined, and Res2Net50 is adopted in this embodiment 1, and is used for source domain samplesX s And target area samplesX t Extracting n source domain and target domain multi-scale featuresf n s Andf n t wherein s represents a source domain, t represents a target domain, n represents the number of multi-scale features, and the multi-scale features of the source domain and the target domainf n s Andf n t the amount and the size of the channel are the same;
in the present example 1, the first step,
Figure DEST_PATH_IMAGE020
multi-scale features are output on Block2, Block3, Block4, Block 5in Res2Net50 network, respectivelyIs prepared from (a)f 1 s ,f 1 s )、(f 2 s ,f 2 s )、(f 3 s ,f 3 s )、(f 4 s ,f 4 s ) The number of the characteristic channels is respectively 256, 512, 1024 and 2048, and the scale proportion among the characteristics is 1, 1/2, 1/4 and 1/8;
in this example 1, a high resolution aggregate feature extractorMComprises two multi-scale feature aggregation layers and a cavity convolution neural network, which respectively combines n source fields and n target fields with multi-scale featuresf n s Andf n t separately aggregating into high resolution source domain aggregation featuresO s And target area aggregation featuresO t
In this embodiment 1, the multi-scale feature aggregation layer includes a feature scale transformation function, a dynamic parameter variable and a plurality of convolutional neural networks, where the dynamic parameter variable is used to store convolutional neural parameters for performing convolutional operations on features of different scales, and the convolutional neural networks perform feature calculation by using the multi-scale features as inputs and perform parameter reading and writing on the dynamic parameter variable;
in this embodiment 1, the feature scale transformation function is scaled by bilinear interpolation, and the dynamic parameter variable is scaled by the scale features of the actual source field and the target fieldf n s Orf n t Determining the number of channels, wherein the number of the convolutional neural networks is determined by multi-scale characteristics of a source field and a target fieldf n s Orf n t The number of the electric wires is determined,
in this embodiment 1, there are 4 convolutional neural networks, the dynamic parameter variable scale may be represented as (in, out,1, 1), where in is the sum of the number of multi-scale feature channels input to the multi-scale feature aggregation layer, and out is the number of the multi-scale feature channels input to the multi-scale feature aggregation layerAfter the convolution calculation of the degree feature aggregation layer, the sum of the numbers of the channels of the multi-scale feature channels is output, the dynamic parameter variable scale of the first multi-scale feature aggregation layer L1 in this embodiment 1 is (3840, 1920, 1, 1), the dynamic parameter variable scale of the second multi-scale feature aggregation layer L2 in this embodiment 1 is (1920, 1920, 1, 1), that is, the number of the channels of the input and output feature channels is unchanged, and the source domain multi-scale feature in this embodiment 1 is the sum of the numbers of the channels of the multi-scale feature channels output after the convolution calculation of the degree feature aggregation layerf n s Features comprising 4 scales, i.e. n = (1, 2, 3, 4),f 1 s f 2 s f 3 s andf 4 s respectively representing a first scale feature, a second scale feature, a third scale feature and a fourth scale feature of the source field, and carrying out multi-scale feature on the source fieldf n s The polymerization operation of (A) is exemplified by the following:
B1、f 1 s corresponding feature metrics (256, H, W) as input to the parameter with index bits (1/15in,1/15out) in the dynamic parametric variable 1 as convolution kernel Conv (1)
Figure DEST_PATH_IMAGE021
) Parameter, output convolved first scale dimension reduction featuref 1 s _1 On the scale of (128, H, W); according tof 2 s f 3 s f 4 s Corresponding feature size (height, width) pairsf 1 s After the characteristic scale of (a) is subjected to 1/2, 1/4 and 1/8 times transformation by a characteristic scale transformation function, the characteristic scale is respectively used as parameters with index bits of (2/15in,2/15out), (4/15in,4/15out) and (8/15in,8/15out) in the dynamic parameter variable 1 and is used as a convolution kernel Conv (1)
Figure 36702DEST_PATH_IMAGE021
) Parameter, 1/2 scaling feature outputting convolved first scale dimension reduction featuref 1 s _2 1/4 scaling feature of first dimension reduction featuref 1 s _3 And 1/8 scaling features of the first dimension reduction featuref 1 s _4 f 1 s _2 Has a dimension of (256, 1/2H, 1/2W),f 1 s _3 has a dimension of (512, 1/4H, 1/4W),f 1 s _4 has a scale of (1024, 1/8H, 1/8W);
B2、f 2 s corresponding feature metrics (512, 1/2H, 1/2W) as input to the parameter with index bits 2/15in,2/15out of the dynamic parameter variable 1 as convolution kernel Conv (1)
Figure DEST_PATH_IMAGE022
) Parameter, output convolved second scale dimension reduction featuref 2 s _2 On the scale of (256, 1/2H, 1/2W); according tof 1 s f 3 s f 4 s Corresponding feature size (height, width) pairsf 2 s After the feature scale of (2), (1/2) and (1/4) times transform by a feature scale transform function, the feature scale of (2), (1/2) and (1/4) times transform is respectively used as parameters input to the dynamic parameter variable 1 with index bits of (1/15in,1/15out), (4/15in,4/15out) and (8/15in,8/15out) as convolution kernels Conv (1)
Figure 912472DEST_PATH_IMAGE022
) Parameter, output 2-fold scale scaling feature of feature second scale dimension reduction feature after convolutionf 2 s _1 1/2-fold scale scaling feature of second-scale dimension reduction featuref 2 s _3 And 1/4-fold scale scaling features of the second dimension reduction featuref 2 s _4 f 2 s _1 Has a dimension of (128, H, W),f 2 s _3 has a dimension of (512, 1/4H, 1/4W),f 2 s _4 has a scale of (1024, 1/8H, 1/8W);
B3、f 3 s corresponding feature metrics (1024, 1/4H, 1/4W) as input to the parameter with index bit (4/15in,4/15out) in the dynamic parameter variable 1 as convolution kernel Conv (1)
Figure 808402DEST_PATH_IMAGE022
) Parameter, output convolved third scale dimension reduction featuref 3 s _3 On the scale of (512, 1/4H, 1/4W); according tof 1 s f 2 s f 4 s Corresponding feature size (height, width) pairsf 3 s After the feature scale of (2) is transformed by 4, 1/2 and 1/4 times through a feature scale transformation function, the transformed feature scale is respectively used as parameters with index bits of (1/15in,1/15out), (2/15in,2/15out) and (8/15in,8/15out) in a dynamic parameter variable 1 and is used as a convolution kernel Conv (1)
Figure 816152DEST_PATH_IMAGE022
) Parameter, output 4 times scale scaling feature of feature third scale dimension reduction feature after convolutionf 3 s _1 1/2 times scale scaling feature of third scale dimension reduction featuref 3 s _2 Andf 3 s _4 1/4-fold scale scaling feature of third-scale dimension reduction featuref 3 s _1 Has a dimension of (128, H, W),f 3 s _2 has a dimension of (256, 1/2H, 1/2)W),f 3 s _4 Has a scale of (1024, 1/8H, 1/8W);
B4、f 4 s corresponding feature metrics (2048, 1/8H, 1/8W) as input to the parameter with index bits 1/8H, 1/8W in the dynamic parameter variable 1 as convolution kernel Conv (1)
Figure 741908DEST_PATH_IMAGE021
) Parameter, output convolved fourth scale dimensionality reduction featuref 4 s _4 On the scale of (1024, 1/8H, 1/8W); according tof 1 s f 2 s f 3 s Corresponding feature size (height, width) pairf 4 s After 8, 4 and 2 times transformation is carried out on the characteristic scale of (1), the characteristic scale is respectively used as parameters with index bits of (1/15in,1/15out), (2/15in,2/15out), (4/15in and 4/15out) in the dynamic parameter variable 1 and is used as a convolution kernel Conv (1)
Figure 791073DEST_PATH_IMAGE021
) Parameter, output 8 times scale scaling feature of feature fourth scale dimension reduction feature after convolutionf 4 s _1 A 4-fold scale scaling feature of a fourth scale dimension reduction feature
Figure DEST_PATH_IMAGE023
And a 2-fold scale scaling feature of a fourth scale dimension reduction feature
Figure DEST_PATH_IMAGE024
f 4 s _1 Has a dimension of (128, H, W),
Figure 136252DEST_PATH_IMAGE023
has a dimension of (256, 1/2H, 1/2W),
Figure DEST_PATH_IMAGE025
has a dimension of (512, 1/4H, 1/4W);
b5, performing aggregation operation and hole convolution calculation on the features with the same scale, (B)f 1 s _1 , f 2 s _1 , f 3 s _1 ,f 4 s _1 ) The aggregate and void convolution output characteristics aref 1 s’ ,(f 1 s _2 , f 2 s _2 , f 3 s _2 ,f 4 s _2 ) The aggregate and void convolution output characteristics aref 2 s’ ,(f 1 s _3 , f 2 s _3 , f 3 s _3 ,f 4 s _3 ) The aggregate and void convolution output characteristics aref 3 s’ ,(f 1 s _4 , f 2 s _4 , f 3 s _4 ,f 4 s _4 ) The aggregate and void convolution output characteristics aref 4 s’ In this embodiment, the aggregation operation is summation;
B6、f 2 s’ f 3 s’ f 4 s’ according tof 1 s’ After the scales are respectively subjected to 2, 4 and 8 times of transformation by the characteristic scale transformation function, the transformed 4 characteristics are respectively subjected toIs input to a parameter with index bits of (1/15in,1/15out), (2/15in,2/15out), (4/15in,4/15out), (8/15in,8/15out) in the dynamic parameter variable 2 as a convolution kernel Conv (1)
Figure DEST_PATH_IMAGE026
) The parameters are subjected to convolution calculation and aggregation calculation, and high-resolution aggregation characteristics are outputO s (ii) a Pixel level recognizerF p Local level recognizerF l And image level recognizerF g Each comprises a full convolution neural network and an activation function; aggregating source and target domainsO s AndO t equal input pixel level recognizerF p Local level recognizerF l And image level recognizerF g Pixel level recognizerF p Output source domain and target domain pixel level identification prediction probability mapp s pixel Andp t pixel local level recognizer
Figure DEST_PATH_IMAGE027
Output source field and target field local level identification prediction probability chartp s local Andp t local image level recognizerF g Output source domain and target domain image level identification prediction probability mapp s global Andp t global
in this embodiment 1, a pixel level identifierF p From a full convolution network Conv (1)
Figure 374291DEST_PATH_IMAGE026
) And Sigmoid activation function, which is used to output pixel level identification prediction probability chart;
local level recognizerF l From a full convolution network Conv (1)
Figure 803654DEST_PATH_IMAGE026
) And Sigmoid activation function, which is used to output the local level recognition prediction probability chart;
image level recognizerF g From a full convolution network Conv (1)
Figure 436936DEST_PATH_IMAGE026
) And Sigmoid activation function, which is used to output image level identification prediction probability chart;
pixel level domain classifierD p Comprising two full convolution networks Conv (1)
Figure DEST_PATH_IMAGE028
And a Sigmoid activation function for outputting domain classification prediction of pixel level;
local level domain classifierD l Comprising two full convolution networks Conv (1)
Figure 541681DEST_PATH_IMAGE028
And Sigmoid activation function, used for outputting the domain classification prediction of the local level;
image-level domain classifierD g Comprising two full convolution networks Conv (1)
Figure 380980DEST_PATH_IMAGE028
And a Sigmoid activation function for outputting domain classification prediction at an image level.
In this embodiment 1, the cross-scene recognition model training is as shown in fig. 2, which is specifically as follows:
step 200, using source field sampleX s And target area samplesX t Input multi-scale feature extractor
Figure DEST_PATH_IMAGE029
The output multi-scale features are (f 1 s ,f 1 s )、(f 2 s ,f 2 s )、(f 3 s ,f 3 s )、(f 4 s ,f 4 s );
Step 201, inputting the output features of step 200 into a high-resolution aggregation feature extractor
Figure DEST_PATH_IMAGE030
High resolution aggregate features of output source domain and target domainO s AndO t
step 202, aggregating the source domain and target domain output in step 201O s AndO t as input, through a pixel level recognizerF p Post-output source domain and target domain pixel level identification prediction probability mapp s pixel Andp t pixel
step 203, aggregating the characteristics of the source domain and the target domain output in the step 201O s AndO t as input, via a local level recognizerF l Output source field and target field local level identification prediction probability chartp s local Andp t local
step 204, aggregating the source domain and target domain output in step 201O s AndO t as input, via an image level recognizerF g Output source domain and target domain image level identification prediction probability mapp s global Andp t global
step 205, calculating and outputting prediction loss of each level of cross-scene recognition and field adaptation at pixel level, local level and image level, specifically as follows:
step 205-1, source domain pixel level identification prediction probability map output in step 202p s pixel As an input, the pixel level identification penalty value is output by a pixel level identification penalty function defined as follows:
Figure 21607DEST_PATH_IMAGE002
wherein the content of the first and second substances,y s the method comprises the steps that a source field sample pixel level real label graph is obtained, w and h are the width and the height of a probability graph;
step 205-2, identifying prediction probability map of pixel level of source domain and target domain output in step 202p s pixel Andp t pixel as input, to a pixel-level domain classifierD p The function is to realize the domain classification prediction of the source domain and the target domain at the pixel level and output the probability of the domain classification predictionh s pixel Andh t pixel adapting loss function to pixel level domainL p adapt A pixel-level domain fit loss value is calculated,h s pixel andh t pixel the probability is predicted for the source domain and the target domain classification, respectively, and the pixel-level domain adaptation loss function is defined as follows:
Figure DEST_PATH_IMAGE031
wherein the content of the first and second substances,L Dp the method comprises the following steps of (1) obtaining a pixel-level cross entropy loss function, wherein 1 is a source domain label, and 0 is a target domain label;
step 205-3, the source domain local level identification prediction probability map outputted in step 203p s local As an input, the local level identification loss value is output by a local level identification loss function, which is defined as follows:
Figure 863442DEST_PATH_IMAGE004
wherein the content of the first and second substances,x={x i :i=1,2,…,N}andy={y i :i=1,2,…,N}in the probability prediction graph in a local block mode respectivelyp s local And the source domain real label mapy s The local predicted value obtained above is used as the local predicted value,Nthe number of the partial blocks is represented,μ x μ y respectively the local predicted valuesx, yThe mean value and the standard deviation of the measured values,σ xy is the covariance of the local predicted values,C 1 andC 2 is radix Ginseng;
step 205-4, the source domain and target domain local level identification prediction probability map output in step 203p s local Andp t local as input, to a local level domain classifierD l The method has the functions of realizing the domain classification prediction of the source domain and the target domain at the local level and outputting the probability of the domain classification predictionh s local Andh t local adapting loss function to local level domainL l adapt Calculating the local level domain adaptation loss value,h s local andh t local respectively classifying and predicting probabilities for local level domains of a source domain and a target domain; the local level domain adaptation loss function is defined as follows:
Figure DEST_PATH_IMAGE032
wherein the content of the first and second substances,L Dl the local level cross entropy loss function is shown, 1 is a source domain label, and 0 is a target domain label;
step 205-5, toSource domain image level identification prediction probability map output in step 204p s global As an input, the image-level recognition loss value is output by an image-level recognition loss function defined as follows:
Figure 544847DEST_PATH_IMAGE006
wherein the content of the first and second substances,p s global a prediction probability map is identified for the image level,y s the method comprises the following steps that a real label graph of a source field is obtained, and w and h are width and height of a probability graph respectively;
step 205-6, the source domain and target domain image level identification prediction probability map output in step 204p s global Andp t global as input, an input image level domain classifier
Figure DEST_PATH_IMAGE033
The method has the functions of realizing the domain classification prediction of the source domain and the target domain at the image level and outputting the probability of the domain classification predictionh s global Andh t global adapting a loss function to an image-level domainL g adapt Calculating an image-level domain fit loss value,h s global and h t global Respectively classifying and predicting probabilities for image-level fields of a source field and a target field; the graph domain fit loss function is defined as follows:
Figure DEST_PATH_IMAGE034
wherein the content of the first and second substances,L Dg is a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;
step 206, outputting the source domain pixel level and the local level output in step 202, step 203 and step 204And image level recognition prediction probability mapp s pixel p s local Andp s global as input, a loss function is trained over a hard-to-partition regionL region Obtaining a loss value, and performing cross-scene recognition model training on the prediction losses of each level of cross-scene recognition and field adaptation in the region level combination step 205, wherein the overall training loss of the region level cross-scene recognition model is defined as follows:
Figure 96657DEST_PATH_IMAGE008
Figure 574648DEST_PATH_IMAGE009
Figure 859655DEST_PATH_IMAGE010
wherein the content of the first and second substances,L seg for training loss of the cross scene recognition model recognition function,L adapt for the training loss of the cross-scene-domain adaptation,L R total overall training loss for region-level cross-scene recognition models (here
Figure DEST_PATH_IMAGE035
In shorthand for region),L region the training loss for difficult regions is defined as follows:
Figure DEST_PATH_IMAGE036
wherein the content of the first and second substances,p s =1/3p s pixel +3p s local +1/3p s global
Figure DEST_PATH_IMAGE037
the probability condition of the difficult-to-partition area is super-parameter, and gamma is super-parameter; in this exampleρ=0.5γ = 0.25, the region level training allows the training process to focus on the difficult and error-prone pixel region, and outputs the trained region level cross-scene recognition model;
step 207, identifying the image level output from step 204 and predicting the probability mapp s global Andp t global calculating a prediction confidence weight Z of each sample input to the sample level cross-scene recognition model, performing sample level cross-scene recognition model training at the prediction loss of each level of cross-scene recognition and field adaptation in the sample level combination step 205, wherein the overall training loss of the sample level cross-scene recognition model is defined as follows:
Figure 415356DEST_PATH_IMAGE008
Figure 94993DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE038
wherein the content of the first and second substances,L s total overall training loss for the sample-level cross-scene recognition model (hereSIn brief for sample), the computation function for Z is defined as:
Figure DEST_PATH_IMAGE039
Zpenalizing the corresponding loss of each sample in the training process,p global specifically, values are taken according to the field of the samplep s global Andp t global if the training sample is from the source domainp global =p s global If the training sample is from the target domain, thenp global =p t global (ii) a And c is super parameter, the sample level training can enable the training process to pay attention to image samples which are difficult to recognize and error to be recognized easily, and the trained sample level cross-scene recognition model is output.
In this embodiment 1, in the iterative training process of the cross-scene recognition model, the model parameters are updated by using the forward-oriented conduction and the chained backward gradient conduction updating methods, wherein the gradient updating related to the updating of the recognition model parameters is performed by using the random gradient descent method, and the domain classifier is updated by using the random gradient descent methodD p D l 、D g The gradient updating related to the parameter updating adopts a self-adaptive moment estimation method;
preferably, the training process of the cross-scene recognition model is realized at three levels, namely, a region level and a sample level, which are respectively combined with the pixel level, a local level and an image level, the recognition model output based on the region level training in the step 206 can pay more attention to the difficult and error-prone pixel region in the cross-scene recognition task, and the recognition model output based on the sample level training in the step 207 can pay more attention to the difficult and error-prone image sample in the cross-scene recognition task;
preferably, in embodiment 1 of the present invention, the area-level and sample-level cross-scene recognition model trained at three levels, i.e., the area-level and sample-level joint pixel level, the local level, and the image level, may apply a recognition task of a new field, for example, set the recognition task as a vehicle-mounted application, and with the target field described in embodiment 1 as a new scene, the road recognition in a remote mountain area may be reliably performed by using the area-level and sample-level cross-scene recognition model; the following can be described by way of example 2.
Example 2:
embodiment 2 of the present invention provides a cross-scene road recognition method, which adopts a cross-scene recognition model of an area level and a sample level obtained by the training method provided in embodiment 1;
as shown in fig. 3, a cross-scene road identification method includes the following steps:
301, receiving road image data X to be identified in a target field;
step 302, inputting road image data X to be recognized in the target field into the region horizontal cross-scene recognition model as input, and outputting a region horizontal prediction graphP R mask
Step 303, inputting the road image data X to be recognized in the target field into the sample horizontal cross-scene recognition model as input, and outputting a sample horizontal prediction graphP S mask
Step 304, area level prediction graph output in step 302P R mask As input, calculating the number of single connected regions and horizontal connected regions of output regionN R And the area corresponding to each horizontally connected regionK R
Step 305, the sample level prediction graph output in step 303P I mask As input, calculating single connected region, and outputting the number of horizontal connected regionsN I And the area corresponding to each sample horizontal connected regionK I
Step 306, the number of connected region horizontal and sample horizontal output in steps 304 and 305N R AndN I and the area corresponding to the region horizontal and sample horizontal connected regionK R AndK I and outputting the road identification result of the target field to-be-identified road image data I according to the judgment strategy.
Acquiring the number of horizontal connected regions of the region by adopting a connected region analysis algorithmN R The area corresponding to each region horizontally connected with the regionK R Sample waterNumber of flat connected regionsN I Area corresponding to each sample horizontal connected regionK I
In this embodiment 2, the connected component analysis algorithm is a Two-Pass method.
The decision strategy in this embodiment 2 is used to decide which kind of result to output, and specifically the following is:
j1 strategy, number of connected zones if zone levelN R =1 and the area of the region is horizontally communicatedK R >Area corresponding to horizontal connected region of sampleK I K min Outputting a region level prediction graph for the minimum safe passing region area of the road image data X to be identifiedP R mask The method comprises the steps that a J2 strategy is triggered if the result of road identification of road image data I to be identified in the target field is obtained, otherwise;
j2 strategy, number of connected regions if sample levelN I = 1 and area of sample horizontal connected regionK I Satisfy the requirement ofK I >Area of horizontally connected areaK R Then outputting the sample level prediction graphP S mask And as a result of the target region to-be-recognized road image data I road recognition, otherwise, considering that the J1 policy and the J2 policy are not satisfied at the same time, and taking the warning signal E as a result of the target region to-be-recognized road image data I road recognition.
In the foregoing steps 302 and 303, according to the region level and the sample level cross-scene recognition model, the prediction probability maps of the image I to be recognized at the pixel level, the local level and the image level are obtained at firstp R pixel p R local Andp R global andp S pixel p S local andp S globa where R represents the output prediction from the region-level cross-scene recognition model and S represents the output prediction from the sample waterFor the output prediction of the horizontal span scene recognition model, in this embodiment 2, the prediction graph calculation manner in the foregoing steps 302 and 303 is as follows:P R mask =1/3p R pixell +1/3p R local +1/3p R global P S mask =1/3p S pixell +1/3p S local +1/3p S global
in the foregoing step 304 and step 305, the prediction graph is subjected toP R mask P I mask The method for calculating the single connected region can be any connected region method, and the method is not strictly limited.
In the aforementioned strategy of J1,K min = K r βfor the conversion ratio of the resolution of the road image data X to be identified to the physical length,K r the minimum safe passing area in the real road is
Figure DEST_PATH_IMAGE040
Is composed ofrealIn the short-hand writing of the Chinese characters,K r = L r *H r , H r the width of the road required for ground vehicle traffic, such as the distance between the left and right wheels,L r in order to safely stop the distance forwards,L r = v * tvfor the current rate per second of the time,tfor safe braking total time.
As shown in table 1, in the embodiment, the image acquisition in the tagged source field from the city block view and the image acquisition in the untagged target field from the remote mountain area are used as training data, and the accuracy of the cross-scene road identification is significantly improved by the cross-scene road identification method, which is obtained by the training method, in the cross-scene identification model at the area level and the sample level.
TABLE 1
Method Mountain road recognition result
Source-only domain training model 42.06%
This example 72.56%
Example 3:
this embodiment 3 discloses a cross-scene recognition model training device, as shown in fig. 4, including:
the training data is input to the processing unit 41, and the source region image is inputX s And pixel level real label mapy s And label-free target domain images of different scenesX t Data is input to the multiscale feature extraction unit 42;
a multi-scale feature extraction unit 42 that extracts features of a plurality of scales from input training data and inputs the extracted features to a high-resolution aggregate feature extraction unit 43;
a high-resolution aggregation feature extraction unit 43 configured to aggregate the input features of a plurality of scales into a high-resolution aggregation feature; in this embodiment 3, the high-resolution aggregated feature extraction unit includes a first multi-scale feature aggregation module 431, a cavity convolution calculation module 432, and a second multi-scale feature aggregation module 433, where the multi-scale feature aggregation module is configured to perform transform aggregation on multiple scale features, and the cavity convolution calculation module is configured to improve the aggregated feature resolution;
a multi-stage cross-scene recognition and domain classification prediction and loss calculation unit 44, which takes the high resolution aggregation feature as input, respectively outputs recognition probability maps of corresponding levels at pixel level, local level and image level by forward conduction, and further respectively calculates pixel level, local level and image level recognition loss and domain adaptation loss for updating cross-scene recognition model parameters by taking the recognition probability maps of pixel level, local level and image level, the source domain sample real label map and the domain label as input;
a region horizontal cross-scene recognition model joint training unit 45 which realizes parallel iterative training of recognition model execution recognition and cross-scene field adaptation in a region horizontal joint pixel level, local level and image level training unit and outputs a region horizontal cross-scene recognition model;
the sample horizontal cross-scene recognition model joint training unit 46 is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the sample horizontal joint pixel level, local level and image level training units and outputting a sample horizontal cross-scene recognition model;
the model storage unit 47 stores the region-level cross-scene recognition model output from the region-level cross-scene recognition model joint training unit 45 and the sample-level cross-scene recognition model output from the sample-level cross-scene recognition model joint training unit 46.
In this embodiment 3, the first multi-scale feature aggregation module 431 and the second multi-scale feature aggregation module 433 both include a feature scale transformation processing submodule, a dynamic parameter storage submodule, a convolution calculation submodule, and a feature aggregation processing submodule;
the characteristic scale transformation processing submodule is used for carrying out scale scaling transformation on the characteristics of a plurality of scales; the dynamic parameter storage submodule is used for storing neuron parameters for performing convolution operation on different scale features; the convolution calculation submodule performs characteristic calculation by taking different scale characteristics as input and reads and writes neuron parameters in the dynamic parameter storage submodule; the feature aggregation processing submodule is used for completing aggregation operation of a plurality of features.
In this embodiment 3, the multi-stage cross-scene recognition and domain classification prediction and loss calculation unit 44 specifically includes:
a pixel-level training module, which comprises a pixel-level recognition and prediction submodule, a pixel-level recognition loss function calculation submodule, a pixel-level field prediction submodule and a pixel-level field adaptive loss function calculation submodule, the pixel-level identification prediction submodule takes the high-resolution aggregation characteristic of each training sample as input and outputs a pixel-level identification prediction probability map, the pixel-level identification loss function calculation submodule takes the pixel-level identification prediction probability map and a source field sample real label map as input and calculates a pixel-level identification loss value, the pixel-level field prediction submodule takes the pixel-level identification prediction probability map as input and outputs a pixel-level field prediction probability map, and the pixel-level field adaptive loss function calculation submodule takes the pixel-level field prediction probability map and a field label as input and calculates the pixel-level field adaptive loss value;
a local level training module, which comprises a local level identification prediction sub-module, a local level identification loss function calculation sub-module, a local level field prediction sub-module, and a local level field adaptive loss function calculation sub-module, the local level identification prediction submodule takes the high-resolution aggregation characteristic of each training sample as input and outputs a local level identification prediction probability chart, the local level identification prediction probability map and the source field sample real label map are valued in a grid form and are input to the local level identification loss function calculation submodule to calculate a local level identification loss value, the local level field prediction submodule takes the local level identification prediction probability map as input and outputs the local level field prediction probability map, and the local level field adaptive loss function calculation submodule takes the local level field prediction probability map and the field label as input to calculate a local level field adaptive loss value;
an image-level training module, which comprises an image-level recognition and prediction submodule, an image-level recognition loss function calculation submodule, an image-level field prediction submodule and an image-level field adaptive loss function calculation submodule, the image-level recognition prediction sub-module takes the high-resolution aggregation characteristic of each training sample as input and outputs an image-level recognition prediction probability map, the image-level recognition prediction probability map and the source field sample real label map are used as input to the local-level recognition loss function calculation submodule to calculate an image-level recognition loss value, the image-level domain prediction sub-module takes the image-level recognition prediction probability map as an input, outputs an image-level domain prediction probability map, the image-level field adaptation loss function calculation submodule takes the image-level field prediction probability graph and the field label as input to calculate an image-level field adaptation loss value;
in this embodiment 3, the unit 45 for jointly training the horizontal cross-scene recognition model of the region specifically includes:
the difficult-to-divide area loss function calculation module is used for taking a source field pixel level, a local level and an image level recognition prediction probability graph as input and outputting a difficult-to-divide area training loss value;
the parameter updating module is used for taking the training loss value of the difficult-to-divide area, the pixel level, the local level, the image level recognition loss and the field adaptation loss as gradient values required by parameter updating in the process of jointly training and iterating the horizontal cross-scene recognition model of the input calculation area;
in this embodiment 3, the sample horizontal cross-scene recognition model joint training unit 46 specifically includes:
the hard-to-separate sample loss function calculation module is used for taking a source field pixel level, a local level and an image level recognition prediction probability graph as input and outputting a hard-to-separate sample training loss value;
and the parameter updating module is used for calculating gradient values required by parameter updating in the process of jointly training and iterating the horizontal cross-scene recognition model of the sample by taking the training loss value, the pixel level, the local level, the image level recognition loss and the field adaptation loss of the difficultly-divided sample as input.
Example 4:
the embodiment 4 discloses a cross-scene road recognition device, as shown in fig. 6, including:
the to-be-identified cross-scene image receiving unit 61 is used for receiving the target field to-be-identified road image data I;
the cross-scene road image recognition unit 62 is configured to input the target-field road image data I to be recognized to the region-level and sample-level cross-scene recognition models, and output a region-level road recognition result and a sample-level road recognition result respectively;
a connected region calculation unit 63, configured to perform connected region calculation on the region level and sample level path identification results, and output the number of region horizontal connected regions, the area of a corresponding region horizontal connected region, the number of sample horizontal connected regions, and the area of a corresponding sample horizontal connected region;
a result determination unit 64 for determining whether the number of connected regions of the region level and the sample level and the area of the connected regions of the region level and the sample level satisfy a determination policy, and outputting a recognition result according to the determination policy;
a recognition result storage unit 65 for storing the recognition result output from the result determination unit 64.
It should be noted that, for the sake of simplicity, the cross-scene recognition model training method, the cross-scene road recognition method, and the embodiments are all described as a series of steps or operation combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence, because some steps or operations may be performed in other sequences or simultaneously according to the present application.
The preferred embodiments of the present application disclosed above are intended only to aid in the understanding of the invention and the core concepts. For those skilled in the art, there may be variations in the specific application scenarios and implementation operations based on the concepts of the present invention, and the description should not be taken as a limitation of the present invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (8)

1. A cross-scene recognition model training method is characterized in that a source field image X is usedsAnd source domain real label graph ysAnd cross-scene unlabeled target domain image XtAs training data, forward conduction and chained reverse gradient conduction updating methods are adopted to calculate outputs at pixel level, local level and image level respectivelyGenerating a prediction of cross-scene recognition and a prediction of recognition loss value and domain adaptation and a domain adaptation loss value; respectively performing cross-scene recognition and field-adapted pixel level, local level and image level prediction and loss value joint at the area level and the sample level to perform cross-scene recognition model iterative training, and finally obtaining a trained area level and sample level cross-scene recognition model; the cross-scene recognition model comprises a multi-scale feature extractor G, a high-resolution aggregation feature extractor M and a pixel-level recognizer FpPixel level domain classifier DpLocal level identifier FlLocal area domain classifier DlImage level recognizer FgAnd an image-level domain classifier Dg
The cross-scene recognition model is used for extracting high-resolution aggregation features from the multi-scale features of the input image, performing recognition prediction on the high-resolution aggregation features at a pixel level, a local level and an image level, and calculating a recognition loss value, a domain classification prediction and a domain adaptation loss value;
the multi-scale feature extractor G comprises a deep convolutional neural network for aligning the source domain samples XsAnd target area sample XtExtracting n source fields and target fields of multi-scale features fn sAnd fn tWherein s represents a source domain, t represents a target domain, n represents the number of multi-scale features, and the multi-scale features f of the source domain and the target domainn sAnd fn tThe amount and the size of the channel are the same;
the high-resolution aggregation feature extractor M comprises a multi-scale feature aggregation layer and a cavity convolution neural network, and respectively integrates n multi-scale features f in the source field and the target fieldn sAnd fn tRespectively polymerized into high-resolution source domain characteristics OsAnd target Domain polymerization characteristics Ot
The multi-scale feature aggregation layer comprises dynamic parameter variables and a convolutional neural network, wherein the dynamic parameter variables are used for storing convolutional neural parameters for carrying out convolution operation on different scale features, and the convolutional neural network takes the multi-scale features as input to carry out feature calculation and carry out parameter reading and writing on the dynamic parameter variables;
pixel level identifier FpLocal level identifier FlAnd an image level identifier FgEach comprises a full convolution neural network and an activation function; aggregating Source and target domains into a feature OsAnd OtEqual input pixel level identifier FpLocal level identifier FlAnd an image level identifier FgPixel level identifier FpOutput source domain and target domain pixel level identification prediction probability map ps pixelAnd pt pixelLocal level identifier FlOutput source field and target field local level identification prediction probability map ps localAnd pt localImage level recognizer FgOutput source domain and target domain image level identification prediction probability map ps globalAnd pt global
2. The method for training a cross-scene recognition model according to claim 1, wherein the recognizing and predicting, calculating recognition loss values, the domain classification and predicting, and calculating the domain adaptation loss values of the high-resolution aggregated features at a pixel level, a local level, and an image level specifically include:
s1 identifying prediction probability map p at source domain pixel levels pixelAnd source domain real label graph ysAs input, the loss function L is identified by the pixel levelpixelOutputting a pixel-level identification penalty value, the pixel-level identification function defined as:
Figure FDA0003388567370000011
wherein, w and h are the width and height of the probability map respectively;
identifying a prediction probability map p at the source domain and target domain pixel levelss pixelAnd pt pixelAs input, to a pixel level domain classifier DpOutput the domain classification prediction probability hs pixelAnd ht pixelAdapting a loss function L to a pixel-level domainp adaptCalculating a pixel level domain adaptation loss value, hs pixelAnd ht pixelRespectively classifying and predicting probabilities for pixel level fields of a source field and a target field; the pixel-level domain fit loss function is defined as:
Figure FDA0003388567370000021
wherein L isDpIs a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;
s2, identifying the prediction probability map p by the local level of the source fields localAnd source domain real label graph ysAs input, the loss function L is identified by the local levellocalOutputting a local level identification loss value, wherein a local level identification loss function is defined as:
Figure FDA0003388567370000022
wherein x ═ { x ═ xiI 1,2, …, N and yiI 1,2, …, N, respectively, in a local block manner in the probability prediction graph ps localAnd source domain real label graph ysThe local predicted value obtained above, N represents the number of local blocks, mux、σxRespectively, the mean value and the standard deviation of the local predicted value x; mu.sy、σyRespectively, the mean value and the standard deviation of the local predicted value y; sigmaxyIs the covariance of the local predictor, C1And C2Is radix Ginseng;
identifying prediction probability map p by local level of source field and target fields localAnd pt localAs input, to a local level domain classifier DlOutput the domain classification prediction probability hs localAnd ht localLoss of adaptation to local domainFunction Ll adaptCalculating the local level field adaptation loss value, hs localAnd ht localRespectively predicting probability for local domain classification of a source domain and a target domain; the local level domain adaptation loss function is defined as:
Figure FDA0003388567370000023
wherein L isDlIs a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;
s3 identifying prediction probability map p at source domain image levels globalAnd source domain real label graph ysAs input, the loss function L is identified by the image levelglobalOutputting an image-level identification penalty value, the image-level identification penalty function being defined as follows:
Figure FDA0003388567370000024
wherein, w and h are the width and height of the probability map respectively;
identifying a prediction probability map p at the source and target domain image levels globalAnd pt globalAs input, an image-level domain classifier D is inputgOutput the domain classification prediction probability hs globalAnd ht globalAdapting a loss function L to an image-level domaing adaptCalculating an image-level domain adaptation loss value, hs globalAnd ht globalRespectively classifying and predicting probabilities for image-level fields of a source field and a target field; the image-level domain fit loss function is defined as:
Figure FDA0003388567370000031
wherein L isDgIs a cross entropy loss functionThe label 1 is a source domain label, and the label 0 is a target domain label.
3. The method for training the cross-scene recognition model according to claim 2, wherein the iterative training of the cross-scene recognition model is performed by combining the prediction and the loss value of the cross-scene recognition and the domain adaptation at the pixel level, the local level and the image level at the region level, which means that the prediction probability map p is recognized and predicted at the pixel level, the local level and the image level of the source domains pixel、ps localAnd ps globalAs input, a loss function L is trained over a hard-to-partition regionregionObtaining a training loss value of the hard partition area; the overall training loss of the region-level cross-scene recognition model is defined as follows:
Lseg=Lpixel+Llocal+Lglobal
Figure FDA0003388567370000032
Figure FDA0003388567370000033
wherein L issegTraining loss for cross-scene recognition model recognition function, LadaptTraining loss for adaptation across scene domains, LR totalFor the overall training loss, L, of the region-level cross-scene recognition modelregionTraining a loss function for difficult regions; hard partition regional training loss function LregionIs defined as:
Figure FDA0003388567370000034
wherein p iss=1/3ps pixel+3ps local+1/3ps globalRho is a super parameter of the probability condition of the hard partition area, and gamma is a super parameter; regional levelTraining and outputting a trained region horizontal cross-scene recognition model;
performing iterative training of a cross-scene recognition model on prediction and loss values of pixel level, local level and image level of sample level combined cross-scene recognition and field adaptation, and recognizing a prediction probability map p by using image levels of a source field and a target fields globalAnd pt globalCalculating a prediction confidence weight Z for each sample input to the sample level cross-scene recognition model, wherein the overall training loss of the sample level cross-scene recognition model is defined as follows:
Lseg=Lpixel+Llocal+Lglobal
Figure FDA0003388567370000035
Figure FDA0003388567370000036
wherein L isI totalFor the overall training loss of the sample level across the scene recognition model, the computation function of Z is defined as:
Figure FDA0003388567370000037
pglobalspecifically, the value p is taken according to the field of the samples globalAnd pt globalIf the training sample is from the source domain, then pglobal=ps globalIf the training sample is from the target domain, then pglobal=pt global(ii) a c is radix Ginseng; and the sample horizontal training outputs a trained sample horizontal cross-scene recognition model.
4. The cross-scene road recognition method based on the cross-scene recognition model training method of claim 3, characterized by comprising the following steps:
a1, receiving road image data X to be identified in the target field;
a2, respectively inputting the road image data X to be recognized in the target field into the trained region horizontal cross-scene recognition model and the sample horizontal cross-scene recognition model obtained by the cross-scene recognition model training method, and respectively outputting a region horizontal prediction graph PR maskAnd sample level prediction map PI mask
A3 prediction map P based on regional levelR maskObtaining the number N of the horizontal connected areas of the areasRAnd the area K corresponding to each region horizontal communication regionR(ii) a Prediction of the map P from the sample levelI maskObtaining the number N of the horizontal connected regions of the sampleIAnd the area K corresponding to each sample horizontal connected regionI
A4, connecting the number of regions N according to the region level and the sample levelRAnd NIAnd the area K corresponding to the region horizontal and sample horizontal connected regionRAnd KIAnd outputting a road identification result of the target field road image data I to be identified according to the judgment strategy.
5. The method for identifying road across scenes as claimed in claim 4, wherein in step A3, a connected region analysis algorithm is adopted to obtain the number N of horizontal connected regions of the regionRThe area K corresponding to each region horizontal communication regionRNumber N of horizontally connected regions of the sampleIAnd the area K corresponding to each sample horizontal connected regionI
6. The method for identifying a road across scenes as claimed in claim 4, wherein in step A4, the decision strategy is specifically as follows:
j1 strategy, number N of horizontally connected regions if regionsR1 and a zone area K of horizontal connectivityR>Area K corresponding to horizontal connected region of sampleI,KminThe minimum safe passing area of the road image data X to be identified isOutputting the region level prediction map PR maskThe method comprises the steps that a road image data X in the target field is identified, and if not, a J2 strategy is triggered;
j2 strategy, number of connected regions if sample level NI1 and sample horizontal connected region area KISatisfy KI>Area K of the zone horizontally connectedRThen, a sample level prediction map P is outputS maskAs a result of the target field to-be-identified road image data X road identification, otherwise, considering that the J1 strategy and the J2 strategy are not satisfied at the same time, and taking the warning signal E as a result of the target field to-be-identified road image data X road identification;
wherein Kmin=KrBeta, beta is the conversion ratio of the X resolution of the road image data to be identified to the physical length, KrFor the smallest safe passing area in a real road, Kr=Lr*Hr,HrRoad width, L, required for ground trafficrFor forward safety braking distance, LrV is the current rate per second and t is the total time of safety braking.
7. The cross-scene recognition model training device based on the cross-scene recognition model training method according to any one of claims 1 to 3, comprising:
a training data input processing unit for inputting the source region image XsAnd pixel level real label graph ysAnd label-free object domain image X of different scenestInputting data to a multi-scale feature extraction unit;
the multi-scale feature extraction unit is used for extracting features of multiple scales from input training data and inputting the features into the high-resolution aggregation feature extraction unit;
the high-resolution aggregation feature extraction unit aggregates input features of multiple scales into high-resolution aggregation features; the high-resolution aggregation feature extraction unit comprises a multi-scale feature aggregation module and a cavity convolution calculation module, wherein the multi-scale feature aggregation module is used for carrying out transformation aggregation on a plurality of scale features, and the cavity convolution calculation module is used for improving the resolution of aggregation features;
the multi-stage cross-scene recognition and field classification prediction and loss calculation unit takes the high-resolution aggregation characteristics as input, respectively outputs recognition probability graphs of corresponding levels at a pixel level, a local level and an image level through forward conduction, and further respectively calculates the pixel level, the local level and the image level recognition loss and the field adaptation loss for updating cross-scene recognition model parameters by taking the recognition probability graphs of the pixel level, the local level and the image level, a source field sample real label graph and a field label as input;
the region horizontal cross-scene recognition model joint training unit is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the region horizontal joint pixel level, local level and image level training units and outputting a region horizontal cross-scene recognition model;
the sample horizontal cross-scene recognition model joint training unit is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the sample horizontal joint pixel level, local level and image level training units and outputting a sample horizontal cross-scene recognition model;
the model storage unit is used for storing the region horizontal cross-scene recognition model output by the region horizontal cross-scene recognition model joint training unit and storing the sample horizontal cross-scene recognition model output by the sample horizontal cross-scene recognition model joint training unit;
the multi-scale feature aggregation module comprises a feature scale transformation processing submodule, a dynamic parameter storage submodule, a convolution calculation submodule and a feature aggregation processing submodule;
the characteristic scale transformation processing submodule is used for carrying out scale scaling transformation on the characteristics of a plurality of scales; the dynamic parameter storage submodule is used for storing neuron parameters for performing convolution operation on different scale features; the convolution calculation submodule performs characteristic calculation by taking different scale characteristics as input and reads and writes neuron parameters in the dynamic parameter storage submodule; the feature aggregation processing submodule is used for completing aggregation operation of a plurality of features.
8. The cross-scene road recognition device based on the cross-scene road recognition method of any one of claims 4 to 6, characterized by comprising:
the cross-scene image receiving unit to be identified is used for receiving road image data X to be identified in the target field;
the cross-scene road image recognition unit is used for inputting the road image data X to be recognized in the target field into the region level and sample level cross-scene recognition models and respectively outputting a region level road recognition result and a sample level road recognition result;
the connected region calculation unit is used for calculating the connected regions of the region horizontal and sample horizontal path identification results and outputting the number of the region horizontal connected regions, the area of the corresponding region horizontal connected regions, the number of the sample horizontal connected regions and the area of the corresponding sample horizontal connected regions;
the result judging unit is used for judging whether the number of the connected regions of the region level and the sample level and the area of the connected regions of the region level and the sample level meet a judgment strategy or not and outputting an identification result according to the judgment strategy;
and a recognition result storage unit for storing the recognition result output from the result determination unit.
CN202111106779.1A 2021-09-22 2021-09-22 Cross-scene recognition model training method, cross-scene road recognition method and device Active CN113554013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111106779.1A CN113554013B (en) 2021-09-22 2021-09-22 Cross-scene recognition model training method, cross-scene road recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111106779.1A CN113554013B (en) 2021-09-22 2021-09-22 Cross-scene recognition model training method, cross-scene road recognition method and device

Publications (2)

Publication Number Publication Date
CN113554013A CN113554013A (en) 2021-10-26
CN113554013B true CN113554013B (en) 2022-03-29

Family

ID=78106478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111106779.1A Active CN113554013B (en) 2021-09-22 2021-09-22 Cross-scene recognition model training method, cross-scene road recognition method and device

Country Status (1)

Country Link
CN (1) CN113554013B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109645980A (en) * 2018-11-14 2019-04-19 天津大学 A kind of rhythm abnormality classification method based on depth migration study
CN111382602A (en) * 2018-12-28 2020-07-07 深圳光启空间技术有限公司 Cross-domain face recognition algorithm, storage medium and processor
US20210049757A1 (en) * 2019-08-14 2021-02-18 Nvidia Corporation Neural network for image registration and image segmentation trained using a registration simulator
US11367268B2 (en) * 2019-08-27 2022-06-21 Nvidia Corporation Cross-domain image processing for object re-identification
US20210150281A1 (en) * 2019-11-14 2021-05-20 Nec Laboratories America, Inc. Domain adaptation for semantic segmentation via exploiting weak labels
CN111860670B (en) * 2020-07-28 2022-05-17 平安科技(深圳)有限公司 Domain adaptive model training method, image detection method, device, equipment and medium
CN112766089B (en) * 2021-01-04 2022-05-13 武汉大学 Cross-domain road extraction method based on global-local confrontation learning framework
CN113158943A (en) * 2021-04-29 2021-07-23 杭州电子科技大学 Cross-domain infrared target detection method

Also Published As

Publication number Publication date
CN113554013A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
CN109117987B (en) Personalized traffic accident risk prediction recommendation method based on deep learning
CN107564025B (en) Electric power equipment infrared image semantic segmentation method based on deep neural network
CN104077613B (en) Crowd density estimation method based on cascaded multilevel convolution neural network
CN110046550B (en) Pedestrian attribute identification system and method based on multilayer feature learning
CN113421269A (en) Real-time semantic segmentation method based on double-branch deep convolutional neural network
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN109753959B (en) Road traffic sign detection method based on self-adaptive multi-scale feature fusion
CN112084890B (en) Method for identifying traffic signal sign in multiple scales based on GMM and CQFL
WO2023207742A1 (en) Method and system for detecting anomalous traffic behavior
CN112396587B (en) Method for detecting congestion degree in bus compartment based on collaborative training and density map
CN109543672B (en) Object detection method based on dense feature pyramid network
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN112115871B (en) High-low frequency interweaving edge characteristic enhancement method suitable for pedestrian target detection
CN111860823A (en) Neural network training method, neural network training device, neural network image processing method, neural network image processing device, neural network image processing equipment and storage medium
CN114092917A (en) MR-SSD-based shielded traffic sign detection method and system
CN113269224A (en) Scene image classification method, system and storage medium
CN114005085A (en) Dense crowd distribution detection and counting method in video
CN115143950A (en) Intelligent automobile local semantic grid map generation method
CN115761735A (en) Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction
Li et al. SOSMaskFuse: An infrared and visible image fusion architecture based on salient object segmentation mask
CN112990371B (en) Unsupervised night image classification method based on feature amplification
CN113554013B (en) Cross-scene recognition model training method, cross-scene road recognition method and device
CN111160282B (en) Traffic light detection method based on binary Yolov3 network
CN113887536B (en) Multi-stage efficient crowd density estimation method based on high-level semantic guidance
CN113205078B (en) Crowd counting method based on multi-branch progressive attention-strengthening

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant