CN113554013B

CN113554013B - Cross-scene recognition model training method, cross-scene road recognition method and device

Info

Publication number: CN113554013B
Application number: CN202111106779.1A
Authority: CN
Inventors: 周智恒; 张鹏宇; 郭勇帆; 沈世福; 王怡凡; 李波
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2022-03-29
Anticipated expiration: 2041-09-22
Also published as: CN113554013A

Abstract

The invention discloses a cross-scene recognition model training method, a cross-scene road recognition method and a cross-scene road recognition device. The invention uses source domain imagesX _sAnd source domain real label graphy _sAnd cross-scene label-free target domain imagesX _tAs training data, adopting forward conduction and chain reverse gradient conduction updating methods, and respectively calculating and outputting a prediction of cross-scene recognition, a recognition loss value, a prediction of field adaptation and a field adaptation loss value at a pixel level, a local level and an image level; and respectively performing iterative training of the cross-scene recognition model on prediction and loss values of pixel level, local level and image level of the cross-scene recognition and the field adaptation at the region level and the sample level, and finally obtaining the trained region level and sample level cross-scene recognition model. The cross-scene road identification method constructed by the invention outputs an accurate and safe identification result based on the judgment strategy, and can provide effective data support for an intelligent driving system in a new scene.

Description

Cross-scene recognition model training method, cross-scene road recognition method and device

Technical Field

The invention relates to the technical fields of artificial intelligence, information processing technology and automatic driving, in particular to a cross-scene recognition model training method, a cross-scene road recognition method and a cross-scene recognition model training device.

Background

In recent years, artificial intelligence techniques based on image recognition have played an increasing role in the field of automated driving. The road recognition method based on the deep neural network performs supervised training on the urban street image data with a large number of labels, provides a high-performance road recognition model for automatic driving, and greatly promotes the technical development of automatic driving.

In a specific road identification algorithm based on a deep neural network, a large number of pictures of an urban road or an expressway with label information are often used as samples, a convolutional neural network is input to perform feature extraction on the samples, the probability that each pixel point in an image is a road area is obtained, and the probability is used as a judgment standard of the road identification area according to the output probability. For example, the invention CN 107808140B, which is based on the invention of monocular vision road recognition algorithm based on image fusion, discloses a deep neural network model obtained by supervised training based on label samples to complete road recognition. The method needs a large amount of labeled image data, is high in cost and low in efficiency, the training data set and the verification data set used by the trained model need to be subjected to independent same-distribution setting, and when actual input does not meet the independent same-distribution setting, namely the road identification performance under a new scene is often poor. The new segmentation model connection method proposed in the invention "segmentation model training method _ road segmentation method _ vehicle control method and apparatus" of patent No. CN106558058 is to use an unsupervised "free area segmentation method" to perform free area segmentation on a training sample, and the obtained segmentation image is used as label information of a training image to complete the training of the model, i.e. training based on a pseudo label. The method 'free region segmentation method' provided in the aspect cannot guarantee the accuracy of the output segmentation image as a pseudo label, and further can cause that the segmentation model thereof cannot complete the expected training.

In addition, the intelligent driving system should have an effective recognition capability for roads in various scenes, such as city block roads with complex scenes, expressways, mountain roads in underdeveloped regions, etc., but it is impossible to collect label samples under various conditions, and the recognition system trained on a certain specific scene road (such as a city block road) cannot be directly applied to a new scene (such as a mountain road).

Disclosure of Invention

In order to overcome the technical problems, the invention provides a cross-scene recognition model training method, a cross-scene road recognition method and a cross-scene recognition road recognition device.

The purpose of the invention is realized by at least one of the following technical solutions.

Cross-scene recognition model training method based on source domain imagesX _sAnd source domain real label graphy _sAnd cross-scene label-free target domain imagesX _tAs training data, adopting forward conduction and chain reverse gradient conduction updating methods, and respectively calculating and outputting a prediction of cross-scene recognition, a recognition loss value, a prediction of field adaptation and a field adaptation loss value at a pixel level, a local level and an image level; and respectively performing iterative training of the cross-scene recognition model on prediction and loss values of pixel level, local level and image level of the cross-scene recognition and the field adaptation at the region level and the sample level, and finally obtaining the trained region level and sample level cross-scene recognition model.

Further, the cross-scene recognition model includes a multi-scale feature extractorGHigh resolution aggregate feature extractorMPixel level identifierF _pPixel level domain classifierD _pLocal level recognizerF _lLocal area domain classifierD _lImage level recognizerF _gAnd an image-level domain classifierD _g，pTo representpixelIn the short-hand form of (1),lto representlocalIn the short-hand form of (1),gto representglobalThe abbreviation of (1);

the cross-scene recognition model is used for extracting high-resolution aggregation features from the multi-scale features of the input image, recognizing and predicting the high-resolution aggregation features at a pixel level, a local level and an image level, and calculating a recognition loss value, a domain classification prediction and a domain adaptation loss value.

Further, a multi-scale feature extractorGIncluding deep convolutional neural network for source domain samplesX _sAnd target area samplesX _tExtracting nSource domain and target domain multi-scale featuresf _n ^sAndf _n ^twherein s represents a source domain, t represents a target domain, n represents the number of multi-scale features, and the multi-scale features of the source domain and the target domainf _n ^sAndf _n ^tthe amount and the size of the channel are the same;

high resolution aggregate feature extractorMThe method comprises a multi-scale feature aggregation layer and a cavity convolution neural network, wherein the multi-scale features of n source fields and n target fields are respectively combinedf _n ^sAndf _n ^tseparately aggregated into high resolution source domain featuresO _sAnd target area aggregation featuresO _t；

The multi-scale feature aggregation layer comprises dynamic parameter variables and a convolutional neural network, wherein the dynamic parameter variables are used for storing convolutional neural parameters for carrying out convolution operation on different scale features, and the convolutional neural network takes the multi-scale features as input to carry out feature calculation and carry out parameter reading and writing on the dynamic parameter variables;

pixel level recognizerF _pLocal level recognizerF _lAnd image level recognizerF _gEach comprises a full convolution neural network and an activation function; aggregating source and target domainsO _sAndO _tequal input pixel level recognizerF _pLocal level recognizerF _lAnd image level recognizerF _gPixel level recognizerF _pOutput source domain and target domain pixel level identification prediction probability mapp ^s _pixelAndp ^t _pixellocal level recognizer

Output source field and target field local level identification prediction probability chartp ^s _localAndp ^t _localimage level recognizerF _gOutput source domain and target domain image level identification prediction probability mapp ^s _globalAndp ^t _global。

further, performing recognition prediction and calculating a recognition loss value, a domain classification prediction and a domain adaptation loss value on the high-resolution aggregation features at a pixel level, a local level and an image level, specifically comprising:

s1 identifying prediction probability map at source domain pixel levelp ^s _pixelAnd source domain real label graphy _sAs input, loss functions are identified by pixel levelL _pixelOutputting a pixel-level identification penalty value, the pixel-level identification function defined as:

；

wherein, w and h are the width and height of the probability map respectively; identifying prediction probability maps at the source domain and target domain pixel levelsp ^s _pixelAndp ^t _pixelas input, to a pixel-level domain classifierD _pOutput domain classification prediction probabilityh ^s _pixelAndh ^t _pixeladapting loss function to pixel level domainL ^p _adaptA pixel-level domain fit loss value is calculated,h ^s _pixelandh ^t _pixelrespectively classifying and predicting probabilities for pixel level fields of a source field and a target field; the pixel-level domain fit loss function is defined as:

wherein the content of the first and second substances,L _Dpis a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;

s2, identifying and predicting probability chart by local level in source fieldp ^s _localAnd source domain real label graphy _sAs input, loss functions are identified by local levelL _localOutputting a local level identification loss value, wherein a local level identification loss function is defined as:

；

wherein the content of the first and second substances,x={x _i :i=1,2,…,N}andy={y _i :i=1,2,…,N}respectively in a local block manner on the probability prediction graphp ^s _localAnd source domain real label graphy _sThe local predicted value obtained above is used as the local predicted value,Nthe number of the partial blocks is represented,μ _x，μ _yare respectively local predicted valuesx, yThe mean value and the standard deviation of the measured values,σ _xyis the covariance of the local predicted values,C ₁andC ₂is radix Ginseng;

identifying prediction probability map by local level of source field and target fieldp ^s _localAndp ^t _localas input, to a local level domain classifierD _lOutput domain classification prediction probabilityh ^s _localAndh ^t _localadapting loss function to local level domainL ^l _adaptCalculating the local level domain adaptation loss value,h ^s _localandh ^t _localrespectively predicting probability for local domain classification of a source domain and a target domain; the local level domain adaptation loss function is defined as:

wherein the content of the first and second substances,L _Dlis a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;

s3, identifying the prediction probability map at the source domain image levelp ^s _globalAnd source domain real label graphy _sAs input, loss functions are identified by image levelL _globalOutputting an image-level identification penalty value, the image-level identification penalty function being defined as follows:

；

wherein, w and h are the width and height of the probability map respectively;

identifying prediction probability maps at the source and target domain image levelp ^s _globalAndp ^t _globalas input, an input image level domain classifierD _gOutput domain classification prediction probabilityh ^s _globalAndh ^t _globaladapting a loss function to an image-level domainL ^g _adaptCalculating an image-level domain fit loss value,h ^s _global and h ^t _globalRespectively classifying and predicting probabilities for image-level fields of a source field and a target field; the image-level domain fit loss function is defined as:

wherein the content of the first and second substances,L _Dgis the cross entropy loss function, with 1 being the source domain label and 0 being the target domain label.

Further, pixel-level, local-level and image-level prediction and loss values that jointly cross scene recognition and domain adaptation are performed at the region level acrossIterative training of scene recognition model, which refers to recognizing prediction probability graph by source domain pixel level, local level and image levelp ^s _pixel、p ^s _localAndp ^s _globalas input, a loss function is trained over a hard-to-partition regionL _regionObtaining a training loss value of the hard partition area; the overall training loss of the region-level cross-scene recognition model is defined as follows:

wherein the content of the first and second substances,L _segfor training loss of the cross scene recognition model recognition function,L _adaptfor the training loss of the cross-scene-domain adaptation,L ^R _totalfor the global training loss of the region-level cross-scene recognition model, here

In the shorthand for the region that is,L _regiontraining a loss function for difficult regions; hard partition regional training loss functionL _regionIs defined as:

wherein the content of the first and second substances,p _s=1/3p ^s _pixel+3p ^s _local +1/3p ^s _global，

the probability condition of the difficult-to-partition area is super-parameter, and gamma is super-parameter; the region horizontal training can enable a training process to pay attention to a pixel region which is difficult and easy to identify errors, and a region horizontal cross-scene identification model which is trained is output;

performing iterative training of a cross-scene recognition model on prediction and loss values of pixel level, local level and image level of sample level combined cross-scene recognition and field adaptation, and recognizing a prediction probability map by using image levels of a source field and a target fieldp ^s _globalAndp ^t _globalcalculating a prediction confidence weight Z for each sample input to the sample level cross-scene recognition model, wherein the overall training loss of the sample level cross-scene recognition model is defined as follows:

wherein the content of the first and second substances,L ^I _totalfor the overall training loss of the sample-level cross-scene recognition model, hereIFor short for Instance, the calculation function of Z is defined as:

Z

p _globalspecifically, the value is taken according to the field to which the sample belongsp ^s _globalAndp ^t _globalif the training sample is from the source domain, thenp _global =p ^s _global ，If the training sample is from the target domain, thenp _global =p ^t _global(ii) a c is radix Ginseng; the sample level training can enable the training process to pay attention to image samples which are difficult to recognize and errors are easy to recognize, and a trained sample level cross-scene recognition model is output.

The cross-scene road identification method comprises the following steps:

a1, receiving road image data X to be identified in the target field;

a2, respectively inputting the road image data X to be recognized in the target field into the trained region horizontal cross-scene recognition model and the sample horizontal cross-scene recognition model obtained by the cross-scene recognition model training method, and respectively outputting region horizontal prediction imagesP ^R _maskAnd sample level prediction mapP ^I _mask；

A3, predicting map according to regional levelP ^R _maskObtaining the number of horizontally connected areasN _RArea corresponding to each region horizontal connected regionK _R(ii) a Prediction of pictures from sample levelP ^I _maskObtaining the number of horizontally connected regions of the sampleN _IArea corresponding to each sample horizontal connected regionK _I；

A4, connecting the number of areas according to area level and sample levelN _RAndN _Iand the area corresponding to the region horizontal and sample horizontal connected regionK _RAndK _Iand outputting a road identification result of the target field road image data I to be identified according to the judgment strategy.

Further, in step a3, a connected region analysis algorithm is used to obtain the number of horizontal connected regions in the regionN _RThe area corresponding to each region horizontally connected with the regionK _RNumber of horizontally connected regions of the sampleN _IArea corresponding to each sample horizontal connected regionK _I。

Further, in step a4, the determination strategy is specifically as follows:

j1 strategy, number of connected zones if zone levelN _R =1 and the area of the region is horizontally communicatedK _R >Area corresponding to horizontal connected region of sampleK _I，K _minOutputting a region level prediction graph for the minimum safe passing region area of the road image data X to be identifiedP ^R _maskThe method comprises the steps that a road image data X in the target field is identified, and if not, a J2 strategy is triggered;

j2 strategy, number of connected regions if sample levelN _I= 1 and area of sample horizontal connected regionK _ISatisfy the requirement ofK _I >Area of horizontally connected areaK _RThen outputting the sample level prediction graphP ^S _maskAnd as a result of the target region to-be-recognized road image data X road recognition, otherwise, considering that the J1 policy and the J2 policy are not satisfied at the same time, and taking the warning signal E as a result of the target region to-be-recognized road image data X road recognition.

Further, the air conditioner is provided with a fan,K _min= K _r *β，βfor the conversion ratio of the resolution of the road image data X to be identified to the physical length,K _rthe minimum safe passing area in the real road is

Is composed ofrealIn the short-hand writing of the Chinese characters,K _r = L _r *H _r, H _rthe width of the road required for ground vehicle traffic, such as the distance between the left and right wheels,L _rin order to safely stop the distance forwards,L _r = v * t，vfor the current rate per second of the time,tfor safe braking total time.

The training device for the cross-scene recognition model is characterized by comprising:

a training data input processing unit for inputting the source region imageX _sAnd pixel level real label mapy _sAnd label-free target domain images of different scenesX _tInputting data to a multi-scale feature extraction unit;

the multi-scale feature extraction unit is used for extracting features of multiple scales from input training data and inputting the features into the high-resolution aggregation feature extraction unit;

the high-resolution aggregation feature extraction unit aggregates input features of multiple scales into high-resolution aggregation features; the high-resolution aggregation feature extraction unit comprises a multi-scale feature aggregation module and a cavity convolution calculation module, wherein the multi-scale feature aggregation module is used for carrying out transformation aggregation on a plurality of scale features, and the cavity convolution calculation module is used for improving the resolution of aggregation features;

the multi-stage cross-scene recognition and field classification prediction and loss calculation unit takes the high-resolution aggregation characteristics as input, respectively outputs recognition probability graphs of corresponding levels at a pixel level, a local level and an image level through forward conduction, and further respectively calculates the pixel level, the local level and the image level recognition loss and the field adaptation loss for updating cross-scene recognition model parameters by taking the recognition probability graphs of the pixel level, the local level and the image level, a source field sample real label graph and a field label as input;

the region horizontal cross-scene recognition model joint training unit is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the region horizontal joint pixel level, local level and image level training units and outputting a region horizontal cross-scene recognition model;

the sample horizontal cross-scene recognition model joint training unit is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the sample horizontal joint pixel level, local level and image level training units and outputting a sample horizontal cross-scene recognition model;

and the model storage unit is used for storing the region horizontal cross-scene recognition model output by the region horizontal cross-scene recognition model joint training unit and storing the sample horizontal cross-scene recognition model output by the sample horizontal cross-scene recognition model joint training unit.

Further, the multi-scale feature aggregation module comprises a feature scale transformation processing submodule, a dynamic parameter storage submodule, a convolution calculation submodule and a feature aggregation processing submodule;

the characteristic scale transformation processing submodule is used for carrying out scale scaling transformation on the characteristics of a plurality of scales; the dynamic parameter storage submodule is used for storing neuron parameters for performing convolution operation on different scale features; the convolution calculation submodule performs characteristic calculation by taking different scale characteristics as input and reads and writes neuron parameters in the dynamic parameter storage submodule; the feature aggregation processing submodule is used for completing aggregation operation of a plurality of features.

A cross-scene road recognition device comprising:

the cross-scene image receiving unit to be identified is used for receiving road image data X to be identified in the target field;

the cross-scene road image recognition unit is used for inputting the road image data X to be recognized in the target field into the region level and sample level cross-scene recognition models and respectively outputting a region level road recognition result and a sample level road recognition result;

the connected region calculation unit is used for calculating the connected regions of the region horizontal and sample horizontal path identification results and outputting the number of the region horizontal connected regions, the area of the corresponding region horizontal connected regions, the number of the sample horizontal connected regions and the area of the corresponding sample horizontal connected regions;

the result judging unit is used for judging whether the number of the connected regions of the region level and the sample level and the area of the connected regions of the region level and the sample level meet a judgment strategy or not and outputting an identification result according to the judgment strategy;

and a recognition result storage unit for storing the recognition result output from the result determination unit.

Compared with the prior art, the invention has the advantages that:

the invention provides a cross-scene recognition model training method, on one hand, with labeled source field data and unlabeled target field data as input, a high-resolution aggregation feature can be obtained through a multi-scale feature extractor and high-resolution aggregation feature extraction, which can provide abundant feature information for recognition prediction and field adaptation, further, the prediction losses of all levels of cross-scene recognition and field adaptation are calculated and output at pixel level, local level and image level, and the prediction result of the cross-scene recognition model can be effectively improved in the iterative training, namely, the prior background recognition confidence coefficient and the edge recognition confidence coefficient; preferably, the cross-scene recognition model training is carried out by combining the prediction losses of the cross-scene recognition and the field adaptation at each level in an area level, so that a pixel area which is difficult to recognize and is easy to recognize errors can be focused in the training process, and the cross-scene recognition performance is further improved; preferably, the cross-scene recognition model training is carried out by combining the prediction losses of all levels of cross-scene recognition and field adaptation at the sample level, so that the training process can be focused on image samples which are difficult to recognize and error is easy to recognize, and the cross-scene recognition performance is further improved; on the other hand, the cross-scene road identification method constructed by the region level and sample level cross-scene identification model can provide effective data support for the intelligent driving system in a new scene by calculating the connected regions of the region level and sample level prediction identification results, outputting accurate and safe identification results based on a judgment strategy according to the number and the corresponding areas of the connected regions.

Drawings

Fig. 1 is a schematic diagram of a cross-scene recognition model according to an embodiment of the present invention.

Fig. 2 is a flowchart of training a cross-scene recognition model according to an embodiment of the present invention.

Fig. 3 is a flowchart of a cross-scene road identification method disclosed in the embodiment of the present invention.

Fig. 4 is a schematic diagram of a cross-scene recognition model training device disclosed in the embodiment of the present invention.

Fig. 5 is a schematic diagram of a multi-scale feature aggregation module disclosed in an embodiment of the present invention.

Fig. 6 is a schematic view of a cross-scene road recognition device disclosed in the embodiment of the present invention.

Detailed Description

In the following description, technical solutions are set forth in conjunction with specific figures in order to provide a thorough understanding of the present invention. This application is capable of embodiments in many different forms than those described herein and it is intended that all such modifications that would occur to one skilled in the art are deemed to be within the scope of the invention.

The terminology used in the description is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, etc. may be used herein to describe various information in one or more embodiments of the specification, these information should not be limited by these terms, which are used only for distinguishing between similar items and not necessarily for describing a sequential or chronological order of the features described in one or more embodiments of the specification. Furthermore, the terms "having," "including," and similar referents, are intended to cover a non-exclusive scope, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to the particular details set forth, but may include other inherent information not expressly listed for such steps or modules.

In the first place, a description will be given of terms used in designing one or more embodiments of the present invention.

Source field: refers to a road image data set with labels and data distribution conforming to independent same distribution, such as a road image from a city street;

target area: the data distribution is different from the data distribution of the source field without a label, such as a road image from a rural mountain land;

identifying a cross-scene road: when the road recognition based on the source field training cannot be directly applied to the road image recognition in the target field, the cross-scene iterative training needs to be carried out on the road recognition model based on the data in the source field and the data in the target field, and the obtained cross-scene road recognition model can be applied to the road recognition in the target field, so that an important basis is provided for the application of the intelligent driving system in a new scene.

Example 1:

in embodiment 1, what is adopted is a cross-scene recognition model training method provided by the present invention, and in order to make the skilled person better understand the method, firstly, the important components of the method are explained in detail:

in this example 1, training samples include collecting source domain images from city street views

And pixel level real label map

And capturing unlabeled target domain images of different scenes from remote mountainous areas

The collection scene is only explained as embodiment 1, and the invention is not strictly limited;

in this example 1, the source region image is usedX _sAnd source domain real label graphy _sAnd cross-scene label-free target domain imagesX _tAs training data, adopting forward conduction and chain reverse gradient conduction updating methods, and respectively calculating and outputting a prediction of cross-scene recognition, a recognition loss value, a prediction of field adaptation and a field adaptation loss value at a pixel level, a local level and an image level; at pixel level, local level and image level for joint cross scene recognition and domain adaptation at region level and sample level, respectivelyAnd performing iterative training of the cross-scene recognition model on the prediction and loss values to finally obtain a trained region level and sample level cross-scene recognition model.

In this embodiment 1, the cross-scene recognition model is shown in fig. 1 and includes a multi-scale feature extractorGHigh resolution aggregate feature extractorMPixel level identifierF _pPixel level domain classifierD _pLocal level recognizerF _lLocal area domain classifierD _lImage level recognizerF _gAnd an image-level domain classifierD _g，pTo representpixelIn the short-hand form of (1),lto representlocalIn the short-hand form of (1),gto representglobalThe abbreviation of (1);

Multi-scale feature extractorGIncluding a deep convolutional neural network, the multi-scale feature extractor may adopt various mainstream deep convolutional networks, the present invention is not strictly defined, and Res2Net50 is adopted in this embodiment 1, and is used for source domain samplesX _sAnd target area samplesX _tExtracting n source domain and target domain multi-scale featuresf _n ^sAndf _n ^twherein s represents a source domain, t represents a target domain, n represents the number of multi-scale features, and the multi-scale features of the source domain and the target domainf _n ^sAndf _n ^tthe amount and the size of the channel are the same;

in the present example 1, the first step,

multi-scale features are output on Block2, Block3, Block4, Block 5in Res2Net50 network, respectivelyIs prepared from (a)f ₁ ^s ，f ₁ ^s）、（f ₂ ^s ，f ₂ ^s）、（f ₃ ^s ，f ₃ ^s）、（f ₄ ^s ，f ₄ ^s) The number of the characteristic channels is respectively 256, 512, 1024 and 2048, and the scale proportion among the characteristics is 1, 1/2, 1/4 and 1/8;

in this example 1, a high resolution aggregate feature extractorMComprises two multi-scale feature aggregation layers and a cavity convolution neural network, which respectively combines n source fields and n target fields with multi-scale featuresf _n ^sAndf _n ^tseparately aggregating into high resolution source domain aggregation featuresO _sAnd target area aggregation featuresO _t；

In this embodiment 1, the multi-scale feature aggregation layer includes a feature scale transformation function, a dynamic parameter variable and a plurality of convolutional neural networks, where the dynamic parameter variable is used to store convolutional neural parameters for performing convolutional operations on features of different scales, and the convolutional neural networks perform feature calculation by using the multi-scale features as inputs and perform parameter reading and writing on the dynamic parameter variable;

in this embodiment 1, the feature scale transformation function is scaled by bilinear interpolation, and the dynamic parameter variable is scaled by the scale features of the actual source field and the target fieldf _n ^sOrf _n ^tDetermining the number of channels, wherein the number of the convolutional neural networks is determined by multi-scale characteristics of a source field and a target fieldf _n ^sOrf _n ^tThe number of the electric wires is determined,

in this embodiment 1, there are 4 convolutional neural networks, the dynamic parameter variable scale may be represented as (in, out,1, 1), where in is the sum of the number of multi-scale feature channels input to the multi-scale feature aggregation layer, and out is the number of the multi-scale feature channels input to the multi-scale feature aggregation layerAfter the convolution calculation of the degree feature aggregation layer, the sum of the numbers of the channels of the multi-scale feature channels is output, the dynamic parameter variable scale of the first multi-scale feature aggregation layer L1 in this embodiment 1 is (3840, 1920, 1, 1), the dynamic parameter variable scale of the second multi-scale feature aggregation layer L2 in this embodiment 1 is (1920, 1920, 1, 1), that is, the number of the channels of the input and output feature channels is unchanged, and the source domain multi-scale feature in this embodiment 1 is the sum of the numbers of the channels of the multi-scale feature channels output after the convolution calculation of the degree feature aggregation layerf _n ^sFeatures comprising 4 scales, i.e. n = (1, 2, 3, 4),f ₁ ^s、f ₂ ^s、f ₃ ^sandf ₄ ^srespectively representing a first scale feature, a second scale feature, a third scale feature and a fourth scale feature of the source field, and carrying out multi-scale feature on the source fieldf _n ^sThe polymerization operation of (A) is exemplified by the following:

B1、f ₁ ^scorresponding feature metrics (256, H, W) as input to the parameter with index bits (1/15in,1/15out) in the dynamic parametric variable 1 as convolution kernel Conv (1)

) Parameter, output convolved first scale dimension reduction featuref ₁ ^s _{_1}On the scale of (128, H, W); according tof ₂ ^s、f ₃ ^s、f ₄ ^sCorresponding feature size (height, width) pairsf ₁ ^sAfter the characteristic scale of (a) is subjected to 1/2, 1/4 and 1/8 times transformation by a characteristic scale transformation function, the characteristic scale is respectively used as parameters with index bits of (2/15in,2/15out), (4/15in,4/15out) and (8/15in,8/15out) in the dynamic parameter variable 1 and is used as a convolution kernel Conv (1)

) Parameter, 1/2 scaling feature outputting convolved first scale dimension reduction featuref ₁ ^s _{_2}1/4 scaling feature of first dimension reduction featuref ₁ ^s _{_3}And 1/8 scaling features of the first dimension reduction featuref ₁ ^s _{_4}，f ₁ ^s _{_2}Has a dimension of (256, 1/2H, 1/2W),f ₁ ^s _{_3}has a dimension of (512, 1/4H, 1/4W),f ₁ ^s _{_4}has a scale of (1024, 1/8H, 1/8W);

B2、f ₂ ^scorresponding feature metrics (512, 1/2H, 1/2W) as input to the parameter with index bits 2/15in,2/15out of the dynamic parameter variable 1 as convolution kernel Conv (1)

) Parameter, output convolved second scale dimension reduction featuref ₂ ^s _{_2}On the scale of (256, 1/2H, 1/2W); according tof ₁ ^s、f ₃ ^s、f ₄ ^sCorresponding feature size (height, width) pairsf ₂ ^sAfter the feature scale of (2), (1/2) and (1/4) times transform by a feature scale transform function, the feature scale of (2), (1/2) and (1/4) times transform is respectively used as parameters input to the dynamic parameter variable 1 with index bits of (1/15in,1/15out), (4/15in,4/15out) and (8/15in,8/15out) as convolution kernels Conv (1)

) Parameter, output 2-fold scale scaling feature of feature second scale dimension reduction feature after convolutionf ₂ ^s _{_1}1/2-fold scale scaling feature of second-scale dimension reduction featuref ₂ ^s _{_3}And 1/4-fold scale scaling features of the second dimension reduction featuref ₂ ^s _{_4}，f ₂ ^s _{_1}Has a dimension of (128, H, W),f ₂ ^s _{_3}has a dimension of (512, 1/4H, 1/4W),f ₂ ^s _{_4}has a scale of (1024, 1/8H, 1/8W);

B3、f ₃ ^scorresponding feature metrics (1024, 1/4H, 1/4W) as input to the parameter with index bit (4/15in,4/15out) in the dynamic parameter variable 1 as convolution kernel Conv (1)

) Parameter, output convolved third scale dimension reduction featuref ₃ ^s _{_3}On the scale of (512, 1/4H, 1/4W); according tof ₁ ^s、f ₂ ^s、f ₄ ^sCorresponding feature size (height, width) pairsf ₃ ^sAfter the feature scale of (2) is transformed by 4, 1/2 and 1/4 times through a feature scale transformation function, the transformed feature scale is respectively used as parameters with index bits of (1/15in,1/15out), (2/15in,2/15out) and (8/15in,8/15out) in a dynamic parameter variable 1 and is used as a convolution kernel Conv (1)

) Parameter, output 4 times scale scaling feature of feature third scale dimension reduction feature after convolutionf ₃ ^s _{_1}1/2 times scale scaling feature of third scale dimension reduction featuref ₃ ^s _{_2}Andf ₃ ^s _{_4}1/4-fold scale scaling feature of third-scale dimension reduction featuref ₃ ^s _{_1}Has a dimension of (128, H, W),f ₃ ^s _{_2}has a dimension of (256, 1/2H, 1/2)W），f ₃ ^s _{_4}Has a scale of (1024, 1/8H, 1/8W);

B4、f ₄ ^scorresponding feature metrics (2048, 1/8H, 1/8W) as input to the parameter with index bits 1/8H, 1/8W in the dynamic parameter variable 1 as convolution kernel Conv (1)

) Parameter, output convolved fourth scale dimensionality reduction featuref ₄ ^s _{_4}On the scale of (1024, 1/8H, 1/8W); according tof ₁ ^s、f ₂ ^s、f ₃ ^sCorresponding feature size (height, width) pairf ₄ ^sAfter 8, 4 and 2 times transformation is carried out on the characteristic scale of (1), the characteristic scale is respectively used as parameters with index bits of (1/15in,1/15out), (2/15in,2/15out), (4/15in and 4/15out) in the dynamic parameter variable 1 and is used as a convolution kernel Conv (1)

) Parameter, output 8 times scale scaling feature of feature fourth scale dimension reduction feature after convolutionf ₄ ^s _{_1}A 4-fold scale scaling feature of a fourth scale dimension reduction feature

And a 2-fold scale scaling feature of a fourth scale dimension reduction feature

，f ₄ ^s _{_1}Has a dimension of (128, H, W),

has a dimension of (256, 1/2H, 1/2W),

has a dimension of (512, 1/4H, 1/4W);

b5, performing aggregation operation and hole convolution calculation on the features with the same scale, (B)f ₁ ^s _{_1}, f ₂ ^s _{_1}, f ₃ ^s _{_1},f ₄ ^s _{_1}) The aggregate and void convolution output characteristics aref ₁ ^s’，（f ₁ ^s _{_2}, f ₂ ^s _{_2}, f ₃ ^s _{_2},f ₄ ^s _{_2}) The aggregate and void convolution output characteristics aref ₂ ^s’，（f ₁ ^s _{_3}, f ₂ ^s _{_3}, f ₃ ^s _{_3},f ₄ ^s _{_3}) The aggregate and void convolution output characteristics aref ₃ ^s’，（f ₁ ^s _{_4}, f ₂ ^s _{_4}, f ₃ ^s _{_4},f ₄ ^s _{_4}) The aggregate and void convolution output characteristics aref ₄ ^s’In this embodiment, the aggregation operation is summation;

B6、f ₂ ^s’、 f ₃ ^s’、 f ₄ ^s’according tof ₁ ^s’After the scales are respectively subjected to 2, 4 and 8 times of transformation by the characteristic scale transformation function, the transformed 4 characteristics are respectively subjected toIs input to a parameter with index bits of (1/15in,1/15out), (2/15in,2/15out), (4/15in,4/15out), (8/15in,8/15out) in the dynamic parameter variable 2 as a convolution kernel Conv (1)

) The parameters are subjected to convolution calculation and aggregation calculation, and high-resolution aggregation characteristics are outputO _s(ii) a Pixel level recognizerF _pLocal level recognizerF _lAnd image level recognizerF _gEach comprises a full convolution neural network and an activation function; aggregating source and target domainsO _sAndO _tequal input pixel level recognizerF _pLocal level recognizerF _lAnd image level recognizerF _gPixel level recognizerF _pOutput source domain and target domain pixel level identification prediction probability mapp ^s _pixelAndp ^t _pixellocal level recognizer

in this embodiment 1, a pixel level identifierF _pFrom a full convolution network Conv (1)

) And Sigmoid activation function, which is used to output pixel level identification prediction probability chart;

local level recognizerF _lFrom a full convolution network Conv (1)

) And Sigmoid activation function, which is used to output the local level recognition prediction probability chart;

image level recognizerF _gFrom a full convolution network Conv (1)

) And Sigmoid activation function, which is used to output image level identification prediction probability chart;

pixel level domain classifierD _pComprising two full convolution networks Conv (1)

And a Sigmoid activation function for outputting domain classification prediction of pixel level;

local level domain classifierD _lComprising two full convolution networks Conv (1)

And Sigmoid activation function, used for outputting the domain classification prediction of the local level;

image-level domain classifierD _gComprising two full convolution networks Conv (1)

And a Sigmoid activation function for outputting domain classification prediction at an image level.

In this embodiment 1, the cross-scene recognition model training is as shown in fig. 2, which is specifically as follows:

step 200, using source field sampleX _sAnd target area samplesX _tInput multi-scale feature extractor

The output multi-scale features are (f ₁ ^s ，f ₁ ^s）、（f ₂ ^s ，f ₂ ^s）、（f ₃ ^s ，f ₃ ^s）、（f ₄ ^s ，f ₄ ^s）；

Step 201, inputting the output features of step 200 into a high-resolution aggregation feature extractor

High resolution aggregate features of output source domain and target domainO _sAndO _t；

step 202, aggregating the source domain and target domain output in step 201O _sAndO _tas input, through a pixel level recognizerF _pPost-output source domain and target domain pixel level identification prediction probability mapp ^s _pixelAndp ^t _pixel；

step 203, aggregating the characteristics of the source domain and the target domain output in the step 201O _sAndO _tas input, via a local level recognizerF _lOutput source field and target field local level identification prediction probability chartp ^s _localAndp ^t _local；

step 204, aggregating the source domain and target domain output in step 201O _sAndO _tas input, via an image level recognizerF _gOutput source domain and target domain image level identification prediction probability mapp ^s _globalAndp ^t _global；

step 205, calculating and outputting prediction loss of each level of cross-scene recognition and field adaptation at pixel level, local level and image level, specifically as follows:

step 205-1, source domain pixel level identification prediction probability map output in step 202p ^s _pixelAs an input, the pixel level identification penalty value is output by a pixel level identification penalty function defined as follows:

；

wherein the content of the first and second substances,y _sthe method comprises the steps that a source field sample pixel level real label graph is obtained, w and h are the width and the height of a probability graph;

step 205-2, identifying prediction probability map of pixel level of source domain and target domain output in step 202p ^s _pixelAndp ^t _pixelas input, to a pixel-level domain classifierD _pThe function is to realize the domain classification prediction of the source domain and the target domain at the pixel level and output the probability of the domain classification predictionh ^s _pixelAndh ^t _pixeladapting loss function to pixel level domainL ^p _adaptA pixel-level domain fit loss value is calculated,h ^s _pixelandh ^t _pixelthe probability is predicted for the source domain and the target domain classification, respectively, and the pixel-level domain adaptation loss function is defined as follows:

wherein the content of the first and second substances,L _Dpthe method comprises the following steps of (1) obtaining a pixel-level cross entropy loss function, wherein 1 is a source domain label, and 0 is a target domain label;

step 205-3, the source domain local level identification prediction probability map outputted in step 203p ^s _localAs an input, the local level identification loss value is output by a local level identification loss function, which is defined as follows:

；

wherein the content of the first and second substances,x={x _i :i=1,2,…,N}andy={y _i :i=1,2,…,N}in the probability prediction graph in a local block mode respectivelyp ^s _localAnd the source domain real label mapy _sThe local predicted value obtained above is used as the local predicted value,Nthe number of the partial blocks is represented,μ _x，μ _yrespectively the local predicted valuesx, yThe mean value and the standard deviation of the measured values,σ _xyis the covariance of the local predicted values,C ₁andC ₂is radix Ginseng;

step 205-4, the source domain and target domain local level identification prediction probability map output in step 203p ^s _localAndp ^t _localas input, to a local level domain classifierD _lThe method has the functions of realizing the domain classification prediction of the source domain and the target domain at the local level and outputting the probability of the domain classification predictionh ^s _localAndh ^t _localadapting loss function to local level domainL ^l _adaptCalculating the local level domain adaptation loss value,h ^s _localandh ^t _localrespectively classifying and predicting probabilities for local level domains of a source domain and a target domain; the local level domain adaptation loss function is defined as follows:

wherein the content of the first and second substances,L _Dlthe local level cross entropy loss function is shown, 1 is a source domain label, and 0 is a target domain label;

step 205-5, toSource domain image level identification prediction probability map output in step 204p ^s _globalAs an input, the image-level recognition loss value is output by an image-level recognition loss function defined as follows:

；

wherein the content of the first and second substances,p ^s _globala prediction probability map is identified for the image level,y _sthe method comprises the following steps that a real label graph of a source field is obtained, and w and h are width and height of a probability graph respectively;

step 205-6, the source domain and target domain image level identification prediction probability map output in step 204p ^s _globalAndp ^t _globalas input, an input image level domain classifier

The method has the functions of realizing the domain classification prediction of the source domain and the target domain at the image level and outputting the probability of the domain classification predictionh ^s _globalAndh ^t _globaladapting a loss function to an image-level domainL ^g _adaptCalculating an image-level domain fit loss value,h ^s _global and h ^t _globalRespectively classifying and predicting probabilities for image-level fields of a source field and a target field; the graph domain fit loss function is defined as follows:

wherein the content of the first and second substances,L _Dgis a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;

step 206, outputting the source domain pixel level and the local level output in step 202, step 203 and step 204And image level recognition prediction probability mapp ^s _pixel、p ^s _localAndp ^s _globalas input, a loss function is trained over a hard-to-partition regionL _regionObtaining a loss value, and performing cross-scene recognition model training on the prediction losses of each level of cross-scene recognition and field adaptation in the region level combination step 205, wherein the overall training loss of the region level cross-scene recognition model is defined as follows:

wherein the content of the first and second substances,L _segfor training loss of the cross scene recognition model recognition function,L _adaptfor the training loss of the cross-scene-domain adaptation,L ^R _totaloverall training loss for region-level cross-scene recognition models (here

In shorthand for region),L _regionthe training loss for difficult regions is defined as follows:

the probability condition of the difficult-to-partition area is super-parameter, and gamma is super-parameter; in this exampleρ=0.5γ = 0.25, the region level training allows the training process to focus on the difficult and error-prone pixel region, and outputs the trained region level cross-scene recognition model;

step 207, identifying the image level output from step 204 and predicting the probability mapp ^s _globalAndp ^t _globalcalculating a prediction confidence weight Z of each sample input to the sample level cross-scene recognition model, performing sample level cross-scene recognition model training at the prediction loss of each level of cross-scene recognition and field adaptation in the sample level combination step 205, wherein the overall training loss of the sample level cross-scene recognition model is defined as follows:

wherein the content of the first and second substances,L ^s _totaloverall training loss for the sample-level cross-scene recognition model (hereSIn brief for sample), the computation function for Z is defined as:

Zpenalizing the corresponding loss of each sample in the training process,p _globalspecifically, values are taken according to the field of the samplep ^s _globalAndp ^t _global ，if the training sample is from the source domainp _global =p ^s _global ，If the training sample is from the target domain, thenp _global =p ^t _global(ii) a And c is super parameter, the sample level training can enable the training process to pay attention to image samples which are difficult to recognize and error to be recognized easily, and the trained sample level cross-scene recognition model is output.

In this embodiment 1, in the iterative training process of the cross-scene recognition model, the model parameters are updated by using the forward-oriented conduction and the chained backward gradient conduction updating methods, wherein the gradient updating related to the updating of the recognition model parameters is performed by using the random gradient descent method, and the domain classifier is updated by using the random gradient descent methodD _p、D _l 、D _gThe gradient updating related to the parameter updating adopts a self-adaptive moment estimation method;

preferably, the training process of the cross-scene recognition model is realized at three levels, namely, a region level and a sample level, which are respectively combined with the pixel level, a local level and an image level, the recognition model output based on the region level training in the step 206 can pay more attention to the difficult and error-prone pixel region in the cross-scene recognition task, and the recognition model output based on the sample level training in the step 207 can pay more attention to the difficult and error-prone image sample in the cross-scene recognition task;

preferably, in embodiment 1 of the present invention, the area-level and sample-level cross-scene recognition model trained at three levels, i.e., the area-level and sample-level joint pixel level, the local level, and the image level, may apply a recognition task of a new field, for example, set the recognition task as a vehicle-mounted application, and with the target field described in embodiment 1 as a new scene, the road recognition in a remote mountain area may be reliably performed by using the area-level and sample-level cross-scene recognition model; the following can be described by way of example 2.

Example 2:

embodiment 2 of the present invention provides a cross-scene road recognition method, which adopts a cross-scene recognition model of an area level and a sample level obtained by the training method provided in embodiment 1;

as shown in fig. 3, a cross-scene road identification method includes the following steps:

301, receiving road image data X to be identified in a target field;

step 302, inputting road image data X to be recognized in the target field into the region horizontal cross-scene recognition model as input, and outputting a region horizontal prediction graphP ^R _mask ；

Step 303, inputting the road image data X to be recognized in the target field into the sample horizontal cross-scene recognition model as input, and outputting a sample horizontal prediction graphP ^S _mask ；

Step 304, area level prediction graph output in step 302P ^R _maskAs input, calculating the number of single connected regions and horizontal connected regions of output regionN _RAnd the area corresponding to each horizontally connected regionK _R；

Step 305, the sample level prediction graph output in step 303P ^I _maskAs input, calculating single connected region, and outputting the number of horizontal connected regionsN _IAnd the area corresponding to each sample horizontal connected regionK _I；

Step 306, the number of connected region horizontal and sample horizontal output in steps 304 and 305N _RAndN _Iand the area corresponding to the region horizontal and sample horizontal connected regionK _RAndK _Iand outputting the road identification result of the target field to-be-identified road image data I according to the judgment strategy.

Acquiring the number of horizontal connected regions of the region by adopting a connected region analysis algorithmN _RThe area corresponding to each region horizontally connected with the regionK _RSample waterNumber of flat connected regionsN _IArea corresponding to each sample horizontal connected regionK _I。

In this embodiment 2, the connected component analysis algorithm is a Two-Pass method.

The decision strategy in this embodiment 2 is used to decide which kind of result to output, and specifically the following is:

j1 strategy, number of connected zones if zone levelN _R =1 and the area of the region is horizontally communicatedK _R >Area corresponding to horizontal connected region of sampleK _I，K _minOutputting a region level prediction graph for the minimum safe passing region area of the road image data X to be identifiedP ^R _maskThe method comprises the steps that a J2 strategy is triggered if the result of road identification of road image data I to be identified in the target field is obtained, otherwise;

j2 strategy, number of connected regions if sample levelN _I= 1 and area of sample horizontal connected regionK _ISatisfy the requirement ofK _I >Area of horizontally connected areaK _RThen outputting the sample level prediction graphP ^S _maskAnd as a result of the target region to-be-recognized road image data I road recognition, otherwise, considering that the J1 policy and the J2 policy are not satisfied at the same time, and taking the warning signal E as a result of the target region to-be-recognized road image data I road recognition.

In the foregoing

steps

302 and 303, according to the region level and the sample level cross-scene recognition model, the prediction probability maps of the image I to be recognized at the pixel level, the local level and the image level are obtained at firstp ^R _pixel、p ^R _localAndp ^R _globalandp ^S _pixel、p ^S _localandp ^S _globawhere R represents the output prediction from the region-level cross-scene recognition model and S represents the output prediction from the sample waterFor the output prediction of the horizontal span scene recognition model, in this embodiment 2, the prediction graph calculation manner in the foregoing

steps

302 and 303 is as follows:P ^R _mask=1/3p ^R _pixell+1/3p ^R _local +1/3p ^R _global，P ^S _mask=1/3p ^S _pixell+1/3p ^S _local +1/3p ^S _global；

in the foregoing step 304 and step 305, the prediction graph is subjected toP ^R _mask，P ^I _maskThe method for calculating the single connected region can be any connected region method, and the method is not strictly limited.

In the aforementioned strategy of J1,K _min= K _r *β，βfor the conversion ratio of the resolution of the road image data X to be identified to the physical length,K _rthe minimum safe passing area in the real road is

As shown in table 1, in the embodiment, the image acquisition in the tagged source field from the city block view and the image acquisition in the untagged target field from the remote mountain area are used as training data, and the accuracy of the cross-scene road identification is significantly improved by the cross-scene road identification method, which is obtained by the training method, in the cross-scene identification model at the area level and the sample level.

TABLE 1

Method	Mountain road recognition result
		Source-only domain training model	42.06%
This example	72.56%

Example 3:

this embodiment 3 discloses a cross-scene recognition model training device, as shown in fig. 4, including:

the training data is input to the processing unit 41, and the source region image is inputX _sAnd pixel level real label mapy _sAnd label-free target domain images of different scenesX _tData is input to the multiscale feature extraction unit 42;

a multi-scale feature extraction unit 42 that extracts features of a plurality of scales from input training data and inputs the extracted features to a high-resolution aggregate feature extraction unit 43;

a high-resolution aggregation feature extraction unit 43 configured to aggregate the input features of a plurality of scales into a high-resolution aggregation feature; in this embodiment 3, the high-resolution aggregated feature extraction unit includes a first multi-scale feature aggregation module 431, a cavity convolution calculation module 432, and a second multi-scale feature aggregation module 433, where the multi-scale feature aggregation module is configured to perform transform aggregation on multiple scale features, and the cavity convolution calculation module is configured to improve the aggregated feature resolution;

a multi-stage cross-scene recognition and domain classification prediction and loss calculation unit 44, which takes the high resolution aggregation feature as input, respectively outputs recognition probability maps of corresponding levels at pixel level, local level and image level by forward conduction, and further respectively calculates pixel level, local level and image level recognition loss and domain adaptation loss for updating cross-scene recognition model parameters by taking the recognition probability maps of pixel level, local level and image level, the source domain sample real label map and the domain label as input;

a region horizontal cross-scene recognition model joint training unit 45 which realizes parallel iterative training of recognition model execution recognition and cross-scene field adaptation in a region horizontal joint pixel level, local level and image level training unit and outputs a region horizontal cross-scene recognition model;

the sample horizontal cross-scene recognition model joint training unit 46 is used for realizing parallel iterative training of recognition model execution recognition and cross-scene field adaptation in the sample horizontal joint pixel level, local level and image level training units and outputting a sample horizontal cross-scene recognition model;

the model storage unit 47 stores the region-level cross-scene recognition model output from the region-level cross-scene recognition model joint training unit 45 and the sample-level cross-scene recognition model output from the sample-level cross-scene recognition model joint training unit 46.

In this embodiment 3, the first multi-scale feature aggregation module 431 and the second multi-scale feature aggregation module 433 both include a feature scale transformation processing submodule, a dynamic parameter storage submodule, a convolution calculation submodule, and a feature aggregation processing submodule;

In this embodiment 3, the multi-stage cross-scene recognition and domain classification prediction and loss calculation unit 44 specifically includes:

a pixel-level training module, which comprises a pixel-level recognition and prediction submodule, a pixel-level recognition loss function calculation submodule, a pixel-level field prediction submodule and a pixel-level field adaptive loss function calculation submodule, the pixel-level identification prediction submodule takes the high-resolution aggregation characteristic of each training sample as input and outputs a pixel-level identification prediction probability map, the pixel-level identification loss function calculation submodule takes the pixel-level identification prediction probability map and a source field sample real label map as input and calculates a pixel-level identification loss value, the pixel-level field prediction submodule takes the pixel-level identification prediction probability map as input and outputs a pixel-level field prediction probability map, and the pixel-level field adaptive loss function calculation submodule takes the pixel-level field prediction probability map and a field label as input and calculates the pixel-level field adaptive loss value;

a local level training module, which comprises a local level identification prediction sub-module, a local level identification loss function calculation sub-module, a local level field prediction sub-module, and a local level field adaptive loss function calculation sub-module, the local level identification prediction submodule takes the high-resolution aggregation characteristic of each training sample as input and outputs a local level identification prediction probability chart, the local level identification prediction probability map and the source field sample real label map are valued in a grid form and are input to the local level identification loss function calculation submodule to calculate a local level identification loss value, the local level field prediction submodule takes the local level identification prediction probability map as input and outputs the local level field prediction probability map, and the local level field adaptive loss function calculation submodule takes the local level field prediction probability map and the field label as input to calculate a local level field adaptive loss value;

an image-level training module, which comprises an image-level recognition and prediction submodule, an image-level recognition loss function calculation submodule, an image-level field prediction submodule and an image-level field adaptive loss function calculation submodule, the image-level recognition prediction sub-module takes the high-resolution aggregation characteristic of each training sample as input and outputs an image-level recognition prediction probability map, the image-level recognition prediction probability map and the source field sample real label map are used as input to the local-level recognition loss function calculation submodule to calculate an image-level recognition loss value, the image-level domain prediction sub-module takes the image-level recognition prediction probability map as an input, outputs an image-level domain prediction probability map, the image-level field adaptation loss function calculation submodule takes the image-level field prediction probability graph and the field label as input to calculate an image-level field adaptation loss value;

in this embodiment 3, the unit 45 for jointly training the horizontal cross-scene recognition model of the region specifically includes:

the difficult-to-divide area loss function calculation module is used for taking a source field pixel level, a local level and an image level recognition prediction probability graph as input and outputting a difficult-to-divide area training loss value;

the parameter updating module is used for taking the training loss value of the difficult-to-divide area, the pixel level, the local level, the image level recognition loss and the field adaptation loss as gradient values required by parameter updating in the process of jointly training and iterating the horizontal cross-scene recognition model of the input calculation area;

in this embodiment 3, the sample horizontal cross-scene recognition model joint training unit 46 specifically includes:

the hard-to-separate sample loss function calculation module is used for taking a source field pixel level, a local level and an image level recognition prediction probability graph as input and outputting a hard-to-separate sample training loss value;

and the parameter updating module is used for calculating gradient values required by parameter updating in the process of jointly training and iterating the horizontal cross-scene recognition model of the sample by taking the training loss value, the pixel level, the local level, the image level recognition loss and the field adaptation loss of the difficultly-divided sample as input.

Example 4:

the embodiment 4 discloses a cross-scene road recognition device, as shown in fig. 6, including:

the to-be-identified cross-scene image receiving unit 61 is used for receiving the target field to-be-identified road image data I;

the cross-scene road image recognition unit 62 is configured to input the target-field road image data I to be recognized to the region-level and sample-level cross-scene recognition models, and output a region-level road recognition result and a sample-level road recognition result respectively;

a connected region calculation unit 63, configured to perform connected region calculation on the region level and sample level path identification results, and output the number of region horizontal connected regions, the area of a corresponding region horizontal connected region, the number of sample horizontal connected regions, and the area of a corresponding sample horizontal connected region;

a result determination unit 64 for determining whether the number of connected regions of the region level and the sample level and the area of the connected regions of the region level and the sample level satisfy a determination policy, and outputting a recognition result according to the determination policy;

a recognition result storage unit 65 for storing the recognition result output from the result determination unit 64.

It should be noted that, for the sake of simplicity, the cross-scene recognition model training method, the cross-scene road recognition method, and the embodiments are all described as a series of steps or operation combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence, because some steps or operations may be performed in other sequences or simultaneously according to the present application.

The preferred embodiments of the present application disclosed above are intended only to aid in the understanding of the invention and the core concepts. For those skilled in the art, there may be variations in the specific application scenarios and implementation operations based on the concepts of the present invention, and the description should not be taken as a limitation of the present invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A cross-scene recognition model training method is characterized in that a source field image X is used_sAnd source domain real label graph y_sAnd cross-scene unlabeled target domain image X_tAs training data, forward conduction and chained reverse gradient conduction updating methods are adopted to calculate outputs at pixel level, local level and image level respectivelyGenerating a prediction of cross-scene recognition and a prediction of recognition loss value and domain adaptation and a domain adaptation loss value; respectively performing cross-scene recognition and field-adapted pixel level, local level and image level prediction and loss value joint at the area level and the sample level to perform cross-scene recognition model iterative training, and finally obtaining a trained area level and sample level cross-scene recognition model; the cross-scene recognition model comprises a multi-scale feature extractor G, a high-resolution aggregation feature extractor M and a pixel-level recognizer F_pPixel level domain classifier D_pLocal level identifier F_lLocal area domain classifier D_lImage level recognizer F_gAnd an image-level domain classifier D_g；

The cross-scene recognition model is used for extracting high-resolution aggregation features from the multi-scale features of the input image, performing recognition prediction on the high-resolution aggregation features at a pixel level, a local level and an image level, and calculating a recognition loss value, a domain classification prediction and a domain adaptation loss value;

the multi-scale feature extractor G comprises a deep convolutional neural network for aligning the source domain samples X_sAnd target area sample X_tExtracting n source fields and target fields of multi-scale features f_n ^sAnd f_n ^tWherein s represents a source domain, t represents a target domain, n represents the number of multi-scale features, and the multi-scale features f of the source domain and the target domain_n ^sAnd f_n ^tThe amount and the size of the channel are the same;

the high-resolution aggregation feature extractor M comprises a multi-scale feature aggregation layer and a cavity convolution neural network, and respectively integrates n multi-scale features f in the source field and the target field_n ^sAnd f_n ^tRespectively polymerized into high-resolution source domain characteristics O_sAnd target Domain polymerization characteristics O_t；

pixel level identifier F_pLocal level identifier F_lAnd an image level identifier F_gEach comprises a full convolution neural network and an activation function; aggregating Source and target domains into a feature O_sAnd O_tEqual input pixel level identifier F_pLocal level identifier F_lAnd an image level identifier F_gPixel level identifier F_pOutput source domain and target domain pixel level identification prediction probability map p^s _pixelAnd p^t _pixelLocal level identifier F_lOutput source field and target field local level identification prediction probability map p^s _localAnd p^t _localImage level recognizer F_gOutput source domain and target domain image level identification prediction probability map p^s _globalAnd p^t _global。

2. The method for training a cross-scene recognition model according to claim 1, wherein the recognizing and predicting, calculating recognition loss values, the domain classification and predicting, and calculating the domain adaptation loss values of the high-resolution aggregated features at a pixel level, a local level, and an image level specifically include:

s1 identifying prediction probability map p at source domain pixel level^s _pixelAnd source domain real label graph y_sAs input, the loss function L is identified by the pixel level_pixelOutputting a pixel-level identification penalty value, the pixel-level identification function defined as:

wherein, w and h are the width and height of the probability map respectively;

identifying a prediction probability map p at the source domain and target domain pixel levels^s _pixelAnd p^t _pixelAs input, to a pixel level domain classifier D_pOutput the domain classification prediction probability h^s _pixelAnd h^t _pixelAdapting a loss function L to a pixel-level domain^p _adaptCalculating a pixel level domain adaptation loss value, h^s _pixelAnd h^t _pixelRespectively classifying and predicting probabilities for pixel level fields of a source field and a target field; the pixel-level domain fit loss function is defined as:

wherein L is_DpIs a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;

s2, identifying the prediction probability map p by the local level of the source field^s _localAnd source domain real label graph y_sAs input, the loss function L is identified by the local level_localOutputting a local level identification loss value, wherein a local level identification loss function is defined as:

wherein x ═ { x ═ x_iI 1,2, …, N and y_iI 1,2, …, N, respectively, in a local block manner in the probability prediction graph p^s _localAnd source domain real label graph y_sThe local predicted value obtained above, N represents the number of local blocks, mu_x、σ_xRespectively, the mean value and the standard deviation of the local predicted value x; mu.s_y、σ_yRespectively, the mean value and the standard deviation of the local predicted value y; sigma_xyIs the covariance of the local predictor, C₁And C₂Is radix Ginseng;

identifying prediction probability map p by local level of source field and target field^s _localAnd p^t _localAs input, to a local level domain classifier D_lOutput the domain classification prediction probability h^s _localAnd h^t _localLoss of adaptation to local domainFunction L^l _adaptCalculating the local level field adaptation loss value, h^s _localAnd h^t _localRespectively predicting probability for local domain classification of a source domain and a target domain; the local level domain adaptation loss function is defined as:

wherein L is_DlIs a cross entropy loss function, 1 is a source domain label, and 0 is a target domain label;

s3 identifying prediction probability map p at source domain image level^s _globalAnd source domain real label graph y_sAs input, the loss function L is identified by the image level_globalOutputting an image-level identification penalty value, the image-level identification penalty function being defined as follows:

wherein, w and h are the width and height of the probability map respectively;

identifying a prediction probability map p at the source and target domain image level^s _globalAnd p^t _globalAs input, an image-level domain classifier D is input_gOutput the domain classification prediction probability h^s _globalAnd h^t _globalAdapting a loss function L to an image-level domain^g _adaptCalculating an image-level domain adaptation loss value, h^s _globalAnd h^t _globalRespectively classifying and predicting probabilities for image-level fields of a source field and a target field; the image-level domain fit loss function is defined as:

wherein L is_DgIs a cross entropy loss functionThe label 1 is a source domain label, and the label 0 is a target domain label.

3. The method for training the cross-scene recognition model according to claim 2, wherein the iterative training of the cross-scene recognition model is performed by combining the prediction and the loss value of the cross-scene recognition and the domain adaptation at the pixel level, the local level and the image level at the region level, which means that the prediction probability map p is recognized and predicted at the pixel level, the local level and the image level of the source domain^s _pixel、p^s _localAnd p^s _globalAs input, a loss function L is trained over a hard-to-partition region_regionObtaining a training loss value of the hard partition area; the overall training loss of the region-level cross-scene recognition model is defined as follows:

L_seg＝L_pixel+L_local+L_global；

wherein L is_segTraining loss for cross-scene recognition model recognition function, L_adaptTraining loss for adaptation across scene domains, L^R _totalFor the overall training loss, L, of the region-level cross-scene recognition model_regionTraining a loss function for difficult regions; hard partition regional training loss function L_regionIs defined as:

wherein p is_s＝1/3p^s _pixel+3p^s _local+1/3p^s _globalRho is a super parameter of the probability condition of the hard partition area, and gamma is a super parameter; regional levelTraining and outputting a trained region horizontal cross-scene recognition model;

performing iterative training of a cross-scene recognition model on prediction and loss values of pixel level, local level and image level of sample level combined cross-scene recognition and field adaptation, and recognizing a prediction probability map p by using image levels of a source field and a target field^s _globalAnd p^t _globalCalculating a prediction confidence weight Z for each sample input to the sample level cross-scene recognition model, wherein the overall training loss of the sample level cross-scene recognition model is defined as follows:

L_seg＝L_pixel+L_local+L_global；

wherein L is^I _totalFor the overall training loss of the sample level across the scene recognition model, the computation function of Z is defined as:

p_globalspecifically, the value p is taken according to the field of the sample^s _globalAnd p^t _globalIf the training sample is from the source domain, then p_global＝p^s _globalIf the training sample is from the target domain, then p_global＝p^t _global(ii) a c is radix Ginseng; and the sample horizontal training outputs a trained sample horizontal cross-scene recognition model.

4. The cross-scene road recognition method based on the cross-scene recognition model training method of claim 3, characterized by comprising the following steps:

a1, receiving road image data X to be identified in the target field;

a2, respectively inputting the road image data X to be recognized in the target field into the trained region horizontal cross-scene recognition model and the sample horizontal cross-scene recognition model obtained by the cross-scene recognition model training method, and respectively outputting a region horizontal prediction graph P^R _maskAnd sample level prediction map P^I _mask；

A3 prediction map P based on regional level^R _maskObtaining the number N of the horizontal connected areas of the areas_RAnd the area K corresponding to each region horizontal communication region_R(ii) a Prediction of the map P from the sample level^I _maskObtaining the number N of the horizontal connected regions of the sample_IAnd the area K corresponding to each sample horizontal connected region_I；

A4, connecting the number of regions N according to the region level and the sample level_RAnd N_IAnd the area K corresponding to the region horizontal and sample horizontal connected region_RAnd K_IAnd outputting a road identification result of the target field road image data I to be identified according to the judgment strategy.

5. The method for identifying road across scenes as claimed in claim 4, wherein in step A3, a connected region analysis algorithm is adopted to obtain the number N of horizontal connected regions of the region_RThe area K corresponding to each region horizontal communication region_RNumber N of horizontally connected regions of the sample_IAnd the area K corresponding to each sample horizontal connected region_I。

6. The method for identifying a road across scenes as claimed in claim 4, wherein in step A4, the decision strategy is specifically as follows:

j1 strategy, number N of horizontally connected regions if regions_R1 and a zone area K of horizontal connectivity_R>Area K corresponding to horizontal connected region of sample_I，K_minThe minimum safe passing area of the road image data X to be identified isOutputting the region level prediction map P^R _maskThe method comprises the steps that a road image data X in the target field is identified, and if not, a J2 strategy is triggered;

j2 strategy, number of connected regions if sample level N_I1 and sample horizontal connected region area K_ISatisfy K_I>Area K of the zone horizontally connected_RThen, a sample level prediction map P is output^S _maskAs a result of the target field to-be-identified road image data X road identification, otherwise, considering that the J1 strategy and the J2 strategy are not satisfied at the same time, and taking the warning signal E as a result of the target field to-be-identified road image data X road identification;

wherein K_min＝K_rBeta, beta is the conversion ratio of the X resolution of the road image data to be identified to the physical length, K_rFor the smallest safe passing area in a real road, K_r＝L_r*H_r,H_rRoad width, L, required for ground traffic_rFor forward safety braking distance, L_rV is the current rate per second and t is the total time of safety braking.

7. The cross-scene recognition model training device based on the cross-scene recognition model training method according to any one of claims 1 to 3, comprising:

a training data input processing unit for inputting the source region image X_sAnd pixel level real label graph y_sAnd label-free object domain image X of different scenes_tInputting data to a multi-scale feature extraction unit;

the model storage unit is used for storing the region horizontal cross-scene recognition model output by the region horizontal cross-scene recognition model joint training unit and storing the sample horizontal cross-scene recognition model output by the sample horizontal cross-scene recognition model joint training unit;

the multi-scale feature aggregation module comprises a feature scale transformation processing submodule, a dynamic parameter storage submodule, a convolution calculation submodule and a feature aggregation processing submodule;

8. The cross-scene road recognition device based on the cross-scene road recognition method of any one of claims 4 to 6, characterized by comprising: