CN117876822A

CN117876822A - Target detection migration training method applied to fish eye scene

Info

Publication number: CN117876822A
Application number: CN202410269158.2A
Authority: CN
Inventors: 胡玲静; 欧阳一村; 陈海涛; 李希; 罗富章
Original assignee: Maxvision Technology Corp
Current assignee: Maxvision Technology Corp
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-04-12
Anticipated expiration: 2044-03-11
Also published as: CN117876822B

Abstract

The application discloses a target detection migration training method applied to a fish-eye scene, which comprises the following steps: performing target detection on each source domain sample image in the source domain sample set with the label to obtain a source domain instance in each source domain sample image; rotating source domain examples of part of the source domain sample images within a set angle range to obtain a rotating example set R and a non-rotating example set NR; dividing a distortion area of each target domain sample image of a target domain sample set of the fish-eye image, carrying out target detection on each target domain sample image to obtain a target domain instance in each target domain sample image, and judging whether the target domain instance is positioned in the distortion area; and performing alignment training for aligning the characteristics of the target domain instance located in the distorted region with the characteristics of the source domain instance in the rotated instance set and for aligning the target domain instance located in the undistorted region with the characteristics of the source domain instance in the non-rotated instance set.

Description

Target detection migration training method applied to fish eye scene

Technical Field

The application relates to the technical field of digital images, in particular to a target detection migration training method applied to fish-eye scenes.

Background

With the rapid development of the deep learning technology, various target detection algorithms reach a higher level in precision and speed at present. In recent years, AI has accumulated a considerable number of open-source sets in various industries to support research in the research community and application in the industry. Most of the precious data are collected by using a common camera, and most of the universal target detection models developed by the open source set are trained based on the data of the common camera.

But in, for example, security monitoring, vehicle-mounted,Panoramic, unmanned aerial vehicle photography etc. in order to cover wider scope, adopt fish-eye camera to be better selection. The data distribution difference between the two fields of the common camera and the fisheye camera is large, so that the performance capability is not satisfactory when the target detection algorithm which is developed based on the existing open source set and is suitable for the common camera is directly applied to the fisheye camera scene. For target detection of fish-eye scenes, one of the effective methods is: collecting data of enough fish-eye cameras to retrain the model; but this approach is too costly to gather and annotate data. The problem of transferring an object detection network which protrudes under a common camera to a fisheye camera is defined as an object detection problem in the crossing field by students, and relates to a source domain image collected by the common camera with data tag information and an object domain image collected by the fisheye camera without the data tag information, so that the problem of how to avoid data collection and marking cost is solved, and how to transfer the existing object detection network model to a fisheye scene is a problem to be solved under the condition of only collecting a small amount of fisheye camera data.

Disclosure of Invention

Aiming at the prior art, the technical problem to be solved by the application is to provide the target detection migration training method applied to the fisheye scene, and the method migrates the target detection algorithm model which is suitable for the common camera to collect the source domain image to the fisheye camera to collect the image, so that the conventional target detection algorithm is migrated to the fisheye scene under the condition that the fisheye camera to collect the image does not need to be marked.

In order to solve the above technical problems, the present application provides a target detection migration training method applied to a fisheye scene, which includes:

performing target detection on each source domain sample image in the source domain sample set with the label to obtain a source domain instance in each source domain sample image;

rotating source domain examples of part of the source domain sample images within a set angle range to obtain a rotating example set R and a non-rotating example set NR;

dividing a distortion area of each target domain sample image of a target domain sample set of the fish-eye image, carrying out target detection on each target domain sample image to obtain a target domain instance in each target domain sample image, and judging whether the target domain instance is positioned in the distortion area; and

alignment training is performed to align the features of the target domain instance in the distorted region with the features of the source domain instance in the rotated instance set and to align the target domain instance in the undistorted region with the features of the source domain instance in the non-rotated instance set.

In one possible implementation, determining whether a target domain instance is located in a distorted region includes:

acquiring a detection frame of the target domain instance acquired during target detection;

calculating an overlapping area of the detection frame and the distortion area; and

when the ratio of the area of the overlapped area to the area of the detection frame exceeds a set overlapped threshold, the target domain instance is located in the distorted area.

In one possible implementation, aligning a target domain instance located in the distortion region with features of source domain instances in the rotated instance set includes:

selecting all source domain instances belonging to the same category as the target domain instance from the rotation instance set;

calculating the similarity between the target domain instance and all the source domain instances;

and selecting K source domain examples with the maximum similarity value as K positive samples of the target domain example.

In one possible implementation, the rotation instance set R records location information, category information, and rotation angle information of each source domain instance;

when all source domain instances belonging to the same category as the target domain instance are selected in the rotation instance set, the selected source domain instance target needs to satisfy: the absolute value of the difference between the absolute value of the rotation angle of the source domain instance and the absolute value of the distortion angle of the target domain instance in the distortion region does not exceed a set angle threshold.

In one possible implementation, in aligning a target domain instance located in the distortion region with the features of the source domain instance in the rotation instance set, the method further includes obtaining K negative samples of the target domain instance:

selecting all source domain examples belonging to different categories from the target domain example in the rotation example set;

calculating the similarity between the target domain instance and all the source domain instances of the different selected categories;

and selecting K source domain examples with the minimum similarity value as K negative samples of the target domain example.

In one possible implementation, the formula for calculating the similarity between two instances is:

；

wherein P represents a similarity value; t represents a target domain sample set, S represents a source domain sample set, i represents a sample set ith Zhang Yangben image, and j represents a jth instance in each sample image;features representing the jth instance in the extracted ith sample image belonging to the source domain sample set S;representing features of the j-th instance in the extracted i-th sample image belonging to the target domain sample set T.

In one possible implementation, the distortion region of each target domain sample image of the target domain sample set that divides the fisheye image is:

a first sector area is drawn in the fisheye image with the bottom left corner of the fisheye image as a center and half of the width of the fisheye image as a radius, and a second sector area is drawn in the fisheye image with the bottom right corner of the fisheye image as a center and half of the width of the fisheye image as a radius, wherein the first sector area and the second sector area are symmetrical to each other in the fisheye image.

In one possible implementation, the distortion region is further divided in the following way: the region formed by the first sector region and the second sector region is divided into a first distortion region, a second distortion region, a third distortion region and a fourth distortion region by a first straight line parallel to the Y axis and a second straight line parallel to the X axis, the first straight line being expressed as: x=1/2W and the second straight line is expressed as: y=h-1/6W; the rotation instance set corresponding rotation angle range can be further subdivided into: when the example alignment is performed, the examples located in the first distortion region, the second distortion region, the third distortion region, and the fourth distortion region are aligned with the examples of the rotation operation belonging to the first angle rotation range, the second angle rotation range, the third angle rotation range, and the fourth angle rotation range, respectively; wherein W is the width of the fisheye image, H is the height of the fisheye image, and the acquired fisheye image is ensured to be more than 1/6W.

In one possible implementation, for a source domain sample set, the ratio of the image with rotation operation to the image without rotation operation is 4:6, preparing a base material; the set angle range is：[,]，[,]The method comprises the steps of carrying out a first treatment on the surface of the The first, second, third and fourth angular rotation ranges are respectively: the first, second, third and fourth angular rotation ranges are respectively: [,]、(,]、(,]Sum [,]。

In one possible implementation, the loss function constructed in the alignment training includes:

loss function L for source domain sample set _S ；L _S = L _sup ；

Loss function L for a target domain sample set _T ；L _T = a*L _unsup + b* L _e ；

L _e =；

；

Wherein L is _sup To monitor the loss, L _sup Including a classification loss softmax loss and a regression loss soomth L1 loss; l (L) _unsup L is an unsupervised loss _unsup Including a classification loss softmax loss and a regression loss soomth L1 loss; l (L) _e Loss of alignment for an instance;representing a total number of sample images in the target domain sample set; i represents the sample set i Zhang Yangben image, j represents the j-th instance in each sample image,representing the number of detected target domain instances for the i-th image in the target domain sample set,an instance level penalty representing the jth instance in the ith picture in the target domain;features representing the jth target domain instance in the ith image in the target domain sample set, K is the total number of positive samples,representing the jth target domain instance in alignment with the jth target domain instance in the ith image in the target domain sample setThe characteristics of the positive sample instance,is the jth target domain instance in the jth image in the target domain sample setThe characteristics of the negative-positive sample example,representation ofAndthe degree of similarity between the two,representation ofAndthe degree of similarity between the two,representation ofAndthe euclidean distance between the two,representation ofAndthe euclidean distance between the two,is a super parameter.

The target detection migration training method applied to the fish eye scene has the beneficial effects that: performing target detection on each image of a source domain sample set with a label to obtain a source domain instance, and performing rotation operation on part of the source domain instance to simulate and adapt to the angle distortion condition of a fisheye image; meanwhile, each target domain sample set which is not marked by data and related to the fisheye image is detected to obtain target detection, a target domain instance is obtained, and whether each target domain instance is positioned in a distortion area of the fisheye image is judged; and training the alignment of the target domain instance positioned in the distortion region and the source domain instance after rotation operation, and training the alignment of the target domain instance of the undistorted region and the source domain instance after non-rotation operation, namely performing global partitioning instance alignment retrieval on the fisheye image, so that a rotation instance set simulating the fisheye distortion condition is aligned with the instance positioned in the distortion region in the target domain image, and the non-rotation instance set is aligned with the instance positioned in the undistorted region of the target domain image, and thus, after multiple times of training, the migration of the target detection network model originally suitable for the source domain sample set is suitable for the target domain sample set of the fisheye image. Therefore, the target detection migration training method applied to the fisheye scene migrates the target detection algorithm model which is suitable for the image collected by the common camera to the image collected by the fisheye camera, so that the conventional target detection algorithm is migrated to the fisheye scene under the condition that the image collected by the fisheye camera does not need to be marked, and the collection data and the data marking cost are avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a target detection migration training method applied to a fisheye scene according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a network configuration during target detection migration training according to an embodiment of the present application;

fig. 3 is a diagram of a result of dividing a distortion area of a fisheye image according to an embodiment of the present application;

FIG. 4 is a diagram of still another result of a distortion zone division of a fisheye image according to an embodiment of the present application;

FIG. 5 is a flowchart of steps of one embodiment of an embodiment of aligning a target domain instance located in a distortion region with features of source domain instances in a rotated instance set, in accordance with an embodiment of the present application;

FIG. 6 is a flowchart of steps of another embodiment of an embodiment of the present application for aligning features of a target domain instance located in a distortion region with source domain instances in a rotated instance set.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved by the present application more clear, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It will be understood that when an element is referred to as being "mounted" or "disposed" on another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate or are based on the orientation or positional relationship shown in the drawings, merely to facilitate description of the present application and simplify description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be configured and operated in a particular orientation, and therefore should not be construed as limiting the present application.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The method for training target detection migration applied to fish eye scenes is specifically described with reference to the accompanying drawings.

Referring to fig. 1, the target detection migration training method applied to a fisheye scene provided in the embodiment of the present application is used for migrating a target detection algorithm model adapted to an image collected by a common camera to the image collected by the fisheye camera, so as to migrate the existing target detection algorithm to the fisheye scene without labeling the image collected by the fisheye camera. Specifically, the target detection migration training method applied to the fish-eye scene comprises the following steps S100 to S400.

Step S100: and carrying out target detection on each source domain sample image in the source domain sample set with the label, and obtaining a source domain instance in each source domain sample image.

Referring also to fig. 2, in the above-described step S100, in the target detection process for the source domain sample set,an i Zhang Yangben image in the source domain sample set S is represented, j represents a j-th example in each sample image, wherein each source domain sample image is a normal image without distortion, and the source domain sample image is subjected to data labeling and has labeling information; extracting each source domain sample image in the source domain sample set S by using a backbone networkIs characterized by obtaining a characteristic diagramAnd carrying out pooling through a pooling layer, and finally classifying through a classifying layer. Wherein, the backbone network adopts a Resnet50 network, and the pooling layer and the classifying layer can adopt a pooling layer and a classifying layer of a master rcnn network. Target detection is completed after feature extraction, pooling and classification, and an i Zhang Yuanyu sample image can be obtainedPooled features of jth source domain instance withinCoordinate information of jth source domain instanceCategory of jth source domain instance. It will be appreciated that the initial targeting of the source domain sample set S may be obtained by training the source domain sample set S with tag information.

It should be noted that, in the migration training method, the model for initially performing object detection on the source domain sample set adopts an existing object detection network adapted to an undistorted common image collected by a common camera, where the object detection network is obtained by training a source domain sample set S to be labeled, and the existing object detection network may be, but is not limited to, a master rcnn network. Examples mentioned in this application can be understood as: the detected target object, such as a human, a car or an animal in a source domain sample image, is a source domain example.

Step S200: and rotating the source domain examples in the partial source domain sample image within a set angle range to obtain a rotation example set R and a non-rotation example set NR.

In order to make the characteristics of the source domain sample set with the labeling information similar to those of the fisheye image, the source domain instance in the source domain sample image is rotated to simulate the adaptation to the fisheye scene.

The method comprises the steps of counting real fish-eye data, wherein the approximate proportion of the number of human bodies with angle variation to the number of human bodies without angle variation in one fish-eye image is 4:6; then, when the data is expanded, the data ratio of the data with rotation enhancement and the data without rotation enhancement is 4:6, namely, when the source domain sample set S is subjected to rotation operation, the ratio of the image subjected to rotation operation to the image not subjected to rotation operation is 4:6, wherein the rotation operation is specifically for target detectionThe instance of the source domain sample image obtained is rotated. According to usual experience, in order to adapt to the angular distortion condition of the example target of the fish-eye image, the rotation angle is as follows,][ solution to the problem ],]Randomly selected within the range.

It is worth noting that the data of both the rotated instance set R and the non-rotated instance set NR are from the source domain sample set S. The feature information, the position information, the rotation angle information and the category information of each rotated example are stored in the rotated example set R, the category before rotation and the category after rotation are the same, the rotated example is extracted through the detection network in step S100, and since the angle information after rotation is known, the position information after rotation is obtained by combining the position information before rotation and the rotation angle. The non-rotated instance centrally stores feature information, location information, and category information for each instance that is not rotated.

Step S300: and dividing a distortion area of each target domain sample image of the target domain sample set T of the fish-eye image, carrying out target detection on each target domain sample image to obtain a target domain instance in each target domain sample image, and judging whether the target domain instance is positioned in the distortion area.

Fig. 3 is a fisheye image with distortion acquired by a fisheye camera, and human bodies in areas with four corners indicated by four arrows in fig. 3 are all inclined towards the directions of the 4 corners, as indicated by the arrows in fig. 3, and the inclination of the middle area is smaller. The upper left corner and the upper right corner are far from the camera, so that the characters are greatly reduced, the inclination is small, and the angle variation of the characters can be not considered. While the lower left and right corner regions are greatly inclined, the distortion degree of the example target is more increased as the example target is closer to the lens position, so that the example of the lower left and right corner regions with larger angle distortion degree needs to be particularly focused on during migration training.

For step S300, a distortion area is defined for the target area sample image according to the distortion characteristics of the fisheye image, so as to determine whether the detected example target is located in the distortion area. In an embodiment of the present application, the distortion area manner of each target domain sample image of the target domain sample set for dividing the fisheye image is: a first sector area is drawn in the fisheye image with the bottom left corner of the fisheye image as a center and with a half of the width of the fisheye image as a radius, and a second sector area is drawn in the fisheye image with the bottom right corner of the fisheye image as a center and with a half of the width of the fisheye image as a radius, wherein the first sector area and the second sector area are symmetrical to each other in the fisheye image, and understandably, areas other than the first sector area and the second sector area are set as non-distorted areas. In another embodiment of the present application, the definition of the distortion area may be adjusted in a practical application scene, and is not limited to the sector in the previous embodiment, but may be other shapes, and the radius length is not limited to half of the image width. In general, the height and width of the fisheye image are not too small, and the height can be more than half the width.

It is worth to be noted that, the upper left corner of the fisheye image is taken as the origin of coordinates to establish an image coordinate system, the distortion degree of the fisheye image from top to bottom is larger and larger, because each example target of the lower area of the fisheye image in the real scene is closer to the lens than each example target of the upper area; the width of the fisheye image is in terms of the length on the X-axis of the image coordinate system.

In another embodiment of the present application, the division of the distortion area in fig. 3 is further refined, where the manner of dividing the distortion area is: the region formed by the first sector region and the second sector region is divided into a first distortion region, a second distortion region, a third distortion region and a fourth distortion region by a first straight line parallel to the Y axis and a second straight line parallel to the X axis, the first straight line being expressed as: x=1/2W and the second straight line is expressed as: y=h-1/6W, where W is the width of the fish-eye image, H is the height of the fish-eye image, obtained fishThe eye pattern guarantees H > 1/6W. After being divided according to the first straight line and the second straight line which are intersected, the first distortion area and the second distortion area are symmetrical, and the third distortion area and the fourth distortion area are symmetrical. In this embodiment, the range of rotation angles corresponding to the set of rotation instances R may be further subdivided into: first angular rotation range: [,]Second angular rotation range: (,]Third angular rotation range: (,]Fourth angular rotation range: [,]And, when the subsequent instance alignment is performed, the instances located in the first distortion region, the second distortion region, the third distortion region, and the fourth distortion region are aligned with the instances of the rotation operation belonging to the first angular rotation range, the second angular rotation range, the third angular rotation range, and the fourth angular rotation range, respectively.

Further subdivision of the distortion area and further subdivision of the angle rotation range can further ensure consistency between aligned examples, and migration of the existing target detection method into a fish-eye scene is facilitated; correspondingly, the rotation instance set R is correspondingly divided into a first rotation instance set R ₁ Second set of rotation examples R ₂ Third rotation instance set R ₃ And a third rotation instance set R ₄ ， R ₁ 、R ₂ 、R ₃ And R is ₄ The angles of rotation of the examples in (a) belong to a first angular rotation range, a second angular rotation range, a third angular rotation range, and a fourth angular rotation range, respectively. Wherein fig. 4 shows a result diagram of further subdividing the distortion zones with respect to fig. 3, the zones 1, 2, 3 and 4 in fig. 4 may correspond to the aforementioned first distortion zone, second distortion zone, third distortion zone and fourth distortion zone, wherein the zone 5 may be set as a non-distortion zone.

For step S300, in an embodiment of the application, determining whether a target domain instance is located in a distortion region includes the following steps:

calculating an overlapping area of the detection frame and the distortion area;

The overlapping threshold may be set according to practical situations, for example, but not limited to, 1:2.

It should be noted that, during the migration training process, the target domain sample image related to the fisheye image is not labeled with data, so that the instance in the target domain sample image is without label information. As shown in fig. 2, at the beginning of training, the main network, the RPN network and the detection head network of the master rcnn network may be used to perform target detection on the target domain sample images to obtain all the example targets in each target domain sample image, and features of the jth target domain example of the ith target domain sample image may be obtained through target detectionCoordinate information of source domain instanceCategory of source domain instanceDue to the target domain sample mapIf the real label information is not provided, the type obtained by initial detection can be called as a pseudo label, after the follow-up multiple training, parameters of the existing target detection network model are continuously adjusted and optimized, the model can be migrated to adapt to a fish-eye scene, and the true and correct type of the pseudo label in the reality can be finally consistent. Likewise, for subsequent example feature alignment of two domains, the main network, the RPN network and the detection head network of the master rcnn network may be used to perform target detection on the source domain sample images to obtain each source domain sample imagePooled features of source domain instances withinCoordinate information of source domain instanceCategory of source domain instance. In order to ensure the accuracy of alignment training, the target domain sample image feature extraction and the source domain sample image feature extraction adopt the same network.

Step S400: alignment training is performed to align the features of the target domain instance in the distorted region with the features of the source domain instance in the rotated instance set and to align the target domain instance in the undistorted region with the features of the source domain instance in the non-rotated instance set.

For step S400, referring to fig. 5, in the first application embodiment, aligning a target domain instance located in the distortion area with the features of a source domain instance in the rotation instance set includes:

calculating the similarity between the target domain instance and all the source domain instances of the same class;

For step S400, referring to fig. 6, in a second application embodiment, aligning a target domain instance located in a distortion region with features of a source domain instance in a rotation instance set includes:

calculating a distortion angle of the target domain instance in a distortion area in the fisheye image;

selecting the source domain instance which belongs to the same category as the target domain instance and meets the angle condition from the rotation instance set; wherein, the angle condition is: the absolute value of the difference between the absolute value of the rotation angle of the selected source domain example and the absolute value of the distortion angle of the target domain example in the distortion area does not exceed a set angle threshold;

Specifically, when the target instance is a human target, the distortion angle solving method of the target domain instance in the angle condition in the distortion area in the fish-eye image is as follows: acquiring a center point A of a detection frame of the target domain instance ₁ The method comprises the steps of carrying out a first treatment on the surface of the Detecting a detection frame of the target domain instance about the head region, and obtaining a center point A of the detection frame of the head region ₂ The method comprises the steps of carrying out a first treatment on the surface of the Calculate the center point A ₁ And a center point A ₂ The set angle threshold may be set according to the application, but not limited to 5 degrees, where the K value may be set to be not limited to 5. The human body target detection and the human head region detection can both adopt the master rcnn target detection network, and along with subsequent migration training, the target detection network is more and more suitable for fish eye scenes.

In the first application embodiment, a positive sample aligned with a feature of a source domain instance of a target domain instance located in a distorted region is searched in a rotation instance set of rotation simulation fisheye image angle distortion, so that a target detection network suitable for unlabeled fisheye scene images is trained by subsequent migration. Compared with the first application embodiment, the second application embodiment further considers the specific distortion angle of the target domain instance located in the distortion area in addition to the characteristic of the distortion of the fish-eye scene, and searches for the aligned source domain instance by taking the rotation angle corresponding to the distortion angle of the target domain instance as close as possible when the two domain instances are aligned, so that the characteristics of the source domain and the target domain instance are aligned, and the target detection model is finally migrated to the fish-eye scene better. And in the two embodiments, the same category retrieval is considered in retrieving aligned positive samples, so that the accuracy of instance alignment is facilitated.

Further, in order to make similar example features between the target domain and the source domain more close and dissimilar example features more far apart when the target domain and the source domain example features are aligned and trained, so that the target detection model is better migrated into the fisheye scene, in the global partition retrieval process of the fisheye image, dissimilar negative samples are required to be retrieved in addition to similar positive samples of the target domain example.

For step S400, in the third application embodiment, in the process of aligning a target domain instance located in the distortion area with the features of the source domain instance in the rotation instance set, the method further includes obtaining K negative samples of the target domain instance, which specifically includes the following steps:

For step S400, in the fourth application embodiment, in the process of aligning a target domain instance located in the distortion area with the features of the source domain instance in the rotation instance set, the method further includes obtaining K negative samples of the target domain instance, which specifically includes the following steps:

selecting the source domain instance which belongs to different categories with the target domain instance and meets the angle condition from the rotation instance set; wherein, the angle condition is: the absolute value of the difference between the absolute value of the rotation angle of the selected source domain example and the absolute value of the distortion angle of the target domain example in the distortion area does not exceed a set angle threshold;

Regarding the last two embodiments of the retrieved negative samples, in the third application embodiment, the rotation samples for rotationally simulating the angle distortion of the fisheye image are concentrated to find the negative samples which are not close to the characteristics of the source domain examples of the target domain examples located in the distortion region, so that the positive samples aligned between the two domains are as close as possible and the negative samples between the two domains are not close as possible during the subsequent migration training, thereby improving the accuracy of obtaining the model after the migration training, and the source domain target examples and the target domain examples should belong to different categories due to the retrieval of the negative samples. Compared with the third application embodiment, the fourth application embodiment further considers the specific distortion angle of the target domain instance located in the distortion area in addition to the characteristic of the fisheye scene distortion, and searches for the aligned source domain negative sample instance by taking the rotation angle corresponding to the distortion angle close to the target domain instance as far as possible when the two domain instances are aligned, so that the negative sample instance is not affected by the angle distortion when being selected.

For step S400, in the fifth application embodiment, the alignment for aligning the features of a target domain instance located in the undistorted region with the source domain instance in the non-rotated instance set includes the following steps:

all source domain instances belonging to the same class as the target domain instance are selected from the non-rotating instance set,

For step S400, in the sixth application embodiment, the alignment method for aligning the features of a target domain instance located in the undistorted area with the features of a source domain instance in the non-rotated instance set further includes obtaining K negative samples of the target domain instance, which specifically includes the following steps:

selecting all source domain examples belonging to different categories from the target domain example in the non-rotating example set;

Further, for all the above embodiments of step S400, the formula for calculating the similarity between two instances is:

；

With further reference to fig. 2, for step S400, the loss function constructed in the alignment training includes:

loss function L for source domain sample set _S ；L _S = L _sup ；

L _e =；

。

Wherein L is _sup To monitor the loss, L _sup Including a classification loss softmax loss and a regression loss soomth L1 loss; l (L) _unsup L is an unsupervised loss _unsup Including a classification loss softmax loss and a regression loss soomth L1 loss; l (L) _e Loss of alignment for an instance;representing a total number of sample images in the target domain sample set; i represents the sample set i Zhang Yangben image, j represents the j-th instance in each sample image,representing the number of detected target field instances for the ith image in the target field sample set T,an instance level penalty representing the jth instance in the ith picture in the target domain;features representing the jth target domain instance in the ith image in the target domain sample set T, K is the total number of positive samples,representing the jth target domain instance in alignment with the jth target domain instance in the ith image in the target domain sample setThe characteristics of the positive sample instances,for the jth target domain instance in the ith image in the target domain sample setThe characteristics of the negative-positive sample instances,representation ofAndthe degree of similarity between the two,representation ofAndthe degree of similarity between the two,representation ofAndthe euclidean distance between the two,representation ofAndthe euclidean distance between the two,is a super parameter, and a and b are weighting coefficients. The foregoing classification loss softmax loss and regression loss soomth L1 loss both use a loss function used by the master rcnn network, which is not described herein. After a plurality of training, make L _S And L _T And (5) converging to obtain a detection network model of the finally used fish-eye scene.

Further, as shown in fig. 2, while performing the alignment training in step S400, in order to adapt to the source domain and the target domain after training, the migration training process further includes: and inputting the characteristics of the source domain sample image extracted by the backbone network and the characteristics of the target domain sample image extracted by the backbone network into the DANN domain countermeasure migration network. And, the domain migration loss Ly adopted in the DANN domain countermeasure migration network training process is as follows:y may take the value 0 or 1,1 representing the source domain, 0 representing the destination domain,representing features extracted by the backbone network,representative characteristic diagramThe domain arbiter processing of the DANN domain against the migration network is performed,the number of images representing the source domain,representing the number of target domain images.

In the target detection migration training method applied to the fisheye scene, performing target detection on each image of a source domain sample set with a labeling label to obtain a source domain instance, and performing rotation operation on part of the source domain instances to simulate and adapt to the angular distortion condition of the fisheye image; meanwhile, each target domain sample set which is not marked by data and related to the fisheye image is detected to obtain target detection, a target domain instance is obtained, and whether each target domain instance is positioned in a distortion area of the fisheye image is judged; and training the alignment of the target domain instance positioned in the distortion region and the source domain instance after rotation operation, and training the alignment of the target domain instance of the undistorted region and the source domain instance after non-rotation operation, namely performing global partitioning instance alignment retrieval on the fisheye image, so that a rotation instance set simulating the fisheye distortion condition is aligned with the instance positioned in the distortion region in the target domain image, and the non-rotation instance set is aligned with the instance positioned in the undistorted region of the target domain image, and thus, after multiple times of training, the migration of the target detection network model originally suitable for the source domain sample set is suitable for the target domain sample set of the fisheye image. Therefore, the target detection migration training method applied to the fisheye scene migrates the target detection algorithm model which is suitable for the image collected by the common camera to the image collected by the fisheye camera, so that the existing target detection algorithm is migrated to the fisheye scene under the condition that the image collected by the fisheye camera does not need to be marked.

In summary, in the target detection migration training method applied to the fisheye scene, the instances of the distorted region and the undistorted region in the fisheye image are subjected to partition retrieval, so that the matching quality of instance alignment can be improved; and example target alignment is more detailed than image level stacking; thereby facilitating the subsequent migration training of a more efficient detection model suitable for fish-eye scenes.

The foregoing description of the preferred embodiments of the present application is not intended to be limiting, but is intended to cover any and all modifications, equivalents, and alternatives falling within the spirit and principles of the present application.

Claims

1. The target detection migration training method applied to the fish-eye scene is characterized by comprising the following steps of:

2. The method of claim 1, wherein determining whether a target domain instance is located in a distorted region comprises:

3. The method of claim 1, wherein aligning a target domain instance located in a distorted region with features of a source domain instance in a rotated instance set comprises:

4. The method for training object detection migration applied to fisheye scene according to claim 3, wherein the rotation instance set R records position information, category information and rotation angle information of each source domain instance;

5. The method for training object detection migration applied to fisheye scene according to any one of claims 2 to 4, wherein in aligning a target domain instance located in a distorted region with a feature of a source domain instance in a rotation instance set, the method further comprises obtaining K negative samples of the target domain instance:

6. The method for training object detection migration applied to fisheye scene according to claim 5, wherein the formula for calculating the similarity between two instances is:

；

wherein P represents a similarity value; t represents a target domain sample set, S represents a source domain sample set, i represents a sample set ith Zhang Yangben image, and j represents a jth instance in each sample image;features representing the jth instance in the extracted ith sample image belonging to the source domain sample set S; />Representing features of the j-th instance in the extracted i-th sample image belonging to the target domain sample set T.

7. The method for training object detection migration applied to fisheye scene according to claim 1, wherein the distortion area mode of each object domain sample image of the object domain sample set for dividing the fisheye image is:

8. The method for training object detection migration in fish-eye scenes according to claim 7,

the mode of further dividing the distortion region is as follows: the region formed by the first sector region and the second sector region is divided into a first distortion region, a second distortion region, a third distortion region and a fourth distortion region by a first straight line parallel to the Y axis and a second straight line parallel to the X axis, the first straight line being expressed as: x=1/2W and the second straight line is expressed as: y=h-1/6W;

the rotation instance set corresponding rotation angle range can be further subdivided into: when the example alignment is performed, the examples positioned in the first distortion area, the second distortion area, the third distortion area and the fourth distortion area are aligned with the examples of the rotation operation belonging to the first angle rotation range, the second angle rotation range, the third angle rotation range and the fourth angle rotation range respectively;

wherein W is the width of the fisheye image, H is the height of the fisheye image, and the acquired fisheye image is ensured to be more than 1/6W.

9. The method for training object detection migration in fish-eye scenes according to claim 8,

for the source domain sample set, the ratio of the image with rotation operation to the image without rotation operation is 4:6, preparing a base material;

the set angle range is as follows: [,/>]，[/>,/>]；

The first, second, third and fourth angular rotation ranges are respectively: [,/>]、(/>,/>]、(/>,/>]And [ ]>,/>]。

10. The method for training object detection migration applied to fish-eye scenes of claim 1, wherein the constructed loss function in alignment training comprises:

loss function L for source domain sample set _S ；L _S = L _sup ；

L _e =；

；

Wherein L is _sup To monitor the loss, L _sup Including a classification loss softmax loss and a regression loss soomth L1 loss; l (L) _unsup L is an unsupervised loss _unsup Including a classification loss softmax loss and a regression loss soomth L1 loss; l (L) _e Loss of alignment for an instance;representing a total number of sample images in the target domain sample set; i represents the ith Zhang Yangben image of the sample set, j represents the jth instance in each sample image,/->Representing the number of detected target field instances of the ith image in the target field sample set,/->An instance level penalty representing the jth instance in the ith picture in the target domain; />Features representing the jth target domain instance in the ith image in the target domain sample set, K being the total number of positive samples, +.>Representative and target domain sample setThe jth target domain instance in the ith image is aligned +.>Characteristics of positive sample instance, +.>Is the +.f. of the j-th target domain instance in the i-th image in the target domain sample set>Characterization of negative positive sample instance, +.>Representation->And->Similarity between the two, in addition to->Representation->And->Similarity between the two, in addition to->Representation->And->Euclidean distance between->Representation->And->Euclidean distance between->Is a super parameter, and a and b are weighting coefficients.