CN115546652A - Multi-time-state target detection model and construction method, device and application thereof - Google Patents
Multi-time-state target detection model and construction method, device and application thereof Download PDFInfo
- Publication number
- CN115546652A CN115546652A CN202211504037.9A CN202211504037A CN115546652A CN 115546652 A CN115546652 A CN 115546652A CN 202211504037 A CN202211504037 A CN 202211504037A CN 115546652 A CN115546652 A CN 115546652A
- Authority
- CN
- China
- Prior art keywords
- temporal
- difference
- coding information
- picture
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 83
- 238000010276 construction Methods 0.000 title abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000002123 temporal effect Effects 0.000 claims description 133
- 230000008859 change Effects 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/176—Urban or other man-made structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The scheme provides a multi-temporal target detection model and a construction method, a device and an application thereof, and the method comprises the following steps: acquiring a first time-state picture and a second time-state picture as training samples, constructing a multi-time-state target detection model, sending the first time-state picture into a first encoder, sending the second time-state picture into a second encoder to obtain multi-level coding information, decoding the coding information by using a panoramic decoder and a difference decoder to obtain a panoramic classification result image and a difference classification result image, and obtaining a target to be detected according to the panoramic classification result image and the difference classification result image. According to the scheme, semi-supervised learning is used, and the panoramic decoder is added during training, so that the difference of different tense pictures can be better judged, and the illegal building is detected.
Description
Technical Field
The application relates to the field of computer algorithms and machine learning, in particular to a multi-temporal target detection model and a construction method, a device and application thereof.
Background
In recent years, along with the continuous enlargement of urban construction scale, the continuous perfection of functions and the gradual increase of novel communities, when urban construction is developed towards a good side, more and more illegal buildings, especially some old communities, villages and towns and factories are built in disorder for the benefit of self, the phenomenon of the construction in disorder not only affects the appearance of the city, but also can harm public safety and self safety, and therefore a method for timely discovering illegal buildings is urgently needed, and better social safety and personal safety are guaranteed.
With the continuous growth of machine learning technology, the violation buildings can be intelligently detected by using the machine learning technology, but because the samples of the violation building pictures are few, and the multiple time-phase pictures of the same place are needed for learning the violation building pictures by using the model, the detection effect of the violation buildings by using the model is not very good, and the conventional multi-time-based change detection algorithm is Binary Change Detection (BCD), namely, traditional matching and other algorithms, in the BCD, the change mapping distinguishes changed pixels and unchanged pixels by using binary labels. The BCD technology can be considered as two categories, and the defects that the variation range and the variation type cannot be determined and semantic information is lacked are overcome. The intelligent degree level of the city violation building supervision is still low nowadays, and the difference can not be satisfied by far only distinguishing.
In the prior art, a segmentation network based on deep learning is used for training a data set of a violation building in a complete supervision mode, but overfitting is easy to occur, generalization is poor, a large number of training samples are needed to be labeled, parameters of the network are huge in the classification process, although the problem of samples can be effectively solved by using a semi-supervision technology, learning performance is reduced due to too few label nodes in the semi-supervised training mode, if the violation building is classified by using a pure semantic segmentation network, although classification can be realized according to the training samples, sample differences and learning sample differences cannot be distinguished, and a large difference exists in the aspects of comprehensive change identification and understanding.
Disclosure of Invention
The scheme provides a multi-temporal target detection model and a construction method, a device and application thereof, and aims at the problem that the accuracy rate of intelligent detection by using the model is not high due to insufficient training samples at present, the scheme uses semi-supervised learning, and the structure of double encoders and double decoders is used for detecting buildings against regulations, so that the detection accuracy rate is improved.
In a first aspect, the present application provides a method for constructing a multi-temporal target detection model, including:
the method comprises the steps of obtaining at least one group of temporal pictures of at least one to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points, and marking a to-be-detected target in each group of temporal pictures to obtain a training sample;
constructing a multi-temporal target detection model, wherein the multi-temporal target detection model consists of a first encoder, a second encoder, a panoramic decoder and a difference decoder, the first temporal picture of each group of temporal pictures is sent into the first encoder to obtain multi-level first coding information with the depth from low to high, the second temporal picture is sent into the second encoder to be encoded to obtain multi-level second coding information with the depth from low to high, and the depth of the first coding information and the depth of the second coding information of each level are the same;
sending the first coding information and the second coding information of each level into the panorama decoder for decoding to obtain a final panorama decoding result;
sending the first coding information and the second coding information of each level into the difference decoder, performing splicing-convolution operation on the first coding information and the second coding information of the last level to obtain a difference decoding result with the same depth as the first coding information of the previous level, performing difference jump connection on the difference decoding result to obtain difference decoding jump information, reducing the depth of the difference decoding jump information again to obtain a new difference decoding result, traversing the difference jump connection operation to obtain a final difference decoding result, splicing the difference decoding result and the difference information with the same depth by the difference jump connection operation, inputting the final difference decoding result into the predictor to obtain a difference classification result graph;
and respectively inputting the panoramic classification result graph and the difference classification result graph to a prediction head to obtain a target to be detected.
In a second aspect, the present disclosure provides a multi-temporal target detection model, which is constructed by using the method of the first aspect.
In a third aspect, the present disclosure provides a multi-temporal target detection method, including:
acquiring at least one group of temporal pictures of a to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points;
and sending the first temporal image and the second temporal image of each group of temporal images into a multi-temporal target detection model to obtain a target to be detected.
In a fourth aspect, the present application provides a method for detecting a building against traffic regulations, including:
acquiring a group of first temporal pictures and second temporal pictures of a to-be-detected place;
sending a first temporal picture and a second temporal picture of each group of temporal pictures into a violation building detection model to obtain a panoramic classification result picture and a difference classification result picture, wherein the violation building detection model is obtained by training a multi-temporal target detection model by using the temporal pictures marked with buildings as training samples, a target to be detected corresponding to a panoramic decoder is the panoramic classification result picture, a target to be detected corresponding to a difference decoder is the difference classification result picture, and the acquisition time of the first temporal picture is earlier than that of the second temporal picture;
and combining the classification result of the panoramic classification result graph and the classification result of the difference classification result graph to obtain a change result graph, and judging the illegal building according to the change result graph.
In a fifth aspect, the present application provides a multi-temporal object detection model building apparatus, including:
an acquisition module: the method comprises the steps of obtaining at least one group of temporal pictures of at least one to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points, and marking a to-be-detected target in each group of temporal pictures to obtain a training sample;
and an encoding module: constructing a multi-temporal target detection model, wherein the multi-temporal target detection model consists of a first encoder, a second encoder, a panoramic decoder and a difference decoder, the first temporal picture of each group of temporal pictures is sent into the first encoder to obtain multi-level first coding information with the depth from low to high, the second temporal picture is sent into the second encoder to be encoded to obtain multi-level second coding information with the depth from low to high, and the depth of the first coding information and the depth of the second coding information of each level are the same;
a first decoding module: sending the first coding information and the second coding information of each level into the panoramic decoder for decoding to obtain a final panoramic decoding result;
a second decoding module: sending the first coding information and the second coding information of each level into the difference decoder, performing splicing-convolution operation on the first coding information and the second coding information of the last level to obtain a difference decoding result with the same depth as the first coding information of the previous level, performing difference jump connection on the difference decoding result to obtain difference decoding jump information, reducing the depth of the difference decoding jump information again to obtain a new difference decoding result, traversing the difference jump connection operation to obtain a final difference decoding result, splicing the difference jump connection operation by using the difference decoding result and the difference information with the same depth, inputting the final difference decoding result into the predictor to obtain a difference classification result graph;
a detection module: and respectively inputting the panoramic classification result graph and the difference classification result graph to a prediction head to obtain a target to be detected.
In a fifth aspect, the present application provides an electronic device comprising a memory having stored therein a computer program and a processor configured to run the computer program to perform a method of constructing a multi-temporal object detection model or a multi-temporal object detection method or a violation building detection method.
In a sixth aspect, the present application provides a readable storage medium having stored therein a computer program comprising instructions for controlling a process to perform a method of constructing a multi-temporal object detection model or a multi-temporal object detection method or a violation building detection method.
Compared with the prior art, the technical scheme has the following characteristics and beneficial effects:
the scheme uses semi-supervised learning, solves the problem of poor accuracy caused by insufficient training samples in the prior art, and ensures normal fusion between feature maps by using perturbation processing and jump connection on the basis of the semi-supervised learning; the scheme divides the illegal building detection problem into two sub-problems, namely a panoramic building classification problem and a difference detection classification problem, through multi-task learning, combines the results of two decoders to output a change result graph in a parameter sharing mode, and classifies the results; the loss of the difference decoder in the scheme is composed of two classification losses and similarity losses, and the classification of the images can be realized while inputting a difference classification result graph; the two encoders of the scheme both adopt variable row convolution, pixel offset information is added in the variable row convolution, and compared with the common convolution, the pixel offset can be learned, so that the network learning change is facilitated; the decoder of the scheme adopts a full convolution form to prevent the high-resolution image from generating artifacts.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flowchart of a method for constructing a multi-temporal object detection model according to an embodiment of the present application;
FIG. 2 is a flow chart of encoding when decoding using a panorama decoder in a multi-temporal object detection model according to an embodiment of the present application;
FIG. 3 is a decoding flow diagram of a panorama decoder in a multi-temporal target detection model according to an embodiment of the present application;
FIG. 4 is a flow chart of encoding when decoding using a disparity decoder in a multi-temporal object detection model according to an embodiment of the present application;
FIG. 5 is a decoding flow diagram of a disparity decoder in a multi-temporal target detection model according to an embodiment of the present application;
FIG. 6 is a block diagram of an apparatus according to an embodiment of the present application;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described in this specification. In some other embodiments, the methods may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Example one
The scheme of the application provides a method for constructing a multi-temporal target detection model, and with reference to fig. 1, the method comprises the following steps:
the method comprises the steps of obtaining at least one group of temporal pictures of at least one to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points, and marking a to-be-detected target in each group of temporal pictures to obtain a training sample;
constructing a multi-temporal target detection model, wherein the multi-temporal target detection model consists of a first encoder, a second encoder, a panoramic decoder and a difference decoder, the first temporal picture of each group of temporal pictures is sent into the first encoder to obtain multi-level first coding information with the depth from low to high, the second temporal picture is sent into the second encoder to be encoded to obtain multi-level second coding information with the depth from low to high, and the depth of the first coding information and the depth of the second coding information of each level are the same;
sending the first coding information and the second coding information of each level into the panorama decoder for decoding to obtain a final panorama decoding result;
sending the first coding information and the second coding information of each level into the difference decoder, performing splicing-convolution operation on the first coding information and the second coding information of the last level to obtain a difference decoding result with the same depth as the first coding information of the previous level, performing difference jump connection on the difference decoding result to obtain difference decoding jump information, reducing the depth of the difference decoding jump information again to obtain a new difference decoding result, traversing the difference jump connection operation to obtain a final difference decoding result, splicing the difference decoding result and the difference information with the same depth by the difference jump connection operation, inputting the final difference decoding result into the predictor to obtain a difference classification result graph;
and respectively inputting the panoramic classification result graph and the difference classification result graph to a prediction head to obtain a target to be detected.
In some embodiments, the first temporal image and the second temporal image are two images with different shooting times, and the first temporal image and the second temporal image are cut into the same size to obtain the training sample.
Specifically, openCV programming may be applied to uniformly crop the urban building orthophotograph into 521 × 512 pixel sizes.
In some specific embodiments of this scheme, use unmanned aerial vehicle to shoot equipment and obtain first tense picture and second tense picture, and fixed unmanned aerial vehicle removes the route and shoots the place to guarantee that two pictures only time is different.
In some embodiments, the application scenario of the scheme is illegal building detection, at this time, the target to be detected on the temporal image is a building, the shooting time of the first temporal image and the second temporal image in different temporal states cannot be too short, and the shooting time can be 1 month, 2 months or three months in the scheme, which is not limited by the scheme.
In some embodiments, the first encoder and the second encoder are identical in structure, and the first coding information and the second coding information are obtained by respectively coding the first temporal picture and the second temporal picture by using the deformable convolutional layer.
Specifically, the variability convolutional layer can learn one more additional pixel offset information relative to the standard convolutional layer, so as to better represent the geometric transformation of the picture, and to better find the scale change and the complex geometric change in the first temporal picture and the second temporal picture, and the formula of the variability convolutional layer is characterized as follows:
wherein, Δ p n Representing the amount of pixel shift, R is the size and extent of the receptive field, p 0 Position information, p, representing first coded information n The representation enumerates a certain receptive field size and expansion range in the R range, w represents a weight.
In some embodiments, the first encoder and the second encoder encode in a weight sharing manner.
Specifically, the weight sharing is to synchronously update respective network weights when the first decoder and the second decoder perform back propagation, and the purpose of encoding by the first encoder and the second encoder in a weight sharing manner is to reduce model operation and maintain the effect of continuous associated feature information.
In some embodiments, the first encoded information and the second encoded information of the same level are perturbed.
Specifically, the perturbation process randomly flips horizontally for each pair of encoded information, rotates the same number of degrees, illustratively rotates a pair of encoded information of the first layer ninety degrees clockwise, rotates a pair of encoded information of the second layer ninety degrees again on the basis of the first layer, and so on.
Specifically, the purpose of the perturbation processing is to make the detection result of the trained model more accurate.
Specifically, the panorama decoder updates the weight learning classification through classification loss and iou loss back propagation, so that the first encoder and the second encoder have better encoding effects, and the first encoding information and the second encoding information are represented in the form of feature maps.
Specifically, the first encoder and the second encoder adopt a semantic segmentation mode, different types in the first temporal image and the second temporal image are segmented by using different colors, and a panoramic decoder is added in the training process, so that the segmentation effect of the first encoder and the second encoder is better, and a better result is obtained when a subsequent difference decoder is used for application.
Illustratively, as shown in fig. 2, the first encoder and the second encoder encode the first temporal picture and the second temporal picture respectively, so as to obtain 4 levels of 4 pairs of encoding information, where the encoding information is represented in the form of a feature map, and the depths of the 4 levels are 64, 128, 256, and 512, respectively.
In some embodiments, in the step of "sending the first coding information and the second coding information of each level into the panorama decoder for decoding to obtain a final panorama decoding result", the first coding information and the second coding information of the last level are subjected to a splicing-convolution operation to obtain a panorama decoding result with the same depth as the first coding information of the previous level, the panorama decoding result is subjected to panorama skip connection to obtain panorama decoding skip information, the panorama decoding skip information is used as an information panorama decoding result after the depth is reduced again, and the panorama skip connection operation is traversed to obtain the final panorama decoding result, wherein the panorama skip connection operation is used for splicing the panorama decoding result and the first coding information and the second coding information with the same depth.
For example, as shown in fig. 3, a pair of coded information with a depth of 512 at the last level is spliced, a convolution of 1x1 is used to convolve the splicing result to obtain a first convolution result with a depth of 256, a pair of coded information with a depth of 265 and the first convolution result with a depth of 256 are subjected to first jump connection to obtain first panorama jump information, a convolution of 1x1 is used to convolve the first panorama jump information to obtain a second convolution result with a depth of 128, a pair of coded information with a depth of 128 and the second convolution result with a depth of 128 are subjected to second jump connection to obtain second panorama jump information, a convolution of 1x1 is used to convolve the result of the second jump connection to obtain a third convolution result with a depth of 64, a pair of coded information with a depth of 64 and the third convolution result with a depth of 64 are subjected to third jump connection to obtain a final panorama decoding result, and the panorama decoding result is input to an FCN-Head predictor to obtain a panorama classification result map.
Specifically, the skip join may supplement some more abstract and locally less coding parsing for the panorama decoding result, so that the boundary can be more accurately segmented when the picture is segmented, and an accurate classification prediction is generated.
In some embodiments, the difference information is obtained by subtracting second coding information corresponding to a second temporal state from first coding information corresponding to a first temporal state picture, and a shooting time point of the first temporal state picture is earlier than a shooting time point of the second temporal state picture.
Illustratively, as shown in fig. 4, the first encoder and the second encoder encode the first temporal picture and the second temporal picture respectively to obtain 4 pairs of encoded information of 4 levels, the encoded information is represented in the form of a feature map, where a pair of encoded information of the first level is FM1 and FM1', a pair of encoded information of the second level is FM2 and FM2', a pair of encoded information of the third level is FM3 and FM3', and a pair of encoded information of the fourth level is FM4 and FM4', respectively, where the temporal states of FM1, FM2, FM3, and FM4 are earlier than the temporal states of FM1', FM2', FM3', and FM4'.
For example, as shown in fig. 5, FM5 is obtained by splicing FM4 and FM4', FM6 is obtained by splicing FM5 with FM 3-FM 3' after performing 1 × 1 convolution, FM7 is obtained by splicing FM6 with FM2-FM2', FM7 is obtained by performing 1 × 1 convolution and then splicing with FM1-FM1', a final difference decoding result is obtained, and the final difference decoding result is input to the FCN-Head predictor to obtain a difference classification result graph.
In particular, the panoramic decoder and the disparity decoder each take the form of a full convolution in order to prevent artifacts from occurring in the high resolution image.
In some embodiments, the multi-temporal object detection model employs semi-supervised learning, i.e.Labeling part of training samples, wherein the number of training iterations is 100, the size of batch size is 16, adam is used as an optimizer, and the initial learning rate is 10 -3 。
In some embodiments, the loss function of the multi-temporal target detection model is composed of a similarity loss, a panorama encoder classification loss, a disparity encoder loss, and a two-classification loss.
Further, the similarity loss is represented by a preliminary difference loss L d Cross entropy loss L ce The preliminary difference loss is obtained by performing a second-order norm formula on each pixel difference in a pair of encoding information of the same level, and the formula is characterized as follows:
wherein,z i pixel values of the first temporal picture of the ith level,z' i pixel values representing a second temporal picture of an i-th level,w(t) In order to be a function of the weight,w(t) The longer the time, the heavier the weight, B is the batch input of each training, and C represents the number of channels.
The cross entropy loss L ce Is to make the panoramic result graph y and the labeled corresponding training samplePerforming cross entropy to obtain cross entropy loss L ce The formula is characterized as follows:
and weighting and adding the preliminary difference loss and the cross entropy loss to obtain similarity loss, wherein a formula is characterized as follows:
wherein, 417For the weight, it can be considered as setting, as a penalty term,λoverfitting of the model is prevented.
Specifically, the similarity loss is to ensure that the results are approximate, or the output vectors are closer to each other, and the comparison is performed from the output spatial distribution, and meanwhile, closer edge features can be better learned from the labeled training samples.
Further, the panoramic result chart Pc1 and the training sample with the label are usedAnd comparing to obtain the classification loss of the panoramic encoder, wherein the formula is characterized as follows:
wherein,L class1 representing a panorama encoder classification penalty.
Specifically, the panorama decoder classification loss adopts pixel-by-pixel cross entropy loss, that is, a prediction result is compared with training data labeled by a label corresponding to each pixel.
Further, the difference classification result graph Pc2 is compared with the training samples with labels to obtain the difference encoder loss, and the formula is characterized as follows:
wherein,L class2 indicating a differential classification loss.
Further, the classification loss is calculated by overlapping portions of the same layer in the first encoder and the second encoder, and the formula is characterized as follows:
wherein,L Dice coefficient representing a binary loss FM representing coding information corresponding to the first temporal picture, FM'And representing the coding information corresponding to the second temporal picture.
Specifically, the binary loss is a binary problem, and the dice coefficient loss can measure the overlapping portion of two pictures.
Further, the similarity loss, the classification loss of the panoramic encoder, the loss of the difference encoder and the two-classification loss are combined to obtain a total loss function of the violation building model, and a formula is characterized as follows:
L=L s +L class1 +L class2 +L Dice coefficient
example two
A multi-temporal target detection model is constructed by the method of the first embodiment.
EXAMPLE III
A violation building detection model is obtained by training the multi-temporal target detection model in the second embodiment by taking a temporal picture marked with a building as a training sample.
Example four
A multi-temporal target detection method comprises the following steps:
acquiring at least one group of temporal pictures of a to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points;
and sending the first temporal image and the second temporal image of each group of temporal images into the multi-temporal target detection model described in the second embodiment to obtain the target to be detected.
EXAMPLE five
A method of violation building detection comprising:
acquiring a group of first temporal pictures and second temporal pictures of a to-be-detected place;
sending a first temporal image and a second temporal image of each group of temporal images into a violation building detection model to obtain a panoramic classification result image and a difference classification result image, wherein the violation building detection model is obtained by training a multi-temporal target detection model by using the temporal images marked with buildings as training samples, the target to be detected corresponding to a panoramic decoder is the panoramic classification result image, the target to be detected corresponding to a difference decoder is the difference classification result image, and the acquisition time of the first temporal image is earlier than that of the second temporal image;
and combining the classification result of the panoramic classification result graph and the classification result of the difference classification result graph to obtain a change result graph, and judging the illegal building according to the change result graph.
In some embodiments, the change result graph represents the change situation of two different temporal pictures of the same place, and if there is a change, there may be a violation building in the place.
In some embodiments, if the place is considered to have the illegal buildings, the first time-state picture is used for constructing a comparison sample library, the area difference value between the difference classification result graph and the panoramic classification result graph is obtained, the calculation result is compared with a first set threshold value, if the calculation result is larger than the first set threshold value, the place is considered to have the illegal buildings, and the position information of the place is output.
In some embodiments, if it is necessary to judge that the house is removed illegally, if the setting is smaller than a first setting threshold, the difference classification result is compared with a first time-mode picture of the same place in the comparison sample library, and the place information is input.
Specifically, the first set threshold is artificially set and is used for determining the building area change of the same place in different tenses.
Specifically, after the location information is output, law enforcement personnel can be sent to go to the door for viewing.
EXAMPLE six
Based on the same concept, referring to fig. 6, the present application further provides a device for constructing a multi-temporal target detection model, including:
an acquisition module: the method comprises the steps of obtaining at least one group of temporal pictures of at least one to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points, and marking a to-be-detected target in each group of temporal pictures to obtain a training sample;
and an encoding module: constructing a multi-temporal target detection model, wherein the multi-temporal target detection model consists of a first encoder, a second encoder, a panoramic decoder and a difference decoder, the first temporal picture of each group of temporal pictures is sent to the first encoder to obtain multi-level first coding information with the depth from low to high, the second temporal picture is sent to the second encoder to be coded to obtain multi-level second coding information with the depth from low to high, and the depth of the first coding information and the depth of the second coding information of each level are the same;
a first decoding module: sending the first coding information and the second coding information of each level into the panorama decoder for decoding to obtain a final panorama decoding result;
a second decoding module: sending the first coding information and the second coding information of each level into the difference decoder, performing splicing-convolution operation on the first coding information and the second coding information of the last level to obtain a difference decoding result with the same depth as the first coding information of the previous level, performing difference jump connection on the difference decoding result to obtain difference decoding jump information, reducing the depth of the difference decoding jump information again to obtain a new difference decoding result, traversing the difference jump connection operation to obtain a final difference decoding result, splicing the difference jump connection operation by using the difference decoding result and the difference information with the same depth, inputting the final difference decoding result into the predictor to obtain a difference classification result graph;
a detection module: and respectively inputting the panoramic classification result image and the differential classification result image to a prediction head to obtain a target to be detected.
EXAMPLE seven
The present embodiment further provides an electronic apparatus, referring to fig. 7, comprising a memory 404 and a processor 402, wherein the memory 404 stores a computer program, and the processor 402 is configured to run the computer program to perform the steps in any one of the above-described embodiments of the method for constructing a multi-temporal object detection model.
Specifically, the processor 402 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
The processor 402 reads and executes the computer program instructions stored in the memory 404 to implement the implementation process of the method for constructing the multi-temporal object detection model in any one of the above embodiments.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.
The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input and output devices 408 are used to input or output information. In this embodiment, the input information may be a first time image, a second time image, and the like, and the output information may be a place where a violation building exists, and the like.
Optionally, in this embodiment, the processor 402 may be configured to execute the following steps by a computer program:
s101, obtaining at least one group of temporal pictures of at least one to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points, and marking a to-be-detected target in each group of temporal pictures to obtain a training sample;
s102, constructing a multi-temporal target detection model, wherein the multi-temporal target detection model is composed of a first encoder, a second encoder, a panoramic decoder and a difference decoder, the first temporal picture of each group of temporal pictures is sent to the first encoder to obtain multi-level first coding information with the depth from low to high, the second temporal picture is sent to the second encoder to be coded to obtain multi-level second coding information with the depth from low to high, and the depth of the first coding information and the depth of the second coding information of each level are the same;
s103, sending the first coding information and the second coding information of each level into the panorama decoder for decoding to obtain a final panorama decoding result;
s104, sending the first coding information and the second coding information of each level into the difference decoder, performing splicing-convolution operation on the first coding information and the second coding information of the last level to obtain a difference decoding result with the same depth as the first coding information of the previous level, performing difference jump connection on the difference decoding result to obtain difference decoding jump information, reducing the depth of the difference decoding jump information again to obtain a new difference decoding result, traversing the difference jump connection operation to obtain a final difference decoding result, splicing the difference decoding result and the difference information with the same depth by the difference jump connection operation, wherein the difference information is the difference value of the first coding information and the second coding information, and inputting the final difference decoding result into the predictor to obtain a difference classification result graph;
and S105, respectively inputting the panoramic classification result image and the difference classification result image to a prediction head to obtain a target to be detected.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also called program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in figure 7 may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.
Claims (13)
1. A method for constructing a multi-temporal target detection model is characterized by comprising the following steps:
the method comprises the steps of obtaining at least one group of temporal pictures of at least one to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points, and marking a to-be-detected target in each group of temporal pictures to obtain a training sample;
constructing a multi-temporal target detection model, wherein the multi-temporal target detection model consists of a first encoder, a second encoder, a panoramic decoder and a difference decoder, the first temporal picture of each group of temporal pictures is sent to the first encoder to obtain multi-level first coding information with the depth from low to high, the second temporal picture is sent to the second encoder to be coded to obtain multi-level second coding information with the depth from low to high, and the depth of the first coding information and the depth of the second coding information of each level are the same;
sending the first coding information and the second coding information of each level into the panoramic decoder for decoding to obtain a final panoramic decoding result;
sending the first coding information and the second coding information of each level into the difference decoder, performing splicing-convolution operation on the first coding information and the second coding information of the last level to obtain a difference decoding result with the same depth as the first coding information of the previous level, performing difference jump connection on the difference decoding result to obtain difference decoding jump information, reducing the depth of the difference decoding jump information again to obtain a new difference decoding result, traversing the difference jump connection operation to obtain a final difference decoding result, splicing the difference decoding result and the difference information with the same depth by the difference jump connection operation, inputting the final difference decoding result into the predictor to obtain a difference classification result graph;
and respectively inputting the panoramic classification result graph and the difference classification result graph to a prediction head to obtain a target to be detected.
2. The method of claim 1, wherein the first encoder and the second encoder have the same structure, and the first temporal picture and the second temporal picture are encoded by using a deformable convolutional layer to obtain first encoding information and second encoding information.
3. The method for constructing the multi-temporal object detection model according to claim 1, wherein the first encoder and the second encoder perform encoding in a weight sharing manner, and perform perturbation processing on the first encoding information and the second encoding information of the same level.
4. The method for constructing the multi-temporal target detection model according to claim 1, wherein in the step of sending the first coding information and the second coding information of each level into the panorama decoder for decoding to obtain the final panorama decoding result, the first coding information and the second coding information of the last level are subjected to a splicing-convolution operation to obtain a panorama decoding result with the same depth as the first coding information of the previous level, the panorama decoding result is subjected to panorama skip connection to obtain panorama decoding skip information, the panorama decoding skip information is used as an information panorama decoding result after the depth is reduced again, and a final panorama decoding result is obtained by traversing panorama skip connection operation, wherein the panorama skip connection operation is performed by splicing the panorama decoding result with the first coding information and the second coding information with the same depth.
5. The method as claimed in claim 1, wherein the difference information is obtained by subtracting second coding information corresponding to a second temporal state from first coding information corresponding to a first temporal state picture, and a shooting time point of the first temporal state picture is earlier than a shooting time point of the second temporal state picture.
6. A multi-temporal object detection model constructed using the method of any one of claims 1 to 5.
7. A violation building detection model, characterized in that the multi-temporal target detection model of claim 6 is trained by using a temporal picture marked with a building as a training sample.
8. A multi-temporal target detection method is characterized by comprising the following steps:
acquiring at least one group of temporal pictures of a to-be-detected place, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points;
and sending the first temporal image and the second temporal image of each group of temporal images into the multi-temporal target detection model of claim 6 to obtain the target to be detected.
9. A method for detecting illegal buildings is characterized by comprising the following steps:
acquiring a group of first temporal pictures and second temporal pictures of a to-be-detected place;
sending the first time-state picture and the second time-state picture of each group of time-state pictures into the violation building detection model of claim 7 to obtain a panoramic classification result picture and a difference classification result picture, wherein the target to be detected corresponding to the panoramic decoder is the panoramic classification result picture, the target to be detected corresponding to the difference decoder is the difference classification result picture, and the acquisition time of the first time-state picture is earlier than that of the second time-state picture;
and combining the classification result of the panoramic classification result graph and the classification result of the difference classification result graph to obtain a change result graph, and judging the illegal building according to the change result graph.
10. The illegal building detection method according to claim 9, characterized in that the change result graph represents the change situation of two different temporal pictures at the same place, if there is a change, the area difference between the difference classification result graph and the panoramic classification result graph is obtained, and if the area difference is greater than a first set threshold, it is determined that there is an illegal building at the place.
11. A device for constructing a multi-temporal target detection model is characterized by comprising:
an acquisition module: the method comprises the steps of obtaining at least one group of temporal pictures of at least one to-be-detected place, marking a to-be-detected target in each group of temporal pictures to obtain a training sample, wherein each group of temporal pictures comprises a first temporal picture and a second temporal picture which are taken from different time points;
the coding module: constructing a multi-temporal target detection model, wherein the multi-temporal target detection model consists of a first encoder, a second encoder, a panoramic decoder and a difference decoder, the first temporal picture of each group of temporal pictures is sent to the first encoder to obtain multi-level first coding information with the depth from low to high, the second temporal picture is sent to the second encoder to be coded to obtain multi-level second coding information with the depth from low to high, and the depth of the first coding information and the depth of the second coding information of each level are the same;
a first decoding module: sending the first coding information and the second coding information of each level into the panoramic decoder for decoding to obtain a final panoramic decoding result;
a second decoding module: sending the first coding information and the second coding information of each level into the difference decoder, performing splicing-convolution operation on the first coding information and the second coding information of the last level to obtain a difference decoding result with the same depth as the first coding information of the previous level, performing difference jump connection on the difference decoding result to obtain difference decoding jump information, reducing the depth of the difference decoding jump information again to obtain a new difference decoding result, traversing the difference jump connection operation to obtain a final difference decoding result, splicing the difference decoding result and the difference information with the same depth by the difference jump connection operation, inputting the final difference decoding result into the predictor to obtain a difference classification result graph;
a detection module: and respectively inputting the panoramic classification result graph and the difference classification result graph to a prediction head to obtain a target to be detected.
12. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to execute the computer program to perform a method of constructing a multi-temporal object detection model according to any one of claims 1 to 5 or a method of multi-temporal object detection according to claim 8 or a method of violation building detection according to claim 9.
13. A readable storage medium having stored thereon a computer program comprising instructions for controlling a process to perform a method of constructing a multi-temporal object detection model according to any one of claims 1 to 5 or a multi-temporal object detection method according to claim 8 or a violation building detection method according to claim 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211504037.9A CN115546652B (en) | 2022-11-29 | 2022-11-29 | Multi-temporal target detection model, and construction method, device and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211504037.9A CN115546652B (en) | 2022-11-29 | 2022-11-29 | Multi-temporal target detection model, and construction method, device and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115546652A true CN115546652A (en) | 2022-12-30 |
CN115546652B CN115546652B (en) | 2023-04-07 |
Family
ID=84722688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211504037.9A Active CN115546652B (en) | 2022-11-29 | 2022-11-29 | Multi-temporal target detection model, and construction method, device and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115546652B (en) |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2001284910A1 (en) * | 2000-08-16 | 2002-05-23 | Dolby Laboratories Licensing Corporation | Modulating one or more parameters of an audio or video perceptual coding system in response to supplemental information |
KR20070074487A (en) * | 2006-01-07 | 2007-07-12 | 한국전자통신연구원 | Method and apparatus for video data encoding and decoding |
AU2013206265A1 (en) * | 2008-07-11 | 2013-07-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Providing a time warp activation signal and encoding an audio signal therewith |
CN104620575A (en) * | 2012-09-28 | 2015-05-13 | 夏普株式会社 | Image decoding device |
CN104662912A (en) * | 2012-09-28 | 2015-05-27 | 夏普株式会社 | Image decoding device |
US20170126456A1 (en) * | 2015-11-03 | 2017-05-04 | Newracom, Inc. | Apparatus and method for scrambling control field information for wireless communications |
CN108416059A (en) * | 2018-03-22 | 2018-08-17 | 北京市商汤科技开发有限公司 | Training method and device, equipment, medium, the program of image description model |
CN109155865A (en) * | 2016-05-24 | 2019-01-04 | 高通股份有限公司 | The first inlet signal in most interested region in the picture transmits |
US10379995B1 (en) * | 2018-07-06 | 2019-08-13 | Capital One Services, Llc | Systems and methods to identify breaking application program interface changes |
CN111797799A (en) * | 2020-07-13 | 2020-10-20 | 郑州昂达信息科技有限公司 | Subway passenger waiting area planning method based on artificial intelligence |
CN112616014A (en) * | 2020-12-09 | 2021-04-06 | 福州大学 | GAN-based panoramic video adaptive streaming transmission method |
CN112991207A (en) * | 2021-03-11 | 2021-06-18 | 五邑大学 | Panoramic depth estimation method and device, terminal equipment and storage medium |
CN113947524A (en) * | 2021-10-22 | 2022-01-18 | 上海交通大学 | Panoramic picture saliency prediction method and device based on full-convolution graph neural network |
CN114120041A (en) * | 2021-11-29 | 2022-03-01 | 暨南大学 | Small sample classification method based on double-pair anti-variation self-encoder |
CN114298997A (en) * | 2021-12-23 | 2022-04-08 | 北京瑞莱智慧科技有限公司 | Method and device for detecting forged picture and storage medium |
CN114648714A (en) * | 2022-01-25 | 2022-06-21 | 湖南中南智能装备有限公司 | YOLO-based workshop normative behavior monitoring method |
CN115131680A (en) * | 2022-07-05 | 2022-09-30 | 西安电子科技大学 | Remote sensing image water body extraction method based on depth separable convolution and jump connection |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1310099B1 (en) * | 2000-08-16 | 2005-11-02 | Dolby Laboratories Licensing Corporation | Modulating one or more parameters of an audio or video perceptual coding system in response to supplemental information |
-
2022
- 2022-11-29 CN CN202211504037.9A patent/CN115546652B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2001284910A1 (en) * | 2000-08-16 | 2002-05-23 | Dolby Laboratories Licensing Corporation | Modulating one or more parameters of an audio or video perceptual coding system in response to supplemental information |
KR20070074487A (en) * | 2006-01-07 | 2007-07-12 | 한국전자통신연구원 | Method and apparatus for video data encoding and decoding |
AU2013206265A1 (en) * | 2008-07-11 | 2013-07-04 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Providing a time warp activation signal and encoding an audio signal therewith |
CN108696756A (en) * | 2012-09-28 | 2018-10-23 | 夏普株式会社 | Picture coding device |
CN104620575A (en) * | 2012-09-28 | 2015-05-13 | 夏普株式会社 | Image decoding device |
CN104662912A (en) * | 2012-09-28 | 2015-05-27 | 夏普株式会社 | Image decoding device |
US20170126456A1 (en) * | 2015-11-03 | 2017-05-04 | Newracom, Inc. | Apparatus and method for scrambling control field information for wireless communications |
CN109155865A (en) * | 2016-05-24 | 2019-01-04 | 高通股份有限公司 | The first inlet signal in most interested region in the picture transmits |
CN108416059A (en) * | 2018-03-22 | 2018-08-17 | 北京市商汤科技开发有限公司 | Training method and device, equipment, medium, the program of image description model |
US10379995B1 (en) * | 2018-07-06 | 2019-08-13 | Capital One Services, Llc | Systems and methods to identify breaking application program interface changes |
CN111797799A (en) * | 2020-07-13 | 2020-10-20 | 郑州昂达信息科技有限公司 | Subway passenger waiting area planning method based on artificial intelligence |
CN112616014A (en) * | 2020-12-09 | 2021-04-06 | 福州大学 | GAN-based panoramic video adaptive streaming transmission method |
CN112991207A (en) * | 2021-03-11 | 2021-06-18 | 五邑大学 | Panoramic depth estimation method and device, terminal equipment and storage medium |
CN113947524A (en) * | 2021-10-22 | 2022-01-18 | 上海交通大学 | Panoramic picture saliency prediction method and device based on full-convolution graph neural network |
CN114120041A (en) * | 2021-11-29 | 2022-03-01 | 暨南大学 | Small sample classification method based on double-pair anti-variation self-encoder |
CN114298997A (en) * | 2021-12-23 | 2022-04-08 | 北京瑞莱智慧科技有限公司 | Method and device for detecting forged picture and storage medium |
CN114648714A (en) * | 2022-01-25 | 2022-06-21 | 湖南中南智能装备有限公司 | YOLO-based workshop normative behavior monitoring method |
CN115131680A (en) * | 2022-07-05 | 2022-09-30 | 西安电子科技大学 | Remote sensing image water body extraction method based on depth separable convolution and jump connection |
Non-Patent Citations (1)
Title |
---|
DANIELA FERNANDEZ ESPINOSA等: "Twitter Users' Privacy Concerns: What do Their Accounts' First Names Tell Us?", 《JOURNAL OF DATA AND INFORMATION SCIENCE》 * |
Also Published As
Publication number | Publication date |
---|---|
CN115546652B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114092820B (en) | Target detection method and moving target tracking method applying same | |
Alsabhan et al. | Automatic building extraction on satellite images using Unet and ResNet50 | |
CN113780296B (en) | Remote sensing image semantic segmentation method and system based on multi-scale information fusion | |
US12051233B2 (en) | Method for filtering image feature points and terminal | |
CN115546601B (en) | Multi-target recognition model and construction method, device and application thereof | |
CN113409361B (en) | Multi-target tracking method and device, computer and storage medium | |
CN113989305B (en) | Target semantic segmentation method and street target abnormity detection method applying same | |
CN112712138A (en) | Image processing method, device, equipment and storage medium | |
EP3921744B1 (en) | Systems and methods for image retrieval | |
CN117217368A (en) | Training method, device, equipment, medium and program product of prediction model | |
CN117576569B (en) | Multi-target detection model and method for urban capacity event management | |
CN115546274B (en) | Image depth judgment model and construction method, device and application thereof | |
CN115546652B (en) | Multi-temporal target detection model, and construction method, device and application thereof | |
CN116861262B (en) | Perception model training method and device, electronic equipment and storage medium | |
CN115620242B (en) | Multi-line human target re-identification method, device and application | |
CN115659268A (en) | Scene recognition method based on ADCP flow measurement data and application thereof | |
CN116091964A (en) | High-order video scene analysis method and system | |
CN112001211B (en) | Object detection method, device, equipment and computer readable storage medium | |
CN115880650B (en) | Cross-view vehicle re-identification model, construction method, device and application thereof | |
CN116824277B (en) | Visual target detection model for road disease detection, construction method and application | |
CN113689472A (en) | Moving target detection method, device and application | |
CN115546704B (en) | Vehicle projectile identification method, device and application | |
CN115546472B (en) | Method and device for recognizing weight of road vehicle and application | |
CN115546780B (en) | License plate recognition method, model and device | |
Alsabhan et al. | Research Article Automatic Building Extraction on Satellite Images Using Unet and ResNet50 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |