CN115131281A

CN115131281A - Method, device and equipment for training change detection model and detecting image change

Info

Publication number: CN115131281A
Application number: CN202210348820.4A
Authority: CN
Inventors: 刘文龙; 刘俊; 高斌斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-09-30

Abstract

The application discloses a method, a device and equipment for training a change detection model and detecting image change, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving, maps and the like. Meanwhile, in the application stage, the target image can be obtained relative to the area of the standard image by only detecting the single data processing branch of the single-phase change aiming at the target image, so that the data processing amount is greatly reduced, and the detection efficiency of image change detection in the actual detection scene is improved.

Description

Method, device and equipment for training change detection model and detecting image change

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, and provides a method, a device and equipment for training a change detection model and detecting image change.

Background

In the industrial manufacturing process, the production efficiency is greatly improved by introducing the assembly line production. However, the complicated process inevitably causes defects of the product, and the defects are generated probabilistically and need to be detected for the finished product at a later stage, while the detection cost of the traditional manual observation method is high, and the manual observation has certain difficulty, for example: due to the fact that the defect area is small, detection is difficult to cause the condition of missing detection, and the yield of an actual production line is affected.

In order to replace a manual visual observation method and improve the efficiency and accuracy of finished product defect detection, a defect positioning method based on a computer vision technology becomes a popular research direction. For example: the defect positioning method based on image change detection is used for identifying and segmenting a defect area in an actual product image, wherein the defect area is an image area of the actual product image which changes relative to a non-defective product image.

However, the accuracy of the image change detection model determines the accuracy of the defect location result, and therefore, how to improve the accuracy of the image change detection is a considerable problem.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for training a change detection model and detecting image change, which are used for improving the accuracy and efficiency of image change detection.

In one aspect, a method for training a change detection model is provided, where the method includes:

obtaining a plurality of sets of sample images, each set of sample images comprising a first image, a second image, and a sample label for indicating: an actual image area in which the first image and the second image are changed from a standard image;

performing iterative training on a change detection model to be trained by adopting the plurality of sample image groups to obtain a corresponding target change detection model, wherein each iteration comprises the following steps:

respectively carrying out intra-group image change detection on each input sample image group to obtain a first image area in which a second image in each sample image group changes relative to a first image;

respectively carrying out single-phase image change detection on each input first image to obtain a second image area of each first image changed relative to the standard image;

and determining model loss based on the sample label, the first image area and the second image area corresponding to each sample image group, and adjusting parameters based on the model loss.

In one aspect, an image change detection method is provided, the method including:

training based on any one of the methods to obtain a target change detection model;

and calling the target change detection model, and carrying out single-phase image change detection on the target image to be detected to obtain a target image area of the target image changed relative to the standard image.

In one aspect, a change detection model training apparatus is provided, the apparatus including:

a sample acquisition unit for obtaining a plurality of sample image groups, each sample image group containing a first image, a second image and a sample label for indicating: an actual image area in which the first image and the second image are changed from a standard image;

the change detection unit is used for carrying out iterative training on a change detection model to be trained by adopting the plurality of sample image groups to obtain a corresponding target change detection model; wherein the change detection unit includes:

the in-group detection subunit is used for respectively carrying out in-group image change detection on each input sample image group to obtain a first image area in which a second image in each sample image group changes relative to a first image;

the single-phase detection subunit is used for respectively carrying out single-phase image change detection on each input first image to obtain a second image area of each first image changed relative to the standard image;

and the joint optimization subunit is used for determining model loss based on the sample label, the first image area and the second image area corresponding to each sample image group, and performing parameter adjustment based on the model loss.

Optionally, the change detection unit further includes a feature extraction subunit;

the feature extraction subunit is configured to extract first feature sets of the first images, respectively, where each first feature set includes: the corresponding first images respectively correspond to first image features of a plurality of preset image scales; and respectively extracting second feature sets of the second images, wherein each second feature set comprises: the respective second images respectively correspond to second image features of the plurality of image scales;

the intra-group detection subunit is specifically configured to perform intra-group image change detection on each sample image group based on each obtained first feature set and each obtained second feature set, and obtain the first image regions corresponding to each sample image group.

Optionally, the intra-group detection subunit is specifically configured to:

for each sample image group, the following operations are respectively executed:

respectively performing feature fusion on image features corresponding to the same image scale in a corresponding first feature set and a corresponding second feature set aiming at a sample image group to obtain fused image features corresponding to the multiple image scales;

and carrying out intra-group image change detection on the sample image group based on the obtained fusion image characteristics to obtain the first image region.

Optionally, the intra-group detection subunit is specifically configured to:

for the plurality of image scales, respectively performing the following operations:

aiming at an image scale, carrying out updating processing based on a self-attention mechanism on corresponding first image features in the first feature set to obtain third image features, and carrying out updating processing based on the self-attention mechanism on corresponding second image features in the second feature set to obtain fourth image features;

performing attention-based fusion processing on the fourth image feature based on the third image feature to obtain a fifth image feature, and performing attention-based fusion processing on the third image feature based on the fourth image feature to obtain a sixth image feature;

and performing feature fusion based on the fifth image feature and the sixth image feature to obtain a fusion image feature of the image scale.

Optionally, the intra-group detection subunit is specifically configured to:

processing the first image feature through a self-attention network of the change detection model to obtain first attention weights of elements in the first image feature, wherein each first attention weight is used for representing the association degree between the corresponding element and other elements;

and performing weighting processing on the first image features through the self-attention network based on the obtained first attention weights to obtain the third image features.

Optionally, the intra-group detection subunit is specifically configured to:

processing the third image feature and the fourth image feature through an attention network of the change detection model to obtain second attention weights of elements in the fourth image feature, wherein each second attention weight is used for representing the degree of association between a corresponding element of the third image feature and the fourth image feature;

and performing weighting processing on the third image features through the attention network based on the obtained second attention weights to obtain fifth image features.

Optionally, the intra-group detection subunit is specifically configured to:

obtaining the fused image feature based on a difference between corresponding elements in the fifth image feature and the sixth image feature.

Optionally, the intra-group detection subunit is specifically configured to:

from the largest image scale, sequentially carrying out down-sampling processing on the fusion image characteristics of all image scales, and carrying out characteristic combination processing on the obtained down-sampled image characteristics and the fusion image characteristics of the next-stage image scale until the smallest image scale is reached to obtain combined image characteristics;

performing up-sampling processing on the merged image features for multiple times until target image features with the same image scale as the original image are obtained;

determining the first image region based on the target image feature.

Optionally, the single-phase detection subunit is specifically configured to:

for each first image, the following steps are respectively executed:

carrying out scale conversion processing on the first image features of all image scales aiming at one first image to obtain a plurality of seventh image features with the same image scale;

performing feature fusion on the obtained plurality of seventh image features to obtain eighth image features;

performing upsampling processing on the basis of the eighth image feature to obtain a ninth image feature which has the same image scale as the first image;

and determining the probability that each pixel point in the first image belongs to the change point based on the ninth image characteristic, and obtaining the second image area based on each obtained probability.

Optionally, the single-phase detection subunit is specifically configured to:

respectively carrying out at least one time of up-sampling processing on the first image features of all image scales to obtain a plurality of seventh image features with the same image scale;

starting from the minimum image scale, the upsampling times of all the image scales are sequentially decreased.

Optionally, the joint optimization subunit is specifically configured to:

determining intra-group change detection loss single-phase change detection loss corresponding to the change detection model based on each first image region and the corresponding sample label;

determining single-phase change detection loss corresponding to the change detection model based on each second image area and the corresponding sample label;

determining the model loss based on the intra-group change detection loss, the single-phase change detection loss, and the respective weights.

In one aspect, an image change detection apparatus is provided, the apparatus including:

the image input unit is used for acquiring a target image to be detected and inputting the target image into a target change detection model obtained based on any one of the methods;

and the image change detection unit is used for carrying out single-phase image change detection on the target image to be detected through the target change detection model to obtain a target image area of the target image changed relative to the standard image.

Optionally, the target image is a product image of a defect to be detected, and the standard image is a defect-free image of a product identical to the target image;

the image change detection unit is further configured to determine the target image area as a defective area.

In one aspect, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the above methods when executing the computer program.

In one aspect, a computer storage medium is provided having computer program instructions stored thereon that, when executed by a processor, implement the steps of any of the above-described methods.

In one aspect, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps of any of the methods described above.

In the embodiment of the application, in the training phase of the change detection model, by constructing a sample image group of a first image and a second image with content registration, on one data processing branch, single-phase change detection is performed on the first image, and on the other data processing branch, intra-group two-phase change detection is performed on the first image and the second image, so that parameter adjustment can be performed on the change detection model based on the processing results of the two data processing branches and the sample label. Therefore, the two images of content registration are combined to adjust the whole model through change detection, so that the single-phase change detection is enhanced by using the two-phase change detection between the two images, and the accuracy of the single-phase change detection branch is improved. Meanwhile, in the application stage, the single data processing branch of the single-phase change detection is only needed for the target image, and the area of the target image changed relative to the standard image can be obtained, so that the data processing amount is greatly reduced, and the detection efficiency of the image change detection in the actual detection scene is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of image change detection provided in an embodiment of the present application;

fig. 2 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a change detection model provided in an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating a method for training a change detection model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a set of constructed sample images provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a variation fusion module according to an embodiment of the present disclosure;

fig. 7 is a schematic processing flow diagram of a self-interacting module according to an embodiment of the present disclosure;

fig. 8 is a schematic processing flow diagram of a cross-interaction submodule according to an embodiment of the present application;

FIG. 9 is a schematic processing flow diagram of a change sensing head according to an embodiment of the present application;

fig. 10 is a schematic diagram illustrating an architecture of a single-phase image change detection network according to an embodiment of the present application;

fig. 11 is a schematic flowchart of an image change detection method according to an embodiment of the present application;

fig. 12 is a schematic view illustrating a defect detection process of an accessory of a camera bracket according to an embodiment of the present application;

FIG. 13 is a comparison of a normal fitting and a defective fitting provided by an embodiment of the present application;

FIG. 14 is a schematic structural diagram of a change detection model training apparatus according to an embodiment of the present application;

FIG. 15 is a schematic structural diagram of an image change detection method according to an embodiment of the present application;

fig. 16 is a schematic structural diagram of an electronic device to which an embodiment of the present application is applied;

fig. 17 is a schematic structural diagram of another electronic device to which the embodiment of the present application is applied.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

For the convenience of understanding the technical solutions provided by the embodiments of the present application, some key terms used in the embodiments of the present application are explained first:

image change detection (change detection): the image change detection task is to detect and segment the image changes that exist between two images, such as semantic changes that exist between two registered pictures taken at different times. Referring to fig. 1, a schematic diagram of image change detection is shown. As can be seen from fig. 1, after comparing the images a and b, the image b can be perceived to have an increased circle and rectangle compared to the image a, that is, the content in the image c, so that the image change detection simulates the human visual perception, and the effect of distinguishing the change in the image is achieved.

And (3) defect detection: the method is characterized by detecting whether an industrial finished product has a product defect and positioning the product defect, is an application embodiment of detecting image change in an actual scene, and is essentially characterized by detecting the change between a finished product image and a defect-free product image.

Attention (attention) mechanism: a method for measuring network intermediate characteristics by using high-level information enables a network to focus on partial information for assisting judgment in an image and ignores irrelevant information. The essence of the attention mechanism is that human vision attention mechanism, when people perceive things, people generally do not see a scene from head to tail, but see a specific part according to needs, and when people find that a scene often appears something they want to observe in a certain part, people can learn to pay attention to the part when similar scenes reappear in the future. Thus, the attention mechanism is essentially a means of screening out high-value information from a large amount of information where different information has different importance to the result, which importance can be reflected by giving weights of different sizes, in other words, the attention mechanism can be understood as a rule of assigning weights when synthesizing multiple sources.

The self-attention mechanism is as follows: the self-attention mechanism is a variant of the attention mechanism, and the essential principle is the same as that of the attention mechanism, except that the attention mechanism focuses on the correlation between different images, and the self-attention mechanism focuses more on the intrinsic relation in one image.

The technical scheme of the embodiment of the application relates to Artificial Intelligence and machine learning technology, and Artificial Intelligence (AI) is a theory, a method, a technology and an application system which simulate, extend and expand human Intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and use the knowledge to acquire an optimal result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Computer Vision technology (CV) is a science for researching how to make a machine see, and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification and detection on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formula learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and researched in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, internet of vehicles, automatic driving, smart traffic and the like.

The scheme provided by the embodiment of the application relates to an artificial intelligence computer vision technology and a machine learning technology, and the target change detection model provided by the embodiment of the application is applied to image change detection (relative to a standard image or between two images). The training and using method of the change detection model can be divided into two parts, including a training part and an application part; the training part trains a change detection model through the technology of machine learning, continuously adjusts model parameters through an optimization algorithm until the model converges, comprises a sample image group input change detection model to be trained, obtains a corresponding first image area by respectively carrying out biphase change detection in the group on a first image and a second image in each image group, and carries out monophase change detection on each first image in each image group to obtain a corresponding second image area, so as to adjust the model parameters according to the first image area and the second image area of each sample image group and a sample label, and the like; the application part is used for carrying out unidirectional change detection on the input target image by using the unidirectional change detection branch of the target change detection model obtained by training in the training part, and obtaining the area of the target image changed relative to the standard image and the like. In addition, it should be further noted that, in the embodiment of the present application, the artificial neural network model may be trained online or offline, which is not specifically limited herein, and is exemplified by offline training.

The following briefly introduces the design concept of the embodiments of the present application:

the defect positioning method based on image change detection can replace an artificial vision observation method, and the efficiency of defect positioning is improved. For example, in the related art, a target detection or semantic segmentation method is usually adopted for positioning, and since most defects are irregular, most semantic segmentation methods are used. For example, a change detection method using a Hierarchical Paired Channel Fusion Network (HPCFNet) is used, in which a multilayer image semantic Feature is extracted, a multilayer Paired Channel Fusion (PCF) module is used to perform Channel-level Feature Fusion, and meanwhile, a multilayer-Part Feature Learning (MPFL) module is used to integrate features from the whole to the Part to adapt to the scale and position diversity of a scene change area, and finally a change segmentation map is output.

However, the design of the PCF module and the MPFL module is too complex, and a large number of parallel hole convolution and cross feature stacking operations are used, so that the method needs to consume a large amount of computing resources, is low in computing efficiency, and cannot meet the requirements in an actual detection scene.

In view of this, the present application provides a method, an apparatus, and a device for training a change detection model and detecting an image change. When model training is carried out based on the change detection model training method, a sample image group of a first image and a second image which are registered in content is constructed, single-phase change detection is carried out on the first image on one data processing branch, and double-phase change detection in the group is carried out on the first image and the second image on the other data processing branch, so that parameter adjustment can be carried out on a change detection model together with a sample label based on the processing results of the two data processing branches. Therefore, the two images of content registration are combined to adjust the whole model through change detection, so that the single-phase change detection is enhanced by using the two-phase change detection between the two images, and the accuracy of the single-phase change detection branch is improved. Meanwhile, when the image change detection method is applied to change detection, the area of the target image changed relative to the standard image can be obtained only by the single data processing branch of single-phase change detection aiming at the target image, so that the data processing amount is greatly reduced, and the detection efficiency of image change detection in an actual detection scene is improved.

In addition, in the embodiment of the application, when bidirectional change detection is performed, self-interaction fusion based on a self-attention mechanism is performed on the first image and the second image respectively, and cross-interaction fusion based on the attention mechanism is performed on the first image and the second image, so that interaction between sample diversity and samples is increased, and therefore, a subsequent change perception effect is better, the detection precision of a trained target change detection model is higher, and robustness is higher for existing inherent changes.

Some brief descriptions are given below to application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In a specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

The scheme provided by the embodiment of the application can be applied to most image Change Detection scenes, such as a product defect positioning Scene, a Street view Change Detection (SSCD) Scene, a remote sensing image Change Detection Scene and the like, and is not illustrated one by one here.

As shown in fig. 2, an application scenario provided in the embodiment of the present application is schematically illustrated, and in the scenario, an image capturing device 201, an image change detecting device 202, and a terminal device 203 may be included.

The image capturing apparatus 201 is an electronic apparatus having an image capturing function, such as a video camera or a video camera. The image change detection device 202 is an electronic device for detecting image changes based on the method of the embodiment of the present application, and may be any electronic device with sufficient computing capability, for example, a mobile phone, a tablet computer (PAD), a laptop computer, a desktop computer, an intelligent home appliance, an intelligent vehicle-mounted device, an aircraft, an intelligent wearable device, and the like. Or may be an independent physical server, or may be a server cluster or distributed system formed by a plurality of physical servers, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform, but is not limited thereto. The terminal device 203 is used for displaying and reminding the image change detection result.

In practical applications, the image collecting device 201 may send the collected images to the image change detecting device 202, and then the image change detecting device 202 may construct a sample image group based on the images, and input the constructed sample image group into a change detecting model to be trained for training, and during training, perform single-phase change detection on a first image of each sample image group by using the change detecting model, and perform intra-group biphase change detection on the first image and a second image in each sample image group, so that the change detecting model can be parameter-adjusted based on a single-phase change detection result, a biphase change detection result, and a sample label to obtain a target change detecting model. In addition, the image change detection device 202 may also perform image change detection on an image subsequently acquired by the image acquisition device 201 by using the target change detection model, and send a detection result to the terminal device 203, and the terminal device 203 may display the image change detection result, or send a prompt message to the terminal device 203 when the image change detection result is abnormal.

In a possible implementation manner, the method of the embodiment of the present application may be applied to a product defect positioning scenario. Specifically, the image capturing device 201 may be disposed on a product production line, for example, may be disposed after each process is finished, and is configured to capture a product image obtained by the process. Further, the image change detection device 202 may obtain a target change detection model based on the photographed product image by using the above-mentioned process training, perform change detection on the product image acquired by the image acquisition device 201 without defects by using the target change detection model to determine whether the corresponding product has defects, locate the defective area, and transmit the corresponding detection result to the terminal device 203, thereby replacing the manual visual detection mode, saving labor, and improving the detection efficiency and accuracy.

In a possible implementation manner, the method of the embodiment of the present application may also be applied to a street view change detection scene. Specifically, the image capturing device 201 may be disposed on each street to be detected, and is configured to capture street view images. Furthermore, the image change detection device 202 may obtain a target change detection model based on the captured street view image by using the above-mentioned process training, perform change detection of a relatively set street view image (such as a street view image at the same position N days ago) on the street view image collected by the image collection device 201 by using the target change detection model to determine a change of the street, and transmit a corresponding detection result to the terminal device 203 to assist a relevant unit in street management and rectification, thereby avoiding manual patrol, saving manpower, and improving detection efficiency and accuracy.

In the embodiment of the present application, the above-mentioned devices may be directly or indirectly connected through one or more networks 204. The network 204 may be a wired network or a Wireless network, for example, the Wireless network may be a mobile cellular network, or may be a Wireless-Fidelity (WIFI) network, or may also be other possible networks, which is not limited in this embodiment of the present invention.

It should be noted that fig. 2 is only an example, and the number of each device is not limited in practice, and is not specifically limited in the embodiment of the present application. In addition, in an actual scenario, the model training phase and the model application phase may be executed by different devices, for example, the model training phase may be executed by a server with sufficient computing power, and after the model training is completed, the relevant content of the target change detection model obtained by training may be stored on the device that performs change detection, so that the efficiency of the model training phase may be improved, and the computing resources may not be wasted in the application phase.

The method provided by the exemplary embodiment of the present application is described below with reference to the accompanying drawings in conjunction with the application scenarios described above, it should be noted that the above application scenarios are only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited in any way in this respect.

Referring to fig. 3, a schematic structural diagram of a change detection model provided in the embodiment of the present application is shown. The model comprises a feature extraction network, a single-phase change detection network, an intra-group change detection network and a joint optimization network. Next, a change detection model training method provided in the embodiment of the present application will be described with reference to the model structure shown in fig. 3, and referring to fig. 4, a flow diagram of the change detection model training method provided in the embodiment of the present application is shown, and a specific implementation flow of the method is as follows:

step 401: obtaining a plurality of sets of sample images, each set of sample images comprising a first image, a second image, and a sample label for indicating: the first image and the second image are actual image areas that are changed from the standard image.

In the embodiment of the application, different image samples can be collected aiming at a specific task target so as to be more suitable for the scene of the task target. For example, for a product defect positioning scene, a product image can be acquired for a specific product, and the model is trained by using the product image, so that the trained model can have higher detection accuracy for the product.

The essence of the change detection task is to detect semantic changes existing between two content-registered pictures, where content registration refers to consistency of main content in two images, and specifically, when product defect location is performed, content registration may refer to that both the two image contents are identical or similar product contents, for example, both product images of the same product, or, when street view change detection is performed, content registration may refer to that both the two image contents are identical or similar street view images, for example, both street view images of the same intersection.

In one possible implementation, an Image registration (Image registration) technique may be employed to construct the set of sample images. The image registration is a process of matching different images obtained under different imaging conditions in the same region through matching specific point pairs, so that when two images with similar contents but different shooting conditions (such as different angles) are obtained, an image registration technology can be adopted for processing, and then the two images are used as a sample image group.

In a possible implementation manner, considering that more resources are consumed by using image registration, on a product production line, for example, an image sequence of the same product may be acquired, and therefore, in order to reduce workload and resource consumption of image registration, in the embodiment of the present application, a sample image group is constructed by using image sequences taken at the same point and at different times.

Illustratively, the sequence of images taken at the same point is represented as (x) ⁰ ,x ¹ ,……,x ^n-1 ) Then a sample image set can be constructed

The first image is recorded as

It may be (x) ⁰ ,x ¹ ,……,x ^n-1 ) The second image is recorded as

Which is prepared from

A permutation and combination of (1), i.e. for dividing in a sequence of images

Or may be any other image than the image, or may be

And the image is obtained by superposition and combination with any other image.

Referring to fig. 5, a schematic diagram for constructing a sample image group is provided in an embodiment of the present application. The description is made specifically by way of example of a shift combination configuration,

for each defective product image in the image sequence taken, for example the first product image, which has a notch at the corner and a crack defect, the second product image has a notch on the left,

then is

The next image in the sequence of images, the first product image corresponding to

Essentially the second product image. Of course, other possible combination configurations, such as random selection, may be adopted in the actual operation process.

The training method provided by the embodiment of the application adopts a supervised training mode, and each image needs to be labeled. Specifically, in order to reduce the workload of manual labeling, the embodiment of the present application performs labeling by comparing the standard image with the image to mark the difference between the standard image and the image. The standard image is an image which can be used as a reference image in a change detection scene, for example, in a defect positioning scene, the standard image can be an image without defects with the product, and then exclusive or (XOR) operation is performed on each image and the standard image, that is, XOR operation is performed on pixel values at corresponding positions in the two images, when the two images are the same, the XOR result is 1, and when the two images are different, the XOR result is 0, so that the difference between the standard image and the standard image can be rapidly marked, and the efficiency of a training stage is improved.

In the embodiment of the present application, each sample image group may adopt the following two composition modes:

(1) the first composition mode is to adopt

And

the actual image areas each changed from the standard image are taken as sample labels. For example, when a sample image group is as shown in FIG. 5

When the first product image and the second product image in the column are in the same image, the sample label can be formed by combining the change diagrams corresponding to the first product image and the second product image, that is, the sample label shown in fig. 5

The first variation graph and the second variation graph in the column are combined to form a sample label.

(2) The second composition is to adopt an annotation mode corresponding to the structure of the change detection model provided in the embodiment of the present application, and the change detection model includes two branches, and the outputs of the two branches are the first image area where the second image changes relative to the first image in the same group and the second image area where the first image changes relative to the standard image, so that the corresponding annotation mode can be adopted, that is, the sample label of one sample image group can include an actual change map of the annotated first image relative to the standard image, which indicates the actual image area where the first image changes relative to the standard image, and an actual change map between the first image and the second image, which indicates the actual image area where the first image changes relative to the standard image. The actual variation graph between the first image and the second image can also be obtained by performing an exclusive-or operation on the first image and the second image, or can also be obtained by performing an exclusive-or operation on the variation graph of the first image and the second image relative to a standard image.

Referring to FIG. 5, which is specifically illustrated in the second way, each row may represent a sample image group including the first image

Second image

Variation diagram of first image relative to standard image

And the result of the XOR operation of the variation maps of the first image and the second image with respect to the standard image

Wherein the content of the first and second substances,

representing a second image

Variation plot against standard image.

In the above-described images, the first image or the second image may be a standard image, for example, a normal defect-free product image.

In the embodiment of the application, after the obtained multiple sample image groups are constructed, the multiple sample image groups can be used for carrying out multiple iterative training on the change detection model to be trained so as to obtain the target change detection model. Since each iteration is similar, the embodiments of the present application are mainly described by taking an iteration as an example. With continued reference to fig. 4, one iteration includes the following steps:

step 402: and performing intra-group image change detection on each input sample image group to obtain a first image area in which a second image in each sample image group changes relative to a first image.

As shown in fig. 3, the change detection model provided in the embodiment of the present application includes two branches, one branch is implemented as a single-phase change detection network branch, and is used for predicting a change of the first image relative to the standard image, and since the input of the change detection model is only the first image and belongs to the change map segmentation of a single image, the change detection model may also be referred to as a semantic segmentation network branch; the other branch is an intra-group change detection network branch which is used for predicting the change between two images in the same sample image group and belongs to biphase change detection.

In each iteration process, the input of the current iteration may be selected from a plurality of constructed sample image groups, for example, a random selection manner may be adopted, or batch division may be performed in advance for a plurality of sample image groups, and one batch of sample image groups is input each time. Since the model processing procedure is similar for each sample image group, the following description focuses on the processing procedure of one sample image group.

Referring to FIG. 3, when a sample image group is inputted

Then, the intra-group change detection network performs intra-group image change detection on the image, and obtains a first image area in which a second image in the group changes relative to the first image, which is expressed as

In one possible embodiment, the first image region may be represented in the form of a variation diagram, for example, see the variation diagram as shown in fig. 5.

Step 403: and respectively carrying out single-phase image change detection on each input first image to obtain a second image area of each first image changed relative to the standard image.

Referring to FIG. 3, when a sample image group is inputted

Then, the single-phase change detection network performs change detection on the first image in the group, which can also be understood as performing semantic segmentation on the first image, segmenting an area in the first image, which may be different from the standard image, and obtaining a second image area of the first image, which changes relative to the standard image and is expressed as

In a possible embodiment, the second image region can also be represented in the form of a variation diagram, for example, see the variation diagram shown in fig. 5.

Step 404: and determining model loss based on the sample label corresponding to each sample image group, the first image area and the second image area.

In the embodiment of the present application, the model loss may include two parts, one part is a prediction loss of a single-phase change detection network branch included in the change detection model, which is referred to as a single-phase change detection loss, and since the single-phase change detection network is mainly used to perform image change detection in the application stage in the embodiment of the present application, the single-phase change detection network branch may be considered as a main branch of the model; the other part is the predicted loss of the intra-group change detection network branch included by the change detection model, which is called intra-group change detection loss, and the intra-group change detection network branch is used as an auxiliary branch relative to the single-phase change detection network branch. Further, the model loss L can be expressed as:

L＝L _seg +λL _change

wherein L is _seg For single-phase change detection of losses, L _change In order to detect the loss for the intra-group variation, λ is a hyper-parameter that balances the two losses, and the value thereof may be set according to the importance of the two losses, or according to the magnitude of the output of the two branches, or may be set according to an empirical value.

Wherein intra-group variation detection loss may be determined based on each first image region and the corresponding sample label.

When the sample label includes the first image in the group

And a second image

Actual change map of each relative to standard image

And

then can be

And

after the exclusive OR operation is carried out, an actual change graph between the first image and the second image is obtained

And then with the first image region predicted by the intra-group change detection network

A comparison is made to determine the intra-group change detection loss.

When the sample label includes the first image in the group

Actual variation diagram relative to standard image

And the actual variation graph between the first image and the second image

Then, the intra-group variation is detected as the first image region predicted by the network

And the actual variation diagram

A comparison is made to determine the intra-group change detection loss.

In one possible embodiment, the intra-group variation detection loss L _change Cross entropy loss (cross entropy loss) function can be adopted to carry out pixel-by-pixel cross entropy loss meterThe calculation is expressed as follows:

wherein the content of the first and second substances,

detecting network branch outputs for intra-group changes

Is composed of

H and W denote the height (height) and width (width) of the first image or the second image, respectively.

In this embodiment, the single-phase change detection loss may be determined based on each second image region and the corresponding sample label.

In one possible implementation, the single-phase change detection loss may be calculated using a cross-entropy loss function, a DICE loss function, and possibly other loss functions. Taking the combination of the cross entropy loss function and the DICE loss function as an example, the single-phase change detection loss can be expressed as follows:

L _seg ＝L _dice +L _lce

wherein L is _dice For DICE loss, L _lce Is the cross entropy loss.

Wherein the content of the first and second substances,

detecting the output of a network branch for single-phase changes, y _i The first image in the sample label corresponds to the label.

Step 405: it is determined whether the change detection model satisfies a convergence condition.

In the embodiment of the present application, the model loss is obtained based on the process described above, and it may be determined whether the change detection model satisfies the convergence condition based on the model loss.

Specifically, the convergence condition may include any one of the following conditions:

(1) the model loss is not greater than a preset loss value threshold.

(2) The iteration times reach a preset time threshold value.

Step 406: if the result of step 305 is negative, then the change detection model is parametrically adjusted based on the model loss.

It should be noted that the above-mentioned change detection model refers to a change detection model participating in the current iteration, and when the change detection model does not satisfy the convergence condition, the parameter of the change detection model is adjusted according to the model loss, and the adjusted change detection model is used to enter the next iteration process, that is, the process jumps to step 402.

In one possible embodiment, an optimization algorithm such as a gradient descent (gradient) method may be used for parameter adjustment.

If the change detection model does not meet the convergence condition, the training process is ended, and the model parameters of the current change detection model are stored to obtain the target change detection model.

In the embodiment of the present application, referring to fig. 3, the change detection model further includes a feature extraction network, which is used to perform feature extraction on the first image and the second image, and the extracted image features may be used in a subsequent single-phase change detection network and an intra-group change detection network.

In a possible implementation manner, the feature extraction network may adopt a twin network structure, and referring to fig. 3, the feature extraction network may include two identical backbones, where one Backbone is used to perform feature extraction on the first image of each input sample image group, so as to obtain first feature sets of each first image, and each first feature set includes: the corresponding first images respectively correspond to first image features of a plurality of preset image scales; another backhaul is used for performing feature extraction on the second image of each input sample image group to obtain a second feature set of each second image, where each second feature set includes: the respective second images respectively correspond to second image features of the plurality of image scales. Then, for the single-phase change detection network branch, it may be composed of a backhaul and a SegHead, and for the intra-group change detection network branch, it may be composed of a backhaul, a ChangeFusion, and a CPHead.

The backhaul may adopt a Convolutional Neural Network (CNN) network, a computer vision Group (VGG), an inclusion network, or a residual error (rescet) network, which is not limited in the embodiment of the present application.

With a given set of sample images

For example, each image has dimension R ^3×H×W Where 3 denotes the number of image channels, H and W characterize the size of the image, and the CNN network extraction using the twin structure yields the following initial image features:

a first image:

a second image:

wherein, the first and the second end of the pipe are connected with each other,

corresponding to a plurality of image scales, specifically taking 5 image scales as an example, an initial image with 5 image scales can be extractedFeatures such as 5 image scales 1, 1/4, 1/8, 1/16 and 1/32, 1/32 indicate that the resulting initial image feature is the 1/32 size of the original image,

the same is true. The first image feature and the second image feature are used for distinguishing the initial image features of the first image and the second image, but do not have any distinguishing limitation of feature levels.

In the practical application process, considering that the extracted features of certain image scales are weak in information amount and large in calculation amount, or the change detection task mentioned in the embodiment of the present application is weak in effect, the features of the image scales can be screened out, and the subsequent processes can be performed by using the remaining features. For example, due to low-level features

And

the semantic information of (2) is weak and the calculation amount is large, and the rest 4 scale features are used

And

for subsequent algorithmic processing.

Referring to fig. 3, an intra-group image change detection network provided in the embodiment of the present application includes a change fusion module (change fusion) and a change perception head (change perception head), and since processing procedures for each sample image group are similar during processing, a sample image group is specifically used here as one sample image group

For example, the first feature set of the sample image group obtained by the above extraction is combined

And a second feature set

And carrying out intra-group image change detection on the sample image group to obtain a corresponding first image area.

(one) change fusion module

In the embodiment of the application, the change fusion module is an important module of the change detection model, and is used for exploring global context information and providing interaction between two spatiotemporal features. The input of the module is a first feature set extracted by a feature extraction network

And a second feature set

Feature pairs at a single image scale

The method is used for performing feature fusion on the image features corresponding to the same image scale in the first feature set and the second feature set so as to obtain fused image features corresponding to a plurality of image scales respectively

Fig. 6 is a schematic structural diagram of a variable fusion module provided in the embodiment of the present application. The module comprises three parts: self-interaction (self) submodule, cross-interaction (cross) submodule and fusion (tff submodule).

(1) Self-interaction submodule

The self-interaction module acts on a single image feature of the same image and is used for sensing the incidence relation between pixels of the image feature, namely, the self-interaction is carried out on the image feature of a certain image scale output by the feature extraction network to obtain the corresponding space-time feature.

For each first image feature in the first feature setIn other words, the self-interaction module obtains the third image features corresponding to the first image features by performing the update processing based on the self-attention mechanism. For example, input

In (1)

The self-interacting module can obtain the corresponding third image characteristic

Input device

In (1)

And so on.

For each second image feature in the second feature set, the self-interaction module performs updating processing based on a self-attention mechanism to obtain a fourth image feature corresponding to each second image feature. For example, input

In (1)

Input device

In (1)

Then the self-interacting module can obtain the corresponding third image feature

And so on.

Since the processing procedure for each first image feature or second image feature is similar, the first image feature is used here

(i takes a value of 2-5) as an example, the processing flow of the self-interacting module is introduced. The self-interaction submodule can be realized by adopting a self-attention network, and the first image characteristic can be subjected to self-attention network

And processing to obtain first attention weights of elements in the first image features, wherein each first attention weight is used for representing the degree of association between the corresponding element and other elements, and further weighting the first image features through a self-attention network based on the obtained first attention weights to obtain third image features.

Referring to fig. 7, a schematic processing flow diagram of a self-interacting module provided in the embodiment of the present application is shown.

S71: and (6) initializing assignment.

In the embodiment of the application, the first image characteristic is used

The weight vector matrix of the self-attention network is respectively assigned, and one possible way is to assign the weight vector matrix to the first image characteristic

Separately initializing query (query) matrices

Key matrix

Sum value (value) matrix

Namely, the initialization is performed in the following manner:

in a possible implementation manner, in order to improve the expression capability of the features, the input first image features can be further subjected to

And carrying out position coding.

In particular, can be

Plus a learning parameter p (W) of fixed dimension _p ) Rho is a position encoder, W _p For corresponding parameters, thereby being based on position-coded

To initialize the above respective matrices, i.e.:

wherein, W _p The training device can be used for training following a training process, and can be used for representing the scale of the current image or representing the position or the size of each element of the current image characteristic in the original image. Learning parameter corresponding to each image featureρ(W _p ) Corresponding to the image dimensions thereof, in accordance with the size of the image features, e.g. for

When the corresponding image scale is 1/4 of the original image, the corresponding learning parameter ρ (W) is obtained _p ) Is also 1/4 of the original image.

In one embodiment, the method can be implemented by

And W _p Position-coding by superimposing (e.g. adding or multiplying) corresponding pixels, or by superimposing

Each element in (1) and W _p The above values are obtained by weighted summation, and of course, possible ways thereof may also be adopted, which is not limited in this application.

S72: and (4) convolution processing.

In the embodiment of the application, the initialized query matrix

Key matrix

Sum matrix

Processing the convolution parameters by 1x1 to obtain a query matrix after convolution

Key matrix

Sum matrix

Wherein the content of the first and second substances,

and

the convolution parameter for the ith image scale.

S73: an attention map is calculated.

In the embodiment of the application, the attention is tried to be based on the query matrix

Key matrix

Similarity between them, and querying the matrix

Key matrix

All by assigning values to the first image feature, and thus the attention map here is essentially a representation of the intrinsic relevance of the first image feature itself.

In one possible implementation, affinity may be measured in an affinity manner, so that the calculation of the attention map may be expressed as follows:

wherein d represents a query matrix

Key matrix

The multiplied dimension size, the softmax function, acts on the channel dimension.

From a matrix of values

Is equal in number, is represented by a value matrix

The first attention weight of each element of (a).

S74: and (6) weighting.

Based on a matrix of values

And attention-seeking

Performing weighting processing to obtain final third image feature

In one possible implementation, the weighted fusion process may be performed by using an Aggregation method, which is expressed as follows:

similarly, the fourth image feature corresponding to the second image can be obtained through the method

(2) Cross-interaction submodule

Self-interacting submodule for boosting two spatiotemporal features

And

information interaction between the first image and the second image, such as the fourth image feature of the second image being merged into the third image feature corresponding to the first image, or the third image feature corresponding to the first image being merged into the fourth image feature of the second image, which is input as a feature pair of the same image scale in the sample image group

Namely, it is

And

for the first image, the cross-interaction sub-module can perform fusion processing based on an attention mechanism on the fourth image feature based on the third image feature to obtain a fifth image feature. For example, input

A fifth image feature at the image scale may be obtained by the cross-interaction sub-module

Input device

And so on.

And for the second image, performing attention-based fusion processing on the third image feature by the cross-interaction sub-module based on the fourth image feature to obtain a sixth image feature. For example, input

A sixth image feature at the image scale may be obtained by the cross-interaction sub-module

Input device

And so on.

Since the corresponding cross-interaction processing for the first image or the second image is similar, the third image is characterized here

In the fourth image feature

For example, a process flow across interaction submodules is described. The cross-interaction sub-module can be realized by adopting an attention network, and further the third image feature and the fourth image feature can be processed by the attention network to obtain second attention weights of elements in the fourth image feature, each second attention weight is used for representing the degree of association between the corresponding element of the third image feature and the fourth image feature, and the cross-interaction sub-module is realized by obtaining each second attention weight and processing the third image feature and the fourth image feature by the attention networkAnd the attention network carries out weighting processing on the third image characteristic to obtain a fifth image characteristic.

Referring to fig. 8, a schematic processing flow diagram of a cross-interaction submodule provided in the embodiment of the present application is shown.

S81: and (6) initializing assignment.

In the embodiment of the present application, the initialization input is similar to the self-interacting module, but the difference is different. In the third image feature

In the fourth image feature

Then, the initialization is specifically performed in the following manner:

namely to

Initializing a query matrix

To be provided with

Initialization key matrix

Sum matrix

Similarly, similar to the self-interacting module, the third image features may be respectively matched to improve the expression capability of the features

And incorporating a fourth image feature

After the position coding is performed, the initialization of the matrix is performed, which is not described herein in detail.

S82: and (5) convolution processing.

In the embodiment of the application, the initialized query matrix

Key matrix

Sum matrix

Key matrix

Sum matrix

Namely:

wherein the content of the first and second substances,

and

the convolution parameter for the ith image scale.

S83: an attention map is calculated.

And key matrix

By similarity between them, and querying the matrix

Key matrix

The assignment of the values for the third image feature and the fourth image feature, respectively, is such that the attention map here is essentially a representation of the correlation between the third image feature and the fourth image feature.

wherein the content of the first and second substances,

from a matrix of values

Is equal in number, is represented by a value matrix

The second attention weight of each element of (a).

S84: and (5) weighting.

Based on a matrix of values

And attention was sought

Performing weighting processing to obtain the final fifth image feature

similarly, the sixth image characteristic corresponding to the second image can be obtained through the method

(2) Fusion submodule

Referring to fig. 6, a pair of cross-interaction features is obtained after passing through the self-interaction sub-module and the cross-interaction sub-module

And

the method has the advantages that the information of the original image is fused, the information of the other image in the same image group is fused, the feature expression is more accurate, and therefore, it is important to adopt an effective fusion strategy for fusion, the fusion strategy needs to meet the requirements of three points, the semantic change can be visually reflected firstly, the second symmetry is realized, and the third calculation complexity is as low as possible. After the comprehensive consideration, it can be considered to measure the semantic change by subtracting the absolute elements of the two cross-interaction features, i.e. based on the fifth image feature

And sixth image feature

The final fused image features are obtained by the difference between the corresponding elements in the image, thereby satisfying all the requirements.

In one possible embodiment, referring to FIG. 6, the fused image feature may be obtained using the following formula

Furthermore, through the above process, the fused image features corresponding to the respective image scales can be obtained for each sample image group

As in the above example, by aligning 4 scale features

And

multilayer fusion features can be obtained using the above process

(II) change sensing head

And the change perception head is used for carrying out intra-group image change detection on each sample image group based on each obtained fusion image characteristic to obtain an intra-group image change detection result. From the perspective of machine learning, the change detection task can be regarded as two types of semantic segmentation, so that a change perception head can be designed by using a semantic segmentation method, and in consideration of effectiveness and high efficiency, the embodiment of the application provides a simple change perception head with a feature pyramid structure.

Fig. 9 is a schematic view of a processing flow of the change sensing head. Wherein, the change perception head comprises a bottom-up processing procedure and a top-down processing procedure.

(1) Bottom-up process

In the processing process, from the largest image scale, namely the shallowest feature, the fused image features of all image scales are subjected to down-sampling processing in sequence, and the obtained down-sampled image features and the fused image features of the next-level image scale are subjected to feature merging processing until the smallest image scale, so that the merged image features are obtained.

Taking 5-layer feature as an example, then from the shallowest layer feature

Initially, downsampling using a convolution process with step 2 and a convolution kernel size of 3x3 to obtain a sum

Image feature of

Then is reacted with

Merging, continuing to perform down-sampling and merging operations after merging,

according to the operation, the combined image characteristics are finally obtained

(2) Top-down process

In the processing process, in order to obtain a high-resolution feature map, the combined image features are subjected to up-sampling processing for multiple times by adopting a bottom-up reverse operation until target image features with the same image scale as that of the original image are obtained.

Also taking 5-layer features as an example, from the deepest layer features

Initially, each upsampling process includes a 1 × 1 convolution process, a Linear activated (Relu) function process, a Batch Normalization (BN) process, and a 2-fold (2 ×) upsampling operation to pair

Performing upsampling updates

Repeating the operation, and finally obtaining the target image characteristics with 1/4 scales.

For the target image feature, 21 x1 convolution and 4x upsampling operations are used to get the final pixel level output, which is the same size as the original image and indicates which regions belong to the changed regions. For example, each element in the output result indicates the probability of a change in the pixel at the position in the first image and the second image, which regions belong to the first image region between the first image and the second image, i.e., the changed image region, can be determined.

In the embodiment of the application, the single-phase image change detection network can be implemented by adopting a network structure based on semantic segmentation, such as network structures of U-Net, PSPNet, deplab v3, and the like. Referring to fig. 10, a schematic diagram of a single-phase image change detection network provided in the embodiment of the present application, unlike a U-Net style network, uses a symmetric decoder to mirror the bottom-up path, but contrary to the output final segmentation, uses an asymmetric decoder to combine information from all levels of contrast features, where the top-down path has only one block at each stage, and uses shared channel dimensions, making it suitable for encoding sufficiently rich multi-scale semantic information to predict an accurate pixel level segmentation map based on the characteristics of the asymmetric decoder.

In the embodiment of the application, for one first image, the first image features of each image scale are subjected to scale conversion processing, and a plurality of seventh image features with the same image scale are obtained.

The scale conversion processing mode can be performed by adopting an upsampling mode, and the upsampling times are different according to different image scales, so that the upsampling processing is performed at least once on the first image features of the image scales respectively, and a plurality of seventh image features with the same image scale are obtained.

Referring to fig. 10, the number of upsampling processes for each image scale is sequentially decreased from the smallest image scale. Taking the image scales of the above 5 layers as an example, the processing procedure of each image scale is respectively as follows:

(1) when the image scale is 1/32, 3 upsampling processes are required to obtain a seventh image feature of size 1/4. Each upsampling process may include processes such as 3 × 3 convolution processes, group normalization (groupNorm), ReLU, and 2 × bilinear upsampling.

(2) When the image scale is 1/16, 2 upsampling processes are required to obtain a seventh image feature of size 1/4.

(3) When the image scale is 1/8, 1 upsampling process is required to obtain a seventh image feature with a size of 1/4.

(4) When the image scale is 1/4, the upsampling process need not include 2 × bilinear upsampling, but only convolution operation, normalization operation, and the like.

In the embodiment of the present application, after obtaining a plurality of seventh image features with the same image scale for a first image, feature fusion is performed on the obtained plurality of seventh image features to obtain an eighth image feature, and upsampling is performed on the obtained eighth image feature to obtain a ninth image feature with the same image scale as that of the first image, where upsampling is similar to the above, and the difference is that the sampling multiple is 4 ×. And then, based on the obtained ninth image characteristics, determining the probability that each pixel point in one first image belongs to the change point, and obtaining a second image area based on each obtained probability.

Specifically, the feature fusion may be performed in various manners, for example, in a stacking manner, a splicing manner, or a pooling manner.

Referring to fig. 10, starting at the deepest image scale, an upsampling process is performed to generate an 1/16 scale feature map, wherein the upsampling process includes 3 × 3 convolution, groupnom, ReLU, and 2 × bilinear upsampling. This strategy is repeated for the 1/16, 1/8, and 1/4 image scales (but with a step-wise reduction in the sampling phase), with the result being a feature map of 1/4 scale, which is then used to generate a defect segmentation map at the original image resolution by 1 × 1 convolution, 4 × bilinear upsampling, and softmax processing.

After the target change detection model is obtained through the above training, the image change detection method provided by the embodiment of the present application may be implemented by using the target change detection model, as shown in fig. 11, the image change detection method provided by the embodiment of the present application includes the following steps:

step 1101: obtaining a target image to be detected, and inputting the target image into a target change detection model obtained by training by the training method;

step 1102: and carrying out single-phase image change detection on the target image to be detected through a target change detection model to obtain a target image area of the target image changed relative to the standard image.

Compared with other methods, the method has the advantages that the sample diversity and the interaction among the samples are increased, so that the defect positioning precision is better, the robustness is higher for the inherent change of the machine, the method consumes 15ms of reasoning time on a V100 image processing unit (GPU), the calculated amount is small, the precision is high, and the landing application is facilitated.

The method of the embodiment of the application can be used as one of the choices of the industrial AI quality inspection algorithm.

Specifically, when the method is applied to an industrial AI quality inspection scene, the target image is a product image of a defect to be detected, the standard image is a non-defective image of a product identical to the target image, and then the target image area can be determined as a defective area after a target image area in which the target image changes relative to the standard image is obtained by calling a target change detection model.

Here, taking defect detection of the camera holder accessory of the mobile phone as an example, a corresponding detection process can be implemented through the flow shown in fig. 12, which is shown in fig. 12 as a schematic diagram of a defect detection flow of the camera holder accessory of the mobile phone.

Step 1201: selecting a sample image from the accessory images shot from the same point position on the camera bracket accessory production line.

Step 1202: and constructing a sample image group from the selected sample images in a shifting combination mode. The label of each sample image is obtained by comparing with a standard image, and the intra-group variation label of each sample image group is obtained by performing exclusive OR operation on the labels of the two sample images.

Step 1203: based on the constructed sample image group, a target change detection model for detecting the defects of the camera bracket accessory is trained by adopting the change detection model training method provided by the embodiment of the application.

Step 1204: the accessory images of the camera bracket accessories are collected from the same point on the camera bracket accessory production line.

Step 1205: and inputting the accessory image into a target change detection model, wherein the target change detection model detects a change area between the accessory image and the standard image by utilizing a single-phase image change detection network of the target change detection model.

Step 1206: and outputting a detection result by the target change detection model.

Step 1207: if the detection result indicates that a change area exists (or the change degree is larger than a certain threshold value), the camera support accessory is classified as an unqualified product, and the detection result is pushed to related personnel to give an alarm when the unqualified product is produced.

Referring to fig. 13, a comparison of a normal fitting and a defective fitting of a camera mount fitting is shown. The change detection model provided by the embodiment of the application enhances the single-phase change detection network branches through the double-phase change detection network branches, enhances the defect positioning effect, has high robustness aiming at various complex conditions, and has low calculated amount and higher detection efficiency in the application stage model processing process.

Referring to fig. 14, based on the same inventive concept, an embodiment of the present application further provides a change detection model training apparatus, including:

a sample acquiring unit 1401 for acquiring a plurality of sample image groups, each sample image group including a first image, a second image, and a sample tag for indicating: an actual image area in which the first image and the second image are changed from the standard image;

a change detection unit 1402, configured to perform iterative training on a change detection model to be trained by using multiple sample image groups, to obtain a corresponding target change detection model; the change detection unit 1402 includes:

an intra-group detection subunit 14021, configured to perform intra-group image change detection on each input sample image group, respectively, to obtain a first image area in which a second image in each sample image group changes with respect to a first image;

a single-phase detection subunit 14022, configured to perform single-phase image change detection on each input first image, and obtain a second image area in which each first image changes with respect to the standard image;

a joint optimization subunit 14023, configured to determine a model loss based on the sample label, the first image area, and the second image area corresponding to each sample image group, and perform parameter adjustment based on the model loss.

Optionally, the change detecting unit 1402 further includes a feature extracting subunit 14024;

a feature extraction subunit, configured to extract first feature sets of the first images, respectively, where each first feature set includes: the corresponding first images respectively correspond to first image features of a plurality of preset image scales; and respectively extracting second feature sets of the second images, wherein each second feature set comprises: the corresponding second images respectively correspond to second image features of a plurality of image scales;

the intra-group detecting subunit 14021 is specifically configured to perform intra-group image change detection on each sample image group based on each obtained first feature set and each obtained second feature set, and obtain a first image area corresponding to each sample image group.

Optionally, the intra-group detection subunit 14021 is specifically configured to:

respectively performing feature fusion on image features corresponding to the same image scale in a corresponding first feature set and a corresponding second feature set aiming at a sample image group to obtain fusion image features corresponding to a plurality of image scales;

and carrying out intra-group image change detection on one sample image group based on the obtained fusion image characteristics to obtain a first image area.

for a plurality of image scales, respectively performing the following operations:

and performing feature fusion based on the fifth image feature and the sixth image feature to obtain a fusion image feature of an image scale.

processing the first image features through a self-attention network of the change detection model to obtain first attention weights of elements in the first image features, wherein each first attention weight is used for representing the degree of association between a corresponding element and other elements;

and obtaining third image characteristics by performing weighting processing on the first image characteristics through the self-attention network based on the obtained first attention weights.

processing the third image characteristic and the fourth image characteristic through an attention network of the change detection model to obtain second attention weights of elements in the fourth image characteristic, wherein each second attention weight is used for representing the degree of association between the corresponding element of the third image characteristic and the fourth image characteristic;

Optionally, the intra-group detecting subunit 14021 is specifically configured to:

and obtaining the fusion image feature based on the difference value between the corresponding elements in the fifth image feature and the sixth image feature.

from the largest image scale, sequentially carrying out down-sampling processing on the fusion image features of all image scales, and carrying out feature merging processing on the obtained down-sampled image features and the fusion image features of the next-level image scale until the smallest image scale is reached to obtain merged image features;

carrying out up-sampling processing on the combined image features for multiple times until target image features with the same image scale as the original image are obtained;

based on the target image feature, a first image region is determined.

Optionally, the single-phase detection subunit 14022 is specifically configured to:

for each first image, the following steps are respectively executed:

performing upsampling processing based on the eighth image feature to obtain a ninth image feature with the same image scale as that of one first image;

and determining the probability that each pixel point in the first image belongs to the change point based on the ninth image characteristic, and obtaining a second image area based on each obtained probability.

starting from the minimum image scale, the upsampling processing times of all the image scales are decreased sequentially.

Optionally, the joint optimization subunit 14023 is specifically configured to:

determining the intra-group change detection loss single-phase change detection loss corresponding to the change detection model based on each first image area and the corresponding sample label;

model losses are determined based on intra-group change detection losses, single-phase change detection losses, and respective weights.

Referring to fig. 15, based on the same inventive concept, an embodiment of the present application further provides an image change detection apparatus 150, including:

an image input unit 1501, configured to obtain a target image to be detected, and input the target image into a target change detection model obtained based on any one of the above-mentioned methods;

the image change detection unit 1502 is configured to perform single-phase image change detection on a target image to be detected through the target change detection model, so as to obtain a target image area where the target image changes relative to the standard image.

Optionally, the target image is a product image of the defect to be detected, and the standard image is a non-defective image of the same product as the target image;

the image change detection unit 1502 is also used to determine the target image area as a defective area.

By the aid of the device, during model training, by constructing the sample image group of the first image and the second image which are registered in content, on one data processing branch, single-phase change detection is carried out on the first image, on the other data processing branch, double-phase change detection in the group is carried out on the first image and the second image, and accordingly parameter adjustment can be carried out on the change detection model based on processing results of the two data processing branches and sample labels. Therefore, the two images subjected to content registration are subjected to change detection to be combined with the adjustment of the integral model, so that the biphase change detection between the two images is utilized to enhance the single-phase change detection, and the accuracy of the single-phase change detection branch is improved. Meanwhile, when the image change detection method is applied to change detection, the area of the target image changed relative to the standard image can be obtained only by the single data processing branch of single-phase change detection aiming at the target image, so that the data processing amount is greatly reduced, and the detection efficiency of image change detection in an actual detection scene is improved.

For convenience of description, the above parts are separately described as unit modules (or modules) according to functional division. Of course, the functionality of the various elements (or modules) may be implemented in the same one or more pieces of software or hardware in practicing the present application.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

The apparatus may be configured to execute the method shown in each embodiment of the present application, and therefore, for functions and the like that can be realized by each functional module of the apparatus, reference may be made to the description of the foregoing embodiment, which is not repeated herein.

Referring to fig. 16, based on the same technical concept, an embodiment of the present application further provides a computer device. In one embodiment, the computer device may be the server mentioned in the corresponding embodiment of fig. 1, and as shown in fig. 16, the computer device includes a memory 1601, a communication module 1603 and one or more processors 1602.

A memory 1601 for storing computer programs executed by the processor 1602. The memory 1601 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, a program required for running an instant messaging function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

The memory 1601 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 1601 may also be a non-volatile memory (non-volatile memory), such as a read-only memory (rom), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD); or memory 1601 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1601 may be a combination of the above.

The processor 1602, may include one or more Central Processing Units (CPUs), a digital processing unit, and the like. A processor 1602, configured to implement the above-described change detection model training method or image change detection method when calling the computer program stored in the memory 1601.

The communication module 1603 is used for communicating with terminal equipment and other servers.

The embodiment of the present application does not limit the specific connection medium among the memory 1601, the communication module 1603 and the processor 1602. In fig. 16, the memory 1601 and the processor 1602 are connected by a bus 1604, the bus 1604 is depicted by a thick line in fig. 16, and the connection manner between other components is merely illustrative and not limited. The bus 1604 may be divided into an address bus, a data bus, a control bus, and so on. For ease of description, only one thick line is depicted in fig. 16, but only one bus or one type of bus is not depicted.

The memory 1601 stores therein a computer storage medium having stored therein computer-executable instructions for implementing the change detection model training method or the image change detection method of the embodiments of the present application. The processor 1602 is configured to execute the change detection model training method or the image change detection method of the foregoing embodiments.

In another embodiment, the computer device may also be another computer device, such as the terminal device mentioned in the corresponding embodiment of fig. 1. In this embodiment, the structure of the computer device may be as shown in fig. 17, including: communication component 1710, memory 1720, display unit 1730, camera 1740, sensors 1750, audio circuitry 1760, bluetooth module 1770, processor 1780, and the like.

A communication component 1710 is configured to communicate with a server. In some embodiments, a Wireless Fidelity (WiFi) module may be included, the WiFi module being of a short-range Wireless transmission technology, through which the computer device may facilitate transceiving information.

Memory 1720 may be used to store software programs and data. The processor 1780 performs various functions of the terminal device and data processing by executing software programs or data stored in the memory 1720. The memory 1720 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The memory 1720 stores an operating system that enables the terminal device to operate. The memory 1720 may store an operating system and various application programs, and may also store codes for executing the change detection model training method or the image change detection method according to the embodiment of the present application.

The display unit 1730 may also be used to display Graphical User Interfaces (GUIs) of information input by or provided to a user and various menus of the terminal device. In particular, the display unit 1730 may include a display screen 1732 disposed on the front surface of the terminal device. The display screen 1732 may be configured in the form of a liquid crystal display, a light emitting diode, or the like. The display unit 1730 may be used to display various pages in the embodiment of the present application, such as an image capturing interface, a feature diagram display interface, a detection result display interface, and the like.

The display unit 1730 may also be used to receive input numeric or character information, generate signal inputs related to user settings and function control of the terminal device, and particularly, the display unit 1730 may include a touch screen 1731 disposed on a front surface of the terminal device, and may collect touch operations of a user thereon or nearby, such as clicking a button, dragging a scroll box, and the like.

The touch screen 1731 may cover the display screen 1732, or the touch screen 1731 and the display screen 1732 may be integrated to implement the input and output functions of the terminal device, and after integration, the touch screen 1731 may be referred to as a touch display screen for short. The display unit 1730 in this application may display the application programs and the corresponding operation steps.

The camera 1740 may be used to capture still images, and the user may post comments on the images captured by the camera 1740 through the application. The number of the cameras 1740 may be one or plural. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing elements convert the light signals into electrical signals which are then passed to a processor 1780 for conversion into digital image signals.

The terminal device may further comprise at least one sensor 1750, such as an acceleration sensor 1751, a distance sensor 1752, a fingerprint sensor 1753, a temperature sensor 1754. The terminal device may also be configured with other sensors such as a gyroscope, barometer, hygrometer, thermometer, infrared sensor, light sensor, motion sensor, and the like.

The audio circuitry 1760, speaker 1761, microphone 1762 may provide an audio interface between the user and the terminal device. The audio circuit 1760 may transmit the electrical signal converted from the received audio data to the speaker 1761, and convert the electrical signal into an audio signal for output by the speaker 1761. The terminal device may be further provided with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 1762 converts the collected sound signals into electrical signals, which are received by the audio circuitry 1760 and converted into audio data, which are output to the communication assembly 1710 for transmission to, for example, another terminal device, or to the memory 1720 for further processing.

The bluetooth module 1770 is used for information interaction with other bluetooth devices having bluetooth modules via bluetooth protocol. For example, the terminal device may establish a bluetooth connection with a wearable computer device (e.g., a smart watch) also equipped with a bluetooth module via the bluetooth module 1770, thereby performing data interaction.

The processor 1780 is a control center of the terminal device, connects various parts of the entire terminal device using various interfaces and lines, and performs various functions of the terminal device and processes data by operating or executing software programs stored in the memory 1720 and calling data stored in the memory 1720. In some embodiments, processor 1780 may include one or more processing units; the processor 1780 may also integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a baseband processor, which primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into the processor 1780. In the present application, the processor 1780 may run an operating system, an application program, a user interface display, a touch response, and the change detection model training method or the image change detection method of the embodiments of the present application. Further, the processor 1780 is coupled to a display unit 1730.

In some possible embodiments, various aspects of the change detection model training method or the image change detection method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the change detection model training method or the image change detection method according to various exemplary embodiments of the present application described above in this specification when the program product is run on a computer device, for example, the computer device may perform the steps of the embodiments.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not so limited, and in the context of this application, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user equipment, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of training a change detection model, the method comprising:

2. The method according to claim 1, wherein before performing intra-group image change detection on each input sample image group to obtain a first image region where a second image in each sample image group changes with respect to a first image, the method further comprises:

respectively extracting first feature sets of the first images, wherein each first feature set comprises: the corresponding first images respectively correspond to first image features of a plurality of preset image scales;

respectively extracting second feature sets of the second images, wherein each second feature set comprises: the corresponding second images respectively correspond to the second image features of the plurality of image scales;

the method for detecting image change in each input sample image group to obtain a first image area in which a second image in each sample image group changes relative to a first image includes:

and respectively carrying out intra-group image change detection on each sample image group based on each obtained first feature set and each obtained second feature set, and obtaining the first image area corresponding to each sample image group.

3. The method according to claim 2, wherein performing intra-group image change detection on each sample image group based on each obtained first feature set and second feature set to obtain the first image region corresponding to each sample image group respectively comprises:

4. The method according to claim 3, wherein performing feature fusion on image features corresponding to a same image scale in the corresponding first feature set and second feature set to obtain fused image features corresponding to the plurality of image scales respectively comprises:

5. The method of claim 4, wherein performing an attention-based update process on the corresponding first image features in the first feature set to obtain third image features comprises:

6. The method according to claim 4, wherein performing attention-based fusion processing on the fourth image feature based on the third image feature to obtain a fifth image feature comprises:

7. The method according to claim 4, wherein performing feature fusion based on the fifth image feature and the sixth image feature to obtain a fused image feature at the one image scale comprises:

8. The method according to claim 3, wherein performing intra-group image change detection on the one sample image group based on each obtained fusion image feature to obtain the first image region comprises:

from the largest image scale, sequentially carrying out down-sampling processing on the fusion image features of all the image scales, and carrying out feature merging processing on the obtained down-sampled image features and the fusion image features of the next-level image scale until the smallest image scale is reached to obtain merged image features;

determining the first image region based on the target image feature.

9. The method according to any one of claims 2 to 8, wherein the step of performing single-phase image change detection on each input first image to obtain a second image area in which each first image changes with respect to the standard image comprises:

for each first image, the following steps are respectively executed:

performing upsampling processing based on the eighth image feature to obtain a ninth image feature with the same image scale as that of the first image;

10. The method according to claim 9, wherein for a first image, performing scale conversion processing on the first image features of each image scale to obtain a plurality of seventh image features with the same image scale, comprises:

respectively carrying out at least one-time up-sampling processing on the first image features of all image scales to obtain a plurality of seventh image features with the same image scale;

starting from the minimum image scale, the upsampling processing times of all the image scales are decreased in sequence.

11. The method according to any one of claims 1 to 8, wherein determining the model loss based on the sample label, the first image region and the second image region corresponding to each sample image group comprises:

12. An image change detection method, characterized in that the method comprises:

obtaining a target image to be detected, and inputting the target image into a target change detection model obtained by training based on the method of any one of claims 1-11;

and carrying out single-phase image change detection on the target image to be detected through the target change detection model to obtain a target image area of the target image changed relative to the standard image.

13. A change detection model training apparatus, characterized in that the apparatus comprises:

the change detection unit is used for carrying out iterative training on a change detection model to be trained by adopting the plurality of sample image groups to obtain a corresponding target change detection model; wherein the change detecting unit includes:

14. An image change detection apparatus, characterized in that the apparatus comprises:

an image input unit, configured to obtain a target image to be detected, and input the target image into a target change detection model trained based on the method according to any one of claims 1 to 11;

15. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor,

the processor, when executing the computer program, realizes the steps of the method of any one of claims 1 to 11 or 12.

16. A computer storage medium having computer program instructions stored thereon, wherein,

the computer program instructions, when executed by a processor, implement the steps of the method of any one of claims 1 to 11 or 12.