WO2022156151A1 - 图像视角转换/故障判断方法、装置、设备及介质 - Google Patents

图像视角转换/故障判断方法、装置、设备及介质 Download PDF

Info

Publication number
WO2022156151A1
WO2022156151A1 PCT/CN2021/103576 CN2021103576W WO2022156151A1 WO 2022156151 A1 WO2022156151 A1 WO 2022156151A1 CN 2021103576 W CN2021103576 W CN 2021103576W WO 2022156151 A1 WO2022156151 A1 WO 2022156151A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
perspective
network model
target object
model
Prior art date
Application number
PCT/CN2021/103576
Other languages
English (en)
French (fr)
Inventor
赵蕊蕊
Original Assignee
长鑫存储技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 长鑫存储技术有限公司 filed Critical 长鑫存储技术有限公司
Priority to US17/447,426 priority Critical patent/US11956407B2/en
Publication of WO2022156151A1 publication Critical patent/WO2022156151A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection

Definitions

  • the present application relates to an image viewing angle conversion/fault judgment method, device, equipment and medium.
  • Image synthesis methods mainly include image pixel-based and feature-based expression.
  • the method based on image pixels needs to count the foreground and background information around the unknown pixel, and then calculate the value of the unknown pixel according to the statistical information. When the foreground and background are similar in color, texture and style, this method is more suitable; otherwise, the image synthesis effect is poor.
  • the main idea of the method based on feature expression is to generate an image according to the feature vector, and the most representative algorithm is the Principal Component Analysis (PCA) algorithm.
  • PCA Principal Component Analysis
  • a first aspect of the present application provides an image perspective conversion method, including:
  • model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to the perspectives, wherein the labels corresponding to different perspectives are different;
  • the plane image of the target object and the label corresponding to the expected perspective are input into the perspective transformation network model, so that the perspective transformation network model generates the expected perspective plane image of the target object.
  • a second aspect of the present application provides a fault judgment method, including:
  • model training data includes a plurality of plane images of different perspectives of the training object and labels corresponding to each of the perspectives, wherein the labels corresponding to different perspectives are different;
  • a third aspect of the present application provides an image viewing angle conversion device, including:
  • model training data acquisition module configured to acquire model training data, where the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to the perspectives, wherein the labels corresponding to different perspectives are different;
  • a perspective conversion network model acquisition module used for training a pre-designed generative adversarial network model according to the model training data to obtain a perspective conversion network model
  • the expected perspective plane image generation module is configured to input the plane image of the target object and the label corresponding to the expected perspective into the perspective transformation network model, so that the perspective transformation network model generates the expected perspective plane image of the target object.
  • a fourth aspect of the present application provides a fault judging device, including:
  • model training data acquisition module configured to acquire model training data, where the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to the perspectives, wherein the labels corresponding to different perspectives are different;
  • a perspective conversion network model acquisition module used for training a pre-designed generative adversarial network model according to the model training data to obtain a perspective conversion network model
  • An expected perspective plane image generation module configured to input the plane image of the target object and the label corresponding to the expected perspective into the perspective conversion network model, so that the perspective transformation network model generates a plurality of different expected perspective plane images of the target object;
  • a stereoscopic image generation module configured to generate a stereoscopic image of the target object according to each of the expected viewing angle plane images
  • a fault judging module is used for judging whether the target object has a fault according to the stereoscopic image.
  • a fifth aspect of the present application provides a computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the processor executing the program When implementing the steps of the method described in any one of the embodiments of the present application.
  • a sixth aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method described in any one of the embodiments of the present application A step of.
  • FIG. 1 is a schematic flowchart of an image perspective conversion method provided in an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an image perspective conversion method provided in another embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for converting an image viewing angle according to another embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a fault determination method provided in an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a fault judgment method provided in another embodiment of the present application.
  • FIG. 6 is a structural block diagram of an image viewing angle conversion apparatus provided in an embodiment of the present application.
  • FIG. 7 is a structural block diagram of an image viewing angle conversion apparatus provided in another embodiment of the present application.
  • FIG. 8 is a structural block diagram of a fault judging apparatus provided in an embodiment of the present application.
  • FIG. 9 is a structural block diagram of a fault judging apparatus provided in another embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
  • the traditional single-view image synthesis method is affected by the observation angle, which easily leads to the loss of a large amount of spatial information in the synthesized 3D image of the observation object, which reduces the quality and efficiency of the synthesized image. effectively identify.
  • the terms “installed”, “connected” and “connected” should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrally connected; it can be a direct connection, an indirect connection through an intermediate medium, or an internal communication between two components.
  • installed should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrally connected; it can be a direct connection, an indirect connection through an intermediate medium, or an internal communication between two components.
  • CNN Convolutional Neural Network
  • VAE Variational Auto-Encoder
  • GAN Generative Adversarial Network
  • the variational autoencoder includes an encoder and a decoder.
  • the encoder is used to map the input image to the latent space, and then the variables in the latent space are mapped to the real image through the decoder.
  • the generative adversarial network includes a generator and a discriminator.
  • the generator and the discriminator are trained in an adversarial manner.
  • the training goal of the generator is to output high-quality target images, while the training goal of the discriminator is to determine the target image as a generator with high probability.
  • Composite image In order to obtain image features from different viewing angles of a target object under a single viewing angle to identify image features of invisible viewing angles of the target object, the present application provides an image viewing angle conversion/fault judgment method, device, device and medium.
  • a method for converting an image perspective including the following steps:
  • Step 22 acquiring model training data, where the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to each of the perspectives, wherein the labels corresponding to different perspectives are different;
  • Step 24 training a pre-designed generative adversarial network model according to the model training data to obtain a perspective conversion network model
  • Step 26 Input the plane image of the target object and the label corresponding to the expected perspective into the perspective transformation network model, so that the perspective transformation network model generates the expected perspective plane image of the target object.
  • the acquired model training data including plane images of training objects from different perspectives and labels corresponding to the perspectives are input into the generative confrontation network. model, and train it so that it can deeply learn the multi-view conversion relationship under different perspectives of the training object to obtain a perspective conversion network model; then input the plane image of the target object and the label corresponding to the expected perspective into the perspective conversion network model,
  • the viewpoint transformation network model is caused to generate an expected viewpoint plane image of the target object.
  • At least one expected perspective plane image of the target object can be obtained according to the single perspective plane image of the target object, and the target object can be reconstructed in three dimensions according to the obtained multiple different expected perspective plane images, and the three-dimensional image of the target object can be obtained to effectively identify the target. 3D characteristics of the object.
  • the acquiring model training data includes:
  • Step 222 obtaining a three-dimensional model of the training object, and projecting the three-dimensional model to obtain a plurality of plane images of the training object from different perspectives;
  • Step 224 generating the same number of labels as the plane images, and labeling the plurality of plane images with different viewing angles with corresponding labels respectively.
  • the 3D CAD model of the selected type of object is projected in a data-driven way to obtain the plane image of the object from different perspectives.
  • the open source toolkit ShapeNet-Viewer can be used to batch generate multiple plane rendering images of different perspectives for a specific object category, and then according to the number of plane images of different perspectives, generate and all The labels with the same number of plane images are marked, and the plurality of plane images with different viewing angles are respectively marked with corresponding labels to obtain model training data.
  • a pre-designed generative adversarial network model can be trained through the model training data, a perspective conversion network model can be obtained, and at least one expected perspective plane image of the target object can be obtained according to the single perspective plane image of the target object, so as to obtain the invisible plane image of the target object. Image features of the viewing angle.
  • the acquiring model training data includes:
  • Step 2221 Acquire a three-dimensional model of the training object, and project the three-dimensional model to obtain a plurality of plane images from different perspectives of the training object. at least two of the oblique side view plane image, the right oblique side view plane image, the side view plane image and the top view plane image;
  • Step 224 generating the same number of labels as the plane images, and labeling the plurality of plane images with different viewing angles with corresponding labels respectively.
  • a plurality of plane images with different viewing angles can be set, including a front view plane image, a left oblique side view plane image, a right oblique side view plane image, a side view plane image and a top view plane image.
  • the open-source toolkit ShapeNet-Viewer can be used to batch generate flat rendering images of the aforementioned 5 viewpoints for a specific object class. Then, 5 labels are generated, and the corresponding labels are respectively marked on the plane images of the 5 different perspectives to obtain model training data.
  • the labels can be set as encoding vectors, for example, the front view plan image can be labeled 00010, the left oblique side view plan image can be labeled 10000, the right oblique side view plan image can be labeled 00100, the side view plan image can be labeled 01000, and the top view plan image can be labeled 01000.
  • Image annotation 00001 to obtain 5 plane images of different perspectives marked with corresponding coding vectors, to obtain model training data, so that a pre-designed generative adversarial network model can be trained through the model training data, and a perspective conversion network model can be obtained, effectively Reduces the complexity of training generative adversarial network models and reduces the risk of overfitting the perspective shift network model.
  • the label includes an encoding vector
  • the generative adversarial network model includes a generator G and a discriminator D
  • the obtaining the perspective conversion network model includes the following steps:
  • Step 241 obtaining a preset input image x and a preset input coding vector c of the training object
  • Step 242 according to the preset input image x and the preset input coding vector c, obtain the pre-generated image G(x, c) output by the generator G;
  • Step 243 Determine an adversarial loss La adv according to the pre-generated image G(x,c) and the probability distribution D(x) of the discriminator D, where the adversarial loss La adv is defined as follows:
  • Step 244 Calculate the target value of the adversarial loss La adv , so that the probability distribution D(x) of the discriminator D obtains the maximum value while the pre-generated image G(x,c) obtains the minimum value.
  • the training generator G By training the generator G and the discriminator D in an adversarial manner, the training generator G outputs a high-quality target image and provides it to the discriminator D; the training discriminator D determines that the target image is an image synthesized by the generator with a high probability; Generate an adversarial network, use the adversarial loss constraint model to calculate the target value of the adversarial loss La adv , so that when the pre-generated image G(x, c) achieves the minimum value, the probability distribution D(x) of the discriminator D is obtained. The maximum value to effectively improve the perceptual quality of the output image.
  • the pre-generated image G(x, c) output by the generator G is obtained according to the preset input image x and the preset input coding vector c, It includes the following steps:
  • Step 2421 obtaining high-dimensional features of a preset dimension according to the preset input coding vector c;
  • Step 2422 generate a feature vector according to the high-dimensional feature of the preset dimension
  • Step 2423 Input the preset input image x and the feature vector into the generator G, so that the generator G generates the pre-generated image G(x, c).
  • the generator G can be set to include a first convolution layer, a second convolution layer, a residual module and a deconvolution layer arranged in sequence, and the first convolution layer can be set Include a convolution kernel of size 7 ⁇ 7; set the second convolution layer to include two convolution kernels of size 3 ⁇ 3 and stride 2; set the number of residual modules to 9; set the deconvolution The layer consists of two convolution kernels of size 4 ⁇ 4 and stride 2.
  • a RGB image with a resolution of 128 ⁇ 128 and a 5-dimensional encoding vector can be input to the generator G.
  • the encoding vector obtains a high-dimensional feature with a dimension of 1024 through two fully connected layers, and then according to the high-dimensional feature Generate 32 ⁇ 32 low-dimensional features, and then output the low-dimensional features and the features of the input image through three convolutional layers through the CONCAT function to obtain a generator that can generate high-quality target images.
  • the obtaining the perspective conversion network model further includes the following steps:
  • Step 2424 Obtain the probability distribution D(c'
  • Step 2425 Determine the domain classification loss of the real image according to the probability distribution D(c'
  • Step 2426 calculate the domain classification loss of the real image The minimum value of , and the loss of the pre-generated image domain classification The minimum value of , to obtain the perspective transformation network model.
  • the adversarial loss constraint model is used to optimize the perspective conversion network model to effectively improve the perceptual quality of the output image.
  • the target object is a wafer.
  • the multi-view conversion relationship under different viewpoints of the wafer is deeply learned to obtain the wafer viewpoint conversion network model; then the planar image of the wafer and the label corresponding to the expected viewpoint are input into the wafer A perspective transformation network model, such that the wafer perspective transformation network model generates a plurality of different expected perspective plane images of the wafer. Therefore, the three-dimensional reconstruction of the wafer can be carried out according to a plurality of plane images of different expected viewing angles, and the three-dimensional image of the wafer can be obtained, so as to effectively identify the three-dimensional features of the wafer.
  • a fault judgment method including the following steps:
  • Step 32 acquiring model training data, where the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to each of the perspectives, wherein the labels corresponding to different perspectives are different;
  • Step 34 training a pre-designed generative adversarial network model according to the model training data to obtain a perspective conversion network model
  • Step 36 inputting the plane image of the target object and the label corresponding to the expected perspective into the perspective conversion network model, so that the perspective conversion network model generates a plurality of different expected perspective plane images of the target object;
  • Step 38 generating a stereoscopic image of the target object according to each of the expected viewing angle plane images
  • Step 310 Determine whether the target object has a fault according to the stereoscopic image.
  • the acquired model training data including a plurality of plane images from different perspectives of the training object and the labels corresponding to the perspectives are input into the generative adversarial network.
  • model train the model so that it can deeply learn the multi-view conversion relationship under different perspectives of the training object to obtain a perspective conversion network model; then input the plane image of the target object and the label corresponding to the expected perspective into the perspective conversion network model , so that the perspective conversion network model generates a plurality of plane images of the target object with different expected perspectives.
  • Three-dimensional reconstruction of the target object is carried out according to the obtained plane images of different expected viewing angles, and a three-dimensional image of the target object is obtained, thereby obtaining the image features of the invisible part of the target object, and the target object is judged according to the three-dimensional image features of the target object. Whether there is a fault, in order to effectively improve the efficiency and intelligence of fault judgment.
  • Step 311 Determine whether the wafer is defective according to the stereoscopic image.
  • the multi-view conversion relationship under different perspectives of the wafer is deeply learned to obtain the wafer perspective conversion network model; then the plane image of the wafer and the expected The label corresponding to the viewing angle is input into the wafer viewing angle conversion network model, so that the wafer viewing angle conversion network model generates a plurality of different expected viewing angle plane images of the wafer; according to the obtained multiple different expected viewing angle plane images, the wafer is three-dimensionally performed.
  • Reconstruction obtain the three-dimensional image of the wafer, so as to obtain the image features of the invisible part of the wafer, and judge whether the wafer is defective according to the three-dimensional image features of the wafer, so as to effectively improve the efficiency and intelligence of defective wafer identification .
  • steps in the flowcharts of FIGS. 1-5 are sequentially displayed according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited in order, and these steps may be performed in other orders. Moreover, although at least a part of the steps in FIG. 1-FIG. 5 may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. Alternatively, the sequence of execution of the stages is not necessarily sequential, but may be performed alternately or alternately with other steps or sub-steps of other steps or at least a portion of a stage.
  • an image perspective conversion device 10 including a model training data acquisition module 12 , a perspective conversion network model acquisition module 14 and an expected perspective plane image generation module 16 , the model training data acquisition module 12 is used for acquiring model training data, the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to each of the perspectives, wherein the labels corresponding to different perspectives are different; perspective conversion The network model acquisition module 14 is used for training the pre-designed generative adversarial network model according to the model training data, so as to obtain the perspective conversion network model; the expected perspective plane image generation module 16 is used to correspond the plane image of the target object and the expected perspective The label of , is input to the perspective transformation network model, so that the perspective transformation network model generates the expected perspective plane image of the target object.
  • the model training data is obtained through the model training data obtaining module 12, and the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to the perspectives, wherein the different perspectives The corresponding labels are different; the perspective conversion network model acquisition module 14 is used to train the pre-designed generative adversarial network model according to the model training data to obtain the perspective conversion network model; thus the target object is generated in the expected perspective plane image generation module 16.
  • the perspective conversion network model After the plane image and the label corresponding to the expected perspective are input into the perspective conversion network model, the perspective conversion network model generates the expected perspective plane image of the target object, and three-dimensional reconstruction is performed on the target object according to the obtained multiple different expected perspective plane images. , to obtain the three-dimensional image of the target object to effectively identify the three-dimensional features of the target object.
  • the model training data acquisition module 12 includes a plane image acquisition module 122 and a label image generation module 124, and the plane image acquisition module 122 is used to acquire the three-dimensional model of the training object, and project the three-dimensional model to obtain a plurality of plane images of the training object from different perspectives; the label image generation module 124 is used to generate the same number of encoding vectors as the plane images, and the plurality of different The plane images of the viewpoints are marked with corresponding labels respectively.
  • the generation module 124 generates the same number of labels as the number of the plane images according to the number of the plane images of the different perspectives, and marks the corresponding labels of the plane images of the different perspectives respectively, so as to obtain the model training data, so as to obtain the model training data.
  • a pre-designed generative adversarial network model can be trained through the model training data to obtain a perspective conversion network model, and at least one expected perspective plane image of the target object can be obtained according to the single perspective plane image of the target object, so as to obtain the invisible perspective of the target object. image features.
  • a plurality of plane images with different viewing angles can be set, including a front view plane image, a left oblique side view plane image, a right oblique side view plane image, a side view plane image and a top view plane image.
  • the open-source toolkit ShapeNet-Viewer can be used to batch generate flat rendering images of the aforementioned 5 viewpoints for a specific object class. Then, 5 labels are generated, and the corresponding labels are respectively marked on the plane images of the 5 different perspectives to obtain model training data.
  • the labels can be set as encoding vectors, for example, the front view plan image can be labeled 00010, the left oblique side view plan image can be labeled 10000, the right oblique side view plan image can be labeled 00100, the side view plan image can be labeled 01000, and the top view plan image can be labeled 01000.
  • Image annotation 00001 to obtain 5 plane images of different perspectives marked with corresponding coding vectors, to obtain model training data, so that a pre-designed generative adversarial network model can be trained through the model training data, and a perspective conversion network model can be obtained, effectively Reduces the complexity of training generative adversarial network models and reduces the risk of overfitting the perspective shift network model.
  • the target object is a wafer.
  • 3D reconstruction of the wafer is performed according to the obtained plane images of different expected perspectives, and a 3D image of the wafer is obtained to effectively identify the wafer. 3D features.
  • a fault judging device 30 including a model training data acquisition module 12, a perspective conversion network model acquisition module 14, an expected perspective plane image generation module 16,
  • the stereo image generation module 308 and the fault judgment module 309 the model training data acquisition module 12 is used to obtain model training data, the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to each of the perspectives, wherein , the labels corresponding to different perspectives are different;
  • the perspective conversion network model acquisition module 14 is used to train the pre-designed generative adversarial network model according to the model training data to obtain the perspective conversion network model;
  • the expected perspective plane image generation module 16 is used for The plane image of the target object and the label corresponding to the expected perspective are input into the perspective conversion network model, so that the perspective conversion network model generates a plurality of plane images of the target object with different expected perspectives;
  • the stereo image generation module 308 is used for A stereoscopic image of the target object is generated from the plane image of the expected viewing
  • the model training data is obtained through the model training data acquisition module 12, and the model training data includes a plurality of plane images of the training object from different perspectives and labels corresponding to the perspectives, wherein the different perspectives The corresponding labels are different; the perspective conversion network model acquisition module 14 is used to train the pre-designed generative adversarial network model according to the model training data to obtain the perspective conversion network model; thus the target object is generated in the expected perspective plane image generation module 16.
  • the perspective conversion network model After inputting the corresponding label of the plane image and the expected perspective into the perspective conversion network model, the perspective conversion network model generates a plurality of plane images of the target object with different expected perspectives; The planar image generates a stereoscopic image of the target object, so that the fault judgment module 309 can judge whether the target object has a fault according to the stereoscopic image, so as to effectively improve the efficiency and intelligence of fault judgment.
  • the target object is a wafer;
  • the fault judgment module 309 includes a defect judgment module 3091, and the defect judgment module 3091 is used for judging the wafer according to the three-dimensional image. Whether the circle is defective.
  • a pre-designed generative adversarial network model is trained to deeply learn the multi-view conversion relationship under different viewing angles of the wafer, so as to obtain the wafer viewing angle conversion network model;
  • the wafer viewing angle conversion network model enables the wafer viewing angle conversion network model to generate multiple different expected viewing angle plane images of the wafer; three-dimensional reconstruction is performed on the wafer according to the obtained multiple different expected viewing angle plane images, and the wafer is obtained.
  • the three-dimensional image of the wafer is obtained, so as to obtain the image features of the invisible part of the wafer, and determine whether the wafer is defective according to the three-dimensional image features of the wafer, so as to effectively improve the efficiency and intelligence of the defective wafer identification.
  • a computer device including a memory and a processor, the memory stores a computer program that can run on the processor, the When the processor executes the program, the steps of the method described in any one of the embodiments of this application are implemented.
  • FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer-readable storage medium is provided, and a computer program is stored thereon, and when the computer program is executed by a processor, the computer program described in any one of the embodiments of the present application is implemented. steps of the method.
  • the acquired plane images including a plurality of different perspectives of the training object and the labels corresponding to the perspectives are obtained.
  • Model training data input the generative adversarial network model, and train it so that it can deeply learn the multi-view conversion relationship under different perspectives of the training object to obtain the perspective conversion network model; then the plane image of the target object and the expected perspective correspond to The label is input to the perspective conversion network model, so that the perspective conversion network model generates the expected perspective plane image of the target object, and obtains at least one expected perspective plane image of the target object according to the single perspective plane image of the target object.
  • a plurality of plane images of different expected viewing angles are used to reconstruct the target object in 3D, and the 3D image of the target object is obtained, so as to effectively identify the 3D features of the target object.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

一种图像视角转换/故障判断方法、装置、设备及介质,所述方法包括:获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;以及将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。

Description

图像视角转换/故障判断方法、装置、设备及介质
相关申请交叉引用
本申请要求2021年01月25日递交的、标题为“图像视角转换/故障判断方法、装置、设备及介质”、申请号为2021100988425的中国申请,其公开内容通过引用全部结合在本申请中。
技术领域
本申请涉及一种图像视角转换/故障判断方法、装置、设备及介质。
背景技术
随着图像合成技术的不断发展及对图像合成的能力不断提升,机器合成图像的质量不断提高。基于物体单视图合成多视图技术,被广泛应用于计算机视觉、计算机图形学、机器人视觉技术以及虚拟现实技术等诸多技术领域中。
图像合成方法主要包括基于图像像素和基于特征表达两种。基于图像像素的方法需要统计未知像素点周围的前景和背景信息,然后根据统计信息计算出未知像素点的值。当前景与背景在颜色、纹理以及风格等方面相近时,这种方法较为适用;反之,则图像合成效果较差。基于特征表达方法的主要思想在于根据特征向量生成图像,其中最具有代表性的算法为主成分分析(Principal Component Analysis,PCA)算法。
发明内容
根据多个实施例,本申请的第一方面提供一种图像视角转换方法,包括:
获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;以及
将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。
根据多个实施例,本申请的第二方面提供一种故障判断方法,包括:
获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各 所述视角对应的标签,其中,不同视角对应的标签不同;
根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;
将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的多个不同预期视角平面图像;
根据各所述预期视角平面图像生成所述目标对象的立体图像;以及
根据所述立体图像判断所述目标对象是否存在故障。
根据多个实施例,本申请的第三方面提供一种图像视角转换装置,包括:
模型训练数据获取模块,用于获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
视角转换网络模型获取模块,用于根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;以及
预期视角平面图像生成模块,用于将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。
根据多个实施例,本申请的第四方面提供一种故障判断装置,包括:
模型训练数据获取模块,用于获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
视角转换网络模型获取模块,用于根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;
预期视角平面图像生成模块,用于将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的多个不同预期视角平面图像;
立体图像生成模块,用于根据各所述预期视角平面图像生成所述目标对象的立体图像;以及
故障判断模块,用于根据所述立体图像判断所述目标对象是否存在故障。
根据多个实施例,本申请的第五方面提供一种计算机设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现本申请中任一个实施例中所述的方法的步骤。
根据多个实施例,本申请的第六方面提供一种计算机可读存储介质,其上存储有计算 机程序,所述计算机程序被处理器执行时实现本申请中任一个实施例中所述的方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他实施例的附图。
图1为本申请一实施例中提供的一种图像视角转换方法的流程示意图。
图2为本申请另一实施例中提供的一种图像视角转换方法的流程示意图。
图3为本申请又一实施例中提供的一种图像视角转换方法的流程示意图。
图4为本申请一实施例中提供的一种故障判断方法的流程示意图。
图5为本申请另一实施例中提供的一种故障判断方法的流程示意图。
图6为本申请一实施例中提供的一种图像视角转换装置的结构框图。
图7为本申请另一实施例中提供的一种图像视角转换装置的结构框图。
图8为本申请一实施例中提供的一种故障判断装置的结构框图。
图9为本申请另一实施例中提供的一种故障判断装置的结构框图。
图10为本申请一实施例中提供的一种计算机设备的结构示意图。
具体实施方式
传统的针对单视图的图像合成方法受观察角度的影响,容易导致合成的观察对象的三维图像丢失大量的空间信息,降低合成图像的质量及效率,且不能对单一视角下观察对象的三维特征进行有效地识别。
为了便于理解本申请,下面将参照相关附图对本申请进行更全面的描述。附图中给出了本申请的较佳的实施例。但是,本申请可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术 人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。本文所使用的术语“及/或”包括一个或多个相关的所列项目的任意的和所有的组合。
在使用本文中描述的“包括”、“具有”、和“包含”的情况下,除非使用了明确的限定用语,例如“仅”、“由……组成”等,否则还可以添加另一部件。除非相反地提及,否则单数形式的术语可以包括复数形式,并不能理解为其数量为一个。
应当理解,尽管本文可以使用术语“第一”、“第二”等来描述各种元件,但是这些元件不应受这些术语的限制。这些术语仅用于将一个元件和另一个元件区分开。例如,在不脱离本申请的范围的情况下,第一元件可以被称为第二元件,并且类似地,第二元件可以被称为第一元件。
在本申请的描述中,需要说明的是,除非另有明确规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是直接连接,亦可以是通过中间媒介间接连接,可以是两个部件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。
得益于卷积神经网络(Convolutional Neural Network,CNN)的优秀表达能力,基于神经网络的图像合成技术快速发展,其中代表性的图像合成技术包括变分自编码器(Variational Auto-Encoder,VAE)和生成对抗网络(Generative Adversarial Network,GAN)。变分自编码器包括编码器和解码器,利用编码器将输入图像映射至隐空间,再通过解码器将隐空间中的变量映射成真实图像。生成对抗网络包括生成器和判别器,生成器和判别器以对抗方式进行训练,生成器的训练目标在于输出高质量的目标图像,而判别器的训练目标在于高概率地判定目标图像为生成器合成的图像。为了获取单一视角下目标对象的不同视角的图像特征,以识别出目标对象的不可见视角的图像特征,本申请提供了一种图像视角转换/故障判断方法、装置、设备及介质。
请参考图1,在本申请的一个实施例中,提供了一种图像视角转换方法,包括如下步骤:
步骤22,获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
步骤24,根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;
步骤26,将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型, 使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。
具体地,请继续参考图1,通过预先设计生成对抗网络模型,将获取的包括训练对象的多个不同视角的平面图像及各所述视角对应的标签的模型训练数据,输入所述生成对抗网络模型,对其进行训练,使其深度学习训练对象不同视角下的多视图转换关系,以获取视角转换网络模型;然后将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。从而能够根据目标对象的单一视角平面图像获取目标对象的至少一个预期视角平面图像,根据获取的多个不同预期视角平面图像对目标对象进行三维重建,获取目标对象的三维图像,以有效地识别目标对象的三维特征。
进一步地,请参考图2,在本申请的一个实施例中,所述获取模型训练数据包括:
步骤222,获取训练对象的三维模型,并对所述三维模型进行投影,以获取所述训练对象的多个不同视角的平面图像;
步骤224,生成与所述平面图像数量相同的标签,并将所述多个不同视角的平面图像分别标注对应的标签。
具体地,请继续参考图1,通过获取训练对象的三维模型,并对所述三维模型进行投影,以获取所述训练对象的多个不同视角的平面图像。由于现实场景中存在各种类别的物体,采用数据驱动的方式,对选定类别物体的3D CAD模型进行投影,得到该物体不同视角下的平面图像。为了在不同视图中建立物体视角的对齐,可以使用开源工具包ShapeNet-Viewer对特定物体类别批量生成多个不同视角的平面渲染图像,然后根据各所述不同视角的平面图像的数量,生成与所述平面图像数量相同的标签,并将所述多个不同视角的平面图像分别标注对应的标签,以获取模型训练数据。从而能够经由所述模型训练数据训练预先设计的生成对抗网络模型,得到视角转换网络模型,实现根据目标对象的单一视角平面图像获取目标对象的至少一个预期视角平面图像,以获取目标对象的不可见视角的图像特征。
作为示例,请参考图3,在本申请的一个实施例中,所述获取模型训练数据包括:
步骤2221,获取训练对象的三维模型,并对所述三维模型进行投影,以获取所述训练对象的多个不同视角的平面图像,所述多个不同视角的平面图像包括前视图平面图像、左斜侧视图平面图像、右斜侧视图平面图像、侧视图平面图像及俯视图平面图像中的至少两个;
步骤224,生成与所述平面图像数量相同的标签,并将所述多个不同视角的平面图像 分别标注对应的标签。
具体地,请继续参考图3,可以设置多个不同视角的平面图像包括前视图平面图像、左斜侧视图平面图像、右斜侧视图平面图像、侧视图平面图像及俯视图平面图像。例如,可以使用开源工具包ShapeNet-Viewer对特定物体类别批量生成前述5个视角的平面渲染图像。然后生成5个标签,并将前述5个不同视角的平面图像分别标注对应的标签,以获取模型训练数据。可以将标签设置为编码向量,例如,可以将前视图平面图像标注00010,将左斜侧视图平面图像标注10000、将右斜侧视图平面图像标注00100、将侧视图平面图像标注01000,将俯视图平面图像标注00001,以获取5个标注对应的编码向量的不同视角的平面图像,得到模型训练数据,从而能够经由所述模型训练数据训练预先设计的生成对抗网络模型,得到视角转换网络模型,有效地减少训练生成对抗网络模型的复杂性,并降低视角转换网络模型过拟合的风险。
作为示例,在本申请的一个实施例中,所述标签包括编码向量,所述生成对抗网络模型包括生成器G及判别器D;所述获取视角转换网络模型包括如下步骤:
步骤241,获取所述训练对象的预设输入图像x和预设输入编码向量c;
步骤242,根据所述预设输入图像x和所述预设输入编码向量c,获取所述生成器G输出的预生成图像G(x,c);
步骤243,根据所述预生成图像G(x,c)及所述判别器D的概率分布D(x)确定对抗损失L adv,其中,所述对抗损失L adv定义如下:
L adv=Ε x[logD(x)]+Ε x,c[log(1-D(G(x,c)))];
步骤244,计算所述对抗损失L adv的目标值,使得所述预生成图像G(x,c)取得最小值的同时,所述判别器D的概率分布D(x)取得最大值。
通过将生成器G和判别器D以对抗方式进行训练,训练生成器G输出高质量的目标图像提供给判别器D;训练判别器D高概率地判定目标图像为生成器合成的图像;通过联合生成对抗网络,利用对抗损失约束模型,计算对抗损失L adv的目标值,使得所述预生成图像G(x,c)取得最小值的同时,所述判别器D的概率分布D(x)取得最大值,以有效提升输出图像的感知质量。
作为示例,在本申请的一个实施例中,所述根据所述预设输入图像x和所述预设输入编码向量c,获取所述生成器G输出的预生成图像G(x,c),包括如下步骤:
步骤2421,根据所述预设输入编码向量c获取预设维度的高维特征;
步骤2422,根据所述预设维度的高维特征生成特征向量;
步骤2423,将所述预设输入图像x及所述特征向量输入所述生成器G,以使得所述生成器G生成所述预生成图像G(x,c)。
作为示例,在本申请的一个实施例中,可以设置生成器G包括顺序排布的第一卷积层、第二卷积层、残差模块及反卷积层,并设置第一卷积层包括一个尺寸为7×7的卷积核;设置第二卷积层包括两个尺寸为3×3且步长为2的卷积核;设置残差模块的数量为9个;设置反卷积层包括两个尺寸为4×4且步长为2的卷积核。以该生成器G为例,示例性说明本申请实现的技术原理。可以向该生成器G输入一幅分辨率为128×128的RGB图像和一个5维的编码向量,该编码向量通过两层全连接层得到维度为1024的高维特征,然后根据该高维特征生成32×32的低维特征,再将该低维特征与输入图像经过三个卷积层后的特征,经由CONCAT函数输出,从而获取能够生成高质量的目标图像的生成器。
作为示例,在本申请的一个实施例中,所述获取视角转换网络模型还包括如下步骤:
步骤2424,获取所述判别器D的原始编码向量c'的概率分布D(c'|x);
步骤2425,根据所述原始编码向量c'的概率分布D(c'|x)确定真实图像的域分类损失
Figure PCTCN2021103576-appb-000001
及根据所述预生成图像G(x,c)确定预生成图像域分类的损失
Figure PCTCN2021103576-appb-000002
其中,所述真实图像的域分类损失
Figure PCTCN2021103576-appb-000003
及所述预生成图像域分类的损失
Figure PCTCN2021103576-appb-000004
分别定义如下:
Figure PCTCN2021103576-appb-000005
Figure PCTCN2021103576-appb-000006
步骤2426,计算所述真实图像的域分类损失
Figure PCTCN2021103576-appb-000007
的最小值,及所述预生成图像域分类的损失
Figure PCTCN2021103576-appb-000008
的最小值,以获取所述视角转换网络模型。
通过联合生成对抗网络,通过计算所述真实图像的域分类损失
Figure PCTCN2021103576-appb-000009
的最小值及所述预生成图像域分类的损失
Figure PCTCN2021103576-appb-000010
的最小值,以获取视角转换网络模型;利用对抗损失约束模型优化视角转换网络模型,以有效提升输出图像的感知质量。
作为示例,在本申请的一个实施例中,所述目标对象为晶圆。通过训练预先设计的生成对抗网络模型,深度学习晶圆不同视角下的多视图转换关系,以获取晶圆视角转换网络模型;然后将晶圆的平面图像及预期视角对应的标签输入所述晶圆视角转换网络模型,使得所述晶圆视角转换网络模型生成晶圆的多个不同预期视角平面图像。从而能够根据多个不同预期视角平面图像对晶圆进行三维重建,获取晶圆的三维图像,以有效地识别晶圆的 三维特征。
进一步地,请参考图4,在本申请的一个实施例中,提供了提供一种故障判断方法,包括如下步骤:
步骤32,获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
步骤34,根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;
步骤36,将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的多个不同预期视角平面图像;
步骤38,根据各所述预期视角平面图像生成所述目标对象的立体图像;
步骤310,根据所述立体图像判断所述目标对象是否存在故障。
具体地,请继续参考图4,通过预先设计生成对抗网络模型,将获取的包括训练对象的多个不同视角的平面图像及各所述视角对应的标签的模型训练数据,输入所述生成对抗网络模型,对该模型进行训练,使其深度学习训练对象不同视角下的多视图转换关系,以获取视角转换网络模型;然后将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的多个不同预期视角平面图像。根据获取的多个不同预期视角平面图像对目标对象进行三维重建,获取目标对象的三维图像,从而获取目标对象不可见部分的图像特征,并根据所述目标对象的三维图像特征判断所述目标对象是否存在故障,以有效地提高故障判断的效率及智能性。
进一步地,请参考图5,在本申请的一个实施例中,所述目标对象为晶圆;所述根据所述立体图像判断所述目标对象是否存在故障,包括如下步骤:
步骤311,根据所述立体图像判断晶圆是否存在缺损。
具体地,请继续参考图5,通过训练预先设计的生成对抗网络模型,深度学习晶圆不同视角下的多视图转换关系,以获取晶圆视角转换网络模型;然后将晶圆的平面图像及预期视角对应的标签输入所述晶圆视角转换网络模型,使得所述晶圆视角转换网络模型生成晶圆的多个不同预期视角平面图像;根据获取的多个不同预期视角平面图像对晶圆进行三维重建,获取晶圆的三维图像,从而能够获取晶圆不可见部分的图像特征,并根据晶圆的三维图像特征判断晶圆是否存在缺损,以有效地提高对缺损晶圆识别的效率及智能性。
应该理解的是,虽然图1-图5的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的依次执行。除非本文中有明确的说明,这些步骤的 执行并没有严格的依次限制,这些步骤可以以其它的依次执行。而且,虽然图1-图5中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行依次也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
进一步地,请参考图6,在本申请的一个实施例中,提供了一种图像视角转换装置10,包括模型训练数据获取模块12、视角转换网络模型获取模块14及预期视角平面图像生成模块16,模型训练数据获取模块12用于获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;视角转换网络模型获取模块14用于根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;预期视角平面图像生成模块16用于将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。
具体地,请继续参考图6,通过模型训练数据获取模块12获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;利用视角转换网络模型获取模块14根据所述模型训练数据,对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;从而在预期视角平面图像生成模块16将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型之后,使得视角转换网络模型生成所述目标对象的预期视角平面图像,根据获取的多个不同预期视角平面图像对目标对象进行三维重建,获取目标对象的三维图像,以有效地识别目标对象的三维特征。
进一步地,请参考图7,在本申请的一个实施例中,模型训练数据获取模块12包括平面图像获取模块122及标签图像生成模块124,平面图像获取模块122用于获取训练对象的三维模型,并对所述三维模型进行投影,以获取所述训练对象的多个不同视角的平面图像;标签图像生成模块124用于生成与所述平面图像数量相同的编码向量,并将所述多个不同视角的平面图像分别标注对应的标签。
具体地,请继续参考图7,通过平面图像获取模块122获取训练对象的三维模型,并对所述三维模型进行投影,以获取所述训练对象的多个不同视角的平面图像;然后利用标签图像生成模块124根据各所述不同视角的平面图像的数量,生成与所述平面图像数量相同的标签,并将所述多个不同视角的平面图像分别标注对应的标签,以获取模型训练数据, 从而能够经由所述模型训练数据训练预先设计的生成对抗网络模型,得到视角转换网络模型,实现根据目标对象的单一视角平面图像获取目标对象的至少一个预期视角平面图像,以获取目标对象的不可见视角的图像特征。
作为示例,在本申请的一个实施例中,可以设置多个不同视角的平面图像包括前视图平面图像、左斜侧视图平面图像、右斜侧视图平面图像、侧视图平面图像及俯视图平面图像。例如,可以使用开源工具包ShapeNet-Viewer对特定物体类别批量生成前述5个视角的平面渲染图像。然后生成5个标签,并将前述5个不同视角的平面图像分别标注对应的标签,以获取模型训练数据。可以将标签设置为编码向量,例如,可以将前视图平面图像标注00010,将左斜侧视图平面图像标注10000、将右斜侧视图平面图像标注00100、将侧视图平面图像标注01000,将俯视图平面图像标注00001,以获取5个标注对应的编码向量的不同视角的平面图像,得到模型训练数据,从而能够经由所述模型训练数据训练预先设计的生成对抗网络模型,得到视角转换网络模型,有效地减少训练生成对抗网络模型的复杂性,并降低视角转换网络模型过拟合的风险。
作为示例,在本申请的一个实施例中,所述目标对象为晶圆。通过根据晶圆的单一视角平面图像获取其多个不同预期视角平面图像,根据获取的多个不同预期视角平面图像对晶圆进行三维重建,获取晶圆的三维图像,以有效地识别晶圆的三维特征。
关于图像视角转换装置的具体限定可以参见上文中对于图像视角转换方法的限定,在此不再赘述。
进一步地,请参考图8,在本申请的一个实施例中,提供了一种故障判断装置30,包括模型训练数据获取模块12、视角转换网络模型获取模块14、预期视角平面图像生成模块16、立体图像生成模块308及故障判断模块309,模型训练数据获取模块12用于获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;视角转换网络模型获取模块14用于根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;预期视角平面图像生成模块16用于将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的多个不同预期视角平面图像;立体图像生成模块308用于根据各所述预期视角平面图像生成所述目标对象的立体图像;故障判断模块309,用于根据所述立体图像判断所述目标对象是否存在故障。
具体地,请继续参考图8,通过模型训练数据获取模块12获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中, 不同视角对应的标签不同;利用视角转换网络模型获取模块14根据所述模型训练数据,对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;从而在预期视角平面图像生成模块16将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型之后,使得视角转换网络模型生成所述目标对象的多个不同预期视角平面图像;然后利用立体图像生成模块308根据各所述预期视角平面图像生成所述目标对象的立体图像,使得故障判断模块309能够根据所述立体图像判断所述目标对象是否存在故障,以有效地提高故障判断的效率及智能性。
进一步地,请参考图9,在本申请的一个实施例中,所述目标对象为晶圆;故障判断模块309包括缺损判断模块3091,缺损判断模块3091用于根据所述立体图像判断所述晶圆是否存在缺损。本实施例中通过训练预先设计的生成对抗网络模型,深度学习晶圆不同视角下的多视图转换关系,以获取晶圆视角转换网络模型;然后将晶圆的平面图像及预期视角对应的标签输入所述晶圆视角转换网络模型,使得所述晶圆视角转换网络模型生成晶圆的多个不同预期视角平面图像;根据获取的多个不同预期视角平面图像对晶圆进行三维重建,获取晶圆的三维图像,从而获取晶圆不可见部分的图像特征,并根据晶圆的三维图像特征判断晶圆是否存在缺损,以有效地提高对缺损晶圆识别的效率及智能性。
关于故障判断装置的具体限定可以参见上文中对于故障判断方法的限定,在此不再赘述。
进一步地,请参考图10,在本申请的一个实施例中,提供了一种计算机设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现本申请中任一个实施例中所述的方法的步骤。
本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
进一步地,在本申请的一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现本申请中任一个实施例中所述的方法的步骤。
具体地,于上述实施例中的计算机设备或计算机可读存储介质中,通过预先设计生成对抗网络模型,将获取的包括训练对象的多个不同视角的平面图像及各所述视角对应的标签的模型训练数据,输入所述生成对抗网络模型,对其进行训练,使其深度学习训练对象不同视角下的多视图转换关系,以获取视角转换网络模型;然后将目标对象的平面图像及 预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像,根据目标对象的单一视角平面图像获取目标对象的至少一个预期视角平面图像,根据获取的多个不同预期视角平面图像对目标对象进行三维重建,获取目标对象的三维图像,以有效地识别目标对象的三维特征。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (18)

  1. 一种图像视角转换方法,包括:
    获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
    根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;以及
    将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。
  2. 根据权利要求1所述的方法,其中所述获取模型训练数据包括:
    获取训练对象的三维模型,并对所述三维模型进行投影,以获取所述训练对象的多个不同视角的平面图像;以及
    生成与所述平面图像数量相同的标签,并将所述多个不同视角的平面图像分别标注对应的标签。
  3. 根据权利要求2所述的方法,其中所述多个不同视角的平面图像包括前视图平面图像、左斜侧视图平面图像、右斜侧视图平面图像、侧视图平面图像及俯视图平面图像中的至少两个。
  4. 根据权利要求2或3所述的方法,其中所述标签包括编码向量,所述生成对抗网络模型包括生成器G及判别器D;
    所述获取视角转换网络模型包括:
    获取所述训练对象的预设输入图像x和预设输入编码向量c;
    根据所述预设输入图像x和所述预设输入编码向量c,获取所述生成器G输出的预生成图像G(x,c);
    根据所述预生成图像G(x,c)及所述判别器D的概率分布D(x)确定对抗损失L adv,其中,所述对抗损失L adv定义如下:
    L adv=Ε x[logD(x)]+Ε x,c[log(1-D(G(x,c)))];以及
    计算所述对抗损失L adv的目标值,使得所述预生成图像G(x,c)取得最小值的同时,所述判别器D的概率分布D(x)取得最大值。
  5. 根据权利要求4所述的方法,其中所述根据所述预设输入图像x和所述预设输入编码向量c,获取所述生成器G输出的预生成图像G(x,c),包括:
    根据所述预设输入编码向量c获取预设维度的高维特征;
    根据所述预设维度的高维特征生成特征向量;以及
    将所述预设输入图像x及所述特征向量输入所述生成器G,以使得所述生成器G生成所述预生成图像G(x,c)。
  6. 根据权利要求4所述的方法,其中所述获取视角转换网络模型还包括:
    获取所述判别器D的原始编码向量c'的概率分布D(c'|x);
    根据所述原始编码向量c'的概率分布D(c'|x)确定真实图像的域分类损失
    Figure PCTCN2021103576-appb-100001
    及根据所述预生成图像G(x,c)确定预生成图像域分类的损失
    Figure PCTCN2021103576-appb-100002
    其中,所述真实图像的域分类损失
    Figure PCTCN2021103576-appb-100003
    及所述预生成图像域分类的损失
    Figure PCTCN2021103576-appb-100004
    分别定义如下:
    Figure PCTCN2021103576-appb-100005
    Figure PCTCN2021103576-appb-100006
    计算所述真实图像的域分类损失
    Figure PCTCN2021103576-appb-100007
    的最小值,及所述预生成图像域分类的损失
    Figure PCTCN2021103576-appb-100008
    的最小值,以获取所述视角转换网络模型。
  7. 根据权利要求4所述的方法,其中所述生成器包括顺序排布的:
    第一卷积层,包括一个尺寸为7×7的卷积核;
    第二卷积层,包括两个尺寸为3×3且步长为2的卷积核;
    残差模块,数量为9个;以及
    反卷积层,包括两个尺寸为4×4且步长为2的卷积核。
  8. 根据权利要求1-3任一项所述的方法,其中所述目标对象为晶圆。
  9. 一种故障判断方法,所述方法包括:
    获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
    根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;
    将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的多个不同预期视角平面图像;
    根据各所述预期视角平面图像生成所述目标对象的立体图像;以及
    根据所述立体图像判断所述目标对象是否存在故障。
  10. 根据权利要求9所述的方法,其中所述目标对象为晶圆;
    所述根据所述立体图像判断所述目标对象是否存在故障包括:
    根据所述立体图像判断所述晶圆是否存在缺损。
  11. 一种图像视角转换装置,包括:
    模型训练数据获取模块,用于获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
    视角转换网络模型获取模块,用于根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;以及
    预期视角平面图像生成模块,用于将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的预期视角平面图像。
  12. 根据权利要求11所述的装置,其中所述模型训练数据获取模块包括:
    平面图像获取模块,用于获取训练对象的三维模型,并对所述三维模型进行投影,以获取所述训练对象的多个不同视角的平面图像;以及
    标签图像生成模块,用于生成与所述平面图像数量相同的编码向量,并将所述多个不同视角的平面图像分别标注对应的标签。
  13. 根据权利要求12所述的装置,其中所述目标对象为晶圆。
  14. 根据权利要求12所述的装置,其中所述多个不同视角的平面图像包括前视图平面图像、左斜侧视图平面图像、右斜侧视图平面图像、侧视图平面图像及俯视图平面图像中的至少两个。
  15. 一种故障判断装置,包括:
    模型训练数据获取模块,用于获取模型训练数据,所述模型训练数据包括训练对象的多个不同视角的平面图像及各所述视角对应的标签,其中,不同视角对应的标签不同;
    视角转换网络模型获取模块,用于根据所述模型训练数据对预先设计的生成对抗网络模型进行训练,以获取视角转换网络模型;
    预期视角平面图像生成模块,用于将目标对象的平面图像及预期视角对应的标签输入所述视角转换网络模型,使得所述视角转换网络模型生成所述目标对象的多个不同预期视角平面图像;
    立体图像生成模块,用于根据各所述预期视角平面图像生成所述目标对象的立体图像;以及
    故障判断模块,用于根据所述立体图像判断所述目标对象是否存在故障。
  16. 根据权利要求15所述的装置,其中所述目标对象为晶圆;
    所述故障判断模块包括缺损判断模块,用于根据所述立体图像判断所述晶圆是否存在缺损。
  17. 一种计算机设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1-10任意一项所述方法的步骤。
  18. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-10任意一项所述方法的步骤。
PCT/CN2021/103576 2021-01-25 2021-06-30 图像视角转换/故障判断方法、装置、设备及介质 WO2022156151A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/447,426 US11956407B2 (en) 2021-01-25 2021-09-12 Image view angle conversion/fault determination method and device, apparatus and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110098842.5A CN114792298A (zh) 2021-01-25 2021-01-25 图像视角转换/故障判断方法、装置、设备及介质
CN202110098842.5 2021-01-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/447,426 Continuation US11956407B2 (en) 2021-01-25 2021-09-12 Image view angle conversion/fault determination method and device, apparatus and medium

Publications (1)

Publication Number Publication Date
WO2022156151A1 true WO2022156151A1 (zh) 2022-07-28

Family

ID=82460264

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103576 WO2022156151A1 (zh) 2021-01-25 2021-06-30 图像视角转换/故障判断方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114792298A (zh)
WO (1) WO2022156151A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629823A (zh) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 多视角图像的生成方法和装置
CN110189278A (zh) * 2019-06-06 2019-08-30 上海大学 一种基于生成对抗网络的双目场景图像修复方法
CN110363163A (zh) * 2019-07-18 2019-10-22 电子科技大学 一种方位角可控的sar目标图像生成方法
WO2019237860A1 (zh) * 2018-06-15 2019-12-19 腾讯科技(深圳)有限公司 一种图像标注方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629823A (zh) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 多视角图像的生成方法和装置
WO2019237860A1 (zh) * 2018-06-15 2019-12-19 腾讯科技(深圳)有限公司 一种图像标注方法和装置
CN110189278A (zh) * 2019-06-06 2019-08-30 上海大学 一种基于生成对抗网络的双目场景图像修复方法
CN110363163A (zh) * 2019-07-18 2019-10-22 电子科技大学 一种方位角可控的sar目标图像生成方法

Also Published As

Publication number Publication date
CN114792298A (zh) 2022-07-26

Similar Documents

Publication Publication Date Title
TWI709106B (zh) 基於深度學習網路的室內場景結構估測系統及其估測方法
Ji et al. Deep view morphing
Ley et al. Syb3r: A realistic synthetic benchmark for 3d reconstruction from images
US20200057778A1 (en) Depth image pose search with a bootstrapped-created database
Liu et al. Pseudo-lidar point cloud interpolation based on 3d motion representation and spatial supervision
CN112967373B (zh) 一种基于非线性3dmm的人脸图像特征编码方法
WO2022198684A1 (en) Methods and systems for training quantized neural radiance field
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
CN113066171A (zh) 一种基于三维人脸形变模型的人脸图像生成方法
CN116418961B (zh) 一种基于三维场景风格化的光场显示方法及系统
US11908067B1 (en) Method and device for gigapixel-level light field intelligent reconstruction of large-scale scene
Dundar et al. Fine detailed texture learning for 3d meshes with generative models
Hani et al. Continuous object representation networks: Novel view synthesis without target view supervision
Chen et al. Transformer-based 3d face reconstruction with end-to-end shape-preserved domain transfer
Song et al. Weakly-supervised stitching network for real-world panoramic image generation
JP2024510230A (ja) 顔表情、身体ポーズ形状及び衣服パフォーマンスキャプチャのための暗黙的微分可能レンダラーを用いたマルチビューニューラル人間予測
Ibrahim et al. MVPCC-Net: multi-view based point cloud completion network for MLS data
WO2022156151A1 (zh) 图像视角转换/故障判断方法、装置、设备及介质
Yu et al. A framework for automatic and perceptually valid facial expression generation
CN114742954A (zh) 一种构建大规模多样化人脸图片和模型数据对的方法
Teng et al. Blind face restoration via multi-prior collaboration and adaptive feature fusion
US11956407B2 (en) Image view angle conversion/fault determination method and device, apparatus and medium
Ibrahim et al. Multi-view based 3d point cloud completion algorithm for vehicles
Nguyen et al. Guiding Image Manipulations using Shape‐appearance Subspaces from Co‐alignment of Image Collections
CN116168137B (zh) 一种基于神经辐射场的新视角合成方法、装置及存储器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920532

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920532

Country of ref document: EP

Kind code of ref document: A1