WO2023061195A1 - 图像获取模型的训练方法、图像检测方法、装置及设备 - Google Patents

图像获取模型的训练方法、图像检测方法、装置及设备 Download PDF

Info

Publication number
WO2023061195A1
WO2023061195A1 PCT/CN2022/121317 CN2022121317W WO2023061195A1 WO 2023061195 A1 WO2023061195 A1 WO 2023061195A1 CN 2022121317 W CN2022121317 W CN 2022121317W WO 2023061195 A1 WO2023061195 A1 WO 2023061195A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
space
electronic device
sub
Prior art date
Application number
PCT/CN2022/121317
Other languages
English (en)
French (fr)
Inventor
秦陈陈
吴大盛
姚建华
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP22880135.3A priority Critical patent/EP4318314A1/en
Publication of WO2023061195A1 publication Critical patent/WO2023061195A1/zh
Priority to US18/199,872 priority patent/US20230298324A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/48Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the embodiments of the present application relate to the technical field of image processing, and in particular to a training method for an image acquisition model, an image detection method, device, and equipment.
  • Hough transform is one of the methods of image detection technology. Based on the principle of Hough transform, it can detect the plane, straight line, ellipse, etc. of the object in the image, so as to carry out subsequent analysis and processing.
  • Embodiments of the present application provide a training method for an image acquisition model, an image detection method, a device, and equipment, which can improve the accuracy of a heat map of an image in a Hough space.
  • the technical solution includes the following content.
  • the embodiment of the present application provides a method for training an image acquisition model, the method comprising:
  • the electronic device acquires a label image pair of the sample image, the label image pair includes a first label image and a second label image, the first label image is a heat map of the sample image in image space, and the second label image is the heat map of the sample image in Hough space;
  • the electronic device acquires a predicted image pair of the sample image according to the first network model, the predicted image pair includes a first predicted image and a second predicted image, and the first predicted image is obtained by the first network model
  • the heat map of the sample image in the image space, the second predicted image is the heat map of the sample image in the Hough space obtained by the first network model
  • the electronic device adjusts the first network model based on the label image pair and the predicted image pair to obtain a second network model, so that the predicted image pair obtained according to the second network model is consistent with the label image pair The difference between
  • the electronic device determines the second network model as an image acquisition model in response to satisfying a training termination condition.
  • an embodiment of the present application provides an image detection method, the method comprising:
  • the electronic device acquires the image to be detected
  • the electronic device acquires a target image of the image to be detected according to an image acquisition model, and the target image is a heat map of the image to be detected in Hough space;
  • the electronic device determines an image detection result of the image to be detected based on the target image
  • the training process of the image acquisition model includes:
  • the label image pair includes a first label image and a second label image
  • the first label image is a heat map of the sample image in image space
  • the second label image is the A heat map of the sample image in the Hough space
  • the predicted image pair includes a first predicted image and a second predicted image
  • the first predicted image is the sample obtained by the first network model
  • the second predicted image is a heat map of the sample image obtained by the first network model in the Hough space
  • the second network model is determined as the image acquisition model.
  • an embodiment of the present application provides a training device for an image acquisition model, the device comprising:
  • the first acquisition module is configured to acquire a label image pair of the sample image, the label image pair includes a first label image and a second label image, the first label image is a heat map of the sample image in image space, and The second label image is a heat map of the sample image in Hough space;
  • the second acquisition module is configured to acquire a predicted image pair of the sample image according to the first network model, the predicted image pair includes a first predicted image and a second predicted image, and the first predicted image is the first predicted image A heat map of the sample image obtained by the network model in the image space, and the second predicted image is a heat map of the sample image obtained by the first network model in the Hough space;
  • An adjustment module configured to adjust the first network model based on the label image pair and the predicted image pair to obtain a second network model, so that the predicted image pair obtained according to the second network model is consistent with the label image The difference between pairs decreases;
  • a determining module configured to determine the second network model as an image acquisition model in response to satisfying a training termination condition.
  • an image detection device the device includes:
  • the first acquisition module is used to acquire the image to be detected
  • the second acquisition module is used to acquire the target image of the image to be detected according to the image acquisition model, and the target image is a heat map of the image to be detected in Hough space;
  • a determining module configured to determine an image detection result of the image to be detected based on the target image
  • the training process of the image acquisition model includes:
  • the label image pair includes a first label image and a second label image
  • the first label image is a heat map of the sample image in image space
  • the second label image is the A heat map of the sample image in the Hough space
  • the predicted image pair includes a first predicted image and a second predicted image
  • the first predicted image is the sample obtained by the first network model
  • the second predicted image is a heat map of the sample image obtained by the first network model in the Hough space
  • the second network model is determined as the image acquisition model.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor , so that the electronic device implements any of the above-mentioned image acquisition model training methods or realizes any of the above-mentioned image detection methods.
  • a computer-readable storage medium is also provided, wherein at least one program code is stored in the computer-readable storage medium, and the at least one program code is loaded and executed by a processor, so that the electronic device realizes any of the above A method for training an image acquisition model or implementing any of the image detection methods described above.
  • a computer program or computer program product includes at least one computer instruction, and the at least one computer instruction is loaded and executed by a processor, so that the electronic device realizes the above-mentioned Any method for training an image acquisition model or implementing any of the image detection methods described above.
  • the technical solution provided by the embodiment of the present application is to obtain an image acquisition model based on the heat map of the sample image in the Hough space and the heat map of the sample image in the image space, so that the image acquisition model can learn the key point features of the Hough space and the image space
  • the semantic features of the image can quickly and accurately determine the heat map of the image in the Hough space, and realize the accurate determination of the image detection result based on the heat map of the image in the Hough space, thereby improving the accuracy of subsequent analysis and processing.
  • FIG. 1 is a schematic diagram of an implementation environment of a training method for an image acquisition model or an image detection method provided in an embodiment of the present application;
  • Fig. 2 is a flow chart of a training method for an image acquisition model provided by an embodiment of the present application
  • Fig. 3 is a schematic diagram of an image provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of another image provided by the embodiment of the present application.
  • Fig. 5 is a schematic diagram of another image provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a fast deep Hough transform network provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of an inverse depth Hough transform provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a first network model provided by an embodiment of the present application.
  • FIG. 9 is a flowchart of a second fusion feature determination method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of another first network model provided by the embodiment of the present application.
  • FIG. 11 is a flow chart of an image detection method provided by an embodiment of the present application.
  • Fig. 12 is a schematic diagram of processing a brain scan image provided by an embodiment of the present application.
  • Fig. 13 is a schematic diagram of processing a fetal scan image provided by an embodiment of the present application.
  • Fig. 14 is a schematic diagram of processing a cell image provided in the embodiment of the present application.
  • Fig. 15 is a schematic diagram of processing a photographic image provided by an embodiment of the present application.
  • Fig. 16 is a schematic structural diagram of a training device for an image acquisition model provided by an embodiment of the present application.
  • Fig. 17 is a schematic structural diagram of an image detection device provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • Fig. 19 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • Fig. 20 is a schematic diagram of an image provided by an embodiment of the present application.
  • Fig. 21 is a schematic diagram of another image provided by the embodiment of the present application.
  • Fig. 22 is another schematic diagram of an image provided by the embodiment of the present application.
  • Fig. 23 is a schematic diagram of another inverse depth Hough transform provided by an embodiment of the present application.
  • Image space is the three-dimensional geometric space of the image.
  • Hough Space is the polar coordinate space (also called parameter space) of the image.
  • Hough Transform is a feature extraction method. Given equations such as straight lines, ellipses, and planes, voting is performed in the polar coordinate space, and positioning and detection are performed by detecting the local peak points in the accumulated polar coordinate space. Geometric patterns. According to different equations, Hough transform can be divided into linear Hough transform (Line Hough Transform, LHT), plane Hough transform (Plane Hough Transform, PHT), elliptic Hough transform (Elliptic Hough Transform, EHT).
  • LHT Line Hough Transform
  • PHT Plane Hough Transform
  • EHT elliptic Hough transform
  • the Deep Hough Transform (DHT) network is a Convolutional Neural Network (CNN) integrated with the Hough Transform method.
  • CNN Convolutional Neural Network
  • the characteristics of the sample image in the Hough space are determined based on the characteristics of the sample image in the image space, so as to obtain the heat map of the sample image in the Hough space.
  • the DHT network is divided into three-dimensional deep Hough transform (3-dimension Deep Hough Transform, 3D DHT) network and two-dimensional deep Hough transform (2-dimension Deep Hough Transform, 2D DHT) network, 3D DHT network is used to process three-dimensional
  • 3D DHT network is used to process three-dimensional
  • the sample image of , the 2D DHT network is used to process the two-dimensional sample image.
  • the Inverse Deep Hough Transform (IDHT) network is a CNN integrated with the Inverse Hough Transform method.
  • the IDHT network is used to determine the characteristics of the sample image in the image space based on the characteristics of the sample image in the Hough space, so as to obtain the heat map of the sample image in the image space.
  • the IDHT network is divided into three-dimensional inverse depth Hough transform (3-dimension Inverse Deep Hough Transform, 3D IDHT) network and two-dimensional inverse depth Hough transform (2-dimension Inverse Deep Hough Transform, 2D IDHT) network, 3D IDHT network It is used to process three-dimensional sample images, and the 2D IDHT network is used to process two-dimensional sample images.
  • the principle of Hough transform to detect the plane of the object in the image is to use the duality of point and plane to transform the point in the image space into a plane of Hough space through the plane expression form, so as to obtain Hough A heatmap of the space.
  • the target plane in the image space is obtained, that is, the plane of the object in the image is obtained.
  • an accumulator is often used to convert each point in the image space into a heat map of the Hough space based on the principle of the Hough transform, so as to determine the image detection result based on the heat map of the Hough space, that is, to determine the plane of the object in the image , straight line, ellipse, etc.
  • the accumulator is implemented based on traditional mathematical transformation, which requires constant iterative parameters, which is very slow and has low precision, which affects the accuracy of subsequent analysis and processing.
  • Fig. 1 is a schematic diagram of an implementation environment of an image acquisition model training method or an image detection method provided in an embodiment of the present application.
  • the implementation environment includes electronic equipment 11, and the image acquisition model in the embodiment of the application
  • the training method or the image detection method can be executed by the electronic device 11 .
  • the electronic device 11 may include at least one of a terminal device or a server.
  • Terminal equipment can be smartphones, game consoles, desktop computers, tablet computers, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) players, MP4 (Moving Picture Experts Group Audio Layer IV, Motion Picture Expert Compression Standard Audio Layer 4) At least one of players and laptop computers.
  • MP3 Motion Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3
  • MP4 Motion Picture Experts Group Audio Layer IV, Motion Picture Expert Compression Standard Audio Layer 4
  • At least one of players and laptop computers At least one of players and laptop computers.
  • the server may be one server, or a server cluster composed of multiple servers, or any one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application.
  • the server can communicate with the terminal device through a wired network or a wireless network.
  • the server may have functions such as data processing, data storage, and data sending and receiving, which are not limited in this embodiment of the present application.
  • Artificial Intelligence is to use a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, and acquire knowledge.
  • Theories, methods, techniques and application systems that use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the nature of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, automatic driving, and intelligent transportation.
  • Computer Vision technology (Computer Vision, CV) is a science that studies how to make machines "see”. To put it further, it refers to the use of cameras and computers instead of human eyes to identify and measure the target and other machine vision, and further make graphics Processing, so that the computer processing becomes an image that is more suitable for human observation or sent to the instrument for detection.
  • Computer Vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (Optical Character Recognition, OCR), video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D (3- Dimension, three-dimensional) technology, virtual reality, augmented reality, simultaneous positioning and map construction, automatic driving, intelligent transportation and other technologies, as well as common face recognition, fingerprint recognition and other biometric recognition technologies.
  • OCR optical character recognition
  • OCR optical Character Recognition
  • video processing video semantic understanding
  • video content/behavior recognition 3D object reconstruction
  • 3D (3- Dimension, three-dimensional) technology virtual reality, augmented reality, simultaneous positioning and map construction
  • automatic driving intelligent transportation and other technologies, as well as common face recognition, fingerprint recognition and other biometric recognition technologies.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, drones , robots, intelligent medical care, intelligent customer service, Internet of Vehicles, autonomous driving, intelligent transportation, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.
  • the embodiment of the present application provides a training method for an image acquisition model. Take the flowchart of a training method for an image acquisition model provided by the embodiment of the application shown in FIG. 2 as an example.
  • the electronic device 11 in 1 executes. As shown in FIG. 2 , the method includes step 201 to step 204 .
  • step 201 the electronic device acquires a label image pair of a sample image.
  • the label image pair includes a first label image and a second label image
  • the first label image is a heat map of the sample image in image space
  • the second label image is a heat map of the sample image in Hough space.
  • sample images include but not limited to brain images, landscape images, road images, fetal images, cell images, etc., and the number of sample images is multiple.
  • the first label image is a heat map of the sample image in the image space, and the value of any pixel in the first label image represents the probability that the pixel is on the target line or on the target ellipse or on the target plane.
  • the second label image is a heat map of the sample image in Hough space, and the value of any pixel in the second label image represents the probability that the pixel is a target line, a target ellipse, or a target plane.
  • FIG. 3 or FIG. 20 is a schematic diagram of an image provided by an embodiment of the present application.
  • the image indicated by reference numeral 301 is a sample image.
  • the heat map of the sample image 301 in the image space is shown as 302
  • the heat map of the sample image 301 in the Hough space is shown as 303
  • the effect map corresponding to the 303 is shown as 2002
  • the image shown by 304 is The superimposed image of the sample image 301 and the heat map 302
  • the effect diagram corresponding to the reference number 304 is shown as the reference number 2001 .
  • FIG. 4 or FIG. 21 is another schematic diagram of an image provided by the embodiment of the present application.
  • the heat map of the sample image 301 in the image space is a strip.
  • the image shown by reference number 401 is the superimposed image of the sample image and the heat map of the sample image in the image space, and the effect image corresponding to reference number 401 is shown by reference number 2101 . Comparing the image shown by 304 with the image shown by 401, it can be clearly seen that: in the image shown by 304, the heat map of the sample image 301 in the image space is a straight line, and in the image shown by 401 , the heat map of the sample image 301 in the image space is a strip.
  • the images shown by reference numeral 402 and reference numeral 403 are images obtained by superimposing the sample image and the heat map of the sample image in the image space.
  • the rendering image corresponding to reference numeral 402 is shown in reference numeral 2102, and the rendering image corresponding to reference numeral 403 is shown in reference numeral 2103. shown.
  • the three images indicated by reference numeral 401 , reference numeral 402 and reference numeral 403 are images of three different viewing angles.
  • FIG. 5 or FIG. 22 is another schematic diagram of an image provided by an embodiment of the present application.
  • the three images shown by the reference number 501 are the superimposed images of the heat map of the sample image in the image space and the sample image under different viewing angles.
  • the effect diagram corresponding to the reference number 501 is shown by the reference number 2201, and the three images shown by the reference number 502 It is the superimposed image of the heat map of the sample image in the image space and the sample image under different viewing angles, and the effect diagram corresponding to the reference number 502 is shown in the reference number 2202 .
  • step 202 the electronic device acquires a predicted image pair of the sample image according to the first network model.
  • the predicted image pair includes a first predicted image and a second predicted image
  • the first predicted image is the heat map of the sample image obtained by the first network model in the image space
  • the second predicted image is the sample image obtained by the first network model in A heat map in Hough space
  • the first predicted image and the second predicted image are heat maps predicted by the first network model.
  • the sample image is input to the first network model, and the first network model outputs the first predicted image and the second predicted image.
  • the first predicted image is a heat map of the sample image in the image space, and the value of any pixel in the first predicted image represents the probability that the pixel is on the target line or on the target ellipse or on the target plane.
  • the second predicted image is a heat map of the sample image in Hough space, and the value of any pixel in the second predicted image represents the probability that the pixel is a target line, a target ellipse, or a target plane.
  • Step 203 the electronic device adjusts the first network model based on the label image pair and the predicted image pair to obtain a second network model, so that the difference between the predicted image pair obtained according to the second network model and the label image pair is reduced.
  • the difference between the predicted image pair and the label image pair obtained by network model prediction the higher the accuracy of the network model. Therefore, the difference between the predicted image pair obtained by the network model and the label image pair The difference between them is the target to adjust the network model to make the adjusted network model more accurate.
  • the difference between the predicted image pair and the label image pair refers to the difference between the first label image and the first predicted image and the difference between the second label image and the second predicted image.
  • the reduction of the difference between the predicted image pair and the label image pair obtained according to the second network model refers to: compared with the difference between the predicted image pair and the label image pair obtained according to the first network model, according to the second The discrepancy between the predicted image pair and the labeled image pair from the network model is reduced.
  • the loss value of the first network model is calculated, and the model parameters of the first network model are adjusted based on the loss value of the first network model to obtain the second network model, In order to reduce the difference between the predicted image pair and the label image pair obtained according to the second network model.
  • the loss value of the first network model represents the difference between the predicted image pair and the label image pair obtained by the first network model.
  • step 204 the electronic device determines the second network model as the image acquisition model in response to satisfying the training termination condition.
  • the embodiment of the present application does not limit the satisfaction of the training termination condition.
  • the satisfaction of the training termination condition is reaching the target number of training times.
  • the value of the target training times can be flexibly set according to human experience or actual scenarios, for example, the target training times is 200 times.
  • the electronic device determines the second network model as the image acquisition model in response to satisfying the training termination condition. If the electronic device does not satisfy the training termination condition, it will train the second network model based on the sample image in the manner of step 202 and step 203 until the training termination condition is satisfied, and obtain an image acquisition model.
  • the image acquisition model is usually trained by using the heat map of the image in the image space, so that the image acquisition model can use the semantic features of the image in the image space to determine the heat map of the image in the image space, but due to fewer factors considered, Therefore, the accuracy of the image acquisition model is poor.
  • the heat map of the sample image in the image space represents the semantic features of the image space
  • the heat map of the sample image in the Hough space represents the key point features of the Hough space. Therefore, based on the heat map of the sample image in the Hough space and the heat map of the sample image in the image space, the image acquisition model is trained, so that the image acquisition model can learn the key point features of the Hough space and the semantic features of the image space.
  • the heat map of the image in the image space and can quickly and accurately determine the heat map of the image in the Hough space.
  • the convergence speed of the model can be accelerated.
  • the training image acquisition model using this method can learn the features of different spaces, it can integrate the features of multiple spaces when generating the heat map, so that the image acquisition model can further improve the quality of the image while learning the relevant features. Get the accuracy of the model.
  • an accurate image detection result can be determined based on the heat map of the image in the Hough space, thereby improving the accuracy of subsequent analysis and processing.
  • the electronic device acquires the predicted image pair of the sample image, including: the electronic device acquires the first image space feature of the sample image, and the first image space feature represents the feature of the sample image in the image space; the electronic device based on The first image space feature determines the first predicted image; the electronic device determines the first Hough space feature based on the first image space feature, and the first Hough space characterizes the feature of the sample image in the Hough space; based on the first Hough space feature A second predicted image is determined.
  • the first image space features of the sample image are extracted first. Then, on the one hand, according to the first network model, perform upsampling based on the first image space feature to obtain the first prediction image; on the other hand, according to the first network model, perform Hough transform based on the first image space feature to obtain the first prediction image A Hough space characteristic. Afterwards, according to the first network model, upsampling is performed based on the first Hough space feature to obtain a second predicted image.
  • the first network model includes a DHT network
  • the DHT network is used to perform Hough transform on the first image space feature to obtain the first Hough space feature.
  • the DHT network is a 2D DHT network
  • the DHT network is a 3D DHT network.
  • the following takes the 3D DHT network as an example to introduce the operation principle of the 3D DHT network in detail.
  • the image acquisition model mainly detects a certain plane of the three-dimensional sample image (such as the midline plane of the brain, the standard plane of the fetus, etc.).
  • the feature of the sample image in the image space is R H ⁇ W ⁇ D
  • the feature of the sample image in the Hough space is Among them, A, B, C and D are four parameters, x, y and z are three variables, ⁇ and ⁇ are three parameters, R is a real number, H is height, W is width, and D is depth.
  • the Hough transform is to convert any point in the image space into a plane in the Hough space. For each plane, the Hough key point heat map (that is, the heat map of the image in the Hough space) is obtained.
  • the maximum value points in the Hough key point heat map correspond to (i.e. phi), theta (i.e. theta) and ⁇ (i.e. rho).
  • the 3D DHT network implements the Hough transform according to the following implementation modes A1 and A2.
  • the algorithm of the 3D DHT network includes the forward propagation algorithm of the 3D DHT network (see implementation A1) and the backpropagation algorithm of the 3D DHT network (see implementation A2).
  • Input volume, //features R H ⁇ W ⁇ D of the sample image in the image space;
  • phi // its value range is [phi_min, phi_max], and the number of samples is n_phi;
  • theta // its value range is [theta_min, theta_max], the number of samples is n_theta;
  • rho //The value range is [rho_min, rho_max], and the number of samples is n_rho;
  • the embodiment of the present application can obtain the features of the sample image in the Hough space by implementing the mode A1.
  • phi // its value range is [phi_min, phi_max], and the number of samples is n_phi;
  • theta // its value range is [theta_min, theta_max], the number of samples is n_theta;
  • rho //The value range is [rho_min, rho_max], and the number of samples is n_rho;
  • the embodiment of the present application can obtain the features of the sample image in the image space by implementing the mode A2.
  • the image acquisition model mainly detects a straight line or an ellipse of the two-dimensional sample image.
  • the feature of the sample image in the image space is R H ⁇ W
  • the feature of the sample image in the Hough space is R ⁇ ⁇ .
  • A, B and C are three parameters
  • x and y are two variables
  • ⁇ and ⁇ are two parameters
  • R is a real number
  • H height
  • W is width.
  • the algorithm of the 2D DHT network includes the forward propagation algorithm of the 2D DHT network and the back propagation algorithm of the 2D DHT network.
  • the principle of the forward propagation algorithm of the 2D DHT network is similar to that of the 3D DHT network. See the implementation method A1 for details, and will not repeat them here.
  • the backpropagation algorithm of 2D DHT network is similar to the principle of backpropagation algorithm of 3D DHT network, see implementation method A2 for details, and will not be repeated here.
  • the electronic device determines the first Hough space feature based on the first image space feature, including: the electronic device rectifies the first image space feature to obtain the rectified first image space feature; the electronic device based on the rectified first image space feature An image space feature is used to determine the first Hough space feature.
  • rectification refers to correcting the spatial features of the first image.
  • the first network model includes a fast deep Hough transform (Fast DHT) network
  • the Fast DHT network includes a rectification network and a DHT network.
  • the Fast DHT network is used to reduce the amount of computation and speed up the computation.
  • FIG. 6 is a schematic diagram of a fast deep Hough transform network provided by an embodiment of the present application.
  • the first image space feature is input to the Fast DHT network, and the first image space feature is rectified by the rectification network to obtain the rectified first image space feature, and the DHT network performs Hough transformation on the rectified first image space feature , get the first Hough space feature.
  • x is the data of the space feature of the first image
  • f(x) is the data of the space feature of the first image after rectification.
  • Table 1 is the operation time of Fast DHT network and DHT network tested on the graphics card, wherein, the embodiment of the present application tests the operation time of two kinds of sample images in Fast DHT network and DHT network.
  • the image size corresponding to the heat map of a sample image in the image space is 400 ⁇ 400, and the image size corresponding to the heat map of the sample image in the Hough space is 180 ⁇ 400.
  • the Fast DHT network corresponding to this sample image It is a Fast 2D DHT network, and the DHT network corresponding to this sample image is a 2D DHT network.
  • the image size corresponding to the heat map of another sample image in the image space is 192 ⁇ 192 ⁇ 160, and the image size corresponding to the heat map of the sample image in the Hough space is 180 ⁇ 180 ⁇ 160.
  • This sample image The corresponding Fast DHT network is the Fast 3D DHT network, and the DHT network corresponding to this sample image is the 3D DHT network.
  • the electronic device determines the first Hough space feature based on the first image space feature, it further includes: the electronic device determines the second image space feature based on the first Hough space feature and the first image space feature , the second image space feature represents the feature of the sample image in the image space; the electronic device determines the third prediction image based on the second image space feature, and the third prediction image is the heat map of the sample image in the image space obtained by the first network model; the electronic device The device determines the second Hough space feature based on the second image space feature, and the second Hough space feature characterizes the feature of the sample image in the Hough space; the electronic device determines the fourth predicted image based on the second Hough space feature, and the fourth predicted image is the heatmap of the sample image obtained by the first network model in the Hough space.
  • the electronic device adjusts the first network model based on the label image pair and the predicted image pair to obtain the second network model, including: the electronic device adjusts the first network model based on the label image pair, the predicted image pair, the third predicted image and the fourth predicted image , to obtain the second network model, so that the difference between the predicted image pair and the label image pair obtained according to the second network model is reduced, and the obtained third predicted image and the fourth predicted image are composed of the image pair and the label image pair The difference between them decreases.
  • the network model also obtains the third predicted image and the fourth predicted image, and the smaller the difference between the image pair composed of the third predicted image and the fourth predicted image and the label image pair, the more accurate the network model is.
  • the higher the degree is, so in addition to reducing the difference between the predicted image pair and the label image pair obtained by the network model, it is also to reduce the image pair composed of the third predicted image and the fourth predicted image and the label image pair
  • the difference between them is the target to adjust the network model to make the adjusted network model more accurate.
  • the difference between the image pair composed of the third predicted image and the fourth predicted image and the label image pair refers to the difference between the first label image and the third predicted image and the difference between the second label image and the fourth predicted image. difference.
  • the second image space feature is firstly determined based on the first image space feature and the first Hough space feature. Then, on the one hand, according to the first network model, upsampling is performed based on the spatial features of the second image to obtain the third predicted image; on the other hand, according to the first network model, Hough transform is performed based on the spatial features of the second image to obtain the first Two Hough spatial features. Afterwards, according to the first network model, upsampling is performed based on the second Hough space feature to obtain a fourth predicted image.
  • the first predicted image can be obtained based on the first image space feature
  • the second predicted image can be obtained based on the first Hough space feature.
  • the second image space feature is determined based on the first image space feature and the first Hough space feature, and the accuracy of the image space feature is improved by further processing the features, so that the third predicted image obtained based on the second image space feature more acurrate.
  • the accuracy of the second Hough spatial feature determined based on the second image spatial feature is improved, so that the fourth predicted image obtained based on the second Hough spatial feature is more accurate.
  • the first network model is adjusted by using the first prediction image, the second prediction image, the third prediction image and the fourth prediction image, so that the model learns more information and speeds up the convergence speed of the model.
  • the electronic device determines the second image space feature based on the first Hough space feature and the first image space feature, including: the electronic device determines the third image space feature based on the first Hough space feature, and the third image space feature characterizes Features of the sample image in image space; the electronic device fuses the first image space feature and the third image space feature to obtain the first fusion feature; the electronic device determines the second image space feature based on the first fusion feature.
  • the first network model includes an IDHT network, and the IDHT network performs inverse Hough transform on the first Hough space feature to obtain the third image space feature.
  • the first Hough space feature is obtained based on the first image space feature, that is to say, in the embodiment of the present application, the first image space feature is processed first to obtain the first Hough space feature, and then the first Hough space feature is obtained.
  • the spatial features of the third image are processed to obtain the spatial features of the third image, and this process will lead to feature loss.
  • the first fusion feature is obtained by fusing the third image space feature with the first image space feature to reduce the impact of feature loss. Afterwards, a second image space feature is determined based on the first fusion feature.
  • FIG. 7 or FIG. 23 is a schematic diagram of an inverse depth Hough transform provided by an embodiment of the present application.
  • the inverse depth Hough transform is used to transform an image from a Hough space to an image space.
  • the three images shown by the number 701 are the heat maps of the sample images in the Hough space under different viewing angles
  • the effect diagram corresponding to the number 701 is shown by the number 2301
  • the three images shown by the number 702 are sample images under different viewing angles
  • the effect map corresponding to the label 702 is shown in the label 2302.
  • the Hough space features corresponding to the three images shown in the label 701 are converted into the three images shown in the label 702.
  • the corresponding image space features are the heat maps of the sample images in the Hough space under different viewing angles.
  • the heat map of the space, the effect map corresponding to the number 704 is shown in the number 2304, through the IDHT network, the Hough space features corresponding to the three images shown in the number 703 are converted into the corresponding to the three images shown in the number 704 image space features.
  • IDHT network includes 2D IDHT network and 3D IDHT network
  • 2D IDHT network includes forward propagation algorithm of 2D IDHT network and back propagation algorithm of 2D IDHT network
  • 3D IDHT network includes forward propagation algorithm of 3D IDHT network and 3D IDHT network Backpropagation algorithm.
  • the forward propagation algorithm of the 2D IDHT network, the forward propagation algorithm of the 3D IDHT network and the back propagation algorithm of the 3D DHT network have the same principle, see the above implementation method A2 for details, and will not repeat them here.
  • the electronic device adjusts the first network model based on the label image pair, the predicted image pair, the third predicted image, and the fourth predicted image to obtain the second network model, including: the electronic device adjusts the first network model based on the label image pair and Predict the image pair to obtain the first loss value; the electronic device obtains the second loss value based on the label image pair, the third predicted image and the fourth predicted image; the electronic device obtains the first network loss value based on the first loss value and the second loss value A loss value of the model; the electronic device adjusts the first network model based on the loss value of the first network model to obtain a second network model.
  • the first loss value represents the difference between the predicted image pair and the label image pair
  • the second loss value represents the difference between the image pair composed of the third predicted image and the fourth predicted image and the label image pair.
  • the label image pair includes a first label image and a second label image
  • the predicted image pair includes a first predicted image and a second predicted image.
  • the electronic device calculates a first loss value based on the first label image, the second label image, the first predicted image, and the second predicted image.
  • Calculate the second loss value based on the first label image, the second label image, the third predicted image and the fourth predicted image calculate the loss value of the first network model based on the first loss value and the second loss value, and pass the label image pair
  • the pair of predicted images, the third predicted image and the fourth predicted image are trained to obtain an image acquisition model, so as to realize the learning of the supervised model in the image space and the Hough space simultaneously, and improve the accuracy of the model.
  • the loss value of the first network model is calculated according to the following formula (1).
  • L total is the loss value of the first network model
  • L s1 is the first loss value
  • L s2 is the second loss value
  • N is the number of pixels of the sample image in the heat map of the image space
  • P out1 is the first A predicted image
  • P gt is the first label image
  • M is the number of pixels in the heat map of the sample image in Hough space
  • H out1 is the second predicted image
  • H gt is the second label image
  • P out2 is the first Three prediction images
  • H out2 is the fourth prediction image.
  • the electronic device obtains the heat map of the sample image in the image space and the heat map of the sample image in the Hough space, and obtains the first label image by performing Gaussian processing on the heat map of the sample image in the image space. Gaussian processing is performed on the heatmap in Hough space to obtain the second label image.
  • the heat map of the brain image in the image space is a straight line
  • Gaussian processing is performed through Gaussian kernel convolution to transform a straight line into is a strip (that is, a line bundle), and the pixel value is normalized to [0, 1] to obtain the first label image.
  • the parameters of the Gaussian kernel convolution are not limited, for example, the parameters of the Gaussian kernel convolution are (5, 5, 5).
  • the brain image includes a point in the heat map of the Hough space (this image is also the gold standard image of the brain image), and the position of the point is (phi, theta, rho). Through Gaussian processing, the point is converted into a Gaussian The sphere, wherein the size of the Gaussian sphere is not limited, for example, the size of the Gaussian sphere is (5, 5, 5).
  • the loss function shown in formula (1) is the mean square error loss function, and other functions other than the mean square error loss function can be selected as the loss function of the first network model during application.
  • the electronic device After the electronic device obtains the loss value of the first network model, it adjusts the model parameters of the first network model based on the loss value of the first network model to obtain a second network model, and the electronic device responds to satisfying the training termination condition and converts the second network model to Determine the model for image acquisition.
  • the embodiment of the present application provides a first network model, as shown in FIG. 8 , which is a schematic diagram of the first network model provided in the embodiment of the present application.
  • the sample image is a brain scan image
  • the first label image is a heat map of the brain scan image in image space
  • the second label image is a heat map of the brain scan image in Hough space.
  • the first network model may include a backbone (Stem Block) network, an hourglass (Hourglass) network, and a 3D DHT network.
  • the brain scan image is input to the first network model, and the initial features of the brain scan image are extracted by the backbone network.
  • the backbone network may include three convolutional layers.
  • the initial feature is subjected to feature processing by the hourglass network to obtain the first image space feature (ie, the first image space feature), wherein the hourglass network is a convolutional neural network for key point detection.
  • the heat map of the brain scan image in the image space (ie, the first predicted image) is obtained; on the other hand, the first image space feature Hough transform is performed through the 3D DHT network to obtain the first Hough spatial feature (that is, the first Hough spatial feature).
  • the first Hough space feature undergoes 1 ⁇ 1 ⁇ 1 convolution and upsampling, a heat map of the brain scan image in the Hough space (ie, the second predicted image) is obtained.
  • the electronic device obtains a loss value of the first network model based on the first label image, the second label image, the first prediction image and the second prediction image, and trains an image acquisition model based on the loss value of the first network model. Since the image acquisition model includes a 3D DHT network, the parameters of the 3D DHT network will be adjusted when training the first network model.
  • the first network model also includes a residual block network, a 3D IDHT network, and the like.
  • the first Hough space feature first passes through two residual block networks to further extract the features of the Hough space, and then performs reverse Hough transformation through the 3D IDHT network to obtain the third image space feature.
  • the electronic device adds the third image space feature, the first image space feature and the initial feature to processing (that is, fusion processing) to obtain the first fusion feature.
  • the second image space feature ie, the second image space feature
  • the heat map of the brain scan image in the image space that is, the third predicted image
  • the second image space features After the Hough transform of the 3D DHT network, the second Hough space feature (that is, the second Hough space feature) is obtained.
  • the brain The heat map of the scanned image in Hough space (ie, the fourth predicted image).
  • the electronic device obtains the loss value of the first network model according to formula (1) based on the first label image, the second label image, the first prediction image, the second prediction image, the third prediction image and the fourth prediction image.
  • An image acquisition model is obtained through training based on the loss value of the first network model. Since the image acquisition model includes a 3D DHT network and a 3D IDHT network, the parameters of the 3D DHT network and the 3D IDHT network will be adjusted simultaneously during the training of the first network model.
  • the first image space feature includes at least two sub-image space features; the electronic device determining the first predicted image based on the first image space feature includes: the electronic device determining the first predictive image based on the at least two sub-image space features predict the image.
  • the first image space features include at least two sub-image space features, and at least two sub-image space features are pyramid-level features, that is, each sub-image space feature increases or decreases progressively, and the first network model can A first predicted image is determined based on at least two sub-image spatial features.
  • the electronic device determines the first Hough space feature based on the first image space feature, including: for any one of the at least two sub-image space features, the electronic device determines the sub-image space corresponding to any one of the sub-image space features
  • the Hough space feature, the first Hough space feature includes sub-Hough space features corresponding to each sub-image space feature.
  • the electronic device determining the second predicted image based on the first Hough space feature includes: the electronic device determining the second predicted image based on sub-Hough space features corresponding to each sub-image space feature.
  • the electronic device performs a Hough transform on the sub-image spatial feature to obtain a sub-Hough spatial feature corresponding to the sub-image spatial feature.
  • the sub-Hough space features corresponding to each sub-image space feature can be obtained, that is, the first Hough space feature can be obtained.
  • the sub-Hough space features corresponding to each sub-image space feature included in the first Hough space feature are also pyramid-level features, and the first network model can determine the second prediction based on the sub-Hough space features corresponding to each sub-image space feature image.
  • the first image space feature includes at least two sub-image space features
  • the first Hough space feature includes sub-Hough space features corresponding to each sub-image space feature.
  • the electronic device determines the first Hough space feature based on the first image space feature, including: for any sub-image space feature except the first sub-image space feature in the at least two sub-image space features, the electronic device determines the first Hough space feature based on any sub-image space feature.
  • the image space feature and the previous sub-image space feature of any sub-image space feature determine the second fusion feature corresponding to any sub-image space feature; the electronic device is based on the second fusion feature corresponding to any sub-image space feature and any The previous sub-image spatial feature of the sub-image spatial feature determines the sub-Hough space feature corresponding to any sub-image spatial feature.
  • the electronic device determines the second fusion feature corresponding to the first sub-image spatial feature based on the first sub-image spatial feature.
  • the electronic device determines the sub-image space feature based on the sub-image space feature and the previous sub-image space feature of the sub-image space feature.
  • the second fusion feature corresponding to the image space feature.
  • the electronic device determines the second fusion feature corresponding to the third sub-image space feature based on the third sub-image space feature and the second sub-image space feature.
  • the electronic device determines the second fusion feature corresponding to any sub-image space feature based on any sub-image space feature and a previous sub-image space feature of any sub-image space feature, including: the electronic device based on any sub-image space feature The last sub-image space feature of the image space feature, determine the second fusion feature corresponding to the last sub-image space feature; determine the fourth image space feature based on the second fusion feature corresponding to the last sub-image space feature, the fourth image space
  • the feature represents the feature of the sample image in the image space; the electronic device determines the third Hough space feature based on the fourth image space feature, and the third Hough space feature represents the feature of the sample image in the Hough space; based on any sub-image space feature, The third Hough spatial feature and the second fusion feature corresponding to the previous sub-image spatial feature, the electronic device determines the second fusion feature corresponding to any sub-image spatial feature.
  • the electronic device determines a second fusion feature corresponding to a previous sub-image spatial feature of the sub-image spatial feature.
  • the second fusion feature corresponding to the first sub-image space feature is determined based on the first sub-image space feature, optionally, for the first The spatial features of the sub-images are convolved to obtain the second fusion features corresponding to the spatial features of the first sub-image.
  • the determination method of the second fusion feature corresponding to the last sub-image space feature is the same as in the embodiment of the present application
  • the method of determining the second fusion feature corresponding to any sub-image spatial feature is the same, please refer to the following description for details.
  • the electronic device determines the second fusion feature corresponding to the previous sub-image spatial feature of any sub-image spatial feature, Convolute the second fusion feature corresponding to the spatial feature of the previous sub-image to obtain the fourth spatial feature of the image.
  • the electronic device performs Hough transform on the fourth image space feature to obtain the third Hough space feature, and then, based on any sub-image space feature, the third Hough space feature and the second sub-image space feature corresponding to The features are fused to obtain a second fused feature corresponding to the sub-image spatial feature.
  • the electronic device first performs inverse Hough transform on the third Hough space feature to obtain the fifth image space feature, the fifth image space feature represents the feature of the sample image in the image space, and then, the fifth image space feature Perform normalization to obtain a normalized fifth image space feature, and then multiply the normalized fifth image space feature and any sub-image space feature to obtain a product result.
  • the second fusion feature corresponding to the spatial feature of the previous sub-image is convolved to obtain the fourth spatial feature of the image, and the product result is added to the fourth spatial feature of the image to obtain the second fusion feature corresponding to the spatial feature of the sub-image.
  • the normalization can be realized by using an activation function such as Sigmoid.
  • FIG. 9 is a flowchart of a second fusion feature determination method provided by an embodiment of the present application. Since the second fusion feature corresponding to the previous sub-image spatial feature is larger than any sub-image spatial feature, a 3 ⁇ 3 convolution with a step size of 2 is performed on the second fusion feature corresponding to the previous sub-image spatial feature, The fourth image space feature is obtained by aligning the second fusion feature corresponding to the previous sub-image space feature with the size of any sub-image space feature. Based on the DHT network, Hough transform is performed on the fourth image space feature to obtain the third Hough space feature.
  • the features of the third Hough space are extracted to further extract the features of the Hough space, and then based on the IDHT network, the features of the third Hough space after feature extraction are reversed.
  • Hough transform to obtain the fifth image space feature.
  • a 1 ⁇ 1 convolution is performed on the fifth image spatial feature, and normalization is performed on the convolved fifth image spatial feature to obtain a normalized fifth image spatial feature.
  • the electronic device performs a 3 ⁇ 3 convolution on any sub-image spatial feature, so as to align the second fusion feature corresponding to the previous sub-image spatial feature with the size of any sub-image spatial feature. Multiply any one of the sub-image spatial features after convolution with the normalized fifth image spatial feature to obtain the product result, and add the product result to the fourth image spatial feature to obtain the corresponding sub-image spatial feature The second fusion feature.
  • the electronic device determines the sub-Hough space feature corresponding to any sub-image space feature based on the second fusion feature corresponding to any sub-image space feature and the previous sub-image space feature of any sub-image space feature.
  • the process of obtaining the normalized fifth image space feature based on the fourth image space feature may be expressed by the following formula (2).
  • y is the fifth image space feature after normalization
  • is the function symbol of normalization
  • h -1 is the function symbol of inverse Hough transform
  • f is the function of two residual block networks for feature extraction symbol
  • h is the function symbol of Hough transform
  • x is the space feature of the fourth image.
  • the network structure shown in FIG. 9 can be used as a pluggable network structure in the first network model. That is to say, the first network model may or may not include the network structure shown in FIG. 9 , and the network structure shown in FIG. 9 is a network structure based on an attention mechanism.
  • the second fusion feature corresponding to the previous sub-image spatial feature is a high-level feature, and any sub-image spatial feature is a low-level feature.
  • the fifth image spatial feature is obtained, so that The network structure pays more attention to the spatial features of the image and improves the accuracy of the spatial features of the image, thereby improving the accuracy of the image acquisition model.
  • the electronic device determines the sub-Hough space feature corresponding to any sub-image space feature based on the second fusion feature corresponding to any sub-image space feature and the previous sub-image space feature of any sub-image space feature, including : The electronic device determines the sub-Hough space feature corresponding to the previous sub-image space feature based on the previous sub-image space feature of any sub-image space feature; the electronic device determines the sub-Hough space feature corresponding to any sub-image space feature The sub-Hough space feature corresponding to the second fusion feature; the electronic device determines the sub-Hough space feature corresponding to any sub-image space feature based on the sub-Hough space feature corresponding to the second fusion feature and the sub-Hough space feature corresponding to the previous sub-image space feature Sub-Hough space features.
  • the electronic device determines the sub-Hough space feature corresponding to the previous sub-image spatial feature of any sub-image spatial feature.
  • Hough transform is performed on the second fusion feature corresponding to the spatial feature of the first sub-image to obtain the sub-Hough transform corresponding to the spatial feature of the first sub-image husband space features.
  • the determination method of the sub-Hough space feature corresponding to the last sub-image space feature is the same as any sub-image space feature
  • the method of determining the sub-Hough space features corresponding to the features is the same, please refer to the description below for details.
  • the electronic device determines the sub-Hough space feature corresponding to the previous sub-image spatial feature of any sub-image spatial feature After that, Hough transform is performed on the second fusion feature corresponding to any sub-image space feature to obtain the sub-Hough space feature corresponding to the second fusion feature, and then the sub-Hough space feature corresponding to the second fusion feature is combined with the above A sub-Hough space feature corresponding to a sub-image space feature is fused to obtain a sub-Hough space feature corresponding to any sub-image space feature.
  • the sub-Hough space features corresponding to each sub-image space feature can be determined.
  • the first image space feature in the embodiment of the present application includes at least two sub-image space features
  • the first predicted image can be determined based on the at least two sub-image space features
  • the first Hough space feature includes sub-Hough spaces corresponding to each sub-image space feature feature
  • the second predicted image can be determined based on the sub-Hough space features corresponding to the space features of each sub-image.
  • the first network model is adjusted to obtain a second network model.
  • the electronic device adjusts the first network model based on the label image pair and the predicted image pair to obtain the second network model, including: the electronic device obtains the loss value of the first network model based on the label image pair and the predicted image pair; A loss value of a network model adjusts the first network model to obtain a second network model.
  • the loss value of the first network model represents the difference between the predicted image pair and the labeled image pair.
  • the label image pair includes a first label image and a second label image
  • the predicted image pair includes a first predicted image and a second predicted image.
  • the loss value of the first network model is calculated according to the formula (3) shown below.
  • L spatial is the loss value of the image space
  • L hough is the loss value of the Hough space
  • L total is the loss value of the first network model
  • N is the number of pixels of the sample image in the image space
  • l is the first predicted image
  • M is the number of pixels of the sample image in the Hough space
  • h is the second predicted image
  • the electronic device After obtaining the loss value of the first network model, the electronic device adjusts the model parameters of the first network model based on the loss value of the first network model to obtain a second network model, and determines the second network model as Image acquisition model.
  • the embodiment of the present application provides another first network model, as shown in FIG. 10 , which is a schematic diagram of another first network model provided in the embodiment of the present application.
  • the sample image is a photographic image
  • the first label image is the heat map of the photographic image in the image space (referred to as the photographic composition)
  • the second label image is the heat map of the photographic image in the Hough space (referred to as the Hough key dot heatmap).
  • the first network model includes residual network (such as Resnet 50 network), Feature Pyramid Network (Feature Pyramid Networks, FPN), Hough Pyramid Attention Network (Hough Pyramid Attention Network, HPAN), DHT network and output network.
  • residual network such as Resnet 50 network
  • Feature Pyramid Networks, FPN Feature Pyramid Networks
  • Hough Pyramid Attention Network Hough Pyramid Attention Network
  • HPAN Hough Pyramid Attention Network
  • DHT network DHT network
  • the photographic image is input to the residual network
  • the initial feature of the photographic image is extracted by the residual network
  • the initial feature includes at least two sub-initial features
  • the at least two sub-initial features are pyramid-level features.
  • the photographic image is input to the residual network, and the information flow of the residual network is the upstream (Up Stream) information flow.
  • the residual network first extracts the sub-initial feature C0, then obtains the sub-initial feature C1 based on the sub-initial feature C0, then obtains the sub-initial feature C2 based on the sub-initial feature C1, and so on until the sub-initial feature C5 is obtained.
  • the sub-initial features C2 to C5 are input to the FPN, and the first image space feature of the photographic image is determined by the FPN, the first image space feature includes at least two sub-image space features, and the information flow of the FPN is the information flow of the downstream (Down Stream) .
  • the four sub-image spatial features are taken as an example for illustration.
  • FPN determines sub-image spatial feature P5 based on sub-initial feature C5
  • FPN determines sub-image spatial feature P4 based on sub-initial feature C4 and sub-image spatial feature P5
  • FPN determines sub-image spatial feature P3 based on sub-initial feature C3 and sub-image spatial feature P4,
  • FPN determines sub-image spatial feature P2 is determined based on the sub-initial feature C2 and the sub-image spatial feature P3.
  • the output network determines and outputs the photographic composition based on the sub-image spatial features P2 to P5; on the other hand, HPAN determines the second fusion feature corresponding to each sub-image spatial feature, and the information flow of HPAN is the upstream information flow.
  • the HPAN determines the second fusion feature PA2 corresponding to the sub-image space feature P2 based on the sub-image space feature P2, and the HPAN determines the second fusion feature PA3 corresponding to the sub-image space feature P3 based on the sub-image space feature P3 and the second fusion feature PA2.
  • the sub-image spatial feature P4 and the second fusion feature PA3 determine the second fusion feature PA4 corresponding to the sub-image space feature P4, and the HPAN determines the second fusion feature corresponding to the sub-image space feature P5 based on the sub-image space feature P5 and the second fusion feature PA4 PA5.
  • HPAN determines the second fusion feature based on the manner shown in FIG. 9 , and the HPAN determines the second fusion feature PA3 based on the sub-image spatial feature P3 and the second fusion feature PA2 as an example for illustration.
  • HPAN first performs 3 ⁇ 3 convolution on the second fusion feature PA2 to obtain the fourth image space feature, based on the DHT network, performs Hough transform on the fourth image space feature to obtain the third Hough space feature. Then, based on the two residual block networks, the feature extraction of the third Hough space feature is performed, and then based on the IDHT network, the reverse Hough transform is performed on the feature extracted third Hough space feature to obtain the fifth image space feature.
  • a 1 ⁇ 1 convolution is performed on the fifth image spatial feature, and normalization is performed on the convolved fifth image spatial feature to obtain a normalized fifth image spatial feature.
  • the spatial features of the four images are added to obtain the second fusion feature PA3.
  • the DHT network determines the sub-Hough space feature corresponding to each sub-image space feature, and the information flow of the DHT network is the upstream information flow.
  • the DHT network performs Hough transform on the second fusion feature PA2 to obtain the sub-Hough space feature H2 corresponding to the sub-image space feature P2.
  • the DHT network performs Hough transform on the second fusion feature PA3 to obtain the sub-Hough space feature corresponding to the second fusion feature PA3, and determines the sub-image based on the sub-Hough space feature and sub-Hough space feature H2 corresponding to the second fusion feature PA3
  • the sub-Hough spatial feature H3 corresponding to the spatial feature P3.
  • the DHT network performs Hough transform on the second fusion feature PA4 to obtain the sub-Hough space feature corresponding to the second fusion feature PA4, and determines the sub-image space based on the sub-Hough space feature and sub-Hough space feature H3 corresponding to the second fusion feature PA4 The sub-Hough space feature H4 corresponding to feature P4.
  • the DHT network performs Hough transform on the second fusion feature PA5 to obtain the sub-Hough space feature corresponding to the second fusion feature PA5, and determines the sub-image space based on the sub-Hough space feature and sub-Hough space feature H4 corresponding to the second fusion feature PA5
  • the output network determines and outputs the Hough keypoint heatmap based on the sub-Hough spatial features H2 to H5.
  • the image acquisition model mainly detects a certain plane of the three-dimensional sample image. Therefore, the heat map of the sample image in the image space can be used as the plane attention image of the sample image.
  • the image acquisition model mainly detects the straight line or ellipse of the two-dimensional sample image. Therefore, the heat map of the sample image in the image space can be used as the line attention image of the sample image. Since the DHT network and IDHT network can realize the transformation of the sample image in the image space and the sample image in the Hough space, the image acquisition model can easily extract geometric features such as lines and planes, and improve the accuracy of image detection.
  • the embodiment of the present application also provides an image detection method, taking the flowchart of an image detection method provided by the embodiment of the application shown in Figure 11 as an example, the method can be implemented by the electronic device in Figure 1 11 execution. As shown in FIG. 11 , the method includes steps 1101 to 1103 .
  • Step 1101 the electronic device acquires an image to be detected.
  • the embodiment of the present application does not limit the images to be detected.
  • the images to be detected include but not limited to brain images, landscape images, road images, fetal images, cell images, etc., and the number of images to be detected is at least one.
  • Step 1102 the electronic device acquires the target image of the image to be detected according to the image acquisition model.
  • the target image is a heat map of the image to be detected in Hough space
  • the image acquisition model is obtained according to the training method of the image acquisition model shown in the above method embodiment.
  • the training process of the image acquisition model includes: obtaining the label image pair of the sample image, the label image pair includes the first label image and the second label image, the first label image is the heat map of the sample image in the image space, and the second label image
  • the image is the heat map of the sample image in Hough space; according to the first network model, the predicted image pair of the sample image is obtained, the predicted image pair includes the first predicted image and the second predicted image, and the first predicted image is obtained by the first network model
  • the heat map of the sample image in the image space, the second predicted image is the heat map of the sample image obtained by the first network model in the Hough space; the first network model is adjusted based on the label image pair and the predicted image pair to obtain the second network model , so that the difference between the predicted image pair obtained according to the second network model and the label image pair is reduced; in response to satisfying the training termination condition, the second network model is determined as the image acquisition model.
  • the image to be detected is input into the image acquisition model, and the image acquisition model outputs the target image pair of the image to be detected.
  • the target image pair of the image to be detected includes a first target image and a second target image.
  • the first target image is a heat map of the image to be detected in the image space, and the value of any pixel in the first target image represents the probability that the pixel is on the target line or on the target ellipse or on the target plane.
  • the second target image is a heat map of the image to be detected in Hough space, and the value of any pixel in the second target image represents the probability that the pixel is a target line or a target ellipse or a target plane.
  • the second target image is the target image mentioned in step 1102 .
  • the value of any pixel in the first target image represents the probability that the pixel is on the target line (or on the target ellipse or on the target plane). If the target line is determined based on the first target image, then It is necessary to determine multiple points from the first target image, and use multiple points to form a target line. Since there may be errors in the determination of each point, the target line is prone to distortion, and the accumulation of errors at multiple points will make the target line The accuracy is lower.
  • the value of any pixel point in the second target image represents the probability that the pixel point is a target line (or target ellipse or target plane). By determining a pixel point from the second target image, the standard target can be determined Straight line, high accuracy. Therefore, the method provided in the embodiment of the present application uses the second target image as the target image during image detection, so that the accuracy of subsequent image detection results is higher.
  • the image to be detected is a brain image
  • the electronic device acquires the target image of the image to be detected according to the image acquisition model, including: the electronic device acquires the first image space feature of the brain image according to the image acquisition model , the first image space feature of the brain image represents the feature of the brain image in the image space; the first Hough space feature of the brain image is determined based on the first image space feature of the brain image, and the first Hough space feature of the brain image
  • the spatial feature characterizes the feature of the brain image in Hough space; the target image of the brain image is determined based on the first Hough space feature of the brain image.
  • the image acquisition model includes a backbone network, an hourglass network and a 3D DHT network
  • the backbone network extracts the initial features of the brain image
  • the initial features of the brain image are processed by the hourglass network to obtain the first image of the brain image Spatial feature
  • the first image spatial feature of the brain image is subjected to Hough transformation by the 3D DHT network
  • the first Hough spatial feature of the brain image is obtained, and the first Hough spatial feature is convolved with 1 ⁇ 1 ⁇ 1
  • the target image is obtained.
  • the related introduction in FIG. 8 please refer to the related introduction in FIG. 8 , which will not be repeated here.
  • the electronic device determines the target image of the brain image based on the first Hough space feature of the brain image, including: the electronic device determines the target image of the brain image based on the first Hough space feature of the brain image and the The first image space feature determines the second image space feature of the brain image; determines the second Hough space feature of the brain image based on the second image space feature of the brain image; determines the second Hough space feature based on the brain image Target images for brain images.
  • the image acquisition model after the image acquisition model obtains the first Hough space feature of the brain image, it first determines the brain image based on the first image space feature of the brain image and the first Hough space feature of the brain image. The second image space features. Then, Hough transform is performed based on the second image space feature of the brain image to obtain the second Hough space feature of the brain image. Then, upsampling is performed based on the second Hough space feature of the brain image to obtain the target image.
  • the image acquisition model includes a backbone network, an hourglass network, a 3D DHT network, a residual block network, a 3D IDHT network, etc.
  • the brain image is processed by the backbone network, the hourglass network, and the 3D DHT network to obtain The first Hough space features.
  • the first Hough space feature of the brain image passes through two residual block networks to further extract the features of the Hough space, and then performs reverse Hough transformation through the 3D IDHT network to obtain the third image space feature of the brain image .
  • the electronic device fuses the third image space feature, the first image space feature and the initial feature of the brain image to obtain the first fusion feature.
  • the first fusion feature is processed by the hourglass network
  • the second image space feature of the brain image is obtained, and the second image space feature of the brain image is subjected to Hough transform through the 3D DHT network to obtain the second Hough of the brain image Spatial feature, the second Hough spatial feature undergoes 1 ⁇ 1 ⁇ 1 convolution and upsampling to obtain the target image.
  • the related introduction in FIG. 8 which will not be repeated here.
  • the images to be detected can be not only brain images, but also landscape images, road images, fetal images, cell images, etc.
  • the processing method is similar to the processing method of the image acquisition model for brain images, and the processing method of the image acquisition model for the image to be detected is similar to the processing method of the first network model for the sample image. For details, see the introduction of the first network model. I won't repeat them here.
  • Step 1103 the electronic device determines an image detection result of the image to be detected based on the target image.
  • the target image is a heat map of the image to be detected in Hough space.
  • a certain plane, line or ellipse of the image to be detected is determined to obtain the image detection result.
  • the electronic device determines the image detection result of the image to be detected based on the target image, including: the electronic device performs non-maximum value suppression processing on the target image to obtain at least one peak point in the target image; At least one peak point in the image determines the image detection result of the image to be detected.
  • Non-maximum suppression is to suppress elements that are not maximum values, which can be understood as a local maximum search.
  • the target image is a heat map of the image to be detected in Hough space, and the target image includes at least one Gaussian sphere.
  • the maximum value in each Gaussian sphere is searched to obtain the peak point of the Gaussian sphere, that is, at least one peak point in the target image is obtained. Any peak point corresponds to a plane, line or ellipse.
  • the image to be detected is a brain image, and the image detection result is the brain midline plane of the brain image; the image to be detected is a fetal image, and the image detection result is at least one standard plane of the fetal image, and any standard plane represents the fetal head Any one of the standard plane of the fetal abdomen, the standard plane of the fetal abdomen, and the standard plane of the fetal femur; the image to be detected is a cell image, and the image detection result is at least one ellipse of the cell image, and any ellipse represents the outline of the cell; the image to be detected is a photographic image, the image detection result is at least one line of the photographic image, and the at least one line represents the structure of the object in the photographic image.
  • the images to be detected include but not limited to brain images, fetal images, cell images, and photographic images.
  • the image detection results of the images to be detected are also different. In the following, different images to be detected and their image detection results are introduced in detail from the perspectives of the implementation modes B1 to B4 respectively.
  • Implementation mode B1 the image to be detected is a brain image
  • the electronic device determines the image detection result of the image to be detected based on the target image, including: determining a peak point in the target image; and determining a brain midline plane based on the peak point in the target image.
  • the brain image is input to the image acquisition model to obtain a target image output by the image acquisition model, wherein the target image is a heat map of a key point.
  • the target image is a heat map of a key point.
  • determine the peak point of the key point to obtain the peak point in the target image.
  • a plane equation is determined based on the peak point in the target image, and the plane equation is a plane equation corresponding to the brain midline plane of the brain image.
  • FIG. 12 is a schematic diagram of processing a brain scan image provided by an embodiment of the present application.
  • the embodiment of the present application does not limit the structure of the image acquisition model.
  • the image acquisition model can be the model shown in Figure 8
  • the image to be detected is a brain scan image
  • the brain scan image includes at least one brain image .
  • the electronic device inputs the brain scan image to the image acquisition model, and the image acquisition model outputs the Hough key point heat map (ie, the target image), and calculates the peak point based on the Hough key point heat map.
  • the peak point here includes three parameters, respectively ⁇ and ⁇ . in, ⁇ and ⁇ are three parameters of the plane in the polar coordinate system.
  • the brain midline offset is calculated based on the plane equation, and a two-dimensional visualization image is produced based on the brain midline offset.
  • the thickened line in the two-dimensional visualization image represents the brain midline.
  • three-dimensional visualization is performed based on the plane equation to obtain a three-dimensional visualization image, and the parallelogram in the three-dimensional visualization image represents the midline plane of the brain.
  • the image to be detected is a fetal image
  • the electronic device determines the image detection result of the image to be detected based on the target image, including: determining each peak point in the target image by the electronic device; determining each peak point based on each peak point in the target image
  • the standard plane corresponding to the point, the standard plane corresponding to any peak point represents any one of the standard plane of the fetal head, the standard plane of the fetal abdomen, and the standard plane of the fetal femur.
  • the fetal image is input into the image acquisition model to obtain a target image output by the image acquisition model, wherein the target image is a heat map of at least one key point.
  • the target image is a heat map of at least one key point.
  • peak points of each key point are determined to obtain each peak point in the target image.
  • the plane equation corresponding to each peak point is determined based on each peak point in the target image, and the plane equation corresponding to any peak point is the plane equation corresponding to the standard plane of the fetal head, the standard plane of the fetal abdomen, or the standard plane of the fetal femur.
  • FIG. 13 is a schematic diagram of processing a fetal scan image provided by an embodiment of the present application.
  • the embodiment of the present application does not limit the structure of the image acquisition model.
  • the image acquisition model may be a model as shown in FIG. 8
  • the image to be detected is a fetal scan image
  • the fetal scan image includes at least one fetal image.
  • the electronic device inputs the scan image of the fetus into the image acquisition model, and the image acquisition model outputs a Hough key point heat map (that is, a target image), wherein the Hough key point heat map is a heat map of multiple key points.
  • Determine the plane equations corresponding to each peak point and obtain the plane equations of the femur, abdomen, head, etc. of the fetus, so as to quickly locate the plane, reduce the time of film reading, and obtain the standard planes of the femur, abdomen, head, etc. of the fetus.
  • fetal development parameters are measured based on the standard planes of the femur, abdomen, head, etc. of the fetus to evaluate the fetal development.
  • the image to be detected is a cell image
  • the electronic device determines the image detection result of the image to be detected based on the target image, including: determining each peak point in the target image; determining each peak point corresponding to each peak point in the target image
  • the ellipse of , the ellipse corresponding to any peak point represents the outline of the cell.
  • the cell image is input to the image acquisition model to obtain a target image output by the image acquisition model, wherein the target image is a heat map of at least one key point.
  • the target image is a heat map of at least one key point.
  • peak points of each key point are determined to obtain each peak point in the target image.
  • the ellipse equation corresponding to each peak point is determined based on each peak point in the target image, and the ellipse equation corresponding to any peak point is the ellipse equation corresponding to the contour of the cell.
  • the cell image includes at least one cell, and by determining the ellipse equation corresponding to each peak point, the ellipse equation corresponding to the outline of each cell in the cell image can be determined, thereby detecting each cell in the cell image.
  • FIG. 14 is a schematic diagram of processing a cell image provided in an embodiment of the present application.
  • the embodiment of the present application does not limit the structure of the image acquisition model.
  • the image acquisition model may be the model shown in FIG. 10
  • the image to be detected is a cell image
  • the cell image includes multiple cells.
  • the electronic device inputs the cell image to the image acquisition model, and the image acquisition model outputs a heat map of Hough key points (that is, a target image), wherein the heat map of Hough key points is a heat map of multiple key points.
  • the ellipse equation corresponding to each peak point is determined, and the ellipse equation corresponding to the outline of each cell in the cell image is obtained, so as to detect each cell in the cell image and obtain a detection result.
  • Implementation mode B4 the image to be detected is a photographic image, and the electronic device determines the image detection result of the image to be detected based on the target image, including: determining each peak point in the target image; determining each peak point corresponding to each peak point in the target image The lines corresponding to each peak point represent the structure of the photographic object.
  • the photographic image is input to the image acquisition model to obtain a target image output by the image acquisition model, wherein the target image is a heat map of at least one key point.
  • the target image is a heat map of at least one key point.
  • peak points of each key point are determined to obtain each peak point in the target image.
  • the line corresponding to any peak point is the line of the photographic object, and the lines corresponding to each peak point can represent the structure of the photographic object.
  • the line corresponding to any peak point can be a line on the edge of the road, a line on a road sign (such as a straight road sign, a left-turn road sign, etc.), and the lines corresponding to each peak point can be The structure that characterizes the road.
  • FIG. 15 is a schematic diagram of processing a photographic image provided by an embodiment of the present application.
  • the embodiment of the present application does not limit the structure of the image acquisition model.
  • the image acquisition model may be the model shown in FIG. 10
  • the image to be detected is a photographic image.
  • the electronic device inputs the photographic image to the image acquisition model, and the image acquisition model outputs a Hough key point heat map (that is, a target image), wherein the Hough key point heat map is a heat map of multiple key points.
  • the image detection model in the above method is trained based on the heat map of the sample image in the Hough space and the heat map of the sample image in the image space, so that the image acquisition model learns the key point features of the Hough space and the semantic features of the image space , can quickly and accurately determine the heat map of the image in the Hough space, and realize the accurate determination of the image detection result based on the heat map of the image in the Hough space, thereby improving the accuracy of subsequent analysis and processing.
  • FIG. 16 is a schematic structural diagram of a training device for an image acquisition model provided in an embodiment of the present application. As shown in FIG. 16 , the device includes:
  • the first acquisition module 1601 is used to acquire the label image pair of the sample image, the label image pair includes the first label image and the second label image, the first label image is the heat map of the sample image in the image space, and the second label image is the sample image Heatmap of images in Hough space;
  • the second acquisition module 1602 is configured to acquire a predicted image pair of the sample image according to the first network model, the predicted image pair includes a first predicted image and a second predicted image, and the first predicted image is a sample image obtained by the first network model in The heat map of the image space, the second predicted image is the heat map of the sample image obtained by the first network model in the Hough space;
  • An adjustment module 1603, configured to adjust the first network model based on the label image pair and the predicted image pair to obtain a second network model, so that the difference between the predicted image pair obtained according to the second network model and the label image pair is reduced;
  • a determining module 1604 configured to determine the second network model as the image acquisition model in response to satisfying the training termination condition.
  • the second acquisition module 1602 is configured to acquire the first image space feature of the sample image, the first image space feature characterizes the feature of the sample image in the image space; determine the first image space feature based on the first image space feature predicting an image; determining a first Hough space feature based on the first image space feature, the first Hough space feature characterizing the feature of the sample image in the Hough space; determining a second predicted image based on the first Hough space feature.
  • the second acquisition module 1602 is configured to rectify the first image space feature to obtain the rectified first image space feature; based on the rectified first image space feature, determine the first husband space features.
  • the adjustment module 1603 is configured to determine the loss value of the first network model based on the label image pair and the predicted image pair, and the loss value of the first network model represents the difference between the predicted image pair and the label image pair. difference; adjusting the first network model based on the loss value of the first network model to obtain a second network model, so that the difference between the predicted image pair and the label image pair obtained according to the second network model is reduced.
  • the second obtaining module 1602 is further configured to determine a second image space feature based on the first Hough space feature and the first image space feature; determine a third predicted image based on the second image space feature, The third prediction image is the heat map of the sample image obtained by the first network model in the image space; the second Hough space feature is determined based on the second image space feature; the fourth prediction image is determined based on the second Hough space feature, and the fourth prediction The image is a heat map of the sample image obtained by the first network model in Hough space;
  • An adjustment module 1603, configured to adjust the first network model based on the label image pair, the predicted image pair, the third predicted image, and the fourth predicted image to obtain a second network model, so that the predicted image pair obtained according to the second network model is consistent with the label
  • the difference between the image pairs is reduced, and the difference between the obtained image pair composed of the third predicted image and the fourth predicted image and the label image pair is reduced.
  • the second acquisition module 1602 is configured to determine the third image space feature based on the first Hough space feature; fuse the first image space feature and the third image space feature to obtain the first fusion features; determining second image space features based on the first fused features.
  • the adjustment module 1603 is configured to obtain a first loss value according to the label image pair and the predicted image pair, and the first loss value represents the difference between the predicted image pair and the label image pair; according to the label image pair pair, the third predicted image and the fourth predicted image to obtain a second loss value, the second loss value represents the difference between the image pair composed of the third predicted image and the fourth predicted image and the label image pair; based on the first loss value and the second loss value to obtain the loss value of the first network model; adjust the first network model based on the loss value of the first network model to obtain the second network model, so that the predicted image pair obtained according to the second network model is consistent with the label image The difference between the pairs is reduced, and the difference between the obtained image pair composed of the third predicted image and the fourth predicted image and the label image pair is reduced.
  • the first image space feature includes at least two sub-image space features
  • the second obtaining module 1602 is configured to determine a first prediction image based on at least two sub-image spatial features.
  • the second acquisition module 1602 is configured to, for any one of the at least two sub-image space features, determine the sub-Hough space feature corresponding to any one of the sub-image space features, the first The Hough space features include sub-Hough space features corresponding to each sub-image space feature; the second predicted image is determined based on the sub-Hough space features corresponding to each sub-image space feature.
  • the second obtaining module 1602 is configured to, for any sub-image spatial feature except the first sub-image spatial feature among the at least two sub-image spatial features, based on any sub-image spatial feature and the previous sub-image space feature of any sub-image space feature, determine the second fusion feature corresponding to any sub-image space feature; based on the second fusion feature corresponding to any sub-image space feature and the last sub-image space feature , to determine the sub-Hough space feature corresponding to any sub-image space feature.
  • the second acquisition module 1602 is configured to determine the second fusion feature corresponding to the last sub-image space feature based on the last sub-image space feature; Two fusion features, determine the fourth image space feature; determine the third Hough space feature based on the fourth image space feature; based on any sub-image space feature, the third Hough space feature and the second sub-image space feature
  • the fused feature is to determine a second fused feature corresponding to any sub-image spatial feature.
  • the second acquisition module 1602 is configured to determine the sub-Hough space feature corresponding to the last sub-image space feature based on the last sub-image space feature;
  • the second fusion feature determines the sub-Hough space feature corresponding to the second fusion feature; based on the sub-Hough space feature corresponding to the second fusion feature and the sub-Hough space feature corresponding to the previous sub-image space feature, determine any sub-image The sub-Hough spatial features corresponding to the spatial features.
  • the above device image detection model is trained based on the heat map of the sample image in the Hough space and the heat map of the sample image in the image space, so that the image acquisition model learns the key point features of the Hough space and the semantic features of the image space, and can Quickly and accurately determine the heat map of the image in the Hough space, and realize accurate image detection results based on the heat map of the image in the Hough space, thereby improving the accuracy of subsequent analysis and processing.
  • the device provided in FIG. 16 realizes its functions, it only uses the division of the above-mentioned functional modules as an example. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the system is divided into different functional modules to complete all or part of the functions described above. In addition, the device and the method embodiment provided by the above embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • FIG. 17 is a schematic structural diagram of an image detection device provided in an embodiment of the present application. As shown in FIG. 17, the device includes:
  • a first acquisition module 1701 configured to acquire an image to be detected
  • the second acquisition module 1702 is used to acquire a target image of the image to be detected according to the image acquisition model, and the target image is a heat map of the image to be detected in Hough space;
  • a determining module 1703 configured to determine an image detection result of the image to be detected based on the target image
  • the training process of the image acquisition model includes:
  • the label image pair includes the first label image and the second label image
  • the first label image is the heat map of the sample image in the image space
  • the second label image is the heat map of the sample image in the Hough space ;
  • the predicted image pair of the sample image is obtained, the predicted image pair includes the first predicted image and the second predicted image, the first predicted image is the heat map of the sample image in the image space obtained by the first network model, and the second predicted image The predicted image is the heat map of the sample image obtained by the first network model in Hough space;
  • the second network model is determined as the image acquisition model.
  • the image to be detected is a brain image
  • the second acquisition module 1702 is configured to acquire the first image space feature of the brain image according to the image acquisition model, the first image space feature of the brain image Characterize the features of the brain image in the image space; determine the first Hough space feature of the brain image based on the first image space feature of the brain image, and the first Hough space feature of the brain image represents the brain image in the Hough space The feature of the brain image; determining the target image of the brain image based on the first Hough space feature of the brain image.
  • the second acquisition module 1702 is configured to determine the second image space feature of the brain image based on the first Hough space feature of the brain image and the first image space feature of the brain image;
  • the second image space feature of the brain image determines the second Hough space feature of the brain image;
  • the target image of the brain image is determined based on the second Hough space feature of the brain image.
  • the determination module 1703 is configured to perform non-maximum value suppression processing on the target image to obtain at least one peak point in the target image; determine the value of the image to be detected based on at least one peak point in the target image Image detection results.
  • the above device image detection model is trained based on the heat map of the sample image in the Hough space and the heat map of the sample image in the image space, so that the image acquisition model learns the key point features of the Hough space and the semantic features of the image space, and can Quickly and accurately determine the heat map of the image in the Hough space, and realize accurate image detection results based on the heat map of the image in the Hough space, thereby improving the accuracy of subsequent analysis and processing.
  • Fig. 18 shows a structural block diagram of a terminal device 1800 provided by an exemplary embodiment of the present application.
  • the terminal device 1800 can be a portable mobile terminal, such as: a smart phone, a tablet computer, an MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, Motion Picture Expert compresses standard audio levels 4) Players, laptops or desktops.
  • the terminal device 1800 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal device 1800 includes: a processor 1801 and a memory 1802 .
  • the processor 1801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • Processor 1801 can adopt at least one hardware form in DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • the processor 1801 can also include a main processor and a coprocessor, the main processor is a processor for processing data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is Low-power processor for processing data in standby state.
  • the processor 1801 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 1801 may also include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 1802 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 1802 may also include high-speed random access memory, and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1802 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1801 to realize the image acquisition provided by the method embodiments in this application The training method of the model or the image detection method.
  • the terminal device 1800 may optionally further include: a peripheral device interface 1803 and at least one peripheral device.
  • the processor 1801, the memory 1802, and the peripheral device interface 1803 may be connected through buses or signal lines.
  • Each peripheral device can be connected to the peripheral device interface 1803 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 1804 and a display screen 1805 .
  • the peripheral device interface 1803 may be used to connect at least one peripheral device related to I/O (Input/Output, input/output) to the processor 1801 and the memory 1802 .
  • the processor 1801, memory 1802 and peripheral device interface 1803 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1801, memory 1802 and peripheral device interface 1803 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 1804 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1804 communicates with the communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 1804 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 1804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • the radio frequency circuit 1804 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.
  • the radio frequency circuit 1804 may also include circuits related to NFC (Near Field Communication, short-range wireless communication), which is not limited in this application.
  • the display screen 1805 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 1805 also has the ability to collect touch signals on or above the surface of the display screen 1805 .
  • the touch signal can be input to the processor 1801 as a control signal for processing.
  • the display screen 1805 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 1805 there may be one display screen 1805, which is set on the front panel of the terminal device 1800; in other embodiments, there may be at least two display screens 1805, which are respectively set on different surfaces of the terminal device 1800 or folded Design; in some other embodiments, the display screen 1805 may be a flexible display screen, which is arranged on the curved surface or the folded surface of the terminal device 1800 . Even, the display screen 1805 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 1805 can be made of LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light-emitting diode) and other materials.
  • FIG. 18 does not constitute a limitation on the terminal device 1800, and may include more or less components than shown in the figure, or combine certain components, or adopt a different component arrangement.
  • FIG. 19 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 1900 may have relatively large differences due to different configurations or performances, and may include one or more processors 1901 and one or more memories 1902, wherein, At least one program code is stored in the one or more memories 1902, and the at least one program code is loaded and executed by the one or more processors 1901 to implement the image acquisition model training method or image detection provided by the above method embodiments method, for example, the processor 1901 is a CPU.
  • the server 1900 may also have components such as wired or wireless network interfaces, keyboards, and input and output interfaces for input and output, and the server 1900 may also include other components for implementing device functions, which will not be described in detail here.
  • an electronic device in an exemplary embodiment, includes a processor and a memory, at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operate:
  • the label image pair includes the first label image and the second label image
  • the first label image is the heat map of the sample image in the image space
  • the second label image is the heat map of the sample image in the Hough space ;
  • the predicted image pair of the sample image is obtained, the predicted image pair includes the first predicted image and the second predicted image, the first predicted image is the heat map of the sample image in the image space obtained by the first network model, and the second predicted image The predicted image is the heat map of the sample image obtained by the first network model in Hough space;
  • the second network model is determined as the image acquisition model.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the first Hough space feature based on the first image space feature, where the first Hough space feature characterizes the features of the sample image in the Hough space;
  • a second predicted image is determined based on the first Hough spatial features.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the first network model is adjusted based on the loss value of the first network model to obtain a second network model, so that the difference between the predicted image pair and the label image pair obtained according to the second network model is reduced.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the fourth prediction image is a heat map of the sample image obtained by the first network model in Hough space;
  • the first network model is adjusted based on the label image pair, the predicted image pair, the third predicted image and the fourth predicted image to obtain the second network model, so that the difference between the predicted image pair obtained according to the second network model and the label image pair decreases, and the difference between the obtained image pair composed of the third predicted image and the fourth predicted image and the label image pair decreases.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • a second image space feature is determined based on the first fused feature.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the first loss value represents the difference between the prediction image pair and the label image pair
  • the second loss value represents the difference between the image pair composed of the third predicted image and the fourth predicted image and the label image pair;
  • the first image space feature includes at least two sub-image space features; the at least one program code is loaded and executed by a processor, so that the electronic device implements the following operations:
  • a first predicted image is determined based on at least two sub-image spatial features.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the first Hough space feature includes the sub-Hough space feature corresponding to each sub-image space feature ;
  • the second predicted image is determined based on the sub-Hough space features corresponding to the space features of each sub-image.
  • the first image space feature includes at least two sub-image space features
  • the first Hough space feature includes sub-Hough space features corresponding to each sub-image space feature
  • the at least one program code is executed by the processor Load and execute, so that the electronic device can perform the following operations:
  • any sub-image spatial feature except the first sub-image spatial feature in at least two sub-image spatial features based on any sub-image spatial feature and the previous sub-image spatial feature of any sub-image spatial feature, determine any A second fusion feature corresponding to a sub-image spatial feature;
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the third Hough space feature and the second fusion feature corresponding to the previous sub-image space feature determine the second fusion feature corresponding to any sub-image space feature.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • an electronic device in an exemplary embodiment, includes a processor and a memory, at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operate:
  • the target image of the image to be detected is obtained, and the target image is the heat map of the image to be detected in Hough space;
  • the training process of the image acquisition model includes:
  • the label image pair includes the first label image and the second label image
  • the first label image is the heat map of the sample image in the image space
  • the second label image is the heat map of the sample image in the Hough space ;
  • the predicted image pair of the sample image is obtained, the predicted image pair includes the first predicted image and the second predicted image, the first predicted image is the heat map of the sample image in the image space obtained by the first network model, and the second predicted image The predicted image is the heat map of the sample image obtained by the first network model in Hough space;
  • the second network model is determined as the image acquisition model.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • the first image space feature of the brain image represents the feature of the brain image in the image space
  • the first Hough space feature of the brain image is determined based on the first image space feature of the brain image, and the first Hough space feature of the brain image represents the feature of the brain image in Hough space;
  • a target image of the brain image is determined based on the first Hough spatial feature of the brain image.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • a target image of the brain image is determined based on the second Hough space feature of the brain image.
  • the at least one program code is loaded and executed by the processor, so that the electronic device implements the following operations:
  • An image detection result of the image to be detected is determined based on at least one peak point in the target image.
  • a computer-readable storage medium is also provided, and at least one program code is stored in the storage medium, and the at least one program code is loaded and executed by a processor, so that the electronic device implements any one of the above-mentioned A training method for an image acquisition model or an image detection method.
  • the above-mentioned computer-readable storage medium may be a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a compact disc (Compact Disc Read-Only Memory, CD-ROM) ), tapes, floppy disks, and optical data storage devices, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • CD-ROM Compact Disc Read-Only Memory
  • a computer program or a computer program product stores at least one computer instruction, the at least one computer instruction is loaded and executed by a processor, so that the electronic device A training method or an image detection method for any one of the above image acquisition models is implemented.

Abstract

一种图像获取模型的训练方法、图像检测方法、装置及设备,属于图像处理技术领域。方法包括:获取样本图像的标签图像对(201);根据第一网络模型,获取样本图像的预测图像对(202);基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小(203);响应于满足训练终止条件,将第二网络模型确定为图像获取模型(204)。采用上述方法、装置及设备能够快速而准确的确定出图像在霍夫空间的热图,实现准确的基于图像在霍夫空间的热图确定图像检测结果,从而提高后续分析处理的准确性。

Description

图像获取模型的训练方法、图像检测方法、装置及设备
本申请要求于2021年10月15日提交、申请号为202111205913.3、发明名称为“图像获取模型的训练方法、图像检测方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及图像处理技术领域,特别涉及一种图像获取模型的训练方法、图像检测方法、装置及设备。
背景技术
随着计算机的不断发展,图像检测技术的应用范围越来越广,图像检测技术的方法种类也越来越多,霍夫变换便是图像检测技术的方法之一。基于霍夫变换的原理能够检测出图像中对象的平面、直线、椭圆等,从而进行后续分析处理。
发明内容
本申请实施例提供了一种图像获取模型的训练方法、图像检测方法、装置及设备,能够提高图像在霍夫空间的热图的准确度,所述技术方案包括如下内容。
一方面,本申请实施例提供了一种图像获取模型的训练方法,所述方法包括:
电子设备获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在霍夫空间的热图;
所述电子设备根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
所述电子设备基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
所述电子设备响应于满足训练终止条件,将所述第二网络模型确定为图像获取模型。
另一方面,本申请实施例提供了一种图像检测方法,所述方法包括:
电子设备获取待检测图像;
所述电子设备根据图像获取模型,获取所述待检测图像的目标图像,所述目标图像是所述待检测图像在霍夫空间的热图;
所述电子设备基于所述目标图像确定所述待检测图像的图像检测结果;
其中,所述图像获取模型的训练过程,包括:
获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在所述霍夫空间的热图;
根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
响应于满足训练终止条件,将所述第二网络模型确定为所述图像获取模型。
另一方面,本申请实施例提供了一种图像获取模型的训练装置,所述装置包括:
第一获取模块,用于获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在霍夫空间的热图;
第二获取模块,用于根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
调整模块,用于基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
确定模块,用于响应于满足训练终止条件,将所述第二网络模型确定为图像获取模型。
另一方面,本申请实施例提供了一种图像检测装置,所述装置包括:
第一获取模块,用于获取待检测图像;
第二获取模块,用于根据图像获取模型,获取所述待检测图像的目标图像,所述目标图像是所述待检测图像在霍夫空间的热图;
确定模块,用于基于所述目标图像确定所述待检测图像的图像检测结果;
其中,所述图像获取模型的训练过程,包括:
获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在所述霍夫空间的热图;
根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
响应于满足训练终止条件,将所述第二网络模型确定为所述图像获取模型。
另一方面,本申请实施例提供了一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以使所述电子设备实现上述任一所述的图像获取模型的训练方法或者实现上述任一所述的图像检测方法。
另一方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以使电子设备实现上述任一所述的图像获取模型的训练方法或者实现上述任一所述的图像检测方法。
另一方面,还提供了一种计算机程序或计算机程序产品,所述计算机程序或计算机程序产品中包括至少一条计算机指令,所述至少一条计算机指令由处理器加载并执行,以使电子设备实现上述任一种图像获取模型的训练方法或者实现上述任一所述的图像检测方法。
本申请实施例提供的技术方案至少带来如下有益效果:
本申请实施例提供的技术方案是基于样本图像在霍夫空间的热图和样本图像在图像空间的热图训练得到图像获取模型,使得图像获取模型学习到了霍夫空间的关键点特征和图像空间的语义特征,能够快速而准确的确定出图像在霍夫空间的热图,实现准确的基于图像在霍夫空间的热图确定图像检测结果,从而提高后续分析处理的准确性。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种图像获取模型的训练方法或者图像检测方法的实施环境示意图;
图2是本申请实施例提供的一种图像获取模型的训练方法的流程图;
图3是本申请实施例提供的一种图像示意图;
图4是本申请实施例提供的另一种图像示意图;
图5是本申请实施例提供的又一种图像示意图;
图6是本申请实施例提供的一种快速深度霍夫变换网络的示意图;
图7是本申请实施例提供的一种反向深度霍夫变换的示意图;
图8是本申请实施例提供的一种第一网络模型的示意图;
图9是本申请实施例提供的一种第二融合特征确定方法的流程图;
图10是本申请实施例提供的另一种第一网络模型的示意图;
图11是本申请实施例提供的一种图像检测方法的流程图;
图12是本申请实施例提供的一种脑部扫描图像的处理示意图;
图13是本申请实施例提供的一种胎儿扫描图像的处理示意图;
图14是本申请实施例提供的一种细胞图像的处理示意图;
图15是本申请实施例提供的一种摄影图像的处理示意图;
图16是本申请实施例提供的一种图像获取模型的训练装置的结构示意图;
图17是本申请实施例提供的一种图像检测装置的结构示意图;
图18是本申请实施例提供的一种终端设备的结构示意图;
图19是本申请实施例提供的一种服务器的结构示意图;
图20是本申请实施例提供的一种图像示意图;
图21是本申请实施例提供的另一种图像示意图;
图22是本申请实施例提供的又一种图像示意图;
图23是本申请实施例提供的另一种反向深度霍夫变换的示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
首先对本申请实施例涉及的名词进行解释。
图像空间(Image Space)是图像的三维几何空间。
霍夫空间(Hough Space)是图像的极坐标空间(也叫参数空间)。
霍夫变换(Hough Transform)是一种特征提取方法,给定直线、椭圆、平面等方程,在极坐标空间进行投票(Voting),通过检测累加的极坐标空间的局部峰值点,来定位和检测几何图案。根据方程的不同,霍夫变换可以分为直线霍夫变换(Line Hough Transform,LHT)、平面霍夫变换(Plane Hough Transform,PHT)、椭圆霍夫变换(Elliptic Hough Transform,EHT)。
深度霍夫变换(Deep Hough Transform,DHT)网络是一种集成有霍夫变换方法的卷积神经网络(Convolutional Neural Network,CNN)。利用DHT网络,基于样本图像在图像空间的特征确定该样本图像在霍夫空间的特征,以得到样本图像在霍夫空间的热图。其中,DHT网络分为三维深度霍夫变换(3-dimension Deep Hough Transform,3D DHT)网络和二维深度霍夫变换(2-dimension Deep Hough Transform,2D DHT)网络,3D DHT网络用于处理三维的样本图像,2D DHT网络用于处理二维的样本图像。
反向深度霍夫变换(Inverse Deep Hough Transform,IDHT)网络是一种集成有反向霍夫变换方法的CNN。利用IDHT网络,基于样本图像在霍夫空间的特征确定该样本图像在图像 空间的特征,以得到样本图像在图像空间的热图。IDHT网络分为三维反向深度霍夫变换(3-dimension Inverse Deep Hough Transform,3D IDHT)网络和二维反向深度霍夫变换(2-dimension Inverse Deep Hough Transform,2D IDHT)网络,3D IDHT网络用于处理三维的样本图像,2D IDHT网络用于处理二维的样本图像。
以基于霍夫变换的原理来检测图像中对象的平面为例,其原理在于利用点与平面的对偶性,将图像空间的点通过平面表达形式变为霍夫空间的一个平面,从而得到霍夫空间的热图。通过寻找霍夫空间中的峰值点,得到图像空间中的目标平面,即得到图像中对象的平面。
相关技术中,常采用累加器基于霍夫变换的原理将图像空间中的各个点变为霍夫空间的热图,以便基于霍夫空间的热图确定图像检测结果,即确定图像中对象的平面、直线、椭圆等。然而累加器是基于传统数学变换的方式实现的,需要不断的迭代参数,速度很慢,且精确度较低,影响后续分析处理的准确性。
图1是本申请实施例提供的一种图像获取模型的训练方法或者图像检测方法的实施环境示意图,如图1所示,该实施环境包括电子设备11,本申请实施例中的图像获取模型的训练方法或者图像检测方法可以由电子设备11执行。示例性地,电子设备11可以包括终端设备或者服务器中的至少一项。
终端设备可以是智能手机、游戏主机、台式计算机、平板电脑、电子书阅读器、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器和膝上型便携计算机中的至少一种。
服务器可以为一台服务器,或者为多台服务器组成的服务器集群,或者为云计算平台和虚拟化中心中的任意一种,本申请实施例对此不加以限定。服务器可以与终端设备通过有线网络或无线网络进行通信连接。服务器可以具有数据处理、数据存储以及数据收发等功能,在本申请实施例中不加以限定。
本申请各可选实施例的技术方案是基于人工智能技术实现的,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习、自动驾驶、智慧交通等几大方向。
计算机视觉技术(Computer Vision,CV)是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、光学字符识别(Optical Character Recognition,OCR)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D(3-Dimension,三维)技术、虚拟现实、增强现实、同步定位与地图构建、自动驾驶、智慧交通等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服、车联网、自动驾驶、智慧交通等,相信随着技术的发展,人工 智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的计算机视觉等技术,具体通过如下实施例进行说明。
基于上述实施环境,本申请实施例提供了一种图像获取模型的训练方法,以图2所示的本申请实施例提供的一种图像获取模型的训练方法的流程图为例,该方法可由图1中的电子设备11执行。如图2所示,该方法包括步骤201至步骤204。
步骤201,电子设备获取样本图像的标签图像对。
其中,标签图像对包括第一标签图像和第二标签图像,第一标签图像是样本图像在图像空间的热图,第二标签图像是样本图像在霍夫空间的热图。
本申请实施例不对样本图像做限定,示例性的,样本图像包括但不限于脑部图像、风景图像、道路图像、胎儿图像、细胞图像等,样本图像的数量为多个。
第一标签图像是样本图像在图像空间的热图,第一标签图像中任一个像素点的值表征该像素点在目标直线上或者在目标椭圆上或者在目标平面上的概率。第二标签图像是样本图像在霍夫空间的热图,第二标签图像中任一个像素点的值表征该像素点为目标直线或者为目标椭圆或者为目标平面的概率。
请参考图3或图20,图3或图20是本申请实施例提供的一种图像示意图。其中,标号301所示的图像为样本图像。样本图像301在图像空间的热图如标号302所示,样本图像301在霍夫空间的热图如标号303所示,标号303对应的效果图如标号2002所示,标号304所示的图像为样本图像301与热图302叠加后的图像,标号304对应的效果图如标号2001所示。
请参见图4或图21,图4或图21是本申请实施例提供的另一种图像示意图,在实际情况中,样本图像301在图像空间的热图为一个条带。标号401所示的图像为样本图像与该样本图像在图像空间的热图叠加后的图像,标号401对应的效果图如标号2101所示。对比标号304所示的图像和标号401所示的图像,可以明显看出:在标号304所示的图像中,样本图像301在图像空间的热图为一条直线,在标号401所示的图像中,样本图像301在图像空间的热图为一个条带。标号402和标号403所示的图像均为该样本图像与该样本图像在图像空间的热图叠加后的图像,标号402对应的效果图如标号2102所示,标号403对应的效果图如标号2103所示。其中,标号401、标号402和标号403所示的三张图像为三个不同视角的图像。
请参见图5或图22,图5或图22是本申请实施例提供的又一种图像示意图。标号501所示的三张图像是不同视角下,样本图像在图像空间的热图与样本图像进行叠加后的图像,标号501对应的效果图如标号2201所示,标号502所示的三张图像是不同视角下,样本图像在图像空间的热图与样本图像叠加后的图像,标号502对应的效果图如标号2202所示。
步骤202,电子设备根据第一网络模型,获取样本图像的预测图像对。
其中,预测图像对包括第一预测图像和第二预测图像,第一预测图像是第一网络模型得到的样本图像在图像空间的热图,第二预测图像是第一网络模型得到的样本图像在霍夫空间的热图,该第一预测图像和第二预测图像也即是第一网络模型预测得到的热图。
本申请实施例中,将样本图像输入至第一网络模型,由第一网络模型输出第一预测图像和第二预测图像。其中,第一预测图像是样本图像在图像空间的热图,第一预测图像中任一个像素点的值表征该像素点在目标直线上或者在目标椭圆上或者在目标平面上的概率。第二预测图像是样本图像在霍夫空间的热图,第二预测图像中任一个像素点的值表征该像素点为目标直线或者为目标椭圆或者为目标平面的概率。
步骤203,电子设备基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小。
本申请实施例中,网络模型预测得到的预测图像对与标签图像对之间的差异越小,则网络模型的准确度越高,因此以减小网络模型得到的预测图像对与标签图像对之间的差异为目 标,来对网络模型进行调整,以使调整后的网络模型更加准确。其中,预测图像对与标签图像对之间的差异是指第一标签图像与第一预测图像之间的差异以及第二标签图像与第二预测图像之间的差异。其中,根据第二网络模型得到的预测图像对与标签图像对之间的差异减小是指:相比于根据第一网络模型得到的预测图像对与标签图像对之间的差异,根据第二网络模型到的预测图像对与标签图像对之间的差异减小。
在一种可能的实现方式中,基于标签图像对和预测图像对,计算第一网络模型的损失值,基于第一网络模型的损失值调整第一网络模型的模型参数,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小。其中,第一网络模型的损失值表示第一网络模型得到的预测图像对与标签图像对之间的差异。
步骤204,电子设备响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
本申请实施例不对满足训练终止条件做限定,示例性的,满足训练终止条件为达到目标训练次数。目标训练次数的数值可以根据人工经验或者实际场景灵活设置,如目标训练次数为200次。
电子设备响应于满足训练终止条件,将第二网络模型确定为图像获取模型。电子设备响应于不满足训练终止条件,则按照步骤202和步骤203的方式,基于样本图像对第二网络模型进行训练,直至满足训练终止条件,得到图像获取模型。
相关技术中,通常利用图像在图像空间的热图训练图像获取模型,使得图像获取模型能够利用图像在图像空间的语义特征,确定出图像在图像空间的热图,但是由于考虑的因素较少,因此图像获取模型的精确度较差。
本申请实施例中,考虑到样本图像本身表征了图像的语义信息,因此,样本图像在图像空间的热图表征了图像空间的语义特征,又由于图像空间的线、平面等对应霍夫空间的点,因此,样本图像在霍夫空间的热图表征了霍夫空间的关键点特征。因此基于样本图像在霍夫空间的热图和样本图像在图像空间的热图训练得到图像获取模型,使得图像获取模型学习到了霍夫空间的关键点特征和图像空间的语义特征,不仅能够确定出图像在图像空间的热图,而且能够快速而准确的确定出图像在霍夫空间的热图。通过在图像空间和霍夫空间同时监督模型的学习,能够加快模型的收敛速度。并且,由于采用该方法进行训练图像获取模型能够学习到不同空间的特征,因此在生成热图时能够整合多重空间的特征,使得图像获取模型在学习到相关特征的同时,还进一步提高了该图像获取模型的精确度。后续即可基于该图像获取模型实现准确的基于图像在霍夫空间的热图确定图像检测结果,从而提高后续分析处理的准确性。
在一种可能的实现方式中,电子设备获取样本图像的预测图像对,包括:电子设备获取样本图像的第一图像空间特征,第一图像空间特征表征样本图像在图像空间的特征;电子设备基于第一图像空间特征确定第一预测图像;电子设备基于第一图像空间特征确定第一霍夫空间特征,第一霍夫空间特表征样本图像在霍夫空间的特征;基于第一霍夫空间特征确定第二预测图像。
本申请实施例中,根据第一网络模型,先提取样本图像的第一图像空间特征。然后,一方面,根据第一网络模型,基于第一图像空间特征进行上采样,得到第一预测图像,另一方面,根据第一网络模型,基于第一图像空间特征进行霍夫变换,得到第一霍夫空间特征。之后,根据第一网络模型,基于第一霍夫空间特征进行上采样,得到第二预测图像。
其中,第一网络模型包括DHT网络,DHT网络用于对第一图像空间特征进行霍夫变换,得到第一霍夫空间特征。当样本图像为二维的样本图像时,DHT网络为2D DHT网络,当样本图像为三维的样本图像时,DHT网络为3D DHT网络。下面以3D DHT网络为例,详细介绍3D DHT网络的运算原理。
本申请实施例中,当样本图像为三维的样本图像时,图像获取模型主要检测三维的样本图像的某一个平面(如脑中线平面、胎儿的标准平面等)。平面在直角坐标系中的表达式为Ax+By+Cz+D=0,这个表达式可以转换为在极坐标系中的表达式:
Figure PCTCN2022121317-appb-000001
Figure PCTCN2022121317-appb-000002
样本图像在图像空间的特征为R H×W×D,样本图像在霍夫空间的特征为
Figure PCTCN2022121317-appb-000003
其中,A、B、C和D为四个参数,x、y和z为三个变量,
Figure PCTCN2022121317-appb-000004
θ和ρ为三个参数,R为实数,H为高,W为宽,D为深度。霍夫变换是将图像空间中的任意一点转化为霍夫空间中的一个平面,通过这种方式,将图像空间中的各个点转化为霍夫空间中的各个平面,实现累加霍夫空间中的各个平面,得到霍夫关键点热图(即图像在霍夫空间的热图)。其中,霍夫关键点热图中的极大值点对应
Figure PCTCN2022121317-appb-000005
(即phi)、θ(即theta)和ρ(即rho)。基于上述信息,3D DHT网络按照下述实现方式A1和实现方式A2实现霍夫变换。
本申请实施例中,3D DHT网络的算法包括3D DHT网络的前向传播算法(见实现方式A1)和3D DHT网络的反向传播算法(见实现方式A2)。
实现方式A1,3D DHT网络的前向传播算法。
输入:volume,//样本图像在图像空间的特征R H×W×D
phi,//其取值范围为[phi_min,phi_max],采样个数为n_phi;
theta,//其取值范围为[theta_min,theta_max],采样个数为n_theta;
rho,//其取值范围为[rho_min,rho_max],采样个数为n_rho;
f。//平面方程;
输出:hough。//样本图像在霍夫空间的特征
Figure PCTCN2022121317-appb-000006
实现算法如下:
Figure PCTCN2022121317-appb-000007
本申请实施例能够通过实现方式A1,得到样本图像在霍夫空间的特征。
实现方式A2,3D DHT网络的反向传播算法。
输入:hough,//样本图像在霍夫空间的特征
Figure PCTCN2022121317-appb-000008
phi,//其取值范围为[phi_min,phi_max],采样个数为n_phi;
theta,//其取值范围为[theta_min,theta_max],采样个数为n_theta;
rho,//其取值范围为[rho_min,rho_max],采样个数为n_rho;
f。//平面方程;
输出:volume。//样本图像在图像空间的特征R H×W×D
实现算法如下:
Figure PCTCN2022121317-appb-000009
Figure PCTCN2022121317-appb-000010
本申请实施例能够通过实现方式A2,得到样本图像在图像空间的特征。
本申请实施例中,当样本图像为二维的样本图像时,图像获取模型主要检测二维的样本图像的直线或者椭圆等。以直线为例,直线在直角坐标系中的表达式为Ax+By+C=0,这个表达式可以转换为在极坐标系中的表达式:x cosθ+y sinθ=ρ。样本图像在图像空间的特征为R H×W,样本图像在霍夫空间的特征为Rθ ×ρ。其中,A、B和C为三个参数,x和y为两个变量,θ和ρ为两个参数,R为实数,H为高,W为宽。基于上述信息,2D DHT网络实现霍夫变换。
2D DHT网络的算法包括2D DHT网络的前向传播算法和2D DHT网络的反向传播算法。2D DHT网络的前向传播算法与3D DHT网络的前向传播算法的原理相类似,详见实现方式A1,在此不再赘述。2D DHT网络的反向传播算法与3D DHT网络的反向传播算法的原理相类似,详见实现方式A2,在此也不再赘述。
可选地,电子设备基于第一图像空间特征确定第一霍夫空间特征,包括:电子设备对第一图像空间特征进行整流,得到整流后的第一图像空间特征;电子设备基于整流后的第一图像空间特征,确定第一霍夫空间特征。其中,整流是指对第一图像空间特征进行修正。
本申请实施例中,第一网络模型包括快速深度霍夫变换(Fast DHT)网络,Fast DHT网络包含整流网络和DHT网络。Fast DHT网络用于减少运算量,加快运算速度。
请参见图6,图6是本申请实施例提供的一种快速深度霍夫变换网络的示意图。第一图像空间特征输入至Fast DHT网络,由整流网络对该第一图像空间特征进行整流,得到整流后的第一图像空间特征,由DHT网络对整流后的第一图像空间特征进行霍夫变换,得到第一霍夫空间特征。
可选地,整流网络的公式为:f(x)=max(0,x)。其中,x为第一图像空间特征的数据,f(x)为整流后的第一图像空间特征的数据。通过整流网络,实现检测第一图像空间特征的各个数据。当数据小于0时,数据重置为0,当数据大于或者等于0时,数据保持不变,通过这种方式,增加了第一图像空间特征中的数据0,从而增加第一图像空间特征的稀疏性。第一图像空间特征中的数据对应像素点,相当于增加了特征为0的像素点,通过设置图像空间中特征为0的像素点不参与投票到霍夫空间,让图像空间中特征为非0的像素点参与投票到霍夫空间,减少参与投票的像素点数量,提高霍夫变换的速度。
请参见表1,表1是在显卡上测试的Fast DHT网络与DHT网络的运算时间,其中,本申请实施例测试了两种样本图像在Fast DHT网络与DHT网络的运算时间。一种样本图像在图像空间的热图所对应的图像尺寸为400×400,且该样本图像在霍夫空间的热图所对应的图 像尺寸为180×400,这种样本图像对应的Fast DHT网络为Fast 2D DHT网络,这种样本图像对应的DHT网络为2D DHT网络。另一种样本图像在图像空间的热图所对应的图像尺寸为192×192×160,且该样本图像在霍夫空间的热图所对应的图像尺寸为180×180×160,这种样本图像对应的Fast DHT网络为Fast 3D DHT网络,这种样本图像对应的DHT网络为3D DHT网络。
表1
  2D DHT Fast 2D DHT 3D DHT Fast 3D DHT
时间(单位:秒) 0.0193 0.0084 26.0 1.3
由表1可以明显看出,Fast 2D DHT网络的运算时间远低于2D DHT网络,Fast 3D DHT网络的运算时间远低于3D DHT网络,由此可知,Fast DHT网络能够减少运算量,加快运算速度。
在一种可能的实现方式中,电子设备基于第一图像空间特征确定第一霍夫空间特征之后,还包括:电子设备基于第一霍夫空间特征和第一图像空间特征确定第二图像空间特征,第二图像空间特征表征样本图像在图像空间的特征;电子设备基于第二图像空间特征确定第三预测图像,第三预测图像是第一网络模型得到的样本图像在图像空间的热图;电子设备基于第二图像空间特征确定第二霍夫空间特征,第二霍夫空间特征表征样本图像在霍夫空间的特征;电子设备基于第二霍夫空间特征确定第四预测图像,第四预测图像是第一网络模型得到的样本图像在霍夫空间的热图。则电子设备基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,包括:电子设备基于标签图像对、预测图像对、第三预测图像和第四预测图像调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异减小。
本申请实施例中,网络模型还得到第三预测图像和第四预测图像,且第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异越小,则网络模型的准确度越高,因此除了以减小网络模型得到的预测图像对与标签图像对之间的差异为目标之外,还以减小第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异为目标,来对网络模型进行调整,以使调整后的网络模型更加准确。其中,第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异是指第一标签图像与第三预测图像之间的差异以及第二标签图像与第四预测图像之间的差异。
本申请实施例中,根据第一网络模型,先基于第一图像空间特征和第一霍夫空间特征确定第二图像空间特征。然后,一方面,根据第一网络模型,基于第二图像空间特征进行上采样,得到第三预测图像,另一方面,根据第一网络模型,基于第二图像空间特征进行霍夫变换,得到第二霍夫空间特征。之后,根据第一网络模型,基于第二霍夫空间特征进行上采样,得到第四预测图像。
之后,基于标签图像对、预测图像对、第三预测图像和第四预测图像计算第一网络模型的损失值,基于第一网络模型的损失值调整第一网络模型的模型参数,得到第二网络模型。
本申请实施例中,基于第一图像空间特征能够得到第一预测图像,基于第一霍夫空间特征能够得到第二预测图像。之后,基于第一图像空间特征和第一霍夫空间特征确定第二图像空间特征,通过对特征做进一步处理,提高图像空间特征的准确性,使得基于第二图像空间特征得到的第三预测图像更准确。同样的,提高了基于第二图像空间特征确定的第二霍夫空间特征的准确性,使得基于第二霍夫空间特征得到的第四预测图像更准确。利用第一预测图像、第二预测图像、第三预测图像和第四预测图像调整第一网络模型,使得模型学习到了更多的信息,加快了模型的收敛速度。
可选地,电子设备基于第一霍夫空间特征和第一图像空间特征确定第二图像空间特征,包括:电子设备基于第一霍夫空间特征确定第三图像空间特征,第三图像空间特征表征样本图像在图像空间的特征;电子设备将第一图像空间特征和第三图像空间特征进行融合,得到 第一融合特征;电子设备基于第一融合特征确定第二图像空间特征。
本申请实施例中,第一网络模型包括IDHT网络,IDHT网络对第一霍夫空间特征进行反向霍夫变换,得到第三图像空间特征。其中,第一霍夫空间特征是基于第一图像空间特征得到的,也就是说,本申请实施例是先对第一图像空间特征进行处理,得到第一霍夫空间特征,再对第一霍夫空间特征进行处理,得到第三图像空间特征,该过程会导致特征丢失。通过将第三图像空间特征与第一图像空间特征进行融合,以减少特征丢失所带来的影响,得到第一融合特征。之后,基于第一融合特征确定第二图像空间特征。
请参见图7或图23,图7或图23是本申请实施例提供的一种反向深度霍夫变换的示意图,反向深度霍夫变换用于将图像从霍夫空间变换到图像空间。其中,标号701所示的三张图像为不同视角下样本图像在霍夫空间的热图,标号701对应的效果图如标号2301所示,标号702所示的三张图像为不同视角下样本图像在图像空间的热图,标号702对应的效果图如标号2302所示,通过IDHT网络,将标号701所示的三张图像所对应的霍夫空间特征,转换为标号702所示的三张图像所对应的图像空间特征。标号703所示的三张图像为不同视角下样本图像在霍夫空间的热图,标号703对应的效果图如标号2303所示,标号704所示的三张图像为不同视角下样本图像在图像空间的热图,标号704对应的效果图如标号2304所示,通过IDHT网络,将标号703所示的三张图像所对应的霍夫空间特征,转换为标号704所示的三张图像所对应的图像空间特征。
需要说明的是,IDHT网络的反向霍夫变换是DHT网络的霍夫变换的逆过程。IDHT网络包括2D IDHT网络和3D IDHT网络,2D IDHT网络包括2D IDHT网络的前向传播算法和2D IDHT网络的反向传播算法,3D IDHT网络包括3D IDHT网络的前向传播算法和3D IDHT网络的反向传播算法。其中,2D IDHT网络的前向传播算法、3D IDHT网络的前向传播算法与3D DHT网络反向传播算法的原理相同,详见上述实现方式A2,在此不再赘述。2D IDHT网络的反向传播算法、3D IDHT网络的反向传播算法与3D DHT网络的前向传播算法的原理相同,详见上述实现方式A1,在此也不再赘述。
在一种可能的实现方式中,电子设备基于标签图像对、预测图像对、第三预测图像和第四预测图像调整第一网络模型,得到第二网络模型,包括:电子设备根据标签图像对和预测图像对,得到第一损失值;电子设备根据标签图像对、第三预测图像和第四预测图像,得到第二损失值;电子设备基于第一损失值和第二损失值,得到第一网络模型的损失值;电子设备基于第一网络模型的损失值调整第一网络模型,得到第二网络模型。
其中,第一损失值表示预测图像对与标签图像对之间的差异,第二损失值表示第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异。
本申请实施例中,标签图像对包括第一标签图像和第二标签图像,预测图像对包括第一预测图像和第二预测图像。电子设备基于第一标签图像、第二标签图像、第一预测图像和第二预测图像计算第一损失值。基于第一标签图像、第二标签图像、第三预测图像和第四预测图像计算第二损失值,基于第一损失值和第二损失值计算第一网络模型的损失值,通过标签图像对、预测图像对、第三预测图像和第四预测图像训练得到图像获取模型,实现在图像空间和霍夫空间同时监督模型的学习,提高模型的准确性。又由于基于第一预测图像和第二预测图像计算第一损失值,基于第三预测图像和第四预测图像计算第二损失值,通过第一损失值和第二损失值调整第一网络模型,相当于基于两个阶段的预测图像监督模型的学习,使得模型学习到了更多的信息,能够加快模型的收敛速度,便于快速完成模型训练,从而提高图像检测的效率。其中,第一网络模型的损失值按照如下所示的公式(1)计算。
Figure PCTCN2022121317-appb-000011
其中,L total为第一网络模型的损失值,L s1为第一损失值,L s2为第二损失值,N为样本图 像在图像空间的热图中的像素点个数,P out1为第一预测图像,P gt为第一标签图像,M为样本图像在霍夫空间的热图中的像素点个数,H out1为第二预测图像,H gt为第二标签图像,P out2为第三预测图像,H out2为第四预测图像。
可选地,电子设备获取样本图像在图像空间的热图和样本图像在霍夫空间的热图,通过对样本图像在图像空间的热图进行高斯处理,得到第一标签图像,通过对样本图像在霍夫空间的热图进行高斯处理,得到第二标签图像。
以样本图像为脑部图像为例,脑部图像在图像空间的热图(该图像可以是脑部图像的金标准图像)为一条直线,通过高斯核卷积进行高斯处理,以将一条直线变为一个条带(也就是线束),并将像素值归一化到[0,1]之后得到第一标签图像。其中,高斯核卷积的参数不做限定,示例性的,高斯核卷积的参数为(5,5,5)。脑部图像在霍夫空间的热图(该图像也是脑部图像的金标准图像)中包括一个点,该点的位置为(phi,theta,rho),通过高斯处理,将该点转化为高斯球,其中,高斯球的尺寸不做限定,示例性的,高斯球的大小为(5,5,5)。
公式(1)所示的损失函数为均方差损失函数,在应用时,可以选择除均方差损失函数之外的其他函数,作为第一网络模型的损失函数。
电子设备在得到第一网络模型的损失值之后,基于第一网络模型的损失值调整第一网络模型的模型参数,得到第二网络模型,电子设备响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
本申请实施例提供了一种第一网络模型,如图8所示,图8是本申请实施例提供的一种第一网络模型的示意图。本申请实施例中,样本图像为脑部扫描图像,第一标签图像为脑部扫描图像在图像空间的热图,第二标签图像为脑部扫描图像在霍夫空间的热图。
其中,第一网络模型可以包括主干(Stem Block)网络、沙漏(Hourglass)网络、3D DHT网络。脑部扫描图像输入至第一网络模型,由主干网络提取脑部扫描图像的初始特征。其中,主干网络可以包括三个卷积层。接着,该初始特征经过沙漏网络进行特征处理后,得到第一个图像空间特征(即第一图像空间特征),其中,沙漏网络是一种用于关键点检测的卷积神经网络。
一方面,第一图像空间特征经过1×1×1的卷积和上采样后,得到脑部扫描图像在图像空间的热图(即第一预测图像),另一方面,第一图像空间特征经过3D DHT网络进行霍夫变换,得到第一个霍夫空间特征(即第一霍夫空间特征)。第一霍夫空间特征经过1×1×1的卷积和上采样后,得到脑部扫描图像在霍夫空间的热图(即第二预测图像)。
之后,电子设备基于第一标签图像、第二标签图像、第一预测图像和第二预测图像,得到第一网络模型的损失值,基于第一网络模型的损失值训练得到图像获取模型。由于图像获取模型包括3D DHT网络,因此,在训练第一网络模型时,会调整3D DHT网络的参数。
可选地,第一网络模型还包括残差块网络、3D IDHT网络等。第一霍夫空间特征先经过两个残差块网络,以进一步提取霍夫空间的特征,再经过3D IDHT网络进行反向霍夫变换,得到第三图像空间特征。
之后,电子设备将第三图像空间特征、第一图像空间特征和初始特征进行加处理(即融合处理),得到第一融合特征。第一融合特征经过沙漏网络进行特征处理后,得到第二个图像空间特征(即第二图像空间特征)。一方面,第二图像空间特征经过1×1×1的卷积和上采样后,得到脑部扫描图像在图像空间的热图(即第三预测图像),另一方面,第二图像空间特征经过3D DHT网络进行霍夫变换,得到第二个霍夫空间特征(即第二霍夫空间特征),第二霍夫空间特征经过1×1×1的卷积和上采样后,得到脑部扫描图像在霍夫空间的热图(即第四预测图像)。
之后,电子设备基于第一标签图像、第二标签图像、第一预测图像、第二预测图像、第三预测图像和第四预测图像,按照公式(1)得到第一网络模型的损失值。基于第一网络模型 的损失值训练得到图像获取模型。由于图像获取模型包括3D DHT网络和3D IDHT网络,因此,在训练第一网络模型的过程中,会同时调整3D DHT网络和3D IDHT网络的参数。
在一种可能的实现方式中,第一图像空间特征包括至少两个子图像空间特征;电子设备基于第一图像空间特征确定第一预测图像,包括:电子设备基于至少两个子图像空间特征确定第一预测图像。
本申请实施例中,第一图像空间特征包括至少两个子图像空间特征,至少两个子图像空间特征为金字塔层级特征,即各个子图像空间特征递进增大或者递进减少,第一网络模型能够基于至少两个子图像空间特征确定第一预测图像。
可选地,电子设备基于第一图像空间特征确定第一霍夫空间特征,包括:对于至少两个子图像空间特征中的任一个子图像空间特征,电子设备确定任一个子图像空间特征对应的子霍夫空间特征,第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征。电子设备基于第一霍夫空间特征确定第二预测图像,包括:电子设备基于各个子图像空间特征对应的子霍夫空间特征确定第二预测图像。
对于至少两个子图像空间特征中的任一个子图像空间特征,电子设备对该子图像空间特征进行霍夫变换,得到该子图像空间特征对应的子霍夫空间特征。按照这种方式,能够得到各个子图像空间特征各自对应的子霍夫空间特征,即得到第一霍夫空间特征。该第一霍夫空间特征包括的各个子图像空间特征各自对应的子霍夫空间特征也为金字塔层级特征,第一网络模型能够基于各个子图像空间特征对应的子霍夫空间特征确定第二预测图像。
可选地,第一图像空间特征包括至少两个子图像空间特征,第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征。电子设备基于第一图像空间特征确定第一霍夫空间特征,包括:对于至少两个子图像空间特征中除第一个子图像空间特征之外的任一个子图像空间特征,电子设备基于任一个子图像空间特征和任一个子图像空间特征的上一个子图像空间特征,确定任一个子图像空间特征对应的第二融合特征;电子设备基于任一个子图像空间特征对应的第二融合特征和任一个子图像空间特征的上一个子图像空间特征,确定任一个子图像空间特征对应的子霍夫空间特征。
本申请实施例中,对于第一个子图像空间特征,电子设备基于第一个子图像空间特征确定第一个子图像空间特征对应的第二融合特征。
对于至少两个子图像特征中除第一个子图像空间特征之外的任一个子图像空间特征,电子设备基于该子图像空间特征和该子图像空间特征的上一个子图像空间特征,确定该子图像空间特征对应的第二融合特征。例如,对于第三个子图像空间特征,电子设备基于第三个子图像空间特征和第二个子图像空间特征,确定第三个子图像空间特征对应的第二融合特征。
可选地,电子设备基于任一个子图像空间特征和任一个子图像空间特征的上一个子图像空间特征,确定任一个子图像空间特征对应的第二融合特征,包括:电子设备基于任一个子图像空间特征的上一个子图像空间特征,确定上一个子图像空间特征对应的第二融合特征;基于上一个子图像空间特征对应的第二融合特征,确定第四图像空间特征,第四图像空间特征表征样本图像在图像空间的特征;电子设备基于第四图像空间特征确定第三霍夫空间特征,第三霍夫空间特征表征样本图像在霍夫空间的特征;基于任一个子图像空间特征、第三霍夫空间特征和上一个子图像空间特征对应的第二融合特征,电子设备确定任一个子图像空间特征对应的第二融合特征。
对于至少两个子图像特征中除第一个子图像空间特征之外的任一个子图像空间特征,电子设备确定该子图像空间特征的上一个子图像空间特征对应的第二融合特征。其中,当上一个子图像空间特征为第一个子图像空间特征时,基于第一个子图像空间特征确定第一个子图像空间特征对应的第二融合特征,可选地,对第一个子图像空间特征进行卷积,得到第一个子图像空间特征对应的第二融合特征。当上一个子图像空间特征为除第一个子图像空间特征之外的任一个子图像空间特征时,则上一个子图像空间特征对应的第二融合特征的确定方式,与本申请实施例中任一个子图像空间特征对应的第二融合特征的确定方式相同,详细请见下 述描述。
对于至少两个子图像特征中除第一个子图像空间特征之外的任一个子图像空间特征,当电子设备确定任一个子图像空间特征的上一个子图像空间特征对应的第二融合特征之后,对上一个子图像空间特征对应的第二融合特征进行卷积,得到第四图像空间特征。之后,电子设备对第四图像空间特征进行霍夫变换,得到第三霍夫空间特征,接着,基于任一个子图像空间特征、第三霍夫空间特征和上一个子图像空间特征对应的第二融合特征,得到该子图像空间特征对应的第二融合特征。
可选地,电子设备先对第三霍夫空间特征进行反向霍夫变换,得到第五图像空间特征,第五图像空间特征表征样本图像在图像空间的特征,接着,对第五图像空间特征进行归一化,得到归一化后的第五图像空间特征,之后,对归一化后的第五图像空间特征和任一个子图像空间特征进行相乘,得到乘积结果。对上一个子图像空间特征对应的第二融合特征进行卷积,得到第四图像空间特征,将乘积结果和第四图像空间特征进行相加,得到该子图像空间特征对应的第二融合特征。示例性地,归一化时可利用如Sigmoid等激活函数实现。
请参见图9,图9是本申请实施例提供的一种第二融合特征确定方法的流程图。由于上一个子图像空间特征对应的第二融合特征比任一个子图像空间特征大,因此,对上一个子图像空间特征对应的第二融合特征进行3×3、步长为2的卷积,以将上一个子图像空间特征对应的第二融合特征与任一个子图像空间特征的大小对齐,得到第四图像空间特征。基于DHT网络,对第四图像空间特征进行霍夫变换,得到第三霍夫空间特征。接着,先基于两个残差块网络,对第三霍夫空间特征进行特征提取,以进一步提取霍夫空间的特征,再基于IDHT网络,对特征提取后的第三霍夫空间特征进行反向霍夫变换,得到第五图像空间特征。接着,对第五图像空间特征进行1×1的卷积,对卷积后的第五图像空间特征进行归一化,得到归一化后的第五图像空间特征。
之后,电子设备对任一个子图像空间特征进行3×3的卷积,以将上一个子图像空间特征对应的第二融合特征与任一个子图像空间特征的大小对齐。将卷积后的任一个子图像空间特征与归一化后的第五图像空间特征进行相乘,得到乘积结果,将乘积结果和第四图像空间特征进行相加,得到该子图像空间特征对应的第二融合特征。
之后,电子设备基于任一个子图像空间特征对应的第二融合特征和任一个子图像空间特征的上一个子图像空间特征,确定任一个子图像空间特征对应的子霍夫空间特征。
本申请实施例中,基于第四图像空间特征,得到归一化后的第五图像空间特征的过程,可以用如下所示的公式(2)表示。
y=σ(h -1(f(h(x))))    公式(2)
其中,y为归一化后的第五图像空间特征,σ为归一化的函数符号,h -1为反向霍夫变换的函数符号,f为两个残差块网络进行特征提取的函数符号,h为霍夫变换的函数符号,x为第四图像空间特征。
需要说明的是,图9所示的网络结构,可以作为第一网络模型中可插拔的一个网络结构。也就是说,第一网络模型可以包含图9所示的网络结构,也可以不包含图9所示的网络结构,图9所示的网络结构是一个基于注意力机制的网络结构。上一个子图像空间特征对应的第二融合特征为高层特征,任一个子图像空间特征为低层特征,通过对高层特征进行霍夫变换、反向霍夫变换等,得到第五图像空间特征,使得该网络结构更关注图像空间特征,提高图像空间特征的准确性,从而提高图像获取模型的精确度。
可选地,电子设备基于任一个子图像空间特征对应的第二融合特征和任一个子图像空间特征的上一个子图像空间特征,确定任一个子图像空间特征对应的子霍夫空间特征,包括:电子设备基于任一个子图像空间特征的上一个子图像空间特征,确定上一个子图像空间特征对应的子霍夫空间特征;电子设备基于任一个子图像空间特征对应的第二融合特征,确定第二融合特征对应的子霍夫空间特征;电子设备基于第二融合特征对应的子霍夫空间特征和上一个子图像空间特征对应的子霍夫空间特征,确定任一个子图像空间特征对应的子霍夫空间 特征。
对于至少两个子图像特征中除第一个子图像空间特征之外的任一个子图像空间特征,电子设备确定任一个子图像空间特征的上一个子图像空间特征对应的子霍夫空间特征。其中,当上一个子图像空间特征为第一个子图像空间特征时,对第一个子图像空间特征对应的第二融合特征进行霍夫变换,得到第一个子图像空间特征对应的子霍夫空间特征。当上一个子图像空间特征为除第一个子图像空间特征之外的任一个子图像空间特征时,上一个子图像空间特征对应的子霍夫空间特征的确定方式,与任一个子图像空间特征对应的子霍夫空间特征的确定方式相同,详细请见下文的描述。
对于至少两个子图像特征中除第一个子图像空间特征之外的任一个子图像空间特征,电子设备当确定了任一个子图像空间特征的上一个子图像空间特征对应的子霍夫空间特征之后,先对任一个子图像空间特征对应的第二融合特征进行霍夫变换,得到第二融合特征对应的子霍夫空间特征,然后,将第二融合特征对应的子霍夫空间特征和上一个子图像空间特征对应的子霍夫空间特征进行融合,得到任一个子图像空间特征对应的子霍夫空间特征。
基于上述方式,能够确定各个子图像空间特征对应的子霍夫空间特征。本申请实施例的第一图像空间特征包括至少两个子图像空间特征,基于至少两个子图像空间特征能够确定第一预测图像,第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征,基于各个子图像空间特征对应的子霍夫空间特征能够确定第二预测图像。之后,基于标签图像对、第一预测图像和第二预测图像,调整第一网络模型,得到第二网络模型。
可选地,电子设备基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,包括:电子设备根据标签图像对和预测图像对,得到第一网络模型的损失值;基于第一网络模型的损失值调整第一网络模型,得到第二网络模型。其中,第一网络模型的损失值表示预测图像对与标签图像对之间的差异。
本申请实施例中,标签图像对包括第一标签图像和第二标签图像,预测图像对包括第一预测图像和第二预测图像。基于第一标签图像、第二标签图像、第一预测图像和第二预测图像计算第一网络模型的损失值,通过标签图像对和预测图像对训练得到图像获取模型,实现在图像空间和霍夫空间同时监督模型的学习,能够加快模型的收敛速度,提高图像检测的效率。
其中,第一网络模型的损失值按照如下所示的公式(3)计算。
Figure PCTCN2022121317-appb-000012
其中,L spatial为图像空间的损失值,L hough为霍夫空间的损失值,L total为第一网络模型的损失值,N为样本图像在图像空间的像素点的个数,l为第一预测图像,
Figure PCTCN2022121317-appb-000013
为第一标签图像,M为样本图像在霍夫空间的像素点的个数,h为第二预测图像,
Figure PCTCN2022121317-appb-000014
为第二标签图像。
在得到第一网络模型的损失值之后,电子设备基于第一网络模型的损失值调整第一网络模型的模型参数,得到第二网络模型,响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
本申请实施例提供了另一种第一网络模型,如图10所示,图10是本申请实施例提供的另一种第一网络模型的示意图。本申请实施例中,样本图像为摄影图像,第一标签图像为摄影图像在图像空间的热图(简称摄影构图),第二标签图像为摄影图像在霍夫空间的热图(简称霍夫关键点热图)。
其中,第一网络模型包括残差网络(如Resnet 50网络)、特征金字塔网络(Feature Pyramid Networks,FPN)、霍夫金字塔注意力网络(Hough Pyramid Attention Network,HPAN)、DHT网络和输出网络。其中,摄影图像输入至残差网络,由残差网络提取摄影图像的初始特征,该初始特征包括至少两个子初始特征,这至少两个子初始特征为金字塔层级特征。
以6个子初始特征为例进行说明。摄影图像输入至残差网络,残差网络的信息流是上游 (Up Stream)的信息流。残差网络先提取子初始特征C0,再基于子初始特征C0得到子初始特征C1,接着基于子初始特征C1得到子初始特征C2,以此类推,直至得到子初始特征C5。其中,子初始特征C2至C5输入至FPN,由FPN确定摄影图像的第一图像空间特征,第一图像空间特征包括至少两个子图像空间特征,FPN的信息流是下游(Down Stream)的信息流。
以4个子图像空间特征为例进行说明。FPN基于子初始特征C5确定子图像空间特征P5,FPN基于子初始特征C4和子图像空间特征P5确定子图像空间特征P4,FPN基于子初始特征C3和子图像空间特征P4确定子图像空间特征P3,FPN基于子初始特征C2和子图像空间特征P3确定子图像空间特征P2。一方面,输出网络基于子图像空间特征P2至P5确定并输出摄影构图,另一方面,HPAN确定各个子图像空间特征对应的第二融合特征,HPAN的信息流是上游的信息流。
HPAN基于子图像空间特征P2确定子图像空间特征P2对应的第二融合特征PA2,HPAN基于子图像空间特征P3和第二融合特征PA2确定子图像空间特征P3对应的第二融合特征PA3,HPAN基于子图像空间特征P4和第二融合特征PA3确定子图像空间特征P4对应的第二融合特征PA4,HPAN基于子图像空间特征P5和第二融合特征PA4确定子图像空间特征P5对应的第二融合特征PA5。
其中,HPAN是基于图9所示的方式确定第二融合特征,以HPAN基于子图像空间特征P3和第二融合特征PA2确定第二融合特征PA3为例进行说明。HPAN先对第二融合特征PA2进行3×3的卷积,得到第四图像空间特征,基于DHT网络,对第四图像空间特征进行霍夫变换,得到第三霍夫空间特征。接着,先基于两个残差块网络,对第三霍夫空间特征进行特征提取,再基于IDHT网络,对特征提取后的第三霍夫空间特征进行反向霍夫变换,得到第五图像空间特征。接着,对第五图像空间特征进行1×1的卷积,对卷积后的第五图像空间特征进行归一化,得到归一化后的第五图像空间特征。之后,对子图像空间特征P3进行3×3的卷积,将卷积后的子图像空间特征P3与归一化后的第五图像空间特征进行相乘,得到乘积结果,将乘积结果和第四图像空间特征进行相加,得到第二融合特征PA3。
在HPAN确定各个子图像空间特征对应的第二融合特征之后,DHT网络确定各个子图像空间特征对应的子霍夫空间特征,DHT网络的信息流是上游的信息流。
DHT网络对第二融合特征PA2进行霍夫变换,得到子图像空间特征P2对应的子霍夫空间特征H2。DHT网络对第二融合特征PA3进行霍夫变换,得到第二融合特征PA3对应的子霍夫空间特征,基于第二融合特征PA3对应的子霍夫空间特征和子霍夫空间特征H2,确定子图像空间特征P3对应的子霍夫空间特征H3。DHT网络对第二融合特征PA4进行霍夫变换得到第二融合特征PA4对应的子霍夫空间特征,基于第二融合特征PA4对应的子霍夫空间特征和子霍夫空间特征H3,确定子图像空间特征P4对应的子霍夫空间特征H4。DHT网络对第二融合特征PA5进行霍夫变换得到第二融合特征PA5对应的子霍夫空间特征,基于第二融合特征PA5对应的子霍夫空间特征和子霍夫空间特征H4,确定子图像空间特征P5对应的子霍夫空间特征H5。之后,输出网络基于子霍夫空间特征H2至H5,确定并输出霍夫关键点热图。
需要说明的是,当样本图像为三维的样本图像时,图像获取模型主要检测三维的样本图像的某一个平面,因此,样本图像在图像空间的热图可以作为样本图像的平面注意力图像。当样本图像为二维的样本图像时,图像获取模型主要检测二维的样本图像的直线或者椭圆,因此,样本图像在图像空间的热图可以作为样本图像的线条注意力图像。由于DHT网络和IDHT网络能够实现样本图像在图像空间与样本图像在霍夫空间的转换,使得图像获取模型能够便捷的提取线条、平面等几何特征,提高图像检测的准确性。
基于上述实施环境,本申请实施例还提供了一种图像检测方法,以图11所示的本申请实施例提供的一种图像检测方法的流程图为例,该方法可由图1中的电子设备11执行。如图11所示,该方法包括步骤1101至步骤1103。
步骤1101,电子设备获取待检测图像。
本申请实施例不对待检测图像做限定,示例性的,待检测图像包括但不限于脑部图像、风景图像、道路图像、胎儿图像、细胞图像等,待检测图像的数量为至少一个。
步骤1102,电子设备根据图像获取模型,获取待检测图像的目标图像。
其中,目标图像是待检测图像在霍夫空间的热图,图像获取模型是根据上述方法实施例所示的图像获取模型的训练方法得到的。
其中,图像获取模型的训练过程,包括:获取样本图像的标签图像对,标签图像对包括第一标签图像和第二标签图像,第一标签图像是样本图像在图像空间的热图,第二标签图像是样本图像在霍夫空间的热图;根据第一网络模型,获取样本图像的预测图像对,预测图像对包括第一预测图像和第二预测图像,第一预测图像是第一网络模型得到的样本图像在图像空间的热图,第二预测图像是第一网络模型得到的样本图像在霍夫空间的热图;基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小;响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
本申请实施例中,将待检测图像输入至图像获取模型,由图像获取模型输出待检测图像的目标图像对。其中,待检测图像的目标图像对包括第一目标图像和第二目标图像。第一目标图像是待检测图像在图像空间的热图,第一目标图像中任一个像素点的值表征该像素点在目标直线上或者在目标椭圆上或者在目标平面上的概率。第二目标图像是待检测图像在霍夫空间的热图,第二目标图像中任一个像素点的值表征该像素点为目标直线或者为目标椭圆或者为目标平面的概率。其中,第二目标图像即为步骤1102中提及的目标图像。
需要说明的是,第一目标图像中任一个像素点的值表征该像素点在目标直线上(或者在目标椭圆上或者在目标平面上)的概率,若基于第一目标图像确定目标直线,则需要从第一目标图像中确定多个点,利用多个点组成一条目标直线,由于每一个点的确定均可能存在误差,导致目标直线容易出现扭曲现象,多个点的误差累积会使得目标直线的准确性较低。第二目标图像中任一个像素点的值表征该像素点为目标直线(或者为目标椭圆或者为目标平面)的概率,通过从第二目标图像中确定一个像素点,即可确定出标准的目标直线,准确性较高。因此,本申请实施例提供的方法在图像检测时,采用第二目标图像作为目标图像,使得后续的图像检测结果的准确性更高。
在一种可能的实现方式中,待检测图像为脑部图像,电子设备根据图像获取模型获取待检测图像的目标图像,包括:电子设备根据图像获取模型,获取脑部图像的第一图像空间特征,脑部图像的第一图像空间特征表征脑部图像在图像空间的特征;基于脑部图像的第一图像空间特征确定脑部图像的第一霍夫空间特征,脑部图像的第一霍夫空间特征表征脑部图像在霍夫空间的特征;基于脑部图像的第一霍夫空间特征确定脑部图像的目标图像。
可选地,图像获取模型包括主干网络、沙漏网络和3D DHT网络,主干网络提取脑部图像的初始特征,脑部图像的初始特征经过沙漏网络进行特征处理后,得到脑部图像的第一图像空间特征,脑部图像的第一图像空间特征经过3D DHT网络进行霍夫变换后,得到脑部图像的第一霍夫空间特征,对第一霍夫空间特征进行1×1×1的卷积和上采样后,得到目标图像,详细请参见图8的相关介绍,在此不再赘述。
在一种可能的实现方式中,电子设备基于脑部图像的第一霍夫空间特征确定脑部图像的目标图像,包括:电子设备基于脑部图像的第一霍夫空间特征和脑部图像的第一图像空间特征确定脑部图像的第二图像空间特征;基于脑部图像的第二图像空间特征确定脑部图像的第二霍夫空间特征;基于脑部图像的第二霍夫空间特征确定脑部图像的目标图像。
本申请实施例中,图像获取模型在得到脑部图像的第一霍夫空间特征之后,先基于脑部图像的第一图像空间特征和脑部图像的第一霍夫空间特征确定脑部图像的第二图像空间特征。然后,基于脑部图像的第二图像空间特征进行霍夫变换,得到脑部图像的第二霍夫空间特征。之后,基于脑部图像的第二霍夫空间特征进行上采样,得到目标图像。
可选地,图像获取模型包括主干网络、沙漏网络、3D DHT网络、残差块网络、3D IDHT网络等,脑部图像经过主干网络、沙漏网络和3D DHT网络进行处理后,得到脑部图像的第一霍夫空间特征。脑部图像的第一霍夫空间特征先经过两个残差块网络,以进一步提取霍夫空间的特征,再经过3D IDHT网络进行反向霍夫变换,得到脑部图像的第三图像空间特征。
之后,电子设备将脑部图像的第三图像空间特征、第一图像空间特征和初始特征进行融合,得到第一融合特征。第一融合特征经过沙漏网络进行特征处理后,得到脑部图像的第二图像空间特征,脑部图像的第二图像空间特征经过3D DHT网络进行霍夫变换,得到脑部图像的第二霍夫空间特征,第二霍夫空间特征经过1×1×1的卷积和上采样后,得到目标图像,详细请参见图8的相关介绍,在此不在赘述。
需要说明的是,待检测图像除可以为脑部图像之外,还可以为风景图像、道路图像、胎儿图像、细胞图像等,图像获取模型对风景图像、道路图像、胎儿图像、细胞图像等的处理方式与图像获取模型对脑部图像的处理方式相类似,且图像获取模型对待检测图像的处理方式与第一网络模型对样本图像的处理方式相类似,详细可见有关第一网络模型的介绍,在此不再赘述。
步骤1103,电子设备基于目标图像确定待检测图像的图像检测结果。
本申请实施例中,目标图像是待检测图像在霍夫空间的热图,通过确定目标图像中的峰值点,确定待检测图像的某一个平面、直线或者椭圆,从而得到图像检测结果。
在一种可能的实现方式中,电子设备基于目标图像确定待检测图像的图像检测结果,包括:电子设备对目标图像进行非极大值抑制处理,得到目标图像中的至少一个峰值点;基于目标图像中的至少一个峰值点确定待检测图像的图像检测结果。
非极大值抑制是抑制不是极大值的元素,可以理解为局部最大值搜索。本申请实施例中,目标图像是待检测图像在霍夫空间的热图,目标图像中包括至少一个高斯球。通过非极大值抑制处理,搜索每一个高斯球中的最大值,得到该高斯球的峰值点,即得到目标图像中的至少一个峰值点。任一个峰值点对应一个平面、直线或者椭圆。
可选地,待检测图像为脑部图像,图像检测结果为脑部图像的脑中线平面;待检测图像为胎儿图像,图像检测结果为胎儿图像的至少一个标准平面,任一个标准平面表征胎儿头部的标准平面、胎儿腹部的标准平面、胎儿股骨的标准平面中的任一个;待检测图像为细胞图像,图像检测结果为细胞图像的至少一个椭圆,任一个椭圆表征细胞的轮廓;待检测图像为摄影图像,图像检测结果为摄影图像的至少一个线条,至少一个线条表征摄影图像中的对象的结构。
待检测图像包括但不限于脑部图像、胎儿图像、细胞图像和摄影图像,根据待检测图像的不同,待检测图像的图像检测结果也不同。下面分别从实现方式B1至实现方式B4的角度,详细介绍不同的待检测图像和其图像检测结果。
实现方式B1,待检测图像为脑部图像,电子设备基于目标图像确定待检测图像的图像检测结果,包括:确定目标图像中的峰值点;基于目标图像中的峰值点,确定脑中线平面。
本申请实施例中,当待检测图像为脑部图像时,将脑部图像输入至图像获取模型,得到图像获取模型输出的目标图像,其中,目标图像是一个关键点的热图。基于目标图像中的这个关键点,确定这个关键点的峰值点,得到目标图像中的峰值点。基于目标图像中的峰值点确定一个平面方程,该平面方程为脑部图像的脑中线平面所对应的平面方程。
请参见图12,图12是本申请实施例提供的一种脑部扫描图像的处理示意图。本申请实施例不对图像获取模型的结构做限定,示例性的,图像获取模型可以是如图8所示的模型,待检测图像为脑部扫描图像,脑部扫描图像包括至少一张脑部图像。
电子设备将脑部扫描图像输入至图像获取模型,由图像获取模型输出霍夫关键点热图(即目标图像),基于霍夫关键点热图计算峰值点,这里的峰值点包括三个参数,分别为
Figure PCTCN2022121317-appb-000015
θ和ρ。其中,
Figure PCTCN2022121317-appb-000016
θ和ρ为平面在极坐标系的三个参数。根据极坐标系的平面方程
Figure PCTCN2022121317-appb-000017
Figure PCTCN2022121317-appb-000018
确定直角坐标系的平面方程Ax+By+Cz+D=0,得到脑部图像 的脑中线平面所对应的平面方程。一方面,基于该平面方程计算脑中线偏移量,并基于脑中线偏移量制作二维可视化图像,该二维可视化图像中加粗的线表示脑中线。另一方面,基于该平面方程进行三维可视化,得到三维可视化图像,该三维可视化图像中的平行四边形表示脑中线平面。
实现方式B2,待检测图像为胎儿图像,电子设备基于目标图像确定待检测图像的图像检测结果,包括:电子设备确定目标图像中的各个峰值点;基于目标图像中的各个峰值点,确定各个峰值点对应的标准平面,任一个峰值点对应的标准平面表征胎儿头部的标准平面、胎儿腹部的标准平面、胎儿股骨的标准平面中的任一个。
本申请实施例中,当待检测图像为胎儿图像时,将胎儿图像输入至图像获取模型,得到图像获取模型输出的目标图像,其中,目标图像是至少一个关键点的热图。基于目标图像中的各个关键点,确定各个关键点的峰值点,得到目标图像中的各个峰值点。基于目标图像中的各个峰值点确定各个峰值点对应的平面方程,任一个峰值点对应的平面方程为胎儿头部的标准平面或者胎儿腹部的标准平面或者胎儿股骨的标准平面所对应的平面方程。
请参见图13,图13是本申请实施例提供的一种胎儿扫描图像的处理示意图。本申请实施例不对图像获取模型的结构做限定,示例性的,图像获取模型可以是如图8所示的模型,待检测图像为胎儿扫描图像,胎儿扫描图像包括至少一张胎儿图像。
电子设备将胎儿扫描图像输入至图像获取模型,由图像获取模型输出霍夫关键点热图(即目标图像),其中,霍夫关键点热图是多个关键点的热图。对霍夫关键点热图进行非极大抑制处理,以搜索局部的最大值,从而确定各个关键点的峰值点,得到多个峰值点。确定各个峰值点对应的平面方程,得到胎儿的股骨、腹部、头部等的平面方程,以便于快速定位平面,减少阅片时间,得到胎儿的股骨、腹部、头部等的标准平面。之后,基于胎儿的股骨、腹部、头部等的标准平面测量胎儿发育参数,以评估胎儿发育情况。
实现方式B3,待检测图像为细胞图像,电子设备基于目标图像确定待检测图像的图像检测结果,包括:确定目标图像中的各个峰值点;基于目标图像中的各个峰值点,确定各个峰值点对应的椭圆,任一个峰值点对应的椭圆表征细胞的轮廓。
本申请实施例中,当待检测图像为细胞图像时,将细胞图像输入至图像获取模型,得到图像获取模型输出的目标图像,其中,目标图像是至少一个关键点的热图。基于目标图像中的各个关键点,确定各个关键点的峰值点,得到目标图像中的各个峰值点。基于目标图像中的各个峰值点确定各个峰值点对应的椭圆方程,任一个峰值点对应的椭圆方程为细胞的轮廓所对应的椭圆方程。
可以理解的是,细胞图像中包括至少一个细胞,通过确定各个峰值点对应的椭圆方程,能够确定细胞图像中各个细胞的轮廓所对应的椭圆方程,从而检测出细胞图像中的各个细胞。
请参见图14,图14是本申请实施例提供的一种细胞图像的处理示意图。本申请实施例不对图像获取模型的结构做限定,示例性的,图像获取模型可以是如图10所示的模型,待检测图像为细胞图像,细胞图像中包括多个细胞。
电子设备将细胞图像输入至图像获取模型,由图像获取模型输出霍夫关键点热图(即目标图像),其中,霍夫关键点热图是多个关键点的热图。对霍夫关键点热图进行非极大抑制处理,以搜索局部的最大值,从而确定各个关键点的峰值点,得到多个峰值点。确定各个峰值点对应的椭圆方程,得到细胞图像中各个细胞的轮廓所对应的椭圆方程,从而检测出细胞图像中的各个细胞,得到检测结果。
实现方式B4,待检测图像为摄影图像,电子设备基于目标图像确定待检测图像的图像检测结果,包括:确定目标图像中的各个峰值点;基于目标图像中的各个峰值点,确定各个峰值点对应的线条,各个峰值点对应的线条表征摄影对象的结构。
本申请实施例中,当待检测图像为摄影图像时,将摄影图像输入至图像获取模型,得到图像获取模型输出的目标图像,其中,目标图像是至少一个关键点的热图。基于目标图像中的各个关键点,确定各个关键点的峰值点,得到目标图像中的各个峰值点。基于目标图像中 的各个峰值点确定各个峰值点对应的直线方程或者椭圆方程等,从而得到各个峰值点对应的线条(直线或者椭圆等)。任一个峰值点对应的线条是摄影对象的线条,各个峰值点对应的线条可以表征摄影对象的结构。
可以理解的是,当摄影图像为道路图像时,任一个峰值点对应的线条可以是道路边缘的线条、道路中路标(如直行路标、左拐路标等)的线条,各个峰值点对应的线条能够表征道路的结构。
请参见图15,图15是本申请实施例提供的一种摄影图像的处理示意图。本申请实施例不对图像获取模型的结构做限定,示例性的,图像获取模型可以是如图10所示的模型,待检测图像为摄影图像。
电子设备将摄影图像输入至图像获取模型,由图像获取模型输出霍夫关键点热图(即目标图像),其中,霍夫关键点热图是多个关键点的热图。对霍夫关键点热图进行非极大抑制处理,以搜索局部的最大值,从而确定各个关键点的峰值点,得到多个峰值点。确定各个峰值点对应的直线方程,得到摄影对象的所对应的各个直线方程,从而得到多条直线,即得到摄影构图。
上述方法中的图像检测模型是基于样本图像在霍夫空间的热图和样本图像在图像空间的热图训练得到的,使得图像获取模型学习到了霍夫空间的关键点特征和图像空间的语义特征,能够快速而准确的确定出图像在霍夫空间的热图,实现准确的基于图像在霍夫空间的热图确定图像检测结果,从而提高后续分析处理的准确性。
图16所示为本申请实施例提供的一种图像获取模型的训练装置的结构示意图,如图16所示,该装置包括:
第一获取模块1601,用于获取样本图像的标签图像对,标签图像对包括第一标签图像和第二标签图像,第一标签图像是样本图像在图像空间的热图,第二标签图像是样本图像在霍夫空间的热图;
第二获取模块1602,用于根据第一网络模型,获取样本图像的预测图像对,预测图像对包括第一预测图像和第二预测图像,第一预测图像是第一网络模型得到的样本图像在图像空间的热图,第二预测图像是第一网络模型得到的样本图像在霍夫空间的热图;
调整模块1603,用于基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小;
确定模块1604,用于响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
在一种可能的实现方式中,第二获取模块1602,用于获取样本图像的第一图像空间特征,第一图像空间特征表征样本图像在图像空间的特征;基于第一图像空间特征确定第一预测图像;基于第一图像空间特征确定第一霍夫空间特征,第一霍夫空间特征表征样本图像在霍夫空间的特征;基于第一霍夫空间特征确定第二预测图像。
在一种可能的实现方式中,第二获取模块1602,用于对第一图像空间特征进行整流,得到整流后的第一图像空间特征;基于整流后的第一图像空间特征,确定第一霍夫空间特征。
在一种可能的实现方式中,调整模块1603,用于基于标签图像对和预测图像对确定第一网络模型的损失值,第一网络模型的损失值表示预测图像对与标签图像对之间的差异;基于第一网络模型的损失值调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小。
在一种可能的实现方式中,第二获取模块1602,还用于基于第一霍夫空间特征和第一图像空间特征确定第二图像空间特征;基于第二图像空间特征确定第三预测图像,第三预测图像是第一网络模型得到的样本图像在图像空间的热图;基于第二图像空间特征确定第二霍夫空间特征;基于第二霍夫空间特征确定第四预测图像,第四预测图像是第一网络模型得到的样本图像在霍夫空间的热图;
调整模块1603,用于基于标签图像对、预测图像对、第三预测图像和第四预测图像调整 第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异减小。
在一种可能的实现方式中,第二获取模块1602,用于基于第一霍夫空间特征确定第三图像空间特征;将第一图像空间特征和第三图像空间特征进行融合,得到第一融合特征;基于第一融合特征确定第二图像空间特征。
在一种可能的实现方式中,调整模块1603,用于根据标签图像对和预测图像对,得到第一损失值,第一损失值表示预测图像对与标签图像对之间的差异;根据标签图像对、第三预测图像和第四预测图像,得到第二损失值,第二损失值表示第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异;基于第一损失值和第二损失值,得到第一网络模型的损失值;基于第一网络模型的损失值调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异减小。
在一种可能的实现方式中,第一图像空间特征包括至少两个子图像空间特征;
第二获取模块1602,用于基于至少两个子图像空间特征确定第一预测图像。
在一种可能的实现方式中,第二获取模块1602,用于对于至少两个子图像空间特征中的任一个子图像空间特征,确定任一个子图像空间特征对应的子霍夫空间特征,第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征;基于各个子图像空间特征对应的子霍夫空间特征确定第二预测图像。
在一种可能的实现方式中,第二获取模块1602,用于对于至少两个子图像空间特征中除第一个子图像空间特征之外的任一个子图像空间特征,基于任一个子图像空间特征和任一个子图像空间特征的上一个子图像空间特征,确定任一个子图像空间特征对应的第二融合特征;基于任一个子图像空间特征对应的第二融合特征和该上一个子图像空间特征,确定任一个子图像空间特征对应的子霍夫空间特征。
在一种可能的实现方式中,第二获取模块1602,用于基于该上一个子图像空间特征,确定上一个子图像空间特征对应的第二融合特征;基于上一个子图像空间特征对应的第二融合特征,确定第四图像空间特征;基于第四图像空间特征确定第三霍夫空间特征;基于任一个子图像空间特征、第三霍夫空间特征和上一个子图像空间特征对应的第二融合特征,确定任一个子图像空间特征对应的第二融合特征。
在一种可能的实现方式中,第二获取模块1602,用于基于该上一个子图像空间特征,确定上一个子图像空间特征对应的子霍夫空间特征;基于任一个子图像空间特征对应的第二融合特征,确定第二融合特征对应的子霍夫空间特征;基于第二融合特征对应的子霍夫空间特征和上一个子图像空间特征对应的子霍夫空间特征,确定任一个子图像空间特征对应的子霍夫空间特征。
上述装置图像检测模型是基于样本图像在霍夫空间的热图和样本图像在图像空间的热图训练得到的,使得图像获取模型学习到了霍夫空间的关键点特征和图像空间的语义特征,能够快速而准确的确定出图像在霍夫空间的热图,实现准确的基于图像在霍夫空间的热图确定图像检测结果,从而提高后续分析处理的准确性。
应理解的是,上述图16提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图17所示为本申请实施例提供的一种图像检测装置的结构示意图,如图17所示,该装置包括:
第一获取模块1701,用于获取待检测图像;
第二获取模块1702,用于根据图像获取模型,获取待检测图像的目标图像,目标图像是待检测图像在霍夫空间的热图;
确定模块1703,用于基于目标图像确定待检测图像的图像检测结果;
其中,图像获取模型的训练过程,包括:
获取样本图像的标签图像对,标签图像对包括第一标签图像和第二标签图像,第一标签图像是样本图像在图像空间的热图,第二标签图像是样本图像在霍夫空间的热图;
根据第一网络模型,获取样本图像的预测图像对,预测图像对包括第一预测图像和第二预测图像,第一预测图像是第一网络模型得到的样本图像在图像空间的热图,第二预测图像是第一网络模型得到的样本图像在霍夫空间的热图;
基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小;
响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
在一种可能的实现方式中,待检测图像为脑部图像,第二获取模块1702,用于根据图像获取模型,获取脑部图像的第一图像空间特征,脑部图像的第一图像空间特征表征脑部图像在图像空间的特征;基于脑部图像的第一图像空间特征确定脑部图像的第一霍夫空间特征,脑部图像的第一霍夫空间特征表征脑部图像在霍夫空间的特征;基于脑部图像的第一霍夫空间特征确定脑部图像的目标图像。
在一种可能的实现方式中,第二获取模块1702,用于基于脑部图像的第一霍夫空间特征和脑部图像的第一图像空间特征确定脑部图像的第二图像空间特征;基于脑部图像的第二图像空间特征确定脑部图像的第二霍夫空间特征;基于脑部图像的第二霍夫空间特征确定脑部图像的目标图像。
在一种可能的实现方式中,确定模块1703,用于对目标图像进行非极大值抑制处理,得到目标图像中的至少一个峰值点;基于目标图像中的至少一个峰值点确定待检测图像的图像检测结果。
上述装置图像检测模型是基于样本图像在霍夫空间的热图和样本图像在图像空间的热图训练得到的,使得图像获取模型学习到了霍夫空间的关键点特征和图像空间的语义特征,能够快速而准确的确定出图像在霍夫空间的热图,实现准确的基于图像在霍夫空间的热图确定图像检测结果,从而提高后续分析处理的准确性。
应理解的是,上述图17提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图18示出了本申请一个示例性实施例提供的终端设备1800的结构框图。该终端设备1800可以是便携式移动终端,比如:智能手机、平板电脑、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端设备1800还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端设备1800包括有:处理器1801和存储器1802。
处理器1801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1801可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1801可以集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示 的内容的渲染和绘制。一些实施例中,处理器1801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器1802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1801所执行以实现本申请中方法实施例提供的图像获取模型的训练方法或者图像检测方法。
在一些实施例中,终端设备1800还可选包括有:外围设备接口1803和至少一个外围设备。处理器1801、存储器1802和外围设备接口1803之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1803相连。具体地,外围设备包括:射频电路1804和显示屏1805中的至少一种。
外围设备接口1803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1801和存储器1802。在一些实施例中,处理器1801、存储器1802和外围设备接口1803被集成在同一芯片或电路板上;在一些其他实施例中,处理器1801、存储器1802和外围设备接口1803中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路1804用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1804通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1804将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路1804包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1804可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路1804还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏1805用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1805是触摸显示屏时,显示屏1805还具有采集在显示屏1805的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1801进行处理。此时,显示屏1805还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1805可以为一个,设置在终端设备1800的前面板;在另一些实施例中,显示屏1805可以为至少两个,分别设置在终端设备1800的不同表面或呈折叠设计;在另一些实施例中,显示屏1805可以是柔性显示屏,设置在终端设备1800的弯曲表面上或折叠面上。甚至,显示屏1805还可以设置成非矩形的不规则图形,也即异形屏。显示屏1805可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
本领域技术人员可以理解,图18中示出的结构并不构成对终端设备1800的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图19为本申请实施例提供的服务器的结构示意图,该服务器1900可因配置或性能不同而产生比较大的差异,可以包括一个或多个处理器1901和一个或多个的存储器1902,其中,该一个或多个存储器1902中存储有至少一条程序代码,该至少一条程序代码由该一个或多个处理器1901加载并执行以实现上述各个方法实施例提供的图像获取模型的训练方法或者图像检测方法,示例性的,处理器1901为CPU。当然,该服务器1900还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器1900还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种电子设备,电子设备包括处理器和存储器,存储器中 存储有至少一条程序代码,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
获取样本图像的标签图像对,标签图像对包括第一标签图像和第二标签图像,第一标签图像是样本图像在图像空间的热图,第二标签图像是样本图像在霍夫空间的热图;
根据第一网络模型,获取样本图像的预测图像对,预测图像对包括第一预测图像和第二预测图像,第一预测图像是第一网络模型得到的样本图像在图像空间的热图,第二预测图像是第一网络模型得到的样本图像在霍夫空间的热图;
基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小;
响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
获取样本图像的第一图像空间特征,第一图像空间特征表征样本图像在图像空间的特征;
基于第一图像空间特征确定第一预测图像;
基于第一图像空间特征确定第一霍夫空间特征,第一霍夫空间特征表征样本图像在霍夫空间的特征;
基于第一霍夫空间特征确定第二预测图像。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
对第一图像空间特征进行整流,得到整流后的第一图像空间特征;
基于整流后的第一图像空间特征,确定第一霍夫空间特征。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
基于标签图像对和预测图像对确定第一网络模型的损失值,第一网络模型的损失值表示预测图像对与标签图像对之间的差异;
基于第一网络模型的损失值调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
基于第一霍夫空间特征和第一图像空间特征确定第二图像空间特征;
基于第二图像空间特征确定第三预测图像,第三预测图像是第一网络模型得到的样本图像在图像空间的热图;
基于第二图像空间特征确定第二霍夫空间特征;
基于第二霍夫空间特征确定第四预测图像,第四预测图像是第一网络模型得到的样本图像在霍夫空间的热图;
基于标签图像对、预测图像对、第三预测图像和第四预测图像调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异减小。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
基于第一霍夫空间特征确定第三图像空间特征;
将第一图像空间特征和第三图像空间特征进行融合,得到第一融合特征;
基于第一融合特征确定第二图像空间特征。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
根据标签图像对和预测图像对,得到第一损失值,第一损失值表示预测图像对与标签图 像对之间的差异;
根据标签图像对、第三预测图像和第四预测图像,得到第二损失值,第二损失值表示第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异;
基于第一损失值和第二损失值,得到第一网络模型的损失值;
基于第一网络模型的损失值调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与标签图像对之间的差异减小。
在一种可能的实现方式中,第一图像空间特征包括至少两个子图像空间特征;该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
基于至少两个子图像空间特征确定第一预测图像。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
对于至少两个子图像空间特征中的任一个子图像空间特征,确定任一个子图像空间特征对应的子霍夫空间特征,第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征;
基于各个子图像空间特征对应的子霍夫空间特征确定第二预测图像。
在一种可能的实现方式中,第一图像空间特征包括至少两个子图像空间特征,第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征;该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
对于至少两个子图像空间特征中除第一个子图像空间特征之外的任一个子图像空间特征,基于任一个子图像空间特征和任一个子图像空间特征的上一个子图像空间特征,确定任一个子图像空间特征对应的第二融合特征;
基于任一个子图像空间特征对应的第二融合特征和上一个子图像空间特征,确定任一个子图像空间特征对应的子霍夫空间特征。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
基于上一个子图像空间特征,确定上一个子图像空间特征对应的第二融合特征;
基于上一个子图像空间特征对应的第二融合特征,确定第四图像空间特征;
基于第四图像空间特征确定第三霍夫空间特征;
基于任一个子图像空间特征、第三霍夫空间特征和上一个子图像空间特征对应的第二融合特征,确定任一个子图像空间特征对应的第二融合特征。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
基于上一个子图像空间特征,确定上一个子图像空间特征对应的子霍夫空间特征;
基于任一个子图像空间特征对应的第二融合特征,确定第二融合特征对应的子霍夫空间特征;
基于第二融合特征对应的子霍夫空间特征和上一个子图像空间特征对应的子霍夫空间特征,确定任一个子图像空间特征对应的子霍夫空间特征。
在示例性实施例中,还提供了一种电子设备,电子设备包括处理器和存储器,存储器中存储有至少一条程序代码,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
获取待检测图像;
根据图像获取模型,获取待检测图像的目标图像,目标图像是待检测图像在霍夫空间的热图;
基于目标图像确定待检测图像的图像检测结果;
其中,图像获取模型的训练过程,包括:
获取样本图像的标签图像对,标签图像对包括第一标签图像和第二标签图像,第一标签图像是样本图像在图像空间的热图,第二标签图像是样本图像在霍夫空间的热图;
根据第一网络模型,获取样本图像的预测图像对,预测图像对包括第一预测图像和第二预测图像,第一预测图像是第一网络模型得到的样本图像在图像空间的热图,第二预测图像是第一网络模型得到的样本图像在霍夫空间的热图;
基于标签图像对和预测图像对调整第一网络模型,得到第二网络模型,以使根据第二网络模型得到的预测图像对与标签图像对之间的差异减小;
响应于满足训练终止条件,将第二网络模型确定为图像获取模型。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
根据图像获取模型获取脑部图像的第一图像空间特征,脑部图像的第一图像空间特征表征脑部图像在图像空间的特征;
基于脑部图像的第一图像空间特征确定脑部图像的第一霍夫空间特征,脑部图像的第一霍夫空间特征表征脑部图像在霍夫空间的特征;
基于脑部图像的第一霍夫空间特征确定脑部图像的目标图像。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
基于脑部图像的第一霍夫空间特征和脑部图像的第一图像空间特征确定脑部图像的第二图像空间特征;
基于脑部图像的第二图像空间特征确定脑部图像的第二霍夫空间特征;
基于脑部图像的第二霍夫空间特征确定脑部图像的目标图像。
在一种可能的实现方式中,该至少一条程序代码由处理器加载并执行,以使电子设备实现如下操作:
对目标图像进行非极大值抑制处理,得到目标图像中的至少一个峰值点;
基于目标图像中的至少一个峰值点确定待检测图像的图像检测结果。
在示例性实施例中,还提供了一种计算机可读存储介质,该存储介质中存储有至少一条程序代码,该至少一条程序代码由处理器加载并执行,以使电子设备实现上述任一种图像获取模型的训练方法或者图像检测方法。
可选地,上述计算机可读存储介质可以是只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品中存储有至少一条计算机指令,该至少一条计算机指令由处理器加载并执行,以使电子设备实现上述任一种图像获取模型的训练方法或者图像检测方法。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
以上仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (21)

  1. 一种图像获取模型的训练方法,所述方法包括:
    电子设备获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在霍夫空间的热图;
    所述电子设备根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
    所述电子设备基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
    所述电子设备响应于满足训练终止条件,将所述第二网络模型确定为图像获取模型。
  2. 根据权利要求1所述的方法,其中,所述电子设备获取所述样本图像的预测图像对,包括:
    所述电子设备获取所述样本图像的第一图像空间特征,所述第一图像空间特征表征所述样本图像在所述图像空间的特征;
    所述电子设备基于所述第一图像空间特征确定所述第一预测图像;
    所述电子设备基于所述第一图像空间特征确定第一霍夫空间特征,所述第一霍夫空间特征表征所述样本图像在所述霍夫空间的特征;
    所述电子设备基于所述第一霍夫空间特征确定所述第二预测图像。
  3. 根据权利要求2所述的方法,其中,所述电子设备基于所述第一图像空间特征确定第一霍夫空间特征,包括:
    所述电子设备对所述第一图像空间特征进行整流,得到整流后的第一图像空间特征;
    所述电子设备基于所述整流后的第一图像空间特征,确定所述第一霍夫空间特征。
  4. 根据权利要求1所述的方法,其中,所述电子设备基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小,包括:
    所述电子设备基于所述标签图像对和所述预测图像对确定所述第一网络模型的损失值,所述第一网络模型的损失值表示所述预测图像对与所述标签图像对之间的差异;
    所述电子设备基于所述第一网络模型的损失值调整所述第一网络模型,得到所述第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小。
  5. 根据权利要求2或3所述的方法,其中,所述方法还包括:
    所述电子设备基于所述第一霍夫空间特征和所述第一图像空间特征确定第二图像空间特征;
    所述电子设备基于所述第二图像空间特征确定第三预测图像,所述第三预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图;
    所述电子设备基于所述第二图像空间特征确定第二霍夫空间特征;
    所述电子设备基于所述第二霍夫空间特征确定第四预测图像,所述第四预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
    所述电子设备基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小,包括:
    所述电子设备基于所述标签图像对、所述预测图像对、所述第三预测图像和所述第四预 测图像调整所述第一网络模型,得到所述第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与所述标签图像对之间的差异减小。
  6. 根据权利要求5所述的方法,其中,所述电子设备基于所述第一霍夫空间特征和所述第一图像空间特征确定第二图像空间特征,包括:
    所述电子设备基于所述第一霍夫空间特征确定第三图像空间特征;
    所述电子设备将所述第一图像空间特征和所述第三图像空间特征进行融合,得到第一融合特征;
    所述电子设备基于所述第一融合特征确定所述第二图像空间特征。
  7. 根据权利要求5所述的方法,其中,所述电子设备基于所述标签图像对、所述预测图像对、所述第三预测图像和所述第四预测图像调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与所述标签图像对之间的差异减小,包括:
    所述电子设备根据所述标签图像对和所述预测图像对,得到第一损失值,所述第一损失值表示所述预测图像对与所述标签图像对之间的差异;
    所述电子设备根据所述标签图像对、所述第三预测图像和所述第四预测图像,得到第二损失值,所述第二损失值表示所述第三预测图像和所述第四预测图像组成的图像对与所述标签图像对之间的差异;
    所述电子设备基于所述第一损失值和所述第二损失值,得到所述第一网络模型的损失值;
    所述电子设备基于所述第一网络模型的损失值调整所述第一网络模型,得到所述第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小,且得到的第三预测图像和第四预测图像组成的图像对与所述标签图像对之间的差异减小。
  8. 根据权利要求2或3所述的方法,其中,所述第一图像空间特征包括至少两个子图像空间特征;
    所述电子设备基于所述第一图像空间特征确定所述第一预测图像,包括:
    所述电子设备基于所述至少两个子图像空间特征确定所述第一预测图像。
  9. 根据权利要求8所述的方法,其中,所述电子设备基于所述第一图像空间特征确定第一霍夫空间特征,包括:
    对于所述至少两个子图像空间特征中的任一个子图像空间特征,所述电子设备确定所述任一个子图像空间特征对应的子霍夫空间特征,所述第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征;
    所述电子设备基于所述第一霍夫空间特征确定所述第二预测图像,包括:
    所述电子设备基于所述各个子图像空间特征对应的子霍夫空间特征确定所述第二预测图像。
  10. 根据权利要求2或3所述的方法,其中,所述第一图像空间特征包括至少两个子图像空间特征,所述第一霍夫空间特征包括各个子图像空间特征对应的子霍夫空间特征;
    所述电子设备基于所述第一图像空间特征确定第一霍夫空间特征,包括:
    对于所述至少两个子图像空间特征中除第一个子图像空间特征之外的任一个子图像空间特征,所述电子设备基于所述任一个子图像空间特征和所述任一个子图像空间特征的上一个子图像空间特征,确定所述任一个子图像空间特征对应的第二融合特征;
    所述电子设备基于所述任一个子图像空间特征对应的第二融合特征和所述上一个子图像空间特征,确定所述任一个子图像空间特征对应的子霍夫空间特征。
  11. 根据权利要求10所述的方法,其中,所述电子设备基于所述任一个子图像空间特征和所述任一个子图像空间特征的上一个子图像空间特征,确定所述任一个子图像空间特征对应的第二融合特征,包括:
    所述电子设备基于所述上一个子图像空间特征,确定所述上一个子图像空间特征对应的第二融合特征;
    所述电子设备基于所述上一个子图像空间特征对应的第二融合特征,确定第四图像空间特征;
    所述电子设备基于所述第四图像空间特征确定第三霍夫空间特征;
    所述电子设备基于所述任一个子图像空间特征、所述第三霍夫空间特征和所述上一个子图像空间特征对应的第二融合特征,确定所述任一个子图像空间特征对应的第二融合特征。
  12. 根据权利要求10所述的方法,其中,所述电子设备基于所述任一个子图像空间特征对应的第二融合特征和所述上一个子图像空间特征,确定所述任一个子图像空间特征对应的子霍夫空间特征,包括:
    所述电子设备基于所述上一个子图像空间特征,确定所述上一个子图像空间特征对应的子霍夫空间特征;
    所述电子设备基于所述任一个子图像空间特征对应的第二融合特征,确定所述第二融合特征对应的子霍夫空间特征;
    所述电子设备基于所述第二融合特征对应的子霍夫空间特征和所述上一个子图像空间特征对应的子霍夫空间特征,确定所述任一个子图像空间特征对应的子霍夫空间特征。
  13. 一种图像检测方法,所述方法包括:
    电子设备获取待检测图像;
    所述电子设备根据图像获取模型,获取所述待检测图像的目标图像,所述目标图像是所述待检测图像在霍夫空间的热图;
    所述电子设备基于所述目标图像确定所述待检测图像的图像检测结果;
    其中,所述图像获取模型的训练过程,包括:
    获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在所述霍夫空间的热图;
    根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
    基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
    响应于满足训练终止条件,将所述第二网络模型确定为所述图像获取模型。
  14. 根据权利要求13所述的方法,其中,所述待检测图像为脑部图像,所述电子设备根据图像获取模型,获取所述待检测图像的目标图像,包括:
    所述电子设备根据所述图像获取模型获取所述脑部图像的第一图像空间特征,所述脑部图像的第一图像空间特征表征所述脑部图像在所述图像空间的特征;
    所述电子设备基于所述脑部图像的第一图像空间特征确定所述脑部图像的第一霍夫空间特征,所述脑部图像的第一霍夫空间特征表征所述脑部图像在所述霍夫空间的特征;
    所述电子设备基于所述脑部图像的第一霍夫空间特征确定所述脑部图像的目标图像。
  15. 根据权利要求14所述的方法,其中,所述电子设备基于所述脑部图像的第一霍夫空间特征确定所述脑部图像的目标图像,包括:
    所述电子设备基于所述脑部图像的第一霍夫空间特征和所述脑部图像的第一图像空间特征确定所述脑部图像的第二图像空间特征;
    所述电子设备基于所述脑部图像的第二图像空间特征确定所述脑部图像的第二霍夫空间特征;
    所述电子设备基于所述脑部图像的第二霍夫空间特征确定所述脑部图像的目标图像。
  16. 根据权利要求13所述的方法,其中,所述电子设备基于所述目标图像确定所述待检测图像的图像检测结果,包括:
    所述电子设备对所述目标图像进行非极大值抑制处理,得到所述目标图像中的至少一个峰值点;
    所述电子设备基于所述目标图像中的至少一个峰值点确定所述待检测图像的图像检测结果。
  17. 一种图像获取模型的训练装置,所述装置包括:
    第一获取模块,用于获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在霍夫空间的热图;
    第二获取模块,用于根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
    调整模块,用于基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
    确定模块,用于响应于满足训练终止条件,将所述第二网络模型确定为图像获取模型。
  18. 一种图像检测装置,所述装置包括:
    第一获取模块,用于获取待检测图像;
    第二获取模块,用于根据图像获取模型,获取所述待检测图像的目标图像,所述目标图像是所述待检测图像在霍夫空间的热图;
    确定模块,用于基于所述目标图像确定所述待检测图像的图像检测结果;
    其中,所述图像获取模型的训练过程,包括:
    获取样本图像的标签图像对,所述标签图像对包括第一标签图像和第二标签图像,所述第一标签图像是所述样本图像在图像空间的热图,所述第二标签图像是所述样本图像在所述霍夫空间的热图;
    根据第一网络模型,获取所述样本图像的预测图像对,所述预测图像对包括第一预测图像和第二预测图像,所述第一预测图像是所述第一网络模型得到的所述样本图像在所述图像空间的热图,所述第二预测图像是所述第一网络模型得到的所述样本图像在所述霍夫空间的热图;
    基于所述标签图像对和所述预测图像对调整所述第一网络模型,得到第二网络模型,以使根据所述第二网络模型得到的预测图像对与所述标签图像对之间的差异减小;
    响应于满足训练终止条件,将所述第二网络模型确定为所述图像获取模型。
  19. 一种电子设备,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以使所述电子设备实现如权利要求1至12任一所述的图像获取模型的训练方法或者实现如权利要求13至16任一所述的图像检测方法。
  20. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以使电子设备实现如权利要求1至12任一所述的图像获取模型的训练方法或者实现如权利要求13至16任一所述的图像检测方法。
  21. 一种计算机程序产品,所述计算机程序产品包括至少一条计算机指令,所述至少一条计算机指令由处理器加载并执行,以使电子设备实现如权利要求1至12任一所述的图像获取模型的训练方法或者实现如权利要求13至16任一所述的图像检测方法。
PCT/CN2022/121317 2021-10-15 2022-09-26 图像获取模型的训练方法、图像检测方法、装置及设备 WO2023061195A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22880135.3A EP4318314A1 (en) 2021-10-15 2022-09-26 Image acquisition model training method and apparatus, image detection method and apparatus, and device
US18/199,872 US20230298324A1 (en) 2021-10-15 2023-05-19 Image acquisition model training method and apparatus, image detection method and apparatus, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111205913.3A CN114298268A (zh) 2021-10-15 2021-10-15 图像获取模型的训练方法、图像检测方法、装置及设备
CN202111205913.3 2021-10-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/199,872 Continuation US20230298324A1 (en) 2021-10-15 2023-05-19 Image acquisition model training method and apparatus, image detection method and apparatus, and device

Publications (1)

Publication Number Publication Date
WO2023061195A1 true WO2023061195A1 (zh) 2023-04-20

Family

ID=80964458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121317 WO2023061195A1 (zh) 2021-10-15 2022-09-26 图像获取模型的训练方法、图像检测方法、装置及设备

Country Status (4)

Country Link
US (1) US20230298324A1 (zh)
EP (1) EP4318314A1 (zh)
CN (1) CN114298268A (zh)
WO (1) WO2023061195A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298268A (zh) * 2021-10-15 2022-04-08 腾讯科技(深圳)有限公司 图像获取模型的训练方法、图像检测方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180322660A1 (en) * 2017-05-02 2018-11-08 Techcyte, Inc. Machine learning classification and training for digital microscopy images
CN111639755A (zh) * 2020-06-07 2020-09-08 电子科技大学中山学院 一种网络模型训练方法、装置、电子设备及存储介质
CN112307940A (zh) * 2020-10-28 2021-02-02 有半岛(北京)信息科技有限公司 模型训练方法、人体姿态检测方法、装置、设备及介质
CN113379687A (zh) * 2021-05-28 2021-09-10 上海联影智能医疗科技有限公司 网络训练方法、图像检测方法和介质
CN114298268A (zh) * 2021-10-15 2022-04-08 腾讯科技(深圳)有限公司 图像获取模型的训练方法、图像检测方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180322660A1 (en) * 2017-05-02 2018-11-08 Techcyte, Inc. Machine learning classification and training for digital microscopy images
CN111639755A (zh) * 2020-06-07 2020-09-08 电子科技大学中山学院 一种网络模型训练方法、装置、电子设备及存储介质
CN112307940A (zh) * 2020-10-28 2021-02-02 有半岛(北京)信息科技有限公司 模型训练方法、人体姿态检测方法、装置、设备及介质
CN113379687A (zh) * 2021-05-28 2021-09-10 上海联影智能医疗科技有限公司 网络训练方法、图像检测方法和介质
CN114298268A (zh) * 2021-10-15 2022-04-08 腾讯科技(深圳)有限公司 图像获取模型的训练方法、图像检测方法、装置及设备

Also Published As

Publication number Publication date
EP4318314A1 (en) 2024-02-07
CN114298268A (zh) 2022-04-08
US20230298324A1 (en) 2023-09-21

Similar Documents

Publication Publication Date Title
US11734851B2 (en) Face key point detection method and apparatus, storage medium, and electronic device
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
US9697416B2 (en) Object detection using cascaded convolutional neural networks
EP3576017A1 (en) Method, apparatus, and device for determining pose of object in image, and storage medium
CN111325726A (zh) 模型训练方法、图像处理方法、装置、设备及存储介质
CN111563502A (zh) 图像的文本识别方法、装置、电子设备及计算机存储介质
CN107301643B (zh) 基于鲁棒稀疏表示与拉普拉斯正则项的显著目标检测方法
CN111598168B (zh) 图像分类方法、装置、计算机设备及介质
CN111950570B (zh) 目标图像提取方法、神经网络训练方法及装置
WO2023151237A1 (zh) 人脸位姿估计方法、装置、电子设备及存储介质
CN112419326B (zh) 图像分割数据处理方法、装置、设备及存储介质
WO2022161302A1 (zh) 动作识别方法、装置、设备、存储介质及计算机程序产品
WO2022257614A1 (zh) 物体检测模型的训练方法、图像检测方法及其装置
CN112163577A (zh) 游戏画面中的文字识别方法、装置、电子设备和存储介质
CN114092963A (zh) 关键点检测及模型训练方法、装置、设备和存储介质
WO2023061195A1 (zh) 图像获取模型的训练方法、图像检测方法、装置及设备
CN116580211B (zh) 关键点检测方法、装置、计算机设备及存储介质
CN113723164A (zh) 获取边缘差异信息的方法、装置、设备及存储介质
WO2023109086A1 (zh) 文字识别方法、装置、设备及存储介质
CN113743186B (zh) 医学图像的处理方法、装置、设备及存储介质
CN116048682A (zh) 一种终端系统界面布局对比方法及电子设备
CN114049674A (zh) 一种三维人脸重建方法、装置及存储介质
CN113822871A (zh) 基于动态检测头的目标检测方法、装置、存储介质及设备
CN113538537B (zh) 图像配准、模型训练方法、装置、设备、服务器及介质
CN115497112B (zh) 表单识别方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22880135

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022880135

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022880135

Country of ref document: EP

Effective date: 20231031