CN116934728A - Hysteroscope image target detection acceleration method based on embedded AI processor - Google Patents

Hysteroscope image target detection acceleration method based on embedded AI processor Download PDF

Info

Publication number
CN116934728A
CN116934728A CN202310948368.XA CN202310948368A CN116934728A CN 116934728 A CN116934728 A CN 116934728A CN 202310948368 A CN202310948368 A CN 202310948368A CN 116934728 A CN116934728 A CN 116934728A
Authority
CN
China
Prior art keywords
model
target detection
processor
loss
distillation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310948368.XA
Other languages
Chinese (zh)
Inventor
张云飞
蔡占毅
曹黎俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Jiyuan Medical Technology Co ltd
Original Assignee
Jiangsu Jiyuan Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Jiyuan Medical Technology Co ltd filed Critical Jiangsu Jiyuan Medical Technology Co ltd
Priority to CN202310948368.XA priority Critical patent/CN116934728A/en
Publication of CN116934728A publication Critical patent/CN116934728A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N10/00Quantum computing, i.e. information processing based on quantum-mechanical phenomena
    • G06N10/20Models of quantum computing, e.g. quantum circuits or universal quantum computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Multimedia (AREA)
  • Computational Mathematics (AREA)
  • Condensed Matter Physics & Semiconductors (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a hysteroscope image target detection acceleration method based on an embedded AI processor, and belongs to the field of medical image processing. With the continuous development of artificial intelligence and deep learning, the network model tends to be complicated, and the parameter and the operand are also increased. The invention utilizes an embedded AI processor to perform real-time pretreatment and feature extraction on hysteroscope images, then uses a network model which is subjected to large model focus and global knowledge distillation to detect and classify abnormal targets in the images, and finally outputs detection results to a display. The invention can effectively improve the speed and accuracy of hysteroscope image target detection, meet the real-time requirement of medical target detection, and reduce the reasoning time and the consumption of calculation resources.

Description

Hysteroscope image target detection acceleration method based on embedded AI processor
Technical Field
The invention relates to the field of medical image processing, in particular to a hysteroscope image target detection acceleration method based on an embedded AI processor.
Background
Hysteroscopes are a medical device for examination and treatment of endometrial diseases, which can transmit images of the uterine cavity to a display through optical fibers for observation and manipulation by a doctor. Hysteroscopic image target detection refers to locating and identifying a region of interest in a hysteroscopic image, such as endometrium, polyp, myoma, adhesion, etc., for a doctor to diagnose and treat. Hysteroscopic image target detection is a challenging task because hysteroscopic images typically have low resolution, low contrast, high noise, occlusion, deformation, etc., which results in difficult target recognition. In order to improve the performance and accuracy of hysteroscopic image target detection, many deep learning-based methods, such as Convolutional Neural Networks (CNNs), have emerged in recent years. These methods typically require a significant amount of labeling data and computational resources to train complex models, such as Faster R-CNN, YOLO, SSD, and the like. However, these models require a long time and high memory consumption in reasoning and are not suitable for real-time medical target detection.
To address this problem, one possible approach is to use an embedded AI processor to accelerate hysteroscopic image target detection. An embedded AI processor is a chip specifically designed to perform artificial intelligence tasks, such as NVIDIA Jetson Nano, google Coral Edge TPU, etc. The chips have the advantages of low power consumption, high performance, small size and the like, and can realize rapid image processing and reasoning on edge equipment. However, there are still some problems with deploying deep learning models directly onto embedded AI processors, such as model oversize, computationally too high, loss of accuracy, etc. To solve these problems, an effective method is distillation (Big Model Knowledge Distillation, BMKD) using large model knowledge. BMKD is a model compression technique that trains a light-weight small model (student model) by building it with the supervision information of the larger model (teacher model) with better performance in order to achieve better performance and accuracy. The BMKD can effectively reduce the parameter quantity and the calculated quantity of the model, and improves the running efficiency and the adaptability of the model on the embedded AI processor.
The invention provides a method for accelerating hysteroscope image target detection based on an embedded AI processor and a BMKD technology.
Disclosure of Invention
The invention aims to provide a hysteroscope image target detection acceleration method based on an embedded AI processor, which mainly adopts a knowledge distillation method to train a lightweight model with good instantaneity, short reasoning time and certain precision by using a large model. The current knowledge distillation method is mostly used for the task of image classification, and the model loss function is improved on the basis of the current knowledge distillation method, so that the performance of knowledge distillation on the target detection task can be improved to a certain extent, and the acceleration by using an AI processor is facilitated.
The aim of the invention is realized by the following technical scheme:
a hysteroscope image target detection acceleration method based on an embedded AI processor comprises the following steps:
step 1: preparing a set of hysteroscopic images for model training, preprocessing the images prior to training, comprising: image scaling, random horizontal overturning, random vertical overturning, random angle rotation, gamma transformation, center cutting and standardization;
step 2: using a GAIA large model as a teacher network and using an arbitrary lightweight object detection model as a student network;
step 3: inputting training images into a teacher network and a student network which are mentioned in the step 2, wherein the teacher network obtains soft labels, and the student network obtains model prediction output;
step 4: capturing single image global relation information from Neck of a network through a GcBLock module, and calculating to obtain global distillation loss L in loss function global
Step 5: binary mask M is calculated through feature map output by network Neck i,j Scale mask S i,j Attention mask A C 、A S From this calculation, the focal distillation loss L in the loss function is obtained focal
Step 6: the original loss function L is calculated according to the soft label and the student network prediction output required by distillation original And detecting the loss function in the step 4 and the step 5Adding to obtain a comprehensive loss function;
step 7: according to the loss function L global 、L focal 、L original Updating model parameters;
step 8: repeating the steps 2-7 until the training times reach the expected value;
step 9: and deploying the light weight optimal model obtained by training into a proper AI processor, accelerating by using NNIE, accelerating the abnormality detection process by using the strong accelerating capacity of the AI processor, and enhancing the instantaneity.
The aim of the invention can be further achieved by the following technical measures:
according to the hysteroscope image target detection acceleration method based on the embedded AI processor, the student network can use a two-stage detection model, a one-stage anchor-based model and a one-stage anchor-free model.
According to the hysteroscope image target detection acceleration method based on the embedded AI processor, the GcBLock module is adopted by the Neck part, and the feature map is obtained from the Neck part of the model and is used for calculating global distillation loss. The calculation formula of the global distillation loss is given by formula (I).
In the hysteroscope image target detection acceleration method based on the embedded AI processor, a binary mask is used for separating the foreground and the background of the image.
According to the hysteroscope image target detection acceleration method based on the embedded AI processor, the influence of different scale targets on the model performance is balanced by using the scale mask.
The above-mentioned hysteroscope image target detection acceleration method based on the embedded AI processor, considers the pixel and channel attention, calculates the pixel and channel attention force diagram G of the feature image S 、G C And calculates the attention mask A based on the result S 、A C . Calculating according to the binary mask, the scale mask and the attention mask to obtain the characteristic loss L fea
The hysteroscope image target detection acceleration method based on the embedded AI processor uses the attention loss L at Forcing student detector to mimicThe teacher detector's space and channel attention mask. The focal distillation loss consists of both the characteristic loss and the attention loss.
According to the hysteroscope image target detection acceleration method based on the embedded AI processor, the inference engine can adopt the 3588 embedded AI processor, a model optimized through large model distillation can be conveniently deployed, and rapid image processing and inference can be realized on edge equipment.
Compared with the closest prior art, the technical scheme provided by the invention has the following beneficial effects:
the foreground and the background in the image are balanced by adopting the focus distillation loss function and the global distillation loss function, and meanwhile, the relation information among pixels is not lost, so that the distillation method can be well applied to target detection tasks. The light model with good performance is deployed on an AI processor, so that the reasoning speed can be increased, and the requirements of medical image detection and diagnosis on real-time performance are met.
Drawings
FIG. 1 is a flow chart of a hysteroscope image target detection acceleration method in the invention;
fig. 2 is an effect diagram of a hysteroscope image target detection method in the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, the present invention provides a technical solution:
the invention discloses a hysteroscope image target detection acceleration method based on an embedded AI processor, which comprises the following steps:
step 1: preparing a set of hysteroscopic images for model training, preprocessing the images prior to training, comprising: image scaling, random horizontal overturning, random vertical overturning, random angle rotation, gamma transformation, center cutting and standardization;
step 2: using a GAIA large model as a teacher network and using an arbitrary lightweight object detection model as a student network;
step 3: inputting training images into a teacher network and a student network which are mentioned in the step 2, wherein the teacher network obtains soft labels, and the student network obtains model prediction output;
step 4: capturing single image global relation information from Neck of a network through a GcBLock module, and calculating to obtain global distillation loss L in loss function global The method comprises the steps of carrying out a first treatment on the surface of the The calculation formula of the global distillation loss is as follows:
wherein W is v1 、W v2 、W k Indicating the convolution layer, LN indicating Layer Normalization, N p The number of pixels in the feature map is represented, and λ is a super-parameter for balancing the loss.
Step 5: binary mask M is calculated through feature map output by network Neck i,j Scale mask S i,j Attention mask A C 、A S From this calculation, the focal distillation loss L in the loss function is obtained focal The method comprises the steps of carrying out a first treatment on the surface of the The binary mask has the following calculation formula:
r represents the ground truth box, i, j represents the horizontal and vertical coordinates of the feature map, and if the coordinates fall within the truth box, the mask is 1, otherwise, 0.
The calculation formula of the scale mask is as follows:
H r W r representing the height and width of the truth box, if a pixel belongs to a different target box, using the minimum box to calculate S i,j
The pixel and channel attention map calculation formula of the feature map is as follows:
H. w, C the height, width and channel number of the feature map, G S 、G C Spatial and channel attention patterns, respectively. The formula from which the attention mask is calculated is as follows:
A S (F)=H·W·softmax(G S (F)/T)
A C (F)=C·softmax(G C (F)/T)
t is a temperature super-parameter used to regulate distillation. By combining the above formulas, the feature loss can be calculated as follows:
where α, β are hyper-parameters for balancing foreground and background losses.
The attention loss is used to force the student detector to imitate the spatial channel attention mask of the teacher detector, and the calculation formula is as follows:
wherein t and s represent teacher and student respectively, from which focal point distillation loss can be obtained:
L focal =L fea +L at
step 6: the original loss function L is calculated according to the soft label and the student network prediction output required by distillation original Adding the integrated loss function with the loss function in the target detection step 4 and the target detection step 5 to obtain an integrated loss function;
step 7: according to the loss function L global 、L focal 、L original Updating model parameters;
step 8: repeating the steps 2-7 until the training times reach the expected value;
step 9: and deploying the light weight optimal model obtained by training into a proper AI processor, accelerating by using NNIE, accelerating the abnormality detection process by using the strong accelerating capacity of the AI processor, and enhancing the instantaneity.
The aim of the invention can be further achieved by the following technical measures:
the student network can use a two-stage detection model, a one-stage Anchor-based model and a one-stage Anchor-free model.
The negk section uses GcBlock modules, and feature maps are obtained from the negk section of the model for calculation of global distillation loss. The calculation formula of the global distillation loss is given by formula (I).
The acceleration method uses a binary mask to separate the foreground and background of the image. A scale mask is used to balance the impact of different scale targets on model performance.
The acceleration method considers pixel and channel attentions, calculates a pixel and channel attentions force diagram G S, G C of the feature map, and calculates an attentions mask A S, A C based on the pixel and channel attentions force diagram G S, G C. And calculating to obtain the feature loss L_fea according to the binary mask, the scale mask and the attention mask.
The attention loss L _ at is used to force the student detector to mimic the spatial and channel attention mask of the teacher detector. The focal distillation loss consists of both the characteristic loss and the attention loss.
The reasoning engine of the acceleration method can adopt 3588 embedded AI processor, the model optimized by large model distillation can be conveniently deployed, and rapid image processing and reasoning can be realized on the edge equipment.
The effect of the method in practical application is shown in fig. 2. The foreground and the background in the image are balanced by adopting the focus distillation loss function and the global distillation loss function, and meanwhile, the relation information among pixels is not lost, so that the distillation method can be well applied to target detection tasks. The light model with good performance is deployed on an AI processor, so that the reasoning speed can be increased, and the requirements of medical image detection and diagnosis on real-time performance are met.
In addition to the above embodiments, other embodiments of the present invention are possible, and all technical solutions formed by equivalent substitution or equivalent transformation are within the scope of the present invention.

Claims (5)

1. The hysteroscope image target detection acceleration method based on the embedded AI processor is characterized by comprising the following steps of:
step 1: preparing a set of hysteroscopic images for model training, preprocessing the images prior to training, comprising: image scaling, random horizontal overturning, random vertical overturning, random angle rotation, gamma transformation, center cutting and standardization;
step 2: using a GAIA large model as a teacher network and using an arbitrary lightweight object detection model as a student network;
step 3: inputting training images into a teacher network and a student network which are mentioned in the step 2, wherein the teacher network obtains soft labels, and the student network obtains model prediction output;
step 4: capturing single image global relation information from a Neck of a network through a GcBLock module, and calculating to obtain global distillation loss in a loss function;
step 5: binary mask calculation through network Neck output feature mapScale mask, attention mask, and loss function calculated therefromFocal distillation loss in numbers;
step 6: calculating to obtain an original loss function according to the soft label and student network prediction output required by distillation, and adding the original loss function with the loss functions in the target detection step 4 and the step 5 to obtain a comprehensive loss function;
step 7: updating model parameters according to the loss function;
step 8: repeating the steps 2-7 until the training times reach the expected value;
step 9: and deploying the light weight optimal model obtained by training into a proper AI processor, accelerating by using NNIE, accelerating the abnormality detection process by using the strong accelerating capacity of the AI processor, and enhancing the instantaneity.
2. The method for accelerating hysteroscope image target detection based on an embedded AI processor as claimed in claim 1, wherein the hysteroscope image target detection is carried out by using a large model knowledge distillation model compression technology, a teacher model uses a GAIA vision large model for vision object detection, and a student model uses a convolutional neural network as a lightweight target detection model of a backbone network.
3. The hysteroscope image target detection acceleration method based on the embedded AI processor as claimed in claim 1, wherein the loss functions used include global distillation loss, focus distillation loss and original loss functions, and the problem that the knowledge distillation method does not pay attention to the key information of the picture is improved through the focus distillation loss, so that the knowledge distillation method is suitable for a target detection task; the global distillation loss considers the relation information among different pixel points in the picture, and compensates the global information which is not available for the focus distillation loss.
4. The method for accelerating hysteroscopic image target detection based on an embedded AI processor of claim 1, wherein the student network uses one of a two-stage detection model, a one-stage Anchor-based model, and a one-stage anchor-free model.
5. The hysteroscope image target detection acceleration method based on the embedded AI processor as claimed in claim 1, wherein a GcBLock module is adopted by a Neck part, and a feature map is obtained from the Neck part of the model for calculating global distillation loss; separating the foreground and background of the image by using a binary mask; the effect of different scale targets on model performance is balanced by using a scale mask.
CN202310948368.XA 2023-07-31 2023-07-31 Hysteroscope image target detection acceleration method based on embedded AI processor Pending CN116934728A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310948368.XA CN116934728A (en) 2023-07-31 2023-07-31 Hysteroscope image target detection acceleration method based on embedded AI processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310948368.XA CN116934728A (en) 2023-07-31 2023-07-31 Hysteroscope image target detection acceleration method based on embedded AI processor

Publications (1)

Publication Number Publication Date
CN116934728A true CN116934728A (en) 2023-10-24

Family

ID=88387657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310948368.XA Pending CN116934728A (en) 2023-07-31 2023-07-31 Hysteroscope image target detection acceleration method based on embedded AI processor

Country Status (1)

Country Link
CN (1) CN116934728A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569737A (en) * 2019-08-15 2019-12-13 深圳华北工控软件技术有限公司 Face recognition deep learning method and face recognition acceleration camera
CN112365586A (en) * 2020-11-25 2021-02-12 厦门瑞为信息技术有限公司 3D face modeling and stereo judging method and binocular 3D face modeling and stereo judging method of embedded platform
CN114007037A (en) * 2021-09-18 2022-02-01 华中科技大学 Video front-end intelligent monitoring system and method, computer equipment and terminal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569737A (en) * 2019-08-15 2019-12-13 深圳华北工控软件技术有限公司 Face recognition deep learning method and face recognition acceleration camera
CN112365586A (en) * 2020-11-25 2021-02-12 厦门瑞为信息技术有限公司 3D face modeling and stereo judging method and binocular 3D face modeling and stereo judging method of embedded platform
CN114007037A (en) * 2021-09-18 2022-02-01 华中科技大学 Video front-end intelligent monitoring system and method, computer equipment and terminal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINGYUAN BU ET AL: "GAIA: A Transfer Learning System of Object Detection that Fits Your Needs", 《IEEE》, pages 274 - 283 *
ZHENDONG YANG ET AL: "Focal and Global Knowledge Distillation for Detectors", 《ARXIV:2111.11837V2》, pages 1 - 11 *
范喜全等: "《地面无人系统原理与设计》", 31 August 2021, pages: 218 - 235 *

Similar Documents

Publication Publication Date Title
CN112884760B (en) Intelligent detection method for multi-type diseases of near-water bridge and unmanned ship equipment
CN113077471B (en) Medical image segmentation method based on U-shaped network
CN108898175B (en) Computer-aided model construction method based on deep learning gastric cancer pathological section
Marcu et al. SafeUAV: Learning to estimate depth and safe landing areas for UAVs from synthetic data
CN112488210A (en) Three-dimensional point cloud automatic classification method based on graph convolution neural network
CN114842365B (en) Unmanned aerial vehicle aerial photography target detection and identification method and system
US12106484B2 (en) Three-dimensional medical image segmentation method and system based on short-term and long-term memory self-attention model
CN111582182B (en) Ship name recognition method, system, computer equipment and storage medium
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
CN116071504B (en) Multi-view three-dimensional reconstruction method for high-resolution image
CN113158971B (en) Event detection model training method and event classification method and system
Li et al. Automatic tongue image segmentation for real-time remote diagnosis
CN116935332A (en) Fishing boat target detection and tracking method based on dynamic video
Liu et al. CAFFNet: channel attention and feature fusion network for multi-target traffic sign detection
Ren et al. Infrared small target detection via region super resolution generative adversarial network
CN117975101A (en) Traditional Chinese medicine disease classification method and system based on tongue picture and text information fusion
CN112907138B (en) Power grid scene early warning classification method and system from local to whole perception
CN118097709A (en) Pig posture estimation method and device
CN116934728A (en) Hysteroscope image target detection acceleration method based on embedded AI processor
CN104615987B (en) A kind of the wreckage of an plane intelligent identification Method and system based on error-duration model neutral net
CN116883961A (en) Target perception method and device
Wu et al. Unmanned Ship Identification Based on Improved YOLOv8s Algorithm
CN116229074A (en) Progressive boundary region optimized medical image small sample segmentation method
CN115249269A (en) Object detection method, computer program product, storage medium, and electronic device
Yulin et al. Recognition of side-scan sonar shipwreck image using convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination