WO2022141859A1 - 图像检测方法、装置、电子设备及存储介质 - Google Patents

图像检测方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022141859A1
WO2022141859A1 PCT/CN2021/083708 CN2021083708W WO2022141859A1 WO 2022141859 A1 WO2022141859 A1 WO 2022141859A1 CN 2021083708 W CN2021083708 W CN 2021083708W WO 2022141859 A1 WO2022141859 A1 WO 2022141859A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
standard
module
student
teacher
Prior art date
Application number
PCT/CN2021/083708
Other languages
English (en)
French (fr)
Inventor
王健宗
瞿晓阳
李佳琳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022141859A1 publication Critical patent/WO2022141859A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Definitions

  • the present application relates to the technical field of image detection, and in particular, to an image detection method, apparatus, electronic device, and computer-readable storage medium.
  • ADAS Advanced Driver Assistance System
  • Pedestrian detection system as an important part of ADAS, is an important research field related to human life safety.
  • the inventor realized that in the current research and development of pedestrian detection systems, detection speed and accuracy are the two major difficulties and pain points that restrict the development of vehicle-mounted person recognition. 1.
  • algorithms based on deep learning have higher feature extraction capabilities and faster detection speeds.
  • a large number of deep convolutional networks used for target detection have a high amount of parameters and calculations.
  • Knowledge distillation is a standard teacher-student learning framework, which uses a larger pre-trained teacher model to guide the training of a lightweight student model, so that the student model can approach the performance of the teacher model and achieve the effect of model compression.
  • traditional knowledge distillation methods use the student model to imitate the teacher model to achieve the performance of the teacher model as much as possible. These methods need to define different knowledge based on the response of the teacher network, such as "softened" output, feature attention, etc.
  • the teacher only plays the role of imitating the students, the students do not interact with the teacher, and the feature extraction ability of the student model is weaker than that of the teacher model, so the knowledge learned by the student model through imitation cannot reach the teacher model. , which affects the accuracy of image detection.
  • An image detection method provided by this application includes:
  • the image to be detected is detected by using the standard student model to obtain an image detection result.
  • the present application also provides an image detection device, the device comprising:
  • an image processing module used for acquiring an original image, performing spatial transformation and data enhancement processing on the original image to obtain a standard image
  • a teacher model building module used for using the standard image to train a pre-built teacher network to obtain a standard teacher model
  • a hybrid network building module used for constructing a hybrid module according to the standard teacher model and the pre-built student network, and obtaining a hybrid network based on the hybrid module and the student network;
  • a student model training module for training the hybrid network by using the standard image to obtain a standard student model
  • the image detection module is used to detect the image to be detected by using the standard student model to obtain the image detection result.
  • the present application also provides an electronic device, the electronic device comprising:
  • a processor that executes the instructions stored in the memory to achieve the following steps:
  • the image to be detected is detected by using the standard student model to obtain an image detection result.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the following steps:
  • the image to be detected is detected by using the standard student model to obtain an image detection result.
  • FIG. 1 is a schematic flowchart of an image detection method provided by an embodiment of the present application
  • Fig. 2 is a detailed implementation flow diagram of one of the steps in Fig. 1;
  • Fig. 3 is the detailed implementation flow schematic diagram of another step in Fig. 1;
  • Fig. 4 is a detailed implementation flow diagram of another step in Fig. 1;
  • Fig. 5 is the detailed implementation flow schematic diagram of another step in Fig. 1;
  • FIG. 6 is a functional block diagram of an image detection apparatus provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device implementing the image detection method according to an embodiment of the present application.
  • the embodiments of the present application provide an image detection method.
  • the execution subject of the image detection method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server and a terminal.
  • the image detection method can be executed by software or hardware installed in a terminal device or a server device, and the software can be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
  • the image detection method includes:
  • the original image may be an image in the KITTI pedestrian detection dataset.
  • the pedestrian in the original image is framed to obtain a real frame, and a label is marked according to the real frame. For example, if the pedestrian is at the upper right in the original image, the label "upper right" is marked.
  • performing spatial transformation and data enhancement processing on the original image to obtain a standard image including:
  • the translation and rotation are corresponding translation or rotation processing according to a preset fixed point.
  • a function in Matlab can be used to perform spatial transformation on the original image, and after spatial transformation is performed on all original images, the labels of the obtained transformed images are also changed accordingly.
  • the function B imrotate(A, 180°) performs rotation processing, wherein the function indicates that the original image A is rotated 180° counterclockwise according to the center point to obtain B, and the label of the original image A is "upper right", then the converted image B is obtained is labeled "Bottom Left”.
  • the Gaussian noise refers to a type of noise whose probability density function follows a Gaussian distribution (ie, a normal distribution).
  • Gaussian noises include fluctuating noise, cosmic noise, thermal noise, and shot noise.
  • the preset random function may be a randn() function.
  • the diversity of the image can be improved, so that the image information in the standard image is more abundant.
  • the pre-built teacher network may be a YOLOv4 network
  • the YOLOv4 network includes an image feature extraction module (Backbone), an image feature enhancement module (Neck), a detection module (Head), and the like.
  • the S2 includes:
  • the preset teacher loss function may be L IOU :
  • L IOU is the intersection ratio loss function
  • y is the real frame
  • IOU represents the intersection ratio of the ground-truth box and the predicted box.
  • the image feature extraction module may be a CSPDarknet53 network.
  • the image feature enhancement module may include SPP (Spatial Pyramid Pooling, Spatial Pyramid Pooling) and PANet (Path Aggregation Network, Path Aggregation Network), using the SPP to extract features of different sizes in the standard image, and Features of different sizes are fused by the PANet.
  • the detection module (Head) can be a YOLOv3 network.
  • the pre-built teacher network is trained by the standard image, so that the standard teacher model obtained by training is more accurate in image detection.
  • the pre-built student network may be a YOLOv4-tiny network
  • the YoloV4-tiny network is a simplified version of YoloV4, which greatly improves the speed and is a lightweight network.
  • the YoloV4-tiny network includes the following lightweight modules: a lightweight feature extraction module (Backbone), a lightweight feature enhancement module (Neck), a lightweight detection module (Head), and the like.
  • the hybrid module constructed according to the standard teacher model and the pre-built student network includes:
  • the teacher module includes: an image feature extraction module, an image feature enhancement module, a detection module, and the like.
  • the student module includes: a lightweight feature extraction module, a lightweight feature enhancement module, a lightweight detection module, and the like. After the teacher module and the corresponding student module are successfully matched, the obtained hybrid module is a dual-channel hybrid module.
  • the obtaining a hybrid network based on the hybrid module and the student network includes:
  • the probability that the teacher module replaces the student module is randomly selected to obtain a standard mixed module
  • the student modules in the student network are replaced with the standard hybrid modules to obtain a hybrid network including the standard hybrid modules.
  • setting by random selection means that in the mixed module, each student module has the same probability of being replaced by the teacher module, which means that the teacher module at each position can guide the corresponding student module to learn.
  • the teacher module in the hybrid module is derived from the standard teacher model, that is, the parameters of the teacher module are fixed.
  • a hybrid module is constructed according to the standard teacher model and the pre-built student network, and a hybrid network is obtained based on the hybrid module and the student network. Since the hybrid module includes a teacher module and a student module, The interactive knowledge distillation of teacher module and student module is realized, which improves the efficiency of knowledge distillation.
  • the S4 includes:
  • the preset loss function may be:
  • L is the loss function
  • y is the real box, for the prediction box.
  • the parameters of the teacher module are fixed, and only the parameters of the student module are updated, which is equivalent to the teacher module being a reference to the student module, so that the preset loss function satisfies the preset loss function.
  • each training only updates the information of the student module with a small amount of parameters, which can speed up the convergence.
  • the hybrid network converges (that is, the preset loss function satisfies the preset loss threshold)
  • the teacher module in the hybrid network is deleted, and an efficient knowledge distillation student model is obtained.
  • the interactive knowledge distillation built by using the hybrid module does not require additional distillation loss, nor does it need to search for hyperparameters for the loss function, and the input image data does not need to pass through the student network and the teacher.
  • the network is processed separately once, so the training process is faster and more efficient.
  • the robustness of the standard student model is stronger.
  • the standard student model is a lightweight network, so it can be directly deployed in edge devices, such as an advanced driver assistance system (ADAS) of an automobile.
  • edge devices such as an advanced driver assistance system (ADAS) of an automobile.
  • ADAS advanced driver assistance system
  • the use of the standard student model to detect the image to be detected to obtain the image detection result includes:
  • the detection image is recognized to obtain a recognition frame and a label, and the image detection result is obtained by summarizing the recognition frame and label.
  • the to-be-detected image may be image data obtained from a camera of an edge device.
  • the standard student model performs frame selection and classification of objects in the to-be-detected image.
  • an image to be detected includes: pedestrians, dogs, and bicycles.
  • the standard student model selects and recognizes pedestrians, dogs, and bicycles, respectively.
  • the obtained image detection results include three recognition boxes and the recognized callout.
  • the present application obtains a standard image by performing spatial transformation and data enhancement processing on the original image, which can improve the diversity of the image and make the image information in the standard image more abundant.
  • a hybrid module is constructed according to the standard teacher model and the pre-built student network, and a hybrid network is obtained based on the hybrid module and the student network. Since the hybrid module includes the teacher module and the student module, the teacher module and the student module are realized.
  • the interactive knowledge distillation of the student module improves the efficiency of knowledge distillation.
  • using the standard image to train the hybrid network to obtain a standard student model due to the uncertainty of the teacher module in the hybrid module during training, the robustness of the standard student model is improved, so that the image detection higher accuracy. Therefore, the implementation of the present application can solve the problem of low image detection accuracy.
  • FIG. 6 it is a functional block diagram of an image detection apparatus provided by an embodiment of the present application.
  • the image detection apparatus 100 described in this application can be installed in an electronic device. According to the realized functions, the image detection apparatus 100 may include an image processing module 101 , a teacher model building module 102 , a hybrid network building module 103 , a student model training module 104 and an image detection module 105 .
  • the modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of an electronic device and can perform fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the image processing module 101 is configured to acquire an original image, perform spatial transformation and data enhancement processing on the original image, and obtain a standard image.
  • the original image may be an image in the KITTI pedestrian detection dataset.
  • the pedestrian in the original image is framed to obtain a real frame, and a label is marked according to the real frame. For example, if the pedestrian is at the upper right in the original image, the label "upper right" is marked.
  • the image processing module 101 obtains a standard image through the following operations:
  • a preset random function is used to generate Gaussian noise, and the Gaussian noise is added to the converted image to obtain a standard image.
  • the translation and rotation are corresponding translation or rotation processing according to a preset fixed point.
  • a function in Matlab can be used to perform spatial transformation on the original image, and after spatial transformation is performed on all original images, the labels of the obtained transformed images are also changed accordingly.
  • the function B imrotate(A, 180°) performs rotation processing, wherein the function indicates that the original image A is rotated 180° counterclockwise according to the center point to obtain B, and the label of the original image A is "upper right", then the converted image B is obtained is labeled "Bottom Left”.
  • the Gaussian noise refers to a type of noise whose probability density function follows a Gaussian distribution (ie, a normal distribution).
  • Gaussian noises include fluctuating noise, cosmic noise, thermal noise, and shot noise.
  • the preset random function may be a randn() function.
  • the diversity of the image can be improved, so that the image information in the standard image is more abundant.
  • the teacher model building module 102 is used to train a pre-built teacher network by using the standard image to obtain a standard teacher model.
  • the pre-built teacher network may be a YOLOv4 network
  • the YOLOv4 network includes an image feature extraction module (Backbone), an image feature enhancement module (Neck), a detection module (Head), and the like.
  • the teacher model building module 102 obtains a standard teacher model through the following operations:
  • the preset teacher loss function may be L IOU :
  • L IOU is the intersection ratio loss function
  • y is the real frame
  • IOU represents the intersection ratio of the ground-truth box and the predicted box.
  • the image feature extraction module may be a CSPDarknet53 network.
  • the image feature enhancement module may include SPP (Spatial Pyramid Pooling, Spatial Pyramid Pooling) and PANet (Path Aggregation Network, Path Aggregation Network), using the SPP to extract features of different sizes in the standard image, and Features of different sizes are fused by the PANet.
  • the detection module (Head) can be a YOLOv3 network.
  • the pre-built teacher network is trained by the standard image, so that the standard teacher model obtained by training is more accurate in image detection.
  • the hybrid network construction module 103 is configured to construct a hybrid module according to the standard teacher model and the pre-built student network, and obtain a hybrid network based on the hybrid module and the student network.
  • the pre-built student network may be a YOLOv4-tiny network
  • the YoloV4-tiny network is a simplified version of YoloV4, which greatly improves the speed and is a lightweight network.
  • the YoloV4-tiny network includes the following lightweight modules: a lightweight feature extraction module (Backbone), a lightweight feature enhancement module (Neck), a lightweight detection module (Head), and the like.
  • hybrid network construction module 103 constructs a hybrid module through the following operations:
  • the modules in the standard teacher model are used as teacher modules, and the modules in the pre-built student network are used as student modules;
  • the teacher module and the corresponding student module are matched, and the mixed module is obtained after the matching is successful.
  • the teacher module includes: an image feature extraction module, an image feature enhancement module, a detection module, and the like.
  • the student module includes: a lightweight feature extraction module, a lightweight feature enhancement module, a lightweight detection module, and the like. After the teacher module and the corresponding student module are successfully matched, the obtained hybrid module is a dual-channel hybrid module.
  • the hybrid network building module 103 obtains a hybrid network through the following operations:
  • the probability that the teacher module replaces the student module is randomly selected to obtain a standard mixed module
  • the student modules in the student network are replaced with the standard hybrid modules to obtain a hybrid network including the standard hybrid modules.
  • setting by random selection means that in the mixed module, each student module has the same probability of being replaced by the teacher module, which means that the teacher module at each position can guide the corresponding student module to learn.
  • the teacher module in the hybrid module is derived from the standard teacher model, that is, the parameters of the teacher module are fixed.
  • a hybrid module is constructed according to the standard teacher model and the pre-built student network, and a hybrid network is obtained based on the hybrid module and the student network. Since the hybrid module includes a teacher module and a student module, The interactive knowledge distillation of teacher module and student module is realized, which improves the efficiency of knowledge distillation.
  • the student model training module 104 is configured to use the standard image to train the hybrid network to obtain a standard student model.
  • the student model training module 104 obtains a standard student model through the following operations:
  • the parameter updated at this time is used as the parameter of the student module, and the teacher module in the hybrid module is deleted to obtain the standard student model.
  • the preset loss function may be:
  • L is the loss function
  • y is the real box, for the prediction box.
  • the parameters of the teacher module are fixed, and only the parameters of the student module are updated, which is equivalent to the teacher module being a reference to the student module, so that the preset loss function satisfies the preset loss function.
  • each training only updates the information of the student module with a small amount of parameters, which can speed up the convergence.
  • the hybrid network converges (that is, the preset loss function satisfies the preset loss threshold)
  • the teacher module in the hybrid network is deleted, and an efficient knowledge distillation student model is obtained.
  • the interactive knowledge distillation built by using the hybrid module does not require additional distillation loss, nor does it need to search for hyperparameters for the loss function, and the input image data does not need to pass through the student network and the teacher.
  • the network is processed separately once, so the training process is faster and more efficient.
  • the robustness of the standard student model is stronger.
  • the image detection module 105 is configured to use the standard student model to detect the image to be detected to obtain an image detection result.
  • the standard student model is a lightweight network, so it can be directly deployed in edge devices, such as an advanced driver assistance system (ADAS) of an automobile.
  • edge devices such as an advanced driver assistance system (ADAS) of an automobile.
  • ADAS advanced driver assistance system
  • the image detection module 105 obtains the image detection result through the following operations:
  • the detection image is recognized to obtain a recognition frame and a label, and the image detection result is obtained by summarizing the recognition frame and label.
  • the to-be-detected image may be image data obtained from a camera of an edge device.
  • the standard student model performs frame selection and classification of objects in the to-be-detected image.
  • an image to be detected includes: pedestrians, dogs, and bicycles.
  • the standard student model selects and recognizes pedestrians, dogs, and bicycles, respectively.
  • the obtained image detection results include three recognition boxes and the recognized callout.
  • FIG. 7 it is a schematic structural diagram of an electronic device implementing an image detection method provided by an embodiment of the present application.
  • the electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as an image detection program 12.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 .
  • the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the code of the image detection program 12, etc., but also can be used to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits.
  • Central Processing Unit CPU
  • microprocessor digital processing chip
  • graphics processor and combination of various control chips, etc.
  • the processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing programs or modules (such as images) stored in the memory 11. detection program, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA Extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is arranged to enable connection communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 7 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 7 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the drawings. components, or a combination of certain components, or a different arrangement of components.
  • the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • a network interface optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the image detection program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, it can realize:
  • the image to be detected is detected by using the standard student model to obtain an image detection result.
  • the modules/units integrated in the electronic device 1 may be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be volatile or non-volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disc, a computer memory, a read-only memory (ROM, Read-Only). Memory).
  • the present application also provides a computer-readable storage medium, where the readable storage medium stores a computer program, and when executed by a processor of an electronic device, the computer program can realize:
  • the image to be detected is detected by using the standard student model to obtain an image detection result.
  • modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种图像检测方法、图像检测装置、电子设备以及计算机可读存储介质,所述方法包括:获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像(S1),利用所述标准图像训练预构建的教师网络,得到标准教师模型(S2),根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络(S3),利用所述标准图像训练所述混合网络,得到标准学生模型(S4),利用所述标准学生模型对待检测图像进行检测,得到图像检测结果(S5)。此外,所述图像检测结果可存储在区块链的节点中。所述方法可以解决图像检测准确性较低的问题。

Description

图像检测方法、装置、电子设备及存储介质
本申请要求于2020年12月31日提交中国专利局、申请号为CN202011645110.5、名称为“图像检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像检测技术领域,尤其涉及一种图像检测方法、装置、电子设备及计算机可读存储介质。
背景技术
随着现代科技和人工智能的发展,适用于边缘设备的模型越来越重要。比如,高级驾驶辅助系统(ADAS)已经成为了汽车行业重点研究和开发的系统项目,其中行人检测系统作为ADAS中的一个重要组成部分,是关乎人类生命安全的重要研究领域。发明人意识到在目前的行人检测系统的研发中,检测速度和精度是制约车载识人发展的两大难点和痛点。1、基于深度学习的算法比起传统的算法来说具有更高的特征提取能力,以及更快的检测速度。但是大量的用于目标检测的深度卷积网络,具有较高的参数量和计算量,不仅模型占用较大存储空间,而且进行推理时需要强有力的图形处理器(Graphics Processing Unit,GPU),难以直接实现在边缘设备端的实际部署与应用。2、利用知识蒸馏进行轻量级网络训练。知识蒸馏是一种标准的教师——学生学习框架,它采用更大的预训练教师模型指导轻量型学生模型训练,从而使学生模型接近教师模型的性能,达到模型压缩的效果。但是传统的知识蒸馏方法利用学生模型模仿教师模型以尽可能达到教师模型的性能,这些方法需要基于教师网络的响应定义不同的知识,如“软化”后的输出、特征注意力等。然而在这样的过程中,教师仅扮演学生模仿的目标,学生并未与教师互动,且学生模型的特征提取能力要比教师模型的弱,那么学生模型通过模仿学习到的知识就无法达到教师模型的程度,影响图像检测的准确性。
发明内容
本申请提供的一种图像检测方法,包括:
获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
利用所述标准图像训练预构建的教师网络,得到标准教师模型;
根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
利用所述标准图像训练所述混合网络,得到标准学生模型;
利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
本申请还提供一种图像检测装置,所述装置包括:
图像处理模块,用于获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
教师模型构建模块,用于利用所述标准图像训练预构建的教师网络,得到标准教师模型;
混合网络构建模块,用于根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
学生模型训练模块,用于利用所述标准图像训练所述混合网络,得到标准学生模型;
图像检测模块,用于利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
本申请还提供一种电子设备,所述电子设备包括:
存储器,存储至少一个指令;及
处理器,执行所述存储器中存储的指令以实现如下步骤:
获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
利用所述标准图像训练预构建的教师网络,得到标准教师模型;
根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
利用所述标准图像训练所述混合网络,得到标准学生模型;
利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现如下步骤:
获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
利用所述标准图像训练预构建的教师网络,得到标准教师模型;
根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
利用所述标准图像训练所述混合网络,得到标准学生模型;
利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
附图说明
图1为本申请一实施例提供的图像检测方法的流程示意图;
图2为图1中其中一个步骤的详细实施流程示意图;
图3为图1中另一个步骤的详细实施流程示意图;
图4为图1中另一个步骤的详细实施流程示意图;
图5为图1中另一个步骤的详细实施流程示意图;
图6为本申请一实施例提供的图像检测装置的功能模块图;
图7为本申请一实施例提供的实现所述图像检测方法的电子设备的结构示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供一种图像检测方法。所述图像检测方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,所述图像检测方法可以由安装在终端设备或服务端设备的软件或硬件来执行,所述软件可以是区块链平台。所述服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。
参照图1所示,为本申请一实施例提供的图像检测方法的流程示意图。在本实施例中,所述图像检测方法包括:
S1、获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像。
本申请实施例中,所述原始图像可以为KITTI行人检测数据集中的图像。本申请实施例对所述原始图像中的行人进行框选得到真实框,并根据所述真实框打上标签,比如,行人在原始图像中的右上方,则打上标签“右上”。
具体地,参照图2所示,所述对所述原始图像进行空间转换及数据增强处理,得到标准图像,包括:
S10、对所述原始图像进行平移、旋转处理,得到转换图像;
S11、利用预设的随机函数生成高斯噪声,并将所述高斯噪声添加至所述转换图像中,得到标准图像。
其中,所述平移、旋转是按照预设的定点进行相应平移或旋转处理。本申请实施例中,可以使用Matlab中的函数对所述原始图像进行空间转换,同时对所有的原始图像进行空间转换后,得到的转换图像的标签也进行相应改变,比如,可以使用函数B=imrotate(A,180°)进行旋转处理,其中,所述函数表示将A原始图像按照中心点逆时针旋转180°得到B,所述A原始图像的标签为“右上”,则得到的转换图像B的标签为“左下”。所述高斯噪声是指它的概率密度函数服从高斯分布(即正态分布)的一类噪声。常见的高斯噪声包括起伏噪声、宇宙噪声、热噪声和散粒噪声等。所述预设的随机函数可以为randn()函数。通过对原始图像进行空间转换及添加高斯噪声,可以提高所述原始图像的多样性及添加一定的误差,使得所述标准图像更有训练价值。
本申请实施例通过对原始图像进行空间转换及数据增强处理,可以提高图像的多样性,使得所述标准图像中的图像信息更加丰富。
S2、利用所述标准图像训练预构建的教师网络,得到标准教师模型。
本申请实施例中,所述预构建的教师网络可以为YOLOv4网络,所述YOLOv4网络包括图像特征提取模块(Backbone)、图像特征增强模块(Neck)及检测模块(Head)等。
详细地,参照图3所示,所述S2包括:
S20、利用所述教师网络中的图像特征提取模块及图像特征增强模块对所述标准图像进行特征提取及特征增强,得到特征图像;
S21、利用所述教师网络中的检测模块得到所述特征图像的预测框,基于所述预测框及真实框,利用预设的教师损失函数计算得到损失值,直到所述损失值小于预设的阈值,得到所述标准教师模型。
本申请实施例中,所述预设的教师损失函数可以为L IOU
Figure PCTCN2021083708-appb-000001
其中,L IOU为交并比损失函数,y为真实框,
Figure PCTCN2021083708-appb-000002
为预测框,IOU表示真实框与预测框的交并比。
具体地,本申请实施例中,所述图像特征提取模块(Backbone)可以为CSPDarknet53网络。所述图像特征增强模块(Neck)可以包括SPP(Spatial Pyramid Pooling,空间金字塔池化)及PANet(Path Aggregation Network,路径聚合网络),利用所述SPP提取所述标准图像中不同尺寸的特征,并且通过所述PANet将不同尺寸的特征进行特征融合。所述检测模块(Head)可以为YOLOv3网络。
本申请实施例中,通过所述标准图像训练预构建的教师网络,使得训练得到的标准教师模型在图像检测方面更加准确。
S3、根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络。
本申请实施例中,所述预构建的学生网络可以为YOLOv4-tiny网络,所述YoloV4-tiny网络是YoloV4的简化版,极大幅度的提升了速度,是一个轻量级网络。所述YoloV4-tiny网络中包括下述轻量级模块:轻量特征提取模块(Backbone)、轻量特征增强模块(Neck)及轻量检测模块(Head)等。
详细地,参照图4所示,所述根据所述标准教师模型和预构建的学生网络构建混合模块,包括:
S30、将所述标准教师模型中的模块作为教师模块,及将所述预构建的学生网络中的模块作为学生模块;
S31、将所述教师模块和对应的学生模块进行匹配,匹配成功后得到所述混合模块。
其中,所述教师模块包括:图像特征提取模块、图像特征增强模块及检测模块等。所述学生模块包括:轻量特征提取模块、轻量特征增强模块及轻量检测模块等。所述教师模块和对应的学生模块匹配成功后,得到的混合模块是一个双通道混合模块。
具体地,所述基于所述混合模块及所述学生网络得到混合网络,包括:
在所述混合模块中,以随机选择的方式设置教师模块替换学生模块的概率,得到标准混合模块;
利用所述标准混合模块替换所述学生网络中的学生模块,得到包含所述标准混合模块的混合网络。
其中,以随机选择的方式进行设置是指在混合模块中,每个学生模块被教师模块替换的概率相同,这就表示每个位置的教师模块都能指导相应的学生模块学习。同时,混合模块中的教师模块来源于所述标准教师模型,即所述教师模块的参数是固定不变的。
本申请实施例中,根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络,由于所述混合模块中包括教师模块及学生模块,实现了教师模块和学生模块的交互式知识蒸馏,提高了知识蒸馏的效率。
S4、利用所述标准图像训练所述混合网络,得到标准学生模型。
详细地,参照图5所示,所述S4包括:
S40、初始化所述标准混合模块中学生模块的参数;
S41、利用所述标准图像训练所述学生模块,并根据预设的损失函数调整所述学生模块的参数;
S42、当所述预设的损失函数满足预设的损失阈值时,以此时更新的参数作为学生模块的参数,并删除所述混合模块中的教师模块,得到所述标准学生模型。
本申请实施例中,所述预设的损失函数可以为:
Figure PCTCN2021083708-appb-000003
其中,L为损失函数,y为真实框,
Figure PCTCN2021083708-appb-000004
为预测框。
具体地,在所述混合网络的训练过程中,教师模块的参数固定不变,仅更新学生模块的参数,相当于教师模块对于学生模块是一种参照,这样在预设的损失函数满足预设的损失阈值之前,每次训练只更新参数量较少的学生模块的信息,可以加速收敛。当所述混合网络收敛(即预设的损失函数满足预设的损失阈值)后,删除所述混合网络中的教师模块,便得到了一个高效的知识蒸馏学生模型。并且从损失函数可看出,利用所述混合模块构建的交互式知识蒸馏不需要额外的蒸馏损失,也不需要对损失函数进行搜索超参数等操作,同时输入的图像数据无需通过学生网络与教师网络分别处理一次,因此训练过程更加快速高效。
本申请实施例中,由于所述混合模块中教师模块在训练时出现的不确定性(即教师模块替换学生模块的概率),因此使得所述标准学生模型的鲁棒性更强。
S5、利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
本申请实施例中,所述标准学生模型是一种轻量型网络,因此可直接部署在边缘设备中,比如,汽车的高级驾驶辅助系统(ADAS)等。同时,由于所述标准学生模型通过交互式知识蒸馏得到,因此检测准确率更高。
详细地,所述利用所述标准学生模型对待检测图像进行检测,得到图像检测结果,包括:
利用所述标准学生模型对所述待检测图像进行框选及分类,得到检测图像;
对所述检测图像进行识别,得到识别框及标注,汇总所述识别框及标注得到所述图像检测结果。
其中,所述待检测图像可以为从边缘设备的摄像头获取的图像数据。所述标准学生模型对所述待检测图像中的物体进行框选及分类。比如,一张待检测图像中包括:行人、狗及自行车,所述标准学生模型对行人、狗及自行车分别进行框选,并进行识别,得到的图像检测结果包括三个识别框及识别后的标注。
本申请通过对所述原始图像进行空间转换及数据增强处理,得到标准图像,可以提高 图像的多样性,使得所述标准图像中的图像信息更加丰富。并且根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络,由于所述混合模块中包括教师模块及学生模块,实现了教师模块和学生模块的交互式知识蒸馏,提高了知识蒸馏的效率。同时,利用所述标准图像训练所述混合网络,得到标准学生模型,由于所述混合模块中教师模块在训练时出现的不确定性,提高了所述标准学生模型的鲁棒性,使得图像检测准确率更高。因此本申请实施可以解决图像检测准确性较低的问题。
如图6所示,是本申请一实施例提供的图像检测装置的功能模块图。
本申请所述图像检测装置100可以安装于电子设备中。根据实现的功能,所述图像检测装置100可以包括图像处理模块101、教师模型构建模块102、混合网络构建模块103、学生模型训练模块104及图像检测模块105。本申请所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
所述图像处理模块101,用于获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像。
本申请实施例中,所述原始图像可以为KITTI行人检测数据集中的图像。本申请实施例对所述原始图像中的行人进行框选得到真实框,并根据所述真实框打上标签,比如,行人在原始图像中的右上方,则打上标签“右上”。
具体地,所述图像处理模块101通过下述操作得到标准图像:
对所述原始图像进行平移、旋转处理,得到转换图像;
利用预设的随机函数生成高斯噪声,并将所述高斯噪声添加至所述转换图像中,得到标准图像。
其中,所述平移、旋转是按照预设的定点进行相应平移或旋转处理。本申请实施例中,可以使用Matlab中的函数对所述原始图像进行空间转换,同时对所有的原始图像进行空间转换后,得到的转换图像的标签也进行相应改变,比如,可以使用函数B=imrotate(A,180°)进行旋转处理,其中,所述函数表示将A原始图像按照中心点逆时针旋转180°得到B,所述A原始图像的标签为“右上”,则得到的转换图像B的标签为“左下”。所述高斯噪声是指它的概率密度函数服从高斯分布(即正态分布)的一类噪声。常见的高斯噪声包括起伏噪声、宇宙噪声、热噪声和散粒噪声等。所述预设的随机函数可以为randn()函数。通过对原始图像进行空间转换及添加高斯噪声,可以提高所述原始图像的多样性及添加一定的误差,使得所述标准图像更有训练价值。
本申请实施例通过对原始图像进行空间转换及数据增强处理,可以提高图像的多样性,使得所述标准图像中的图像信息更加丰富。
所述教师模型构建模块102,用于利用所述标准图像训练预构建的教师网络,得到标准教师模型。
本申请实施例中,所述预构建的教师网络可以为YOLOv4网络,所述YOLOv4网络包括图像特征提取模块(Backbone)、图像特征增强模块(Neck)及检测模块(Head)等。
本申请实施例中,所述教师模型构建模块102通过下述操作得到标准教师模型:
利用所述教师网络中的图像特征提取模块及图像特征增强模块对所述标准图像进行特征提取及特征增强,得到特征图像;
利用所述教师网络中的检测模块得到所述特征图像的预测框,基于所述预测框及真实框,利用预设的教师损失函数计算得到损失值,直到所述损失值小于预设的阈值,得到所述标准教师模型。
本申请实施例中,所述预设的教师损失函数可以为L IOU
Figure PCTCN2021083708-appb-000005
其中,L IOU为交并比损失函数,y为真实框,
Figure PCTCN2021083708-appb-000006
为预测框,IOU表示真实框与预测框的交并比。
具体地,本申请实施例中,所述图像特征提取模块(Backbone)可以为CSPDarknet53网络。所述图像特征增强模块(Neck)可以包括SPP(Spatial Pyramid Pooling,空间金字塔池化)及PANet(Path Aggregation Network,路径聚合网络),利用所述SPP提取所述标准图像中不同尺寸的特征,并且通过所述PANet将不同尺寸的特征进行特征融合。所述检测模块(Head)可以为YOLOv3网络。
本申请实施例中,通过所述标准图像训练预构建的教师网络,使得训练得到的标准教师模型在图像检测方面更加准确。
所述混合网络构建模块103,用于根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络。
本申请实施例中,所述预构建的学生网络可以为YOLOv4-tiny网络,所述YoloV4-tiny网络是YoloV4的简化版,极大幅度的提升了速度,是一个轻量级网络。所述YoloV4-tiny网络中包括下述轻量级模块:轻量特征提取模块(Backbone)、轻量特征增强模块(Neck)及轻量检测模块(Head)等。
详细地,所述混合网络构建模块103通过下述操作构建混合模块:
将所述标准教师模型中的模块作为教师模块,及将所述预构建的学生网络中的模块作为学生模块;
将所述教师模块和对应的学生模块进行匹配,匹配成功后得到所述混合模块。
其中,所述教师模块包括:图像特征提取模块、图像特征增强模块及检测模块等。所述学生模块包括:轻量特征提取模块、轻量特征增强模块及轻量检测模块等。所述教师模块和对应的学生模块匹配成功后,得到的混合模块是一个双通道混合模块。
详细地,所述混合网络构建模块103通过下述操作得到混合网络:
在所述混合模块中,以随机选择的方式设置教师模块替换学生模块的概率,得到标准混合模块;
利用所述标准混合模块替换所述学生网络中的学生模块,得到包含所述标准混合模块的混合网络。
其中,以随机选择的方式进行设置是指在混合模块中,每个学生模块被教师模块替换的概率相同,这就表示每个位置的教师模块都能指导相应的学生模块学习。同时,混合模块中的教师模块来源于所述标准教师模型,即所述教师模块的参数是固定不变的。
本申请实施例中,根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络,由于所述混合模块中包括教师模块及学生模块,实现了教师模块和学生模块的交互式知识蒸馏,提高了知识蒸馏的效率。
所述学生模型训练模块104,用于利用所述标准图像训练所述混合网络,得到标准学生模型。
详细地,所述学生模型训练模块104通过下述操作得到标准学生模型:
初始化所述标准混合模块中学生模块的参数;
利用所述标准图像训练所述学生模块,并根据预设的损失函数调整所述学生模块的参数;
当所述预设的损失函数满足预设的损失阈值时,以此时更新的参数作为学生模块的参数,并删除所述混合模块中的教师模块,得到所述标准学生模型。
本申请实施例中,所述预设的损失函数可以为:
Figure PCTCN2021083708-appb-000007
其中,L为损失函数,y为真实框,
Figure PCTCN2021083708-appb-000008
为预测框。
具体地,在所述混合网络的训练过程中,教师模块的参数固定不变,仅更新学生模块的参数,相当于教师模块对于学生模块是一种参照,这样在预设的损失函数满足预设的损失阈值之前,每次训练只更新参数量较少的学生模块的信息,可以加速收敛。当所述混合网络收敛(即预设的损失函数满足预设的损失阈值)后,删除所述混合网络中的教师模块,便得到了一个高效的知识蒸馏学生模型。并且从损失函数可看出,利用所述混合模块构建的交互式知识蒸馏不需要额外的蒸馏损失,也不需要对损失函数进行搜索超参数等操作,同时输入的图像数据无需通过学生网络与教师网络分别处理一次,因此训练过程更加快速高效。
本申请实施例中,由于所述混合模块中教师模块在训练时出现的不确定性(即教师模块替换学生模块的概率),因此使得所述标准学生模型的鲁棒性更强。
所述图像检测模块105,用于利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
本申请实施例中,所述标准学生模型是一种轻量型网络,因此可直接部署在边缘设备中,比如,汽车的高级驾驶辅助系统(ADAS)等。同时,由于所述标准学生模型通过交互式知识蒸馏得到,因此检测准确率更高。
详细地,所述图像检测模块105通过下述操作得到图像检测结果:
利用所述标准学生模型对所述待检测图像进行框选及分类,得到检测图像;
对所述检测图像进行识别,得到识别框及标注,汇总所述识别框及标注得到所述图像检测结果。
其中,所述待检测图像可以为从边缘设备的摄像头获取的图像数据。所述标准学生模型对所述待检测图像中的物体进行框选及分类。比如,一张待检测图像中包括:行人、狗及自行车,所述标准学生模型对行人、狗及自行车分别进行框选,并进行识别,得到的图像检测结果包括三个识别框及识别后的标注。
如图7所示,是本申请一实施例提供的实现图像检测方法的电子设备的结构示意图。
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如图像检测程序12。
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如图像检测程序12的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如图像检测程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及 至少一个处理器10等之间的连接通信。
图7仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图7示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备1中的所述存储器11存储的图像检测程序12是多个指令的组合,在所述处理器10中运行时,可以实现:
获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
利用所述标准图像训练预构建的教师网络,得到标准教师模型;
根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
利用所述标准图像训练所述混合网络,得到标准学生模型;
利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
具体地,所述处理器10对上述指令的具体实现方法可参考图1至图5对应实施例中相关步骤的描述,在此不赘述。
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。所述计算机可读存储介质可以是易失性的,也可以是非易失性的。例如,所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。
本申请还提供一种计算机可读存储介质,所述可读存储介质存储有计算机程序,所述计算机程序在被电子设备的处理器所执行时,可以实现:
获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
利用所述标准图像训练预构建的教师网络,得到标准教师模型;
根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
利用所述标准图像训练所述混合网络,得到标准学生模型;
利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的 划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种图像检测方法,其中,所述方法包括:
    获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
    利用所述标准图像训练预构建的教师网络,得到标准教师模型;
    根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
    利用所述标准图像训练所述混合网络,得到标准学生模型;
    利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
  2. 如权利要求1所述的图像检测方法,其中,所述获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像,包括:
    对所述原始图像进行平移、旋转处理,得到转换图像;
    利用预设的随机函数生成高斯噪声,并将所述高斯噪声添加至所述转换图像中,得到标准图像。
  3. 如权利要求1所述的图像检测方法,其中,所述利用所述标准图像训练预构建的教师网络,得到标准教师模型,包括:
    利用所述教师网络中的图像特征提取模块及图像特征增强模块对所述标准图像进行特征提取及特征增强,得到特征图像;
    利用所述教师网络中的检测模块得到所述特征图像的预测框,基于所述预测框及真实框,利用预设的教师损失函数计算得到损失值,直到所述损失值小于预设的阈值,得到所述标准教师模型。
  4. 如权利要求1所述的图像检测方法,其中,所述根据所述标准教师模型和预构建的学生网络构建混合模块,包括:
    将所述标准教师模型中的模块作为教师模块,及将所述预构建的学生网络中的模块作为学生模块;
    将所述教师模块和对应的学生模块进行匹配,匹配成功后得到所述混合模块。
  5. 如权利要求4所述的图像检测方法,其中,所述基于所述混合模块及所述学生网络得到混合网络,包括:
    在所述混合模块中,以随机选择的方式设置教师模块替换学生模块的概率,得到标准混合模块;
    利用所述标准混合模块替换所述学生网络中的学生模块,得到包含所述标准混合模块的混合网络。
  6. 如权利要求5所述的图像检测方法,其中,所述利用所述标准图像训练所述混合网络,得到标准学生模型,包括:
    初始化所述标准混合模块中学生模块的参数;
    利用所述标准图像训练所述学生模块,并根据预设的损失函数调整所述学生模块的参数;
    当所述预设的损失函数满足预设的损失阈值时,以此时更新的参数作为学生模块的参数,并删除所述混合模块中的教师模块,得到所述标准学生模型。
  7. 如权利要求1至6中任意一项所述的图像检测方法,其中,所述利用所述标准学生模型对待检测图像进行检测,得到图像检测结果,包括:
    利用所述标准学生模型对所述待检测图像进行框选及分类,得到检测图像;
    对所述检测图像进行识别,得到识别框及标注,汇总所述识别框及标注得到所述图像检测结果。
  8. 一种图像检测装置,其中,所述装置包括:
    图像处理模块,用于获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
    教师模型构建模块,用于利用所述标准图像训练预构建的教师网络,得到标准教师模型;
    混合网络构建模块,用于根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
    学生模型训练模块,用于利用所述标准图像训练所述混合网络,得到标准学生模型;
    图像检测模块,用于利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
  9. 一种电子设备,其中,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:
    获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
    利用所述标准图像训练预构建的教师网络,得到标准教师模型;
    根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
    利用所述标准图像训练所述混合网络,得到标准学生模型;
    利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
  10. 如权利要求9所述的电子设备,其中,所述获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像,包括:
    对所述原始图像进行平移、旋转处理,得到转换图像;
    利用预设的随机函数生成高斯噪声,并将所述高斯噪声添加至所述转换图像中,得到标准图像。
  11. 如权利要求9所述的电子设备,其中,所述利用所述标准图像训练预构建的教师网络,得到标准教师模型,包括:
    利用所述教师网络中的图像特征提取模块及图像特征增强模块对所述标准图像进行特征提取及特征增强,得到特征图像;
    利用所述教师网络中的检测模块得到所述特征图像的预测框,基于所述预测框及真实框,利用预设的教师损失函数计算得到损失值,直到所述损失值小于预设的阈值,得到所述标准教师模型。
  12. 如权利要求9所述的电子设备,其中,所述根据所述标准教师模型和预构建的学生网络构建混合模块,包括:
    将所述标准教师模型中的模块作为教师模块,及将所述预构建的学生网络中的模块作为学生模块;
    将所述教师模块和对应的学生模块进行匹配,匹配成功后得到所述混合模块。
  13. 如权利要求12所述的电子设备,其中,所述基于所述混合模块及所述学生网络得到混合网络,包括:
    在所述混合模块中,以随机选择的方式设置教师模块替换学生模块的概率,得到标准混合模块;
    利用所述标准混合模块替换所述学生网络中的学生模块,得到包含所述标准混合模块的混合网络。
  14. 如权利要求13所述的电子设备,其中,所述利用所述标准图像训练所述混合网络,得到标准学生模型,包括:
    初始化所述标准混合模块中学生模块的参数;
    利用所述标准图像训练所述学生模块,并根据预设的损失函数调整所述学生模块的参数;
    当所述预设的损失函数满足预设的损失阈值时,以此时更新的参数作为学生模块的参数,并删除所述混合模块中的教师模块,得到所述标准学生模型。
  15. 如权利要求9至14中任意一项所述的电子设备,其中,所述利用所述标准学生模型对待检测图像进行检测,得到图像检测结果,包括:
    利用所述标准学生模型对所述待检测图像进行框选及分类,得到检测图像;
    对所述检测图像进行识别,得到识别框及标注,汇总所述识别框及标注得到所述图像检测结果。
  16. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像;
    利用所述标准图像训练预构建的教师网络,得到标准教师模型;
    根据所述标准教师模型和预构建的学生网络构建混合模块,并基于所述混合模块及所述学生网络得到混合网络;
    利用所述标准图像训练所述混合网络,得到标准学生模型;
    利用所述标准学生模型对待检测图像进行检测,得到图像检测结果。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述获取原始图像,对所述原始图像进行空间转换及数据增强处理,得到标准图像,包括:
    对所述原始图像进行平移、旋转处理,得到转换图像;
    利用预设的随机函数生成高斯噪声,并将所述高斯噪声添加至所述转换图像中,得到标准图像。
  18. 如权利要求16所述的计算机可读存储介质,其中,所述利用所述标准图像训练预构建的教师网络,得到标准教师模型,包括:
    利用所述教师网络中的图像特征提取模块及图像特征增强模块对所述标准图像进行特征提取及特征增强,得到特征图像;
    利用所述教师网络中的检测模块得到所述特征图像的预测框,基于所述预测框及真实框,利用预设的教师损失函数计算得到损失值,直到所述损失值小于预设的阈值,得到所述标准教师模型。
  19. 如权利要求16所述的计算机可读存储介质,其中,所述根据所述标准教师模型和预构建的学生网络构建混合模块,包括:
    将所述标准教师模型中的模块作为教师模块,及将所述预构建的学生网络中的模块作为学生模块;
    将所述教师模块和对应的学生模块进行匹配,匹配成功后得到所述混合模块。
  20. 如权利要求19所述的计算机可读存储介质,其中,所述基于所述混合模块及所述学生网络得到混合网络,包括:
    在所述混合模块中,以随机选择的方式设置教师模块替换学生模块的概率,得到标准混合模块;
    利用所述标准混合模块替换所述学生网络中的学生模块,得到包含所述标准混合模块的混合网络。
PCT/CN2021/083708 2020-12-31 2021-03-30 图像检测方法、装置、电子设备及存储介质 WO2022141859A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011645110.5A CN112767320A (zh) 2020-12-31 2020-12-31 图像检测方法、装置、电子设备及存储介质
CN202011645110.5 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022141859A1 true WO2022141859A1 (zh) 2022-07-07

Family

ID=75698783

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083708 WO2022141859A1 (zh) 2020-12-31 2021-03-30 图像检测方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112767320A (zh)
WO (1) WO2022141859A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082690A (zh) * 2022-07-12 2022-09-20 北京百度网讯科技有限公司 目标识别方法、目标识别模型训练方法及装置
CN115131747A (zh) * 2022-08-25 2022-09-30 合肥中科类脑智能技术有限公司 基于知识蒸馏的输电通道工程车辆目标检测方法及系统
CN116071608A (zh) * 2023-03-16 2023-05-05 浙江啄云智能科技有限公司 目标检测方法、装置、设备和存储介质
CN116977919A (zh) * 2023-06-21 2023-10-31 北京卓视智通科技有限责任公司 一种着装规范的识别方法、系统、存储介质和电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284164A (zh) * 2021-05-19 2021-08-20 中国农业大学 虾群自动计数方法、装置、电子设备及存储介质
CN115631178B (zh) * 2022-11-03 2023-11-10 昆山润石智能科技有限公司 自动晶圆缺陷检测方法、系统、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674714A (zh) * 2019-09-13 2020-01-10 东南大学 基于迁移学习的人脸和人脸关键点联合检测方法
CN111027403A (zh) * 2019-11-15 2020-04-17 深圳市瑞立视多媒体科技有限公司 手势估计方法、装置、设备及计算机可读存储介质
CN111950638A (zh) * 2020-08-14 2020-11-17 厦门美图之家科技有限公司 基于模型蒸馏的图像分类方法、装置和电子设备
CN112116030A (zh) * 2020-10-13 2020-12-22 浙江大学 一种基于向量标准化和知识蒸馏的图像分类方法
CN112115783A (zh) * 2020-08-12 2020-12-22 中国科学院大学 基于深度知识迁移的人脸特征点检测方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674714A (zh) * 2019-09-13 2020-01-10 东南大学 基于迁移学习的人脸和人脸关键点联合检测方法
CN111027403A (zh) * 2019-11-15 2020-04-17 深圳市瑞立视多媒体科技有限公司 手势估计方法、装置、设备及计算机可读存储介质
CN112115783A (zh) * 2020-08-12 2020-12-22 中国科学院大学 基于深度知识迁移的人脸特征点检测方法、装置及设备
CN111950638A (zh) * 2020-08-14 2020-11-17 厦门美图之家科技有限公司 基于模型蒸馏的图像分类方法、装置和电子设备
CN112116030A (zh) * 2020-10-13 2020-12-22 浙江大学 一种基于向量标准化和知识蒸馏的图像分类方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082690A (zh) * 2022-07-12 2022-09-20 北京百度网讯科技有限公司 目标识别方法、目标识别模型训练方法及装置
CN115131747A (zh) * 2022-08-25 2022-09-30 合肥中科类脑智能技术有限公司 基于知识蒸馏的输电通道工程车辆目标检测方法及系统
CN116071608A (zh) * 2023-03-16 2023-05-05 浙江啄云智能科技有限公司 目标检测方法、装置、设备和存储介质
CN116071608B (zh) * 2023-03-16 2023-06-06 浙江啄云智能科技有限公司 目标检测方法、装置、设备和存储介质
CN116977919A (zh) * 2023-06-21 2023-10-31 北京卓视智通科技有限责任公司 一种着装规范的识别方法、系统、存储介质和电子设备
CN116977919B (zh) * 2023-06-21 2024-01-26 北京卓视智通科技有限责任公司 一种着装规范的识别方法、系统、存储介质和电子设备

Also Published As

Publication number Publication date
CN112767320A (zh) 2021-05-07

Similar Documents

Publication Publication Date Title
WO2022141859A1 (zh) 图像检测方法、装置、电子设备及存储介质
US10726304B2 (en) Refining synthetic data with a generative adversarial network using auxiliary inputs
WO2022105179A1 (zh) 生物特征图像识别方法、装置、电子设备及可读存储介质
WO2022116424A1 (zh) 交通流预测模型训练方法、装置、电子设备及存储介质
TWI814984B (zh) 分裂網路加速架構
WO2021212683A1 (zh) 基于法律知识图谱的查询方法、装置、电子设备及介质
EP3746934A1 (en) Face synthesis
WO2022141858A1 (zh) 行人检测方法、装置、电子设备及存储介质
WO2023159755A1 (zh) 虚假新闻检测方法、装置、设备及存储介质
US10732694B2 (en) Power state control of a mobile device
CN113096242A (zh) 虚拟主播生成方法、装置、电子设备及存储介质
WO2022142106A1 (zh) 文本分析方法、装置、电子设备及可读存储介质
WO2022227218A1 (zh) 药名识别方法、装置、计算机设备和存储介质
WO2023134069A1 (zh) 实体关系的识别方法、设备及可读存储介质
CN113157739B (zh) 跨模态检索方法、装置、电子设备及存储介质
WO2022227192A1 (zh) 图像分类方法、装置、电子设备及介质
CN113591881B (zh) 基于模型融合的意图识别方法、装置、电子设备及介质
US11321397B2 (en) Composition engine for analytical models
CN112269875B (zh) 文本分类方法、装置、电子设备及存储介质
US11276249B2 (en) Method and system for video action classification by mixing 2D and 3D features
CN112860851A (zh) 基于根因分析的课程推荐方法、装置、设备及介质
Yi et al. Improving synthetic to realistic semantic segmentation with parallel generative ensembles for autonomous urban driving
WO2023178798A1 (zh) 图像分类方法、装置、设备及介质
WO2022222228A1 (zh) 文本不良信息识别方法、装置、电子设备及存储介质
CN113419951B (zh) 人工智能模型优化方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912629

Country of ref document: EP

Kind code of ref document: A1