CN113688675A - Target detection method and device, electronic equipment and storage medium - Google Patents

Target detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113688675A
CN113688675A CN202110814956.5A CN202110814956A CN113688675A CN 113688675 A CN113688675 A CN 113688675A CN 202110814956 A CN202110814956 A CN 202110814956A CN 113688675 A CN113688675 A CN 113688675A
Authority
CN
China
Prior art keywords
pupil
training
image
detection module
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110814956.5A
Other languages
Chinese (zh)
Inventor
陈荡荡
和超
张大磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eaglevision Medical Technology Co Ltd
Beijing Airdoc Technology Co Ltd
Original Assignee
Shanghai Eaglevision Medical Technology Co Ltd
Beijing Airdoc Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eaglevision Medical Technology Co Ltd, Beijing Airdoc Technology Co Ltd filed Critical Shanghai Eaglevision Medical Technology Co Ltd
Priority to CN202110814956.5A priority Critical patent/CN113688675A/en
Publication of CN113688675A publication Critical patent/CN113688675A/en
Priority to PCT/CN2022/105895 priority patent/WO2023001063A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: processing the target pupil image by using a first target detection module to obtain a first result comprising preliminary position information of a pupil to be positioned in the target pupil image; intercepting an exit pupil area image from a target pupil image based on preliminary position information of a pupil to be positioned; processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: and the final position information of the pupil to be positioned and the probability that the object in the pupil area image is the pupil. The first target detection module and the second target detection module can both adopt a simple convolutional neural network for modeling, and have lower calculation complexity compared with the more complex target detection algorithms such as YOLO, Faster RCNN and SSD for target detection, so that the method is suitable for being applied to embedded equipment with lower calculation performance such as a fundus camera, and has higher detection speed.

Description

Target detection method and device, electronic equipment and storage medium
Technical Field
The application relates to the field of machine learning, in particular to a target detection method and device, electronic equipment and a storage medium.
Background
When analyzing the eye health of a user, it is often necessary to locate the position of the user's pupil. Currently, objects such as yolo (young Only Look one), fast RCNN, and SSD are typically used to detect the locations of the pupils of the user located by the network. However, such an object detection algorithm has high computational complexity, and is not suitable for application to an embedded device with low computational performance, such as an intelligent fundus camera, and at the same time, has a slow detection speed.
Disclosure of Invention
The application provides a target detection method, a target detection device, an electronic device and a storage medium.
According to a first aspect of embodiments of the present application, there is provided a target detection method, including:
processing the target pupil image by using a first target detection module to obtain a first result, wherein the first result comprises: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;
intercepting an exit pupil area image from the target pupil image based on the preliminary position information of the pupil to be positioned;
processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of a rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.
According to a second aspect of embodiments of the present application, there is provided an object detection apparatus, including:
a first detection unit configured to process the target pupil image by using a first target detection module to obtain a first result, where the first result includes: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;
a capturing unit configured to capture an exit pupil region image from the target pupil image based on preliminary position information of the pupil to be positioned;
a second detection unit configured to process the pupil region image by using a second object detection module to obtain a second result, where the second result includes: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of a rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.
The target detection method, the target detection device, the electronic device and the storage medium provided by the embodiment of the application utilize the first target detection module to process the target pupil image to obtain a first result, wherein the first result comprises: preliminary position information of a pupil to be positioned in the target pupil image; intercepting an exit pupil area image from a target pupil image based on preliminary position information of a pupil to be positioned; processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: and the final position information of the pupil to be positioned and the probability that the object in the pupil area image is the pupil. The first target detection module and the second target detection module can both adopt a simple convolutional neural network for modeling, and have lower calculation complexity compared with the method for detecting the target by adopting a more complex target detection algorithm such as YOLO, Faster RCNN and SSD, so that the method is suitable for being applied to embedded equipment with lower calculation performance such as a fundus camera, and has higher detection speed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating a target detection method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating the effect of a rectangular frame output by a first object detection module and a rectangular frame output by a second object detection module;
fig. 3 is a schematic diagram illustrating an effect of a speed of performing target detection on a pupil image by using a target detection method provided in an embodiment of the present application;
fig. 4 is a block diagram illustrating a structure of an object detection apparatus provided in an embodiment of the present application;
fig. 5 shows a block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a flowchart of a target detection method provided in an embodiment of the present application, where the method includes:
step 101, a first target detection module is used for processing a target pupil image to obtain a first result.
The pupil images may be acquired by a smart fundus camera, e.g., any pupil image acquired by a secondary camera of the smart fundus camera. One pupil image includes one eye of the photographed person, and one pupil image includes only one pupil, i.e., the pupil in the one eye.
In the present application, a target pupil image is not particularly specific to a certain pupil image. For a pupil image, when the position of the pupil in the pupil image needs to be determined, the pupil image is taken as a target pupil image.
In the present application, one pupil in the target pupil image may be referred to as a pupil to be positioned.
When the first target detection module is used for processing the target pupil image to obtain a first result, the target pupil image is input into the first target detection module, and the first target detection module can output a preliminary prediction rectangular frame corresponding to the pupil to be positioned. The coordinates of the central point of the preliminary prediction rectangular frame can be used as the preliminary coordinates of the central point of the pupil to be positioned, the width of the preliminary prediction rectangular frame can be used as the preliminary width of the pupil to be positioned, and the height of the preliminary prediction rectangular frame can be used as the preliminary height of the pupil to be positioned. The preliminary location information of the pupil to be located may include: the preliminary coordinates of the central point of the pupil to be positioned, the preliminary width of the pupil to be positioned and the preliminary height of the pupil to be positioned.
In this application, the first target detection module may include: branches for predicting pupil location. The branch used to predict the pupil location may be a neural network, such as a convolutional neural network.
After the target pupil image is input into the first target detection module, the target pupil image is used as an input of a branch for predicting the pupil position, and the branch for predicting the pupil position outputs preliminary position information of a pupil to be positioned.
In some embodiments, the first target detection module comprises a convolutional neural network for predicting pupil position or a Logistic Regression (Logistic Regression) model for predicting pupil position.
The branch in the first object detection module for predicting the pupil position may be a convolutional neural network for predicting the pupil position or a logistic regression model for predicting the pupil position.
In this application, the pupil image used for training the first target detection module may be referred to as a first training pupil image. The first target detection module is trained in advance through the first target detection module by the first training pupil image and the annotation data of each first training pupil image.
And training the first target detection module each time by adopting a first training pupil image and the labeling data of the first training pupil image. The first training pupil images adopted by the first target detection module are different during each training.
Each time the first target detection module is trained, one first training pupil image adopted by the training is input into the first target detection module, the first target detection module outputs the predicted position information of the pupil in the first training pupil image, and specifically, a branch used for predicting the pupil position in the first target detection module outputs the predicted position information of the pupil in the first training pupil image.
The predicted location information of the pupil in the first training pupil image may include: the predicted coordinates of the center point of the pupil in the first training pupil image, the predicted width of the pupil in the first training pupil image, and the predicted height of the pupil in the first training pupil image.
In this application, for each first training pupil image, the labeling information of the first training pupil image may include: and marking position information of the pupil in the first training pupil image.
For each first training pupil image, the annotation position information of the pupil in the first training pupil image may include: the labeling coordinates of the center point of the pupil in the first training pupil image, the labeling width of the pupil in the first training pupil image, and the labeling height of the pupil in the first training pupil image.
Each time the first target detection module is trained, the loss between the predicted position information of the pupil in the first trained pupil image and the labeled position information of the pupil in the first trained pupil image can be calculated, and the parameter of the branch used for predicting the pupil position in the first target detection module is updated according to the calculated loss.
In some embodiments, for each first training pupil image, each item in the labeled position information of the pupil in the first training pupil image is converted to obtain a conversion value of each item; generating a label for training of the first training pupil image, the label for training of the first training pupil image including a translation value of each item.
For each first training pupil image, the label used for training of the first training pupil image is used for calculating loss, and when the first training pupil image is used for training the first target detection module, the loss between the predicted position information of the pupil in the first training pupil image and the label used for training of the first training pupil image is calculated.
Each item of annotation location information for the pupil in the first training pupil image may be transformed using the following formula:
d1=min((x+w*0.5)/img_w,1)
d2=min((y+h*0.5)/img_h,1)
d3=-log(w/img_w)
d4=-log(h/img_h)
d1 is a conversion value of an x-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, d2 is a conversion value of a y-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, d3 is a conversion value of the predicted width of the pupil in the first training pupil image, d4 is a conversion value of the predicted height of the pupil in the first training pupil image, x is an x-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, y is a y-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, w is the predicted width of the pupil in the first training pupil image, h is the predicted height of the pupil in the first training pupil image, img _ w is the width of the first training pupil image, and img _ h is the height of the first training pupil image.
And 102, intercepting an exit pupil area image from the target pupil image based on the preliminary position information of the pupil to be positioned.
In the application, the pupil area to be positioned can be determined according to the preliminary position information of the pupil to be positioned. The preliminary location information of the pupil to be located may include: the preliminary coordinates of the central point of the pupil to be positioned, the preliminary width of the pupil to be positioned and the preliminary height of the pupil to be positioned. The shape of the pupil area to be positioned is rectangular, the coordinate of the central point of the pupil area to be positioned is the preliminary coordinate, the width of the pupil area to be positioned is the preliminary width, and the height of the pupil area to be positioned is the preliminary height. After determining the pupil region to be located in the target pupil image, an exit pupil region image may be intercepted from the target pupil image. In other words, the pupil area image may be an image block of which the occupied area in the target pupil image is the pupil area to be located.
And 103, processing the pupil area image by using a second target detection module to obtain a second result.
In the application, when the second target detection module is used for processing the pupil area image to obtain a second result, the pupil area image is input into the second target detection module, and the second target detection module can output a final prediction rectangular frame corresponding to a pupil to be positioned and the probability that an object in the pupil area image is the pupil. The coordinate of the central point of the final prediction rectangular frame can be used as the final coordinate of the central point of the pupil to be positioned, the width of the final prediction rectangular frame can be used as the final width of the pupil to be positioned, and the height of the final prediction rectangular frame can be used as the final height of the pupil to be positioned. The second result includes: and the final position information of the pupil to be positioned and the probability that the object in the pupil area image is the pupil. The final position information of the pupil to be positioned comprises: the final coordinate of the center point of the pupil to be positioned, the final width of the pupil to be positioned and the final height of the pupil to be positioned.
Please refer to fig. 2, which shows an effect diagram of the rectangular frame output by the first object detection module and the rectangular frame output by the second object detection module.
In fig. 2, the dotted rectangle is the rectangle output by the first target detection module, i.e. the preliminary prediction rectangle corresponding to the pupil to be positioned, and the solid rectangle is the rectangle output by the second target detection module, i.e. the final prediction rectangle corresponding to the pupil to be positioned.
Please refer to fig. 3, which shows an effect diagram of a speed of performing object detection on a pupil image by using the object detection method provided by the embodiment of the present application.
The processor for executing the object detection method provided by the embodiment of the present application may be an arm processor, for example, an arm processor of a7@1.2 GHz.
In fig. 3, a detection speed of the target detection for each of the 10 pupil images is exemplarily shown. Each pupil image may be 800 x 600 in size, with an average detection speed of about 19 ms.
In this application, the second target detection module may include: a regression branch for predicting the pupil position, and a classification branch for predicting the probability that the object in the pupil area image is a pupil. The regression branch for predicting the pupil position may be a neural network, such as a convolutional neural network, and the classification branch for predicting the probability that the object in the pupil region image is a pupil may be a neural network, such as a convolutional neural network.
After the pupil area image is input into the second target detection module, the pupil area image is used as an input of a regression branch in the second target detection module, and the regression branch outputs final position information of a pupil to be positioned. Meanwhile, the pupil region image is used as an input of a classification branch in the second object detection module, which outputs a probability that an object in the pupil region image is a pupil.
In some embodiments, the second target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position.
The regression branch in the second target detection module for predicting the pupil position may be a convolutional neural network for predicting the pupil position or a logistic regression model for predicting the pupil position.
In some embodiments, processing the pupil region image with the second object detection module to obtain a second result comprises: adjusting the size of the pupil area image to a preset size; and processing the pupil area image by using a second target detection module to obtain a second result.
For example, the preset size is 48 × 48, the size of the pupil region image is adjusted to 48 × 48, the pupil region image is input into the second object detection module after the size of the pupil region image is adjusted to 48 × 48, and the second object detection module outputs the second result.
In the present application, the second target detection module is trained in advance through the second target detection module by the training data corresponding to each second training pupil image.
During training of the second target detection model, the regression branch in the second target detection module and the classification branch in the second target detection module may be trained using the training data corresponding to each of the second training pupil images.
In the present application, the pupil image used for training the second target detection module may be referred to as a second training pupil image.
For each second training pupil image, the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data corresponding to the rectangular boxes of pupils in the second training pupil image comprising: the difference information of the rectangular frame, the image blocks in the second training pupil image surrounded by the rectangular frame, and the labeling data of the second training pupil image, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and the labeling frame of the pupil in the second training pupil image.
For each second training pupil image, the labeling information of the second training pupil image includes: and marking position information of the pupil in the second training pupil image.
For each second training pupil image, the labeling position information of the pupil in the second training pupil image may include: the labeling coordinates of the center point of the pupil in the second training pupil image, the labeling width of the pupil in the second training pupil image, and the labeling height of the pupil in the second training pupil image.
In this application, for each second training pupil image, a preset number of rectangular frames different from the labeling frame of the pupil in the second training pupil image may be generated around the labeling frame of the pupil in the second training pupil image.
In this application, for each second training pupil image, for the second training pupil image, a rectangular frame generated around the labeling frame of the pupil in the second training pupil image and different from the labeling frame of the pupil in the second training pupil image is a rectangular frame corresponding to the pupil in the second training pupil image.
For each second training pupil image, the shape of the labeling frame of the pupil in the second training pupil image is a rectangle, the coordinate of the central point of the labeling frame of the pupil in the second training pupil image is the labeling coordinate of the central point of the pupil in the second training pupil image, the width of the labeling frame of the pupil in the second training pupil image is the labeling width of the pupil in the second training pupil image, and the height of the labeling frame of the pupil in the second training pupil image is the labeling height of the pupil in the second training pupil image.
When the second target detection module is trained, for a rectangular frame corresponding to a second training pupil image, the image block in the training data of the rectangular frame is used as the input of the second target detection module, and the difference information in the training data of the rectangular frame can be used as the supervision information to calculate the regression loss. If the intersection ratio (IOU) between the rectangular frame and the labeling frame of the pupil in the second training pupil image is greater than the preset threshold value related to classification, the rectangular frame may be used as a positive sample for training the classification branch in the second target detection module, and if the intersection ratio between the rectangular frame and the labeling frame of the pupil in the second training pupil image is less than the preset threshold value related to classification, the rectangular frame may be used as a negative sample for training the classification branch in the second target detection module.
In some embodiments, for a second training pupil image, the rectangular frame corresponding to the pupil in the second training pupil image with the corresponding intersection ratio greater than the first intersection ratio threshold is used for training the regression branch in the second target detection module and is used as a positive sample for training the classification branch in the second target detection module, the rectangular frame with the corresponding intersection ratio smaller than the first intersection ratio threshold and larger than the second intersection ratio threshold is used for training the regression branch, the rectangular frame with the corresponding intersection ratio smaller than the third intersection ratio threshold is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeled frame of the pupil in the second training pupil image, the first intersection ratio threshold is larger than the intersection ratio second threshold, the second cross-over ratio threshold is greater than the third cross-over ratio threshold.
For each rectangular frame corresponding to a pupil in one second training pupil image, an intersection ratio of the rectangular frame and a labeling frame of the pupil in the second training pupil image may be calculated to obtain an intersection ratio corresponding to the rectangular frame. And determining whether the rectangular frame corresponding to the pupil in the second training pupil image is used for training the regression branch and the classification branch according to the intersection ratio corresponding to the rectangular frame corresponding to the pupil in the second training pupil image.
For example, the first cross ratio threshold is 0.65, the second cross ratio threshold is 0.5, and the third cross ratio threshold is 0.3. For a rectangular box corresponding to a pupil in a second training pupil image, if the corresponding intersection ratio of the rectangular box is greater than 0.65, the rectangular box is used for training the regression branch and the rectangular box is used as a positive sample for training the classification branch. If the corresponding intersection ratio of the rectangular box is less than 0.65 and greater than 0.5, the rectangular box is used to train the regression branch. If the intersection ratio corresponding to the rectangular box is less than 0.3, the rectangular box is used as a negative sample for training the classification branch.
When the second target detection module is trained, for a rectangular frame corresponding to a second training pupil image, the image blocks in the training data of the rectangular frame are used as the input of the second target detection module. If the rectangular frame is used for training the regression branch, the regression loss can be calculated by using the difference information in the training data of the rectangular frame as the supervision information.
Referring to fig. 4, a block diagram of a target detection apparatus according to an embodiment of the present disclosure is shown. The object detection device includes: a first detection unit 401, a truncation unit 402, and a second detection unit 403.
The first detection unit 401 is configured to process the target pupil image with the first target detection module, resulting in a first result comprising: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;
a clipping unit 402 is configured to clip an exit pupil region image from the target pupil image based on preliminary location information of the pupil to be located;
the second detection unit 403 is configured to process the pupil region image by using a second object detection module, resulting in a second result, which includes: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of a rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.
In some embodiments, the corresponding rectangular box of each rectangular box with the intersection ratio greater than the first intersection ratio threshold is used for training the regression branch in the second target detection module and is used as a positive sample for training the classification branch in the second target detection module, the corresponding rectangular boxes with the intersection ratio smaller than a first intersection ratio threshold and larger than a second intersection ratio threshold in each rectangular box are used for training the regression branch, the corresponding rectangular box with the intersection ratio smaller than the third intersection ratio threshold value in each rectangular box is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeling frame, the first intersection ratio threshold is greater than the intersection ratio second threshold, and the second intersection ratio threshold is greater than the third intersection ratio threshold.
In some embodiments, the annotation information of the first training pupil image comprises: marking position information of the pupil in the first training pupil image; the device further comprises:
a label generating unit configured to convert each item of labeling position information of a pupil in the first training pupil image to obtain a conversion value of each item; generating a label for training of the first training pupil image, the label for training of the first training pupil image including the converted value of each item.
In some embodiments, the first target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position, and the second target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position.
In some embodiments, the second detecting unit 403 is further configured to adjust the size of the pupil region image to a preset size; and processing the pupil area image by using a second target detection module to obtain a second result.
Fig. 5 is a block diagram of an electronic device provided in this embodiment. The electronic device includes a processing component 522 that further includes one or more processors, and memory resources, represented by memory 532, for storing instructions, e.g., applications, that are executable by the processing component 522. The application programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processing component 522 is configured to execute instructions to perform the above-described methods.
The electronic device may also include a power supply component 526 configured to perform power management of the electronic device, a wired or wireless network interface 550 configured to connect the electronic device to a network, and an input/output (I/O) interface 558. The electronic device may operate based on an operating system stored in memory 532, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by an electronic device to perform the above-described object detection method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method of object detection, the method comprising:
processing the target pupil image by using a first target detection module to obtain a first result, wherein the first result comprises: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;
intercepting an exit pupil area image from the target pupil image based on the preliminary position information of the pupil to be positioned;
processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of the rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.
2. The method of claim 1, wherein the corresponding rectangular box of each rectangular box with the cross-to-parallel ratio greater than a first cross-to-parallel ratio threshold is used for training the regression branch in the second target detection module and is used as a positive sample for training the classification branch in the second target detection module, the corresponding rectangular boxes with the intersection ratio smaller than a first intersection ratio threshold and larger than a second intersection ratio threshold in each rectangular box are used for training the regression branch, the corresponding rectangular box with the intersection ratio smaller than the third intersection ratio threshold value in each rectangular box is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeling frame, the first intersection ratio threshold is greater than the intersection ratio second threshold, and the second intersection ratio threshold is greater than the third intersection ratio threshold.
3. The method of claim 1, wherein the labeling information of the first training pupil image comprises: marking position information of the pupil in the first training pupil image; the method further comprises the following steps:
converting each item in the labeling position information of the pupil in the first training pupil image to obtain a conversion value of each item;
generating a label for training of the first training pupil image, the label for training of the first training pupil image including the converted value of each item.
4. The method of claim 1, wherein the first target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position, and the second target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position.
5. The method of claim 1, wherein processing the pupil region image with a second object detection module to obtain a second result comprises:
adjusting the size of the pupil area image to a preset size;
and processing the pupil area image by using a second target detection module to obtain a second result.
6. An object detection apparatus, characterized in that the apparatus comprises:
a first detection unit configured to process the target pupil image by using a first target detection module to obtain a first result, where the first result includes: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;
a capturing unit configured to capture an exit pupil region image from the target pupil image based on preliminary position information of the pupil to be positioned;
a second detection unit configured to process the pupil region image by using a second object detection module to obtain a second result, where the second result includes: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of the rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.
7. The apparatus of claim 6, wherein the rectangular box of each rectangular box with the corresponding intersection ratio greater than a first intersection ratio threshold is used for training regression branches and is used as a positive sample for training classification branches in the second target detection module, the corresponding rectangular boxes with the intersection ratio smaller than a first intersection ratio threshold and larger than a second intersection ratio threshold in each rectangular box are used for training the regression branch in the second target detection module, the corresponding rectangular box with the intersection ratio smaller than the third intersection ratio threshold value in each rectangular box is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeling frame, the first intersection ratio threshold is greater than the intersection ratio second threshold, and the second intersection ratio threshold is greater than the third intersection ratio threshold.
8. The apparatus of claim 6, wherein the labeling information of the first training pupil image comprises: marking position information of the pupil in the first training pupil image; the device further comprises:
a label generating unit configured to convert each item of labeling position information of a pupil in the first training pupil image to obtain a conversion value of each item; generating a label for training of the first training pupil image, the label for training of the first training pupil image including the converted value of each item.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 5.
10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1 to 5.
CN202110814956.5A 2021-07-19 2021-07-19 Target detection method and device, electronic equipment and storage medium Pending CN113688675A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110814956.5A CN113688675A (en) 2021-07-19 2021-07-19 Target detection method and device, electronic equipment and storage medium
PCT/CN2022/105895 WO2023001063A1 (en) 2021-07-19 2022-07-15 Target detection method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110814956.5A CN113688675A (en) 2021-07-19 2021-07-19 Target detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113688675A true CN113688675A (en) 2021-11-23

Family

ID=78577417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110814956.5A Pending CN113688675A (en) 2021-07-19 2021-07-19 Target detection method and device, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113688675A (en)
WO (1) WO2023001063A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298912A (en) * 2022-03-08 2022-04-08 北京万里红科技有限公司 Image acquisition method and device, electronic equipment and storage medium
CN114937086A (en) * 2022-07-19 2022-08-23 北京鹰瞳科技发展股份有限公司 Training method and detection method for multi-image target detection and related products
WO2023001063A1 (en) * 2021-07-19 2023-01-26 北京鹰瞳科技发展股份有限公司 Target detection method and apparatus, electronic device, and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115856B (en) * 2023-08-02 2024-04-05 珠海微度芯创科技有限责任公司 Target detection method based on image fusion, human body security inspection equipment and storage medium
CN117523650B (en) * 2024-01-04 2024-04-02 山东大学 Eyeball motion tracking method and system based on rotation target detection

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4795737B2 (en) * 2005-07-12 2011-10-19 富士フイルム株式会社 Face detection method, apparatus, and program
CN108717693A (en) * 2018-04-24 2018-10-30 浙江工业大学 A kind of optic disk localization method based on RPN
CN111598091A (en) * 2020-05-20 2020-08-28 北京字节跳动网络技术有限公司 Image recognition method and device, electronic equipment and computer readable storage medium
CN112598650A (en) * 2020-12-24 2021-04-02 苏州大学 Combined segmentation method for optic cup optic disk in fundus medical image
CN113688675A (en) * 2021-07-19 2021-11-23 北京鹰瞳科技发展股份有限公司 Target detection method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023001063A1 (en) * 2021-07-19 2023-01-26 北京鹰瞳科技发展股份有限公司 Target detection method and apparatus, electronic device, and storage medium
CN114298912A (en) * 2022-03-08 2022-04-08 北京万里红科技有限公司 Image acquisition method and device, electronic equipment and storage medium
CN114937086A (en) * 2022-07-19 2022-08-23 北京鹰瞳科技发展股份有限公司 Training method and detection method for multi-image target detection and related products
CN114937086B (en) * 2022-07-19 2022-11-01 北京鹰瞳科技发展股份有限公司 Training method and detection method for multi-image target detection and related products

Also Published As

Publication number Publication date
WO2023001063A1 (en) 2023-01-26

Similar Documents

Publication Publication Date Title
CN113688675A (en) Target detection method and device, electronic equipment and storage medium
US10936919B2 (en) Method and apparatus for detecting human face
US20200334830A1 (en) Method, apparatus, and storage medium for processing video image
CN107358149B (en) Human body posture detection method and device
CN110210302B (en) Multi-target tracking method, device, computer equipment and storage medium
CN107622240B (en) Face detection method and device
JP2022526750A (en) Object tracking methods, object tracking devices, computer programs, and electronic devices
CN108197618B (en) Method and device for generating human face detection model
US20190362511A1 (en) Efficient scene depth map enhancement for low power devices
CN108491823B (en) Method and device for generating human eye recognition model
CN111079638A (en) Target detection model training method, device and medium based on convolutional neural network
WO2021057148A1 (en) Brain tissue layering method and device based on neural network, and computer device
CN112330624A (en) Medical image processing method and device
CN112149615A (en) Face living body detection method, device, medium and electronic equipment
CN110110666A (en) Object detection method and device
CN114549557A (en) Portrait segmentation network training method, device, equipment and medium
CN116863522A (en) Acne grading method, device, equipment and medium
CN110472673B (en) Parameter adjustment method, fundus image processing device, fundus image processing medium and fundus image processing apparatus
CN111080697A (en) Method, device, computer equipment and storage medium for detecting direction of target object
CN112509154A (en) Training method of image generation model, image generation method and device
CN113505763B (en) Key point detection method and device, electronic equipment and storage medium
CN111695404B (en) Pedestrian falling detection method and device, electronic equipment and storage medium
CN114612976A (en) Key point detection method and device, computer readable medium and electronic equipment
CN113688667A (en) Deep learning-based luggage taking and placing action recognition method and system
CN112258563A (en) Image alignment method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination