CN113688675A

CN113688675A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN113688675A
Application number: CN202110814956.5A
Authority: CN
Inventors: 陈荡荡; 和超; 张大磊
Original assignee: Shanghai Eaglevision Medical Technology Co Ltd; Beijing Airdoc Technology Co Ltd
Current assignee: Shanghai Eaglevision Medical Technology Co Ltd; Beijing Airdoc Technology Co Ltd
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-11-23
Also published as: WO2023001063A1

Abstract

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: processing the target pupil image by using a first target detection module to obtain a first result comprising preliminary position information of a pupil to be positioned in the target pupil image; intercepting an exit pupil area image from a target pupil image based on preliminary position information of a pupil to be positioned; processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: and the final position information of the pupil to be positioned and the probability that the object in the pupil area image is the pupil. The first target detection module and the second target detection module can both adopt a simple convolutional neural network for modeling, and have lower calculation complexity compared with the more complex target detection algorithms such as YOLO, Faster RCNN and SSD for target detection, so that the method is suitable for being applied to embedded equipment with lower calculation performance such as a fundus camera, and has higher detection speed.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The application relates to the field of machine learning, in particular to a target detection method and device, electronic equipment and a storage medium.

Background

When analyzing the eye health of a user, it is often necessary to locate the position of the user's pupil. Currently, objects such as yolo (young Only Look one), fast RCNN, and SSD are typically used to detect the locations of the pupils of the user located by the network. However, such an object detection algorithm has high computational complexity, and is not suitable for application to an embedded device with low computational performance, such as an intelligent fundus camera, and at the same time, has a slow detection speed.

Disclosure of Invention

The application provides a target detection method, a target detection device, an electronic device and a storage medium.

According to a first aspect of embodiments of the present application, there is provided a target detection method, including:

processing the target pupil image by using a first target detection module to obtain a first result, wherein the first result comprises: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;

intercepting an exit pupil area image from the target pupil image based on the preliminary position information of the pupil to be positioned;

processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of a rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.

According to a second aspect of embodiments of the present application, there is provided an object detection apparatus, including:

a first detection unit configured to process the target pupil image by using a first target detection module to obtain a first result, where the first result includes: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;

a capturing unit configured to capture an exit pupil region image from the target pupil image based on preliminary position information of the pupil to be positioned;

a second detection unit configured to process the pupil region image by using a second object detection module to obtain a second result, where the second result includes: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of a rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.

The target detection method, the target detection device, the electronic device and the storage medium provided by the embodiment of the application utilize the first target detection module to process the target pupil image to obtain a first result, wherein the first result comprises: preliminary position information of a pupil to be positioned in the target pupil image; intercepting an exit pupil area image from a target pupil image based on preliminary position information of a pupil to be positioned; processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: and the final position information of the pupil to be positioned and the probability that the object in the pupil area image is the pupil. The first target detection module and the second target detection module can both adopt a simple convolutional neural network for modeling, and have lower calculation complexity compared with the method for detecting the target by adopting a more complex target detection algorithm such as YOLO, Faster RCNN and SSD, so that the method is suitable for being applied to embedded equipment with lower calculation performance such as a fundus camera, and has higher detection speed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart illustrating a target detection method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating the effect of a rectangular frame output by a first object detection module and a rectangular frame output by a second object detection module;

fig. 3 is a schematic diagram illustrating an effect of a speed of performing target detection on a pupil image by using a target detection method provided in an embodiment of the present application;

fig. 4 is a block diagram illustrating a structure of an object detection apparatus provided in an embodiment of the present application;

fig. 5 shows a block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows a flowchart of a target detection method provided in an embodiment of the present application, where the method includes:

step 101, a first target detection module is used for processing a target pupil image to obtain a first result.

The pupil images may be acquired by a smart fundus camera, e.g., any pupil image acquired by a secondary camera of the smart fundus camera. One pupil image includes one eye of the photographed person, and one pupil image includes only one pupil, i.e., the pupil in the one eye.

In the present application, a target pupil image is not particularly specific to a certain pupil image. For a pupil image, when the position of the pupil in the pupil image needs to be determined, the pupil image is taken as a target pupil image.

In the present application, one pupil in the target pupil image may be referred to as a pupil to be positioned.

When the first target detection module is used for processing the target pupil image to obtain a first result, the target pupil image is input into the first target detection module, and the first target detection module can output a preliminary prediction rectangular frame corresponding to the pupil to be positioned. The coordinates of the central point of the preliminary prediction rectangular frame can be used as the preliminary coordinates of the central point of the pupil to be positioned, the width of the preliminary prediction rectangular frame can be used as the preliminary width of the pupil to be positioned, and the height of the preliminary prediction rectangular frame can be used as the preliminary height of the pupil to be positioned. The preliminary location information of the pupil to be located may include: the preliminary coordinates of the central point of the pupil to be positioned, the preliminary width of the pupil to be positioned and the preliminary height of the pupil to be positioned.

In this application, the first target detection module may include: branches for predicting pupil location. The branch used to predict the pupil location may be a neural network, such as a convolutional neural network.

After the target pupil image is input into the first target detection module, the target pupil image is used as an input of a branch for predicting the pupil position, and the branch for predicting the pupil position outputs preliminary position information of a pupil to be positioned.

In some embodiments, the first target detection module comprises a convolutional neural network for predicting pupil position or a Logistic Regression (Logistic Regression) model for predicting pupil position.

The branch in the first object detection module for predicting the pupil position may be a convolutional neural network for predicting the pupil position or a logistic regression model for predicting the pupil position.

In this application, the pupil image used for training the first target detection module may be referred to as a first training pupil image. The first target detection module is trained in advance through the first target detection module by the first training pupil image and the annotation data of each first training pupil image.

And training the first target detection module each time by adopting a first training pupil image and the labeling data of the first training pupil image. The first training pupil images adopted by the first target detection module are different during each training.

Each time the first target detection module is trained, one first training pupil image adopted by the training is input into the first target detection module, the first target detection module outputs the predicted position information of the pupil in the first training pupil image, and specifically, a branch used for predicting the pupil position in the first target detection module outputs the predicted position information of the pupil in the first training pupil image.

The predicted location information of the pupil in the first training pupil image may include: the predicted coordinates of the center point of the pupil in the first training pupil image, the predicted width of the pupil in the first training pupil image, and the predicted height of the pupil in the first training pupil image.

In this application, for each first training pupil image, the labeling information of the first training pupil image may include: and marking position information of the pupil in the first training pupil image.

For each first training pupil image, the annotation position information of the pupil in the first training pupil image may include: the labeling coordinates of the center point of the pupil in the first training pupil image, the labeling width of the pupil in the first training pupil image, and the labeling height of the pupil in the first training pupil image.

Each time the first target detection module is trained, the loss between the predicted position information of the pupil in the first trained pupil image and the labeled position information of the pupil in the first trained pupil image can be calculated, and the parameter of the branch used for predicting the pupil position in the first target detection module is updated according to the calculated loss.

In some embodiments, for each first training pupil image, each item in the labeled position information of the pupil in the first training pupil image is converted to obtain a conversion value of each item; generating a label for training of the first training pupil image, the label for training of the first training pupil image including a translation value of each item.

For each first training pupil image, the label used for training of the first training pupil image is used for calculating loss, and when the first training pupil image is used for training the first target detection module, the loss between the predicted position information of the pupil in the first training pupil image and the label used for training of the first training pupil image is calculated.

Each item of annotation location information for the pupil in the first training pupil image may be transformed using the following formula:

d1＝min((x+w*0.5)/img_w,1)

d2＝min((y+h*0.5)/img_h,1)

d3＝-log(w/img_w)

d4＝-log(h/img_h)

d1 is a conversion value of an x-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, d2 is a conversion value of a y-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, d3 is a conversion value of the predicted width of the pupil in the first training pupil image, d4 is a conversion value of the predicted height of the pupil in the first training pupil image, x is an x-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, y is a y-axis coordinate value of the predicted coordinate of the center point of the pupil in the first training pupil image, w is the predicted width of the pupil in the first training pupil image, h is the predicted height of the pupil in the first training pupil image, img _ w is the width of the first training pupil image, and img _ h is the height of the first training pupil image.

And 102, intercepting an exit pupil area image from the target pupil image based on the preliminary position information of the pupil to be positioned.

In the application, the pupil area to be positioned can be determined according to the preliminary position information of the pupil to be positioned. The preliminary location information of the pupil to be located may include: the preliminary coordinates of the central point of the pupil to be positioned, the preliminary width of the pupil to be positioned and the preliminary height of the pupil to be positioned. The shape of the pupil area to be positioned is rectangular, the coordinate of the central point of the pupil area to be positioned is the preliminary coordinate, the width of the pupil area to be positioned is the preliminary width, and the height of the pupil area to be positioned is the preliminary height. After determining the pupil region to be located in the target pupil image, an exit pupil region image may be intercepted from the target pupil image. In other words, the pupil area image may be an image block of which the occupied area in the target pupil image is the pupil area to be located.

And 103, processing the pupil area image by using a second target detection module to obtain a second result.

In the application, when the second target detection module is used for processing the pupil area image to obtain a second result, the pupil area image is input into the second target detection module, and the second target detection module can output a final prediction rectangular frame corresponding to a pupil to be positioned and the probability that an object in the pupil area image is the pupil. The coordinate of the central point of the final prediction rectangular frame can be used as the final coordinate of the central point of the pupil to be positioned, the width of the final prediction rectangular frame can be used as the final width of the pupil to be positioned, and the height of the final prediction rectangular frame can be used as the final height of the pupil to be positioned. The second result includes: and the final position information of the pupil to be positioned and the probability that the object in the pupil area image is the pupil. The final position information of the pupil to be positioned comprises: the final coordinate of the center point of the pupil to be positioned, the final width of the pupil to be positioned and the final height of the pupil to be positioned.

Please refer to fig. 2, which shows an effect diagram of the rectangular frame output by the first object detection module and the rectangular frame output by the second object detection module.

In fig. 2, the dotted rectangle is the rectangle output by the first target detection module, i.e. the preliminary prediction rectangle corresponding to the pupil to be positioned, and the solid rectangle is the rectangle output by the second target detection module, i.e. the final prediction rectangle corresponding to the pupil to be positioned.

Please refer to fig. 3, which shows an effect diagram of a speed of performing object detection on a pupil image by using the object detection method provided by the embodiment of the present application.

The processor for executing the object detection method provided by the embodiment of the present application may be an arm processor, for example, an arm processor of a7@1.2 GHz.

In fig. 3, a detection speed of the target detection for each of the 10 pupil images is exemplarily shown. Each pupil image may be 800 x 600 in size, with an average detection speed of about 19 ms.

In this application, the second target detection module may include: a regression branch for predicting the pupil position, and a classification branch for predicting the probability that the object in the pupil area image is a pupil. The regression branch for predicting the pupil position may be a neural network, such as a convolutional neural network, and the classification branch for predicting the probability that the object in the pupil region image is a pupil may be a neural network, such as a convolutional neural network.

After the pupil area image is input into the second target detection module, the pupil area image is used as an input of a regression branch in the second target detection module, and the regression branch outputs final position information of a pupil to be positioned. Meanwhile, the pupil region image is used as an input of a classification branch in the second object detection module, which outputs a probability that an object in the pupil region image is a pupil.

In some embodiments, the second target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position.

The regression branch in the second target detection module for predicting the pupil position may be a convolutional neural network for predicting the pupil position or a logistic regression model for predicting the pupil position.

In some embodiments, processing the pupil region image with the second object detection module to obtain a second result comprises: adjusting the size of the pupil area image to a preset size; and processing the pupil area image by using a second target detection module to obtain a second result.

For example, the preset size is 48 × 48, the size of the pupil region image is adjusted to 48 × 48, the pupil region image is input into the second object detection module after the size of the pupil region image is adjusted to 48 × 48, and the second object detection module outputs the second result.

In the present application, the second target detection module is trained in advance through the second target detection module by the training data corresponding to each second training pupil image.

During training of the second target detection model, the regression branch in the second target detection module and the classification branch in the second target detection module may be trained using the training data corresponding to each of the second training pupil images.

In the present application, the pupil image used for training the second target detection module may be referred to as a second training pupil image.

For each second training pupil image, the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data corresponding to the rectangular boxes of pupils in the second training pupil image comprising: the difference information of the rectangular frame, the image blocks in the second training pupil image surrounded by the rectangular frame, and the labeling data of the second training pupil image, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and the labeling frame of the pupil in the second training pupil image.

For each second training pupil image, the labeling information of the second training pupil image includes: and marking position information of the pupil in the second training pupil image.

For each second training pupil image, the labeling position information of the pupil in the second training pupil image may include: the labeling coordinates of the center point of the pupil in the second training pupil image, the labeling width of the pupil in the second training pupil image, and the labeling height of the pupil in the second training pupil image.

In this application, for each second training pupil image, a preset number of rectangular frames different from the labeling frame of the pupil in the second training pupil image may be generated around the labeling frame of the pupil in the second training pupil image.

In this application, for each second training pupil image, for the second training pupil image, a rectangular frame generated around the labeling frame of the pupil in the second training pupil image and different from the labeling frame of the pupil in the second training pupil image is a rectangular frame corresponding to the pupil in the second training pupil image.

For each second training pupil image, the shape of the labeling frame of the pupil in the second training pupil image is a rectangle, the coordinate of the central point of the labeling frame of the pupil in the second training pupil image is the labeling coordinate of the central point of the pupil in the second training pupil image, the width of the labeling frame of the pupil in the second training pupil image is the labeling width of the pupil in the second training pupil image, and the height of the labeling frame of the pupil in the second training pupil image is the labeling height of the pupil in the second training pupil image.

When the second target detection module is trained, for a rectangular frame corresponding to a second training pupil image, the image block in the training data of the rectangular frame is used as the input of the second target detection module, and the difference information in the training data of the rectangular frame can be used as the supervision information to calculate the regression loss. If the intersection ratio (IOU) between the rectangular frame and the labeling frame of the pupil in the second training pupil image is greater than the preset threshold value related to classification, the rectangular frame may be used as a positive sample for training the classification branch in the second target detection module, and if the intersection ratio between the rectangular frame and the labeling frame of the pupil in the second training pupil image is less than the preset threshold value related to classification, the rectangular frame may be used as a negative sample for training the classification branch in the second target detection module.

In some embodiments, for a second training pupil image, the rectangular frame corresponding to the pupil in the second training pupil image with the corresponding intersection ratio greater than the first intersection ratio threshold is used for training the regression branch in the second target detection module and is used as a positive sample for training the classification branch in the second target detection module, the rectangular frame with the corresponding intersection ratio smaller than the first intersection ratio threshold and larger than the second intersection ratio threshold is used for training the regression branch, the rectangular frame with the corresponding intersection ratio smaller than the third intersection ratio threshold is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeled frame of the pupil in the second training pupil image, the first intersection ratio threshold is larger than the intersection ratio second threshold, the second cross-over ratio threshold is greater than the third cross-over ratio threshold.

For each rectangular frame corresponding to a pupil in one second training pupil image, an intersection ratio of the rectangular frame and a labeling frame of the pupil in the second training pupil image may be calculated to obtain an intersection ratio corresponding to the rectangular frame. And determining whether the rectangular frame corresponding to the pupil in the second training pupil image is used for training the regression branch and the classification branch according to the intersection ratio corresponding to the rectangular frame corresponding to the pupil in the second training pupil image.

For example, the first cross ratio threshold is 0.65, the second cross ratio threshold is 0.5, and the third cross ratio threshold is 0.3. For a rectangular box corresponding to a pupil in a second training pupil image, if the corresponding intersection ratio of the rectangular box is greater than 0.65, the rectangular box is used for training the regression branch and the rectangular box is used as a positive sample for training the classification branch. If the corresponding intersection ratio of the rectangular box is less than 0.65 and greater than 0.5, the rectangular box is used to train the regression branch. If the intersection ratio corresponding to the rectangular box is less than 0.3, the rectangular box is used as a negative sample for training the classification branch.

When the second target detection module is trained, for a rectangular frame corresponding to a second training pupil image, the image blocks in the training data of the rectangular frame are used as the input of the second target detection module. If the rectangular frame is used for training the regression branch, the regression loss can be calculated by using the difference information in the training data of the rectangular frame as the supervision information.

Referring to fig. 4, a block diagram of a target detection apparatus according to an embodiment of the present disclosure is shown. The object detection device includes: a first detection unit 401, a truncation unit 402, and a second detection unit 403.

The first detection unit 401 is configured to process the target pupil image with the first target detection module, resulting in a first result comprising: preliminary position information of a pupil to be positioned in the target pupil image, wherein the first target detection module is trained in advance through the first training pupil image and the labeling data of each first training pupil image;

a clipping unit 402 is configured to clip an exit pupil region image from the target pupil image based on preliminary location information of the pupil to be located;

the second detection unit 403 is configured to process the pupil region image by using a second object detection module, resulting in a second result, which includes: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of a rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.

In some embodiments, the corresponding rectangular box of each rectangular box with the intersection ratio greater than the first intersection ratio threshold is used for training the regression branch in the second target detection module and is used as a positive sample for training the classification branch in the second target detection module, the corresponding rectangular boxes with the intersection ratio smaller than a first intersection ratio threshold and larger than a second intersection ratio threshold in each rectangular box are used for training the regression branch, the corresponding rectangular box with the intersection ratio smaller than the third intersection ratio threshold value in each rectangular box is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeling frame, the first intersection ratio threshold is greater than the intersection ratio second threshold, and the second intersection ratio threshold is greater than the third intersection ratio threshold.

In some embodiments, the annotation information of the first training pupil image comprises: marking position information of the pupil in the first training pupil image; the device further comprises:

a label generating unit configured to convert each item of labeling position information of a pupil in the first training pupil image to obtain a conversion value of each item; generating a label for training of the first training pupil image, the label for training of the first training pupil image including the converted value of each item.

In some embodiments, the first target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position, and the second target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position.

In some embodiments, the second detecting unit 403 is further configured to adjust the size of the pupil region image to a preset size; and processing the pupil area image by using a second target detection module to obtain a second result.

Fig. 5 is a block diagram of an electronic device provided in this embodiment. The electronic device includes a processing component 522 that further includes one or more processors, and memory resources, represented by memory 532, for storing instructions, e.g., applications, that are executable by the processing component 522. The application programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processing component 522 is configured to execute instructions to perform the above-described methods.

The electronic device may also include a power supply component 526 configured to perform power management of the electronic device, a wired or wireless network interface 550 configured to connect the electronic device to a network, and an input/output (I/O) interface 558. The electronic device may operate based on an operating system stored in memory 532, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by an electronic device to perform the above-described object detection method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of object detection, the method comprising:

processing the pupil area image by using a second target detection module to obtain a second result, wherein the second result comprises: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of the rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.

2. The method of claim 1, wherein the corresponding rectangular box of each rectangular box with the cross-to-parallel ratio greater than a first cross-to-parallel ratio threshold is used for training the regression branch in the second target detection module and is used as a positive sample for training the classification branch in the second target detection module, the corresponding rectangular boxes with the intersection ratio smaller than a first intersection ratio threshold and larger than a second intersection ratio threshold in each rectangular box are used for training the regression branch, the corresponding rectangular box with the intersection ratio smaller than the third intersection ratio threshold value in each rectangular box is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeling frame, the first intersection ratio threshold is greater than the intersection ratio second threshold, and the second intersection ratio threshold is greater than the third intersection ratio threshold.

3. The method of claim 1, wherein the labeling information of the first training pupil image comprises: marking position information of the pupil in the first training pupil image; the method further comprises the following steps:

converting each item in the labeling position information of the pupil in the first training pupil image to obtain a conversion value of each item;

generating a label for training of the first training pupil image, the label for training of the first training pupil image including the converted value of each item.

4. The method of claim 1, wherein the first target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position, and the second target detection module comprises a convolutional neural network for predicting pupil position or a logistic regression model for predicting pupil position.

5. The method of claim 1, wherein processing the pupil region image with a second object detection module to obtain a second result comprises:

adjusting the size of the pupil area image to a preset size;

and processing the pupil area image by using a second target detection module to obtain a second result.

6. An object detection apparatus, characterized in that the apparatus comprises:

a second detection unit configured to process the pupil region image by using a second object detection module to obtain a second result, where the second result includes: the final position information of the pupil to be positioned and the probability that the object in the pupil region image is the pupil are obtained, wherein the second target detection module is trained in advance through training data corresponding to each second training pupil image, and the training data corresponding to the second training pupil image includes: training data corresponding to each rectangular box of pupils in the second training pupil image, the training data for the rectangular boxes comprising: difference information of the rectangular frame, image blocks in the second training pupil image surrounded by the rectangular frame, wherein the difference information of the rectangular frame indicates a difference between the rectangular frame and a labeling frame of a pupil in the second training pupil image.

7. The apparatus of claim 6, wherein the rectangular box of each rectangular box with the corresponding intersection ratio greater than a first intersection ratio threshold is used for training regression branches and is used as a positive sample for training classification branches in the second target detection module, the corresponding rectangular boxes with the intersection ratio smaller than a first intersection ratio threshold and larger than a second intersection ratio threshold in each rectangular box are used for training the regression branch in the second target detection module, the corresponding rectangular box with the intersection ratio smaller than the third intersection ratio threshold value in each rectangular box is used as a negative sample for training the classification branch, wherein the intersection ratio corresponding to the rectangular frame is the intersection ratio of the rectangular frame and the labeling frame, the first intersection ratio threshold is greater than the intersection ratio second threshold, and the second intersection ratio threshold is greater than the third intersection ratio threshold.

8. The apparatus of claim 6, wherein the labeling information of the first training pupil image comprises: marking position information of the pupil in the first training pupil image; the device further comprises:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 5.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1 to 5.