CN111160242A

CN111160242A - Image target detection method, system, electronic terminal and storage medium

Info

Publication number: CN111160242A
Application number: CN201911381794.XA
Authority: CN
Inventors: 周康明; 蒋章
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-15

Abstract

The application provides an image target detection method, a system, an electronic terminal and a storage medium, wherein the image target detection method comprises the following steps: acquiring an image to be detected; inputting the image to be detected into a pre-trained image target detection network model, predicting the target in the image to be detected based on a fixed oval frame by the image target detection network model, and outputting the oval outer frame of the target in the image to be detected. The anchor point frame of the universal detection model is improved, so that the image target detection network model can predict various parameters of the target elliptical frame, the output candidate elliptical frame is more fit with the edge of the target, common targets such as people, animals, vehicles and the like, a foundation is laid for text recognition, the input of redundant background information is reduced, the recognition accuracy is improved, and the overall performance of the target detection network is improved.

Description

Image target detection method, system, electronic terminal and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and a system for detecting an image target, an electronic terminal, and a storage medium.

Background

In a task of detecting a natural scene picture target, various objects with odd shapes, from animals to plants and other objects, are often seen in daily life, but a candidate frame used in a target detection network is a quadrangle (rectangle, trapezoid and the like), but because the candidate frame usually contains sharp corner points, the coincidence degree of the candidate frame and the target is not high, usually the corner points are not filled with the target, even no target exists (for example, a minimum horizontal circumscribed rectangle of a fish is taken, and the corner points of the rectangle do not have fish pixels), so that matching of the candidate frame is not accurate enough and is not an optimal candidate frame, and the network finally cannot obtain the best detection effect on the best premise.

Generally, the existing method is based on the detection of a common horizontal rectangular frame, and assuming that common texts are all horizontally placed and have small inclination, a common detection model is used, and finally a horizontal rectangular text detection result frame is output in an ideal state. This clearly presents two problems:

1) the text in the photo is horizontally placed, the assumption that the inclination angle is not large is too ideal, in many cases, the shooting angle or the random placement of the text object can generate a text image with an extremely irregular outline, usually, the text image also has the influence of distortion of perspective transformation and the like, the text image can be square, the noise of a model learning sample is too much, and finally, the detection model is difficult to train, and the effect is not good.

2) Because the detection is carried out according to the horizontal rectangle, the detection result is also the horizontal rectangle, and the detection result may contain a lot of unnecessary backgrounds or omit the originally highlighted text part, thereby increasing the difficulty of subsequent text recognition and finally causing errors.

Content of application

In view of the above-mentioned shortcomings of the prior art, the present application aims to provide an image object detection method, system, electronic terminal and storage medium, which are used to solve the technical problem of poor detection effect caused by the fact that the candidate frame used by the object detection network is a quadrilateral in the prior art.

To achieve the above and other related objects, a first aspect of the present application provides an image object detecting method, including: acquiring an image to be detected; inputting the image to be detected into a pre-trained image target detection network model, predicting the target in the image to be detected based on a fixed oval frame by the image target detection network model, and outputting the oval outer frame of the target in the image to be detected.

In some embodiments of the first aspect of the present application, the predicting, by the image target detection network model, a target in the image to be detected based on a fixed oval frame includes: predicting central point offset, horizontal angle, focal length gain and major axis gain according to the central point, horizontal angle, focal length and major axis of the fixed oval frame; and forming an oval external frame of the target in the image to be detected according to the predicted central point offset, the horizontal angle, the focal length gain and the major axis gain.

In some embodiments of the first aspect of the present application, the predicting, by the target detection network model, a target in the image to be detected based on a fixed oval frame, and outputting an oval outer frame of the target in the image to be detected includes: the image target detection network model predicts a target in the image to be detected based on a fixed oval frame and outputs a candidate oval frame; and inputting the candidate elliptical frame into the image target detection network model for prediction, and outputting an elliptical external frame of the target in the image to be detected.

In some embodiments of the first aspect of the present application, forming an oval bounding box of a target in the image to be detected according to a predicted center point offset, a horizontal angle, a focal length gain, and a major axis gain includes: forming the candidate elliptical frame according to the predicted central point offset, the horizontal angle, the focal length gain and the long axis gain; predicting the offset of the center point, the horizontal angle, the focal length gain and the long axis gain of the candidate elliptical frame; and forming an elliptical external frame of the target in the image to be detected according to the predicted central point offset, horizontal angle, focal length gain and major axis gain of the candidate elliptical frame.

In some embodiments of the first aspect of the present application, the training generation manner of the target detection network model includes: acquiring an oval external frame marked with a target and a training sample image of a target class; inputting the marked training sample image into an RPN network for training, and outputting a training result; inputting the training result into a Faster-RCNN network for training; and repeating the steps to carry out multiple times of iterative training on the RPN and the Faster-RCNN to obtain the target detection network model.

In some embodiments of the first aspect of the present application, in the RPN network and the fast-RCNN network: position loss function smooth_L1(x) Comprises the following steps:

the classification loss function Lcls is:

wherein, x is the offset of the central coordinate of the calibration ellipse of the training data relative to the central position of the anchor ellipse, or the gain quantity of the major axis and the minor axis of the calibration ellipse relative to the major axis and the minor axis of the anchor ellipse, or the offset of the major axis of the calibration ellipse relative to the horizontal direction and the network prediction, or the difference of the gain quantity or the prediction major axis relative to the horizontal angle; y is an input picture; and p (y) is the type of the input picture y calibration, and q (y) is the predicted value of the vector output by the network full-connection layer of the input picture y corresponding to the calibration type component.

To achieve the above and other related objects, a second aspect of the present application provides a training generation method for an image target detection network model, including: acquiring an oval external frame marked with a target and a training sample image of a target class; inputting the marked training sample image into an RPN network for training, and outputting a training result; inputting the training result into a Faster-RCNN network for training; and repeatedly carrying out multiple iterative training on the RPN and the Faster-RCNN to obtain a target detection network model.

To achieve the above and other related objects, a third aspect of the present application provides an image object detecting system comprising: the image acquisition module is used for acquiring an image to be detected; and the image detection module is used for inputting the image to be detected into a pre-trained image target detection network model, predicting the target in the image to be detected based on a fixed oval frame by the image target detection network model, and outputting the oval outer frame of the target in the image to be detected.

In some embodiments of the third aspect of the present application, the image target detection system further comprises: the training module is used for acquiring an oval external frame marked with a target and a training sample image of a target type, inputting the marked training sample image into an RPN (resilient packet network) for training, outputting a training result, and inputting the training result into a Faster-RCNN (fast-RCNN) network for training; and repeatedly carrying out repeated iterative training on the RPN and the Faster-RCNN to obtain the target detection network model.

To achieve the above and other related objects, a fourth aspect of the present application provides an electronic terminal comprising: a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the computer program stored in the memory to enable the electronic terminal to execute the image object detection method as described above.

To achieve the above and other related objects, a fifth aspect of the present application is a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image object detection method as described above.

As described above, the image target detection method, system, electronic terminal and storage medium of the present application have the following beneficial effects:

the anchor point frame of the universal detection model is improved, so that the image target detection network model can predict various parameters of the target elliptical frame, the output candidate elliptical frame is more fit with the edge of the target, common targets such as people, animals, vehicles and the like, a foundation is laid for text recognition, the input of redundant background information is reduced, the recognition accuracy is improved, and the overall detection performance of the target detection network is improved.

Drawings

Fig. 1 is a schematic overall flow chart of an image target detection method in an embodiment of the present application.

Fig. 2 is a schematic flow chart illustrating an elliptical circumscribing frame for acquiring a target in the image target detection method according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating an elliptical outline for acquiring a target in the image target detection method according to an embodiment of the present application.

Fig. 4 is a flowchart illustrating an elliptical outline for obtaining a target by fine-tuning in an image target detection method according to an embodiment of the present application.

Fig. 5 is a schematic diagram illustrating an elliptical outline for obtaining a target by fine adjustment in the image target detection method according to an embodiment of the present application.

Fig. 6 is a flowchart illustrating a training generation method of a target detection network model according to an embodiment of the present application.

Fig. 7 is a schematic diagram illustrating a training sample image in a training generation method of a target detection network model according to an embodiment of the present application.

Fig. 8 is a schematic block diagram of an image target detection system in an embodiment of the present application.

Fig. 9 is a schematic block diagram of an image target detection system according to an embodiment of the present application.

Fig. 10 is a schematic structural diagram of an electronic terminal according to an embodiment of the present application.

Description of the element reference numerals

100 image target detection system

110 image acquisition module

120 image detection module

130 training module

1101 processor

1102 memory

S100 to S200

S210-S220

S221 to S223 steps

S310 to S330

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "either: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.

The embodiment provides an image target detection method, an image target detection system, an electronic terminal and a storage medium, which are used for solving the technical problem of poor detection effect caused by the fact that a candidate frame used by a target detection network is a quadrilateral in the prior art.

The embodiment provides a method for improving the matching effect of the candidate frame of the existing target detection network by using the elliptical anchor point frame, and takes the Faster-rcnn target detection network as an example, so as to finally obtain a better matching result, thereby improving the overall detection performance of the target detection network.

The principles and embodiments of an image target detection method, system, electronic terminal and storage medium according to the present embodiment will be described in detail below, so that those skilled in the art can understand the image target detection method, system, electronic terminal and storage medium according to the present embodiment without creative work.

Fig. 1 is a schematic flow chart showing an image target detection method according to an embodiment of the invention.

It should be noted that the image target detection method can be applied to various types of hardware devices. The hardware device is, for example, a controller, specifically, an arm (advanced RISC machines) controller, an fpga (field programmable Gate array) controller, a soc (system on chip) controller, a dsp (digital signal processing) controller, or an mcu (micro controller unit) controller, etc. The hardware devices may also be, for example, a computer that includes components such as memory, a memory controller, one or more processing units (CPUs), a peripheral interface, RF circuitry, audio circuitry, speakers, a microphone, an input/output (I/O) subsystem, a display screen, other output or control devices, and external ports; the computer includes, but is not limited to, Personal computers such as desktop computers, notebook computers, tablet computers, smart phones, smart televisions, Personal Digital Assistants (PDAs), and the like. In other embodiments, the hardware device may also be a server, where the server may be arranged on one or more entity servers according to various factors such as functions and loads, or may be formed by a distributed or centralized server cluster, and this embodiment is not limited in this embodiment.

As shown in fig. 1, in the present embodiment, the image object detection method includes steps S100 to S200.

S100, acquiring an image to be detected;

and S200, inputting the image to be detected into a pre-trained image target detection network model, predicting the target in the image to be detected based on a fixed oval frame by the image target detection network model, and outputting the oval outer frame of the target in the image to be detected.

The following describes steps S100 to S200 of the image object detection method in this embodiment in detail.

Step S100, an image to be detected is obtained.

The original RPN network is a part of a fast-RCNN network, and its function is to predict candidate frames (frames in which a target object may exist) based on the input feature map, where the predicted information is the positions of n candidate frames, the meaning of the candidate frames is based on the position of a fixed anchor, (the offset dx, dy of the anchor centroid coordinate and the variation dw, dh of the length and width of the frame), such as that the fixed centroid of the anchor is (16, 16) width and height is (20, 10), if the centroid offset of the candidate frame prediction parameter is (2, -1) and the width and height offset is (1.2, 0.8), the actual positions and sizes of the candidate frames are (16+2 ═ 18, 16-1 ═ 15), (20 ═ 1.2 ═ 24, 10 ═ 0.8 ═ 8), and each of the rectangular prediction frames is a level relative to the original. The improved RPN network in this embodiment will output elliptical candidate frames.

Specifically, as shown in fig. 2, in this embodiment, the predicting, by the image target detection network model, a target in the image to be detected based on a fixed oval frame includes:

step S210, predicting central point offset, horizontal angle, focal length gain and major axis gain according to the central point, horizontal angle, focal length and major axis of the fixed oval frame.

Specifically, the predicted ellipse frame includes a center point offset (dx, dy), and a focal length gain multiple df, a (a is-90 to 90 degrees) between the major axis and the horizontal, and a major axis length gain dc.

For example, as shown in fig. 3, the fixed center point of the anchor is (16, 16), the horizontal angle, the focal length, and the long axis is (0, 10, 20), the predicted center point offset (dx, dy) is (2, -2), the horizontal angle, the focal length gain, and the long axis gain are (30, 1.2, 0.8), then the center point of the resulting oval frame is (16+2, 16-2), and the horizontal angle, the focal length, and the long axis are calculated as (30, 12, 16).

The image target detection method of the present embodiment uses an improved anchor point frame, i.e., an elliptical frame, compared to the existing method. And (3) the RPN network regresses the parameters of the ellipse of the target, wherein the parameters comprise the horizontal and vertical coordinates of the center point of the ellipse frame, the focal length of the ellipse, the long axis and the horizontal included angle.

And S220, forming an oval external frame of the target in the image to be detected according to the offset of the predicted central point, the horizontal angle, the focal length gain and the long axis gain.

After the original RPN acquires the candidate frame of any quadrilateral, the Faster-RCNN classifies and fine-tunes the candidate frame again. The original fine adjustment is equivalent to the operation similar to the original RPN network, a horizontal rectangle is still regressed, after the RPN is improved, the input of the classification and position fine adjustment network is elliptical information, and at the moment, the mode similar to the improved RPN is used. But only the relative offsets (dxri, dyri) of the top left, top right, bottom right, and bottom left corner are predicted, unlike the previous improvement of RPN, since these four points can already determine the position of the next box.

Specifically, in this embodiment, the target detection network model predicts the target in the image to be detected based on a fixed elliptical frame, and outputting an elliptical circumscribed frame of the target in the image to be detected includes:

the image target detection network model predicts a target in the image to be detected based on a fixed oval frame and outputs a candidate oval frame;

and inputting the candidate elliptical frame into the image target detection network model for prediction, and outputting an elliptical external frame of the target in the image to be detected.

Specifically, as shown in fig. 4, forming an oval bounding box of the target in the image to be detected according to the predicted central point offset, the horizontal angle, the focal length gain, and the major axis gain includes:

step S221, forming the candidate elliptical frame according to the predicted central point offset, the horizontal angle, the focal length gain and the long axis gain;

step S222, predicting the central point offset, the horizontal angle, the focal length gain and the long axis gain of the candidate elliptical frame;

and step S223, forming an oval external frame of the target in the image to be detected according to the predicted central point offset, horizontal angle, focal length gain and major axis gain of the candidate oval frame.

Continuing with the previous example, as shown in fig. 5, the fixed ellipse frame is a horizontal ellipse, the center coordinates of the ellipse candidate frame (tilted small ellipse) predicted by RPN are (18, 15), the centroid offset (dx, dy) of the horizontal angle, the focal length, the major axis is (30, 12, 16) is (0, -1), the horizontal angle, the focal length gain, and the major axis gain are (5, 1.1, 0.9), and the candidate frame after fine adjustment, here called the prediction target center coordinates are (18, 14), the horizontal angle, the focal length, and the major axis are (35, 13.1, 14.4). In fig. 5, the horizontal ellipse is a fixed anchor frame, the tilted small ellipse is a candidate ellipse frame output by the RPN, and the tilted large ellipse is an ellipse circumscribed frame output by the final fine-tuning network.

In this embodiment, as shown in fig. 6, the training generation method of the target detection network model includes:

step S310, acquiring the oval external frame marked with the target and the training sample image of the target category.

As shown in fig. 7, the oval frame is a schematic diagram of the present embodiment using the oval anchor frame, and the relative position on the target picture is marked, that is, the following information of the minimum circumscribed oval of each target: the center point of each ellipse is identified by its x y coordinates, the angle a between the major axis of each ellipse and the horizontal, the focal length f, and the length c of the major axis, thus defining an ellipse in the plane. In addition, the category of the frame needs to be marked, and people, dogs, sheep and people are respectively arranged in the lower graph from left to right.

Step S320, inputting the marked training sample image into an RPN network for training, and outputting a training result;

step S330, inputting the training result into a Faster-RCNN network for training;

and repeating the steps to carry out multiple times of iterative training on the RPN and the Faster-RCNN to obtain the target detection network model.

In this embodiment, in the RPN network and the fast-RCNN network:

position loss function smooth_L1(x) Comprises the following steps:

the classification loss function Lcls is:

That is, in this embodiment, the coordinates of the center point of the circumscribed ellipse of the target, the included angle between the major axis and the horizontal line, the focal length, and the length of the major axis need to be marked, the RPN network is trained first, the classification and fine-tuning parameter network is not moved later, and then the whole fast-RCNN network is trained.

In this embodiment, an image is input to the target detection network model, and forward transfer is performed to directly obtain an elliptical circumscribed frame and a category of a predicted target.

Therefore, in this embodiment, the anchor point frame of the RPN network of the universal detection model is improved, and by taking the typical fast-RCNN target detection as an example, the RPN network of the RPN network is improved, so that the RPN network can predict various parameters of the target elliptical frame, and therefore, the candidate elliptical frame output by the RPN can be more attached to the edge of the target, a good foundation is laid for text recognition, the input of redundant background information is reduced, and the recognition accuracy is improved.

As shown in fig. 8, the present embodiment further provides an image target detection system, which includes: the device comprises an image acquisition module and an image detection module.

The image acquisition module is used for acquiring an image to be detected; the image detection module is used for inputting the image to be detected into a pre-trained image target detection network model, the image target detection network model predicts the target in the image to be detected based on a fixed oval frame, and outputs an oval outer frame of the target in the image to be detected.

As shown in fig. 9, in the present embodiment, the image target detection system further includes: the training module is used for acquiring an oval external frame marked with a target and a training sample image of a target type, inputting the marked training sample image into an RPN (resilient packet network) for training, outputting a training result, and inputting the training result into a Faster-RCNN (fast-RCNN) network for training; and repeatedly carrying out repeated iterative training on the RPN and the Faster-RCNN to obtain the target detection network model.

The technical features of the specific implementation of the image target detection system of this embodiment are substantially the same as those of the image target detection method in the foregoing embodiments, and the general technical contents between the embodiments are not repeated.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the X module may be a processing element separately set up, or may be implemented by being integrated in a chip of an electronic terminal, or may be stored in a memory of the terminal in the form of program code, and the function of the tracking calculation module is called and executed by a processing element of the terminal. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

As shown in fig. 10, a schematic structural diagram of an electronic terminal in an embodiment of the present application is shown, where the electronic terminal includes a processor 1101 and a memory 1102; the memory 1102 is connected to the processor 1101 through a system bus to complete communication between the processor 1102 and the memory 1101, the memory 1102 is used for storing computer programs, and the processor 1101 is used for operating the computer programs, so that the electronic terminal executes the image object detection method. The image target detection method has already been described in detail above, and is not described herein again.

It should be noted that the above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access system and other devices (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The Processor 1101 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

Furthermore, the present embodiment also provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the image object detection method. The image target detection method has already been described in detail above, and is not described herein again.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In summary, the anchor point frame of the universal detection model is improved, so that the image target detection network model can predict various parameters of the target elliptical frame, the output candidate elliptical frame is more fit with the edge of the target, and common targets such as people, animals, vehicles and the like play a good foundation for text recognition, the input of redundant background information is reduced, the recognition accuracy is improved, and the overall performance of the target detection network is further improved. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. An image target detection method, characterized by: the image target detection method comprises the following steps:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained image target detection network model, predicting the target in the image to be detected based on a fixed oval frame by the image target detection network model, and outputting the oval outer frame of the target in the image to be detected.

2. The image object detection method according to claim 1, characterized in that: the image target detection network model predicts the target in the image to be detected based on the fixed oval frame, and comprises the following steps:

predicting central point offset, horizontal angle, focal length gain and major axis gain according to the central point, horizontal angle, focal length and major axis of the fixed oval frame;

and forming an oval external frame of the target in the image to be detected according to the predicted central point offset, the horizontal angle, the focal length gain and the major axis gain.

3. The image object detection method according to claim 2, characterized in that: the target detection network model predicts the target in the image to be detected based on a fixed oval frame, and outputs an oval external frame of the target in the image to be detected, wherein the oval external frame comprises the following steps:

4. The image object detection method according to claim 3, characterized in that: forming an oval outer frame of the target in the image to be detected according to the offset of the predicted central point, the horizontal angle, the focal length gain and the major axis gain, and comprising the following steps:

forming the candidate elliptical frame according to the predicted central point offset, the horizontal angle, the focal length gain and the long axis gain;

predicting the offset of the center point, the horizontal angle, the focal length gain and the long axis gain of the candidate elliptical frame;

and forming an elliptical external frame of the target in the image to be detected according to the predicted central point offset, horizontal angle, focal length gain and major axis gain of the candidate elliptical frame.

5. The image object detection method according to claim 1, characterized in that: the training generation mode of the target detection network model comprises the following steps:

acquiring an oval external frame marked with a target and a training sample image of a target class;

inputting the marked training sample image into an RPN network for training, and outputting a training result;

inputting the training result into a Faster-RCNN network for training;

6. The image object detection method according to claim 5, characterized in that: in the RPN network and the Faster-RCNN network:

position loss function smooth_L1(x) Comprises the following steps:

the classification loss function Lcls is:

7. A training generation method of an image target detection network model is characterized by comprising the following steps: the training generation method of the image target detection network model comprises the following steps:

inputting the training result into a Faster-RCNN network for training;

and repeatedly carrying out multiple iterative training on the RPN and the Faster-RCNN to obtain a target detection network model.

8. An image object detection system characterized by: the image target detection system includes:

the image acquisition module is used for acquiring an image to be detected;

and the image detection module is used for inputting the image to be detected into a pre-trained image target detection network model, predicting the target in the image to be detected based on a fixed oval frame by the image target detection network model, and outputting the oval outer frame of the target in the image to be detected.

9. The image object detection system of claim 8, wherein: the image target detection system further comprises:

the training module is used for acquiring an oval external frame marked with a target and a training sample image of a target type, inputting the marked training sample image into an RPN (resilient packet network) for training, outputting a training result, and inputting the training result into a Faster-RCNN (fast-RCNN) network for training; and repeatedly carrying out repeated iterative training on the RPN and the Faster-RCNN to obtain the target detection network model.

10. An electronic terminal, characterized by: the method comprises the following steps: a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program stored in the memory to cause the electronic terminal to execute the image object detection method according to any one of claims 1 to 6.

11. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the image object detection method of any one of claims 1 to 6.