CN109784131B

CN109784131B - Object detection method, device, storage medium and processor

Info

Publication number: CN109784131B
Application number: CN201711133580.1A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Kuang Chi Innovative Technology Ltd; Shenzhen Kuang Chi Hezhong Technology Ltd
Current assignee: Kuang Chi Innovative Technology Ltd; Shenzhen Kuang Chi Hezhong Technology Ltd
Priority date: 2017-11-15
Filing date: 2017-11-15
Publication date: 2023-08-22
Anticipated expiration: 2037-11-15
Also published as: CN109784131A; WO2019095596A1

Abstract

The application discloses an object detection method, an object detection device, a storage medium and a processor. Wherein the method comprises the following steps: generating a suggestion frame through a regional suggestion network, wherein the regional suggestion network is used for identifying a region where an object in a picture is located, and the suggestion frame is used for displaying the region where the object is located; acquiring a detection network according to the suggestion frame, wherein the detection network is used for detecting an object from the picture, and a convolution layer which is not shared between the regional suggestion network and the detection network; fine-tuning the detection network and the regional suggestion network according to the suggestion frame to obtain a target detection network and a target regional suggestion network with the same shared convolution layer; and detecting the target object in the target picture according to the target detection network and the target area suggestion network. The application solves the technical problem of low accuracy of object detection.

Description

Object detection method, device, storage medium and processor

Technical Field

The present application relates to the field of computers, and in particular, to an object detection method, an object detection device, a storage medium, and a processor.

Background

With the rapid development of computer technology, computers can perform more and more tasks, such as: detecting objects, and the like. In the object detection technology, the existing detection method can detect fewer objects and has lower detection accuracy.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides an object detection method, an object detection device, a storage medium and a processor, which are used for at least solving the technical problem of low object detection accuracy.

According to an aspect of an embodiment of the present application, there is provided an object detection method including: generating a suggestion frame through a regional suggestion network, wherein the regional suggestion network is used for identifying a region where an object in a picture is located, and the suggestion frame is used for displaying the region where the object is located; acquiring a detection network according to the suggestion frame, wherein the detection network is used for detecting the object from the picture, and a convolution layer which is not shared between the regional suggestion network and the detection network is formed; fine-tuning the detection network and the regional suggestion network according to the suggestion frame to obtain a target detection network and a target regional suggestion network with the same shared convolution layer; and detecting the target object in the target picture according to the target detection network and the target area suggestion network.

Optionally, fine tuning the detection network and the regional suggestion network according to the suggestion box, and obtaining the target detection network and the target regional suggestion network with the same shared convolution layer includes: and keeping the suggestion frame fixed, and interactively fine-tuning the area suggestion network and the detection network to obtain the target detection network and the target area suggestion network with the same shared convolution layer.

Optionally, keeping the suggestion box fixed, interactively fine-tuning the regional suggestion network and the detection network includes: maintaining the suggestion frame fixed, and fine-tuning a convolution layer unique to the regional suggestion network to the detection network and the regional suggestion network to have a shared convolution layer so as to obtain the target regional suggestion network; and keeping the shared convolution layer fixed, and fine-tuning the FC layer of the detection network to the detection network and the area suggestion network to have the same shared convolution layer so as to obtain the target detection network.

Optionally, the regional suggestion network is an RPN network, and the detection network is a Fast R-CNN network.

Optionally, before generating the suggestion box through the regional suggestion network, the method further comprises: and carrying out initialization training through an ImageNet pre-training model to obtain the RPN.

Optionally, acquiring the detection network according to the suggestion box includes: training an ImageNet pre-training model according to the suggestion frame through Fast R-CNN to obtain the Fast R-CNN network without a shared convolution layer between the Fast R-CNN network and the RPN network.

Optionally, detecting the target object in the target picture according to the target detection network and the target area suggestion network includes: acquiring the target picture; and inputting the target picture into the target area suggestion network and the target detection network to obtain the target object.

Optionally, inputting the target picture into the target area suggestion network and the target detection network, and obtaining the target object includes: inputting the target picture into the target area suggestion network to obtain a picture carrying a target suggestion frame; and inputting the picture carrying the target suggestion frame into the target detection network to obtain the target object detected by the target detection network from the target suggestion frame.

According to another aspect of the embodiment of the present application, there is also provided an object detection apparatus including: the generation module is used for generating a suggestion frame through a regional suggestion network, wherein the regional suggestion network is used for identifying the region where the object in the picture is located, and the suggestion frame is used for displaying the region where the object is located; the acquisition module is used for acquiring a detection network according to the suggestion frame, wherein the detection network is used for detecting the object from the picture, and a convolution layer which is not shared between the regional suggestion network and the detection network is formed; the fine tuning module is used for fine tuning the detection network and the regional suggestion network according to the suggestion frame to obtain a target detection network and a target regional suggestion network with the same shared convolution layer; and the detection module is used for detecting the target object in the target picture according to the target detection network and the target area suggestion network.

Optionally, the fine tuning module includes: and the fine tuning unit is used for keeping the suggestion frame fixed, and interactively fine tuning the area suggestion network and the detection network to obtain the target detection network and the target area suggestion network which have the same shared convolution layer.

Optionally, the fine tuning unit includes: a first fine tuning subunit, configured to keep the suggestion frame fixed, fine tune a convolution layer unique to the regional suggestion network to a shared convolution layer between the detection network and the regional suggestion network, and obtain the target regional suggestion network; and the second fine tuning subunit is used for keeping the shared convolution layer fixed, and fine tuning the FC layer of the detection network to the detection network and the area suggestion network to have the same shared convolution layer so as to obtain the target detection network.

Optionally, the apparatus further comprises: and the initialization training module is used for performing initialization training through an ImageNet pre-training model to obtain the RPN.

Optionally, the acquiring module is configured to: training an ImageNet pre-training model according to the suggestion frame through Fast R-CNN to obtain the Fast R-CNN network without a shared convolution layer between the Fast R-CNN network and the RPN network.

Optionally, the detection module includes: an acquisition unit, configured to acquire the target picture; and the input unit is used for inputting the target picture into the target area suggestion network and the target detection network to obtain the target object.

Optionally, the input unit includes: the first input subunit is used for inputting the target picture into the target area suggestion network to obtain a picture carrying a target suggestion frame; and the second input subunit is used for inputting the picture carrying the target suggestion frame into the target detection network to obtain the target object detected by the target detection network from the target suggestion frame.

According to another aspect of the embodiment of the present application, there is further provided a storage medium, where the storage medium includes a stored program, where the program, when executed, controls a device in which the storage medium is located to perform any one of the methods described above.

According to another aspect of the embodiment of the present application, there is further provided a processor, where the processor is configured to execute a program, where the program executes the method according to any one of the preceding claims.

In the embodiment of the application, a suggestion frame is generated through a regional suggestion network, wherein the regional suggestion network is used for identifying the region where the object in the picture is located, and the suggestion frame is used for displaying the region where the object is located; acquiring a detection network according to the suggestion frame, wherein the detection network is used for detecting an object from the picture, and a convolution layer which is not shared between the regional suggestion network and the detection network; fine-tuning the detection network and the regional suggestion network according to the suggestion frame to obtain a target detection network and a target regional suggestion network with the same shared convolution layer; according to the mode of detecting the target object in the target picture by the target detection network and the target area suggestion network, a suggestion frame is generated through the area suggestion network, the detection network is acquired, and the area suggestion network and the detection network are subjected to fine adjustment according to the suggestion frame, so that the same shared convolution layer exists between the target detection network and the target area suggestion network, the target detection network and the target area suggestion network can be unified, the accuracy of object detection is improved, and the technical problem of low accuracy of object detection is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of an object detection method according to an embodiment of the present application;

FIG. 2 is a block diagram of an object detection apparatus according to an embodiment of the present application;

FIG. 3 is a block diagram II of an object detection device according to an embodiment of the present application;

fig. 4 is a block diagram III of an object detection apparatus according to an embodiment of the present application;

fig. 5 is a block diagram of an object detection apparatus according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present application, there is provided a method embodiment of object detection, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.

Fig. 1 is a schematic diagram of an object detection method according to an embodiment of the present application, as shown in fig. 1, the method includes the steps of:

step S102, generating a suggestion frame through a regional suggestion network, wherein the regional suggestion network is used for identifying the region where the object in the picture is located, and the suggestion frame is used for displaying the region where the object is located;

step S104, acquiring a detection network according to a suggestion frame, wherein the detection network is used for detecting an object from a picture, and a convolution layer which is not shared between the regional suggestion network and the detection network is formed;

step S106, fine-tuning the detection network and the regional suggestion network according to the suggestion frame to obtain a target detection network and a target regional suggestion network with the same shared convolution layer;

step S108, detecting the target object in the target picture according to the target detection network and the target area suggestion network.

Alternatively, in the present embodiment, the above-described object detection method may be applied, but not limited to, in a detection scene of an object. For example: such objects may include, but are not limited to: tableware, fruit, stationery, kitchen ware, clothing, ornaments, and the like.

Alternatively, in the present embodiment, the above-described object detection method may be applied to, but is not limited to, a terminal device. The terminal device may include, but is not limited to: a cell phone, tablet computer, PC computer, smart wearable device, smart home device, etc.

Optionally, in this embodiment, the above-mentioned region suggestion network is used to identify a region where an object in the picture is located. For example: the picture is input into a regional suggestion network, the regional suggestion network identifies an object in the picture, the region where the object is located is given, and the region can be displayed by using a suggestion frame.

Alternatively, in the present embodiment, the suggestion box may be, but is not limited to, a rectangular suggestion box. It should be noted that the shape of the suggestion box may be, but not limited to, any shape, such as: rectangular, circular, oval, diamond, etc., the present embodiment is not limited.

Optionally, in this embodiment, the detection network is used to detect an object from a picture, and may, but is not limited to, use the Fast RCNN detection model to detect an object.

Through the steps, the suggested frame is generated through the regional suggested network, the detection network is acquired, the regional suggested network and the detection network are finely adjusted according to the suggested frame, and the same shared convolution layer is arranged between the target detection network and the target regional suggested network, so that the target detection network and the target regional suggested network can be unified, the accuracy of object detection is improved, and the technical problem of low object detection accuracy is solved.

Alternatively, to unify the regional suggestion network and the detection network, the regional suggestion network and the detection network may be trained, the suggestion box may be kept fixed, and the fine-tuning of the regional suggestion and the fine-tuning of the target detection may be alternated. For example: in the above step S106, the suggestion frame may be kept fixed, and the regional suggestion network and the detection network are interactively fine-tuned, so as to obtain the target detection network and the target regional suggestion network with the same shared convolution layer.

Optionally, the regional advice network and the detection network may be fine-tuned by: maintaining the suggestion frame fixed, and fine-tuning a convolution layer unique to the regional suggestion network until the detection network and the regional suggestion network have shared convolution layers to obtain a target regional suggestion network; and keeping the shared convolution layer fixed, and fine-tuning the FC layer of the detection network to the detection network and the regional suggestion network to have the same shared convolution layer so as to obtain the target detection network.

Alternatively, the obtained target picture may be used as input of the target area suggestion network and the target detection network to detect the target object in the target picture. For example: in the step S108, a target picture may be acquired, and then the target picture is input into the target area suggestion network and the target detection network to obtain the target object.

Alternatively, the obtained target picture may be first used as an input of the target area suggestion network, the target suggestion frame is generated on the target picture, and then the picture with the target suggestion frame is used as an input of the target detection network, so as to obtain the target object detected by the target detection network. For example: inputting the target picture into a target area suggestion network, thereby obtaining a target suggestion frame output by the target area suggestion network, obtaining a picture with the target suggestion frame, inputting the picture with the target suggestion frame into a target detection network, and obtaining a target object detected by the target detection network from the target suggestion frame.

Alternatively, the regional suggestion network may be, but is not limited to being, a regional generation network (RPN network), and the detection network may be, but is not limited to being, a Fast regional convolutional neural network (Fast R-CNN network).

Alternatively, the regional suggestion network may be trained first, but not limited to, prior to generating the suggestion box. For example: before the step S102, an initialization training may be performed through an ImageNet pre-training model to obtain a regional suggestion network.

Optionally, in the step S104, the Fast R-CNN may train the image net pre-training model according to the suggestion box to obtain a detection network with no shared convolution layer with the regional suggestion network.

In an alternative embodiment, the regional suggestion network (RPN) may take an image (of arbitrary size) as input, output a set of rectangular suggestion boxes, each box having an objectness score. The detection network can use a Fast R-CNN model, and uses a high-quality area suggestion frame generated by RPN to detect the target by using Fast R-CNN.

In order to unify object detection of RPN and Fast R-CNN, this alternative embodiment proposes a training scheme for both networks, i.e. keeping the suggestion box fixed, the fine tuning area suggestion network and the fine tuning detection network alternate. A 4-step training algorithm may be employed to learn the shared features through alternate optimization. The 4-step training algorithm comprises the following steps:

in the first step, the RPN is trained, the network is initialized with an ImageNet pre-trained model, and fine-tuned end-to-end for regional advice tasks.

In a second step, a separate detection network is trained by Fast R-CNN using the suggestion box generated by the RPN of the first step, which may be initialized by the ImageNet pre-trained model, when the two networks have not yet shared the convolutional layer.

Thirdly, the RPN training is initialized with the detection network, the shared convolutional layer is fixed, and only the layer unique to the RPN is trimmed, now the two networks share the convolutional layer.

Fourth, keeping the shared convolution layer fixed, fine tuning the fc layer of Fast R-CNN. Thus, the two networks share the same convolution layer, and form a unified network.

According to another embodiment of the present application, there is provided an embodiment of an apparatus for object detection, fig. 2 is a block diagram of an apparatus for object detection according to an embodiment of the present application, as shown in fig. 2, including:

the generating module 22 is configured to generate a suggestion box through a region suggestion network, where the region suggestion network is configured to identify a region where an object in the picture is located, and the suggestion box is configured to display the region where the object is located;

an acquisition module 24, coupled to the generation module 22, for acquiring a detection network according to the suggestion frame, wherein the detection network is used for detecting the object from the picture, and there is no shared convolution layer between the regional suggestion network and the detection network;

a trimming module 26, coupled to the obtaining module 24, for trimming the detection network and the area suggestion network according to the suggestion box, to obtain a target detection network and a target area suggestion network having the same shared convolution layer;

a detection module 28, coupled to the fine tuning module 26, is configured to detect the target object in the target picture according to the target detection network and the target area suggestion network.

Alternatively, in the present embodiment, the above-described object detection apparatus may be applied, but not limited to, in a detection scene of an object. For example: such objects may include, but are not limited to: tableware, fruit, stationery, kitchen ware, clothing, ornaments, and the like.

Alternatively, in the present embodiment, the above-described object detection apparatus may be applied to, but is not limited to, a terminal device. The terminal device may include, but is not limited to: a cell phone, tablet computer, PC computer, smart wearable device, smart home device, etc.

Through the device, the suggested frame is generated through the regional suggested network, the detection network is acquired, and the regional suggested network and the detection network are finely adjusted according to the suggested frame, so that the target detection network and the target regional suggested network have the same shared convolution layer, the target detection network and the target regional suggested network can be unified, the accuracy of object detection is improved, and the technical problem of low object detection accuracy is solved.

Fig. 3 is a block diagram two of an object detection apparatus according to an embodiment of the present application, as shown in fig. 3, optionally, the fine adjustment module 26 includes:

and a fine tuning unit 32, configured to keep the suggestion frame fixed, and to interactively fine tune the area suggestion network and the detection network, so as to obtain a target detection network and a target area suggestion network with the same shared convolution layer.

Alternatively, to unify the regional suggestion network and the detection network, the regional suggestion network and the detection network may be trained, the suggestion box may be kept fixed, and the fine-tuning of the regional suggestion and the fine-tuning of the target detection may be alternated.

Fig. 4 is a block diagram III of an object detection apparatus according to an embodiment of the present application, and as shown in fig. 4, optionally, the fine adjustment unit 32 includes:

a first trimming subunit 42, configured to keep the suggestion frame fixed, trim the convolution layer unique to the regional suggestion network to the detection network and the regional suggestion network to have a shared convolution layer, and obtain a target regional suggestion network;

the second trimming subunit 44 is configured to keep the shared convolutional layer fixed, and trim the FC layer of the detection network to the detection network and the regional suggestion network to have the same shared convolutional layer, so as to obtain the target detection network.

Fig. 5 is a block diagram of an object detection apparatus according to an embodiment of the present application, and as shown in fig. 5, optionally, the detection module 28 includes:

an acquisition unit 52 for acquiring a target picture;

an input unit 54, coupled to the obtaining unit 52, is configured to input the target picture into the target area suggestion network and the target detection network to obtain the target object.

Optionally, the input unit 54 includes: the first input subunit is used for inputting the target picture into the target area suggestion network to obtain a picture carrying a target suggestion frame; and the second input subunit is used for inputting the picture carrying the target suggestion frame into the target detection network to obtain the target object detected by the target detection network from the target suggestion frame.

Alternatively, the regional advice network may be, but is not limited to, an RPN network and the detection network may be, but is not limited to, a Fast R-CNN network.

Optionally, the apparatus further includes: and the training module is used for carrying out initialization training through the ImageNet pre-training model to obtain the regional suggestion network.

Optionally, the acquiring module 24 is further configured to: training the ImageNet pre-training model by Fast R-CNN according to the suggestion frame to obtain a detection network which has no shared convolution layer with the regional suggestion network.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. An object detection method, comprising:

generating a suggestion frame through a regional suggestion network, wherein the regional suggestion network is used for identifying a region where an object in a picture is located, and the suggestion frame is used for displaying the region where the object is located;

acquiring a detection network according to the suggestion frame, wherein the detection network is used for detecting the object from the picture, and a convolution layer which is not shared between the regional suggestion network and the detection network is formed;

fine-tuning the detection network and the regional suggestion network according to the suggestion frame to obtain a target detection network and a target regional suggestion network with the same shared convolution layer;

detecting a target object in a target picture according to the target detection network and the target area suggestion network;

wherein the fine tuning the detection network and the regional suggestion network according to the suggestion box, the obtaining the target detection network and the target regional suggestion network with the same shared convolution layer includes: maintaining the suggestion frame fixed, and interactively fine-tuning the area suggestion network and the detection network to obtain the target detection network and the target area suggestion network with the same shared convolution layer;

wherein, the keeping the suggestion box fixed, interactively fine-tuning the regional suggestion network and the detection network includes: maintaining the suggestion frame fixed, and fine-tuning a convolution layer unique to the regional suggestion network to the detection network and the regional suggestion network to have a shared convolution layer so as to obtain the target regional suggestion network; and keeping the shared convolution layer fixed, and fine-tuning the FC layer of the detection network to the detection network and the area suggestion network to have the same shared convolution layer so as to obtain the target detection network.

2. The method of claim 1, wherein the regional suggestion network is an RPN network and the detection network is a Fast R-CNN network.

3. The method of claim 2, wherein prior to generating the suggestion box over the regional suggestion network, the method further comprises:

and carrying out initialization training through an ImageNet pre-training model to obtain the RPN.

4. The method of claim 2, wherein obtaining the detection network according to the suggestion box comprises:

training an ImageNet pre-training model according to the suggestion frame through Fast R-CNN to obtain the Fast R-CNN network without a shared convolution layer between the Fast R-CNN network and the RPN network.

5. The method according to any one of claims 1 to 4, wherein detecting the target object in the target picture according to the target detection network and the target area suggestion network comprises:

acquiring the target picture;

and inputting the target picture into the target area suggestion network and the target detection network to obtain the target object.

6. The method of claim 5, wherein inputting the target picture into the target region suggestion network and the target detection network to obtain the target object comprises:

inputting the target picture into the target area suggestion network to obtain a picture carrying a target suggestion frame;

and inputting the picture carrying the target suggestion frame into the target detection network to obtain the target object detected by the target detection network from the target suggestion frame.

7. An object detection apparatus, comprising:

the generation module is used for generating a suggestion frame through a regional suggestion network, wherein the regional suggestion network is used for identifying the region where the object in the picture is located, and the suggestion frame is used for displaying the region where the object is located;

the acquisition module is used for acquiring a detection network according to the suggestion frame, wherein the detection network is used for detecting the object from the picture, and a convolution layer which is not shared between the regional suggestion network and the detection network is formed;

the fine tuning module is used for fine tuning the detection network and the regional suggestion network according to the suggestion frame to obtain a target detection network and a target regional suggestion network with the same shared convolution layer;

the detection module is used for detecting a target object in a target picture according to the target detection network and the target area suggestion network;

wherein the fine tuning module comprises: a fine tuning unit for keeping the suggestion frame fixed, interactively fine tuning the area suggestion network and the detection network to obtain the target detection network and the target area suggestion network with the same shared convolution layer,

wherein the fine tuning unit includes: a first fine tuning subunit, configured to keep the suggestion frame fixed, fine tune a convolution layer unique to the regional suggestion network to a shared convolution layer between the detection network and the regional suggestion network, and obtain the target regional suggestion network; and the second fine tuning subunit is used for keeping the shared convolution layer fixed, and fine tuning the FC layer of the detection network to the detection network and the area suggestion network to have the same shared convolution layer so as to obtain the target detection network.

8. The apparatus of claim 7, wherein the regional suggestion network is an RPN network and the detection network is a Fast R-CNN network.

9. The apparatus of claim 8, wherein the apparatus further comprises:

and the initialization training module is used for performing initialization training through an ImageNet pre-training model to obtain the RPN.

10. The apparatus of claim 8, wherein the acquisition module is configured to: training an ImageNet pre-training model according to the suggestion frame through Fast R-CNN to obtain the Fast R-CNN network without a shared convolution layer between the Fast R-CNN network and the RPN network.

11. The apparatus according to any one of claims 7 to 10, wherein the detection module comprises:

an acquisition unit, configured to acquire the target picture;

and the input unit is used for inputting the target picture into the target area suggestion network and the target detection network to obtain the target object.

12. The apparatus of claim 11, wherein the input unit comprises:

the first input subunit is used for inputting the target picture into the target area suggestion network to obtain a picture carrying a target suggestion frame;

and the second input subunit is used for inputting the picture carrying the target suggestion frame into the target detection network to obtain the target object detected by the target detection network from the target suggestion frame.

13. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the method of any one of claims 1 to 6.

14. A processor for running a program, wherein the program when run performs the method of any one of claims 1 to 6.