CN111523533B

CN111523533B - Method and device for determining area of object from image

Info

Publication number: CN111523533B
Application number: CN201910106122.1A
Authority: CN
Inventors: 杨攸奕; 武元琪; 李名杨
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2023-07-07
Anticipated expiration: 2039-02-01
Also published as: CN111523533A

Abstract

The application provides a method and a device for determining an area where an object is located from an image. The method for determining the area where the object is located in the image comprises the following steps: acquiring an image to be identified; acquiring feature map data of the image to be identified; obtaining object parts in the feature map data; and determining the area where the object in the image to be identified is located according to the object part. The method provided by the application is adopted to improve the calculation efficiency of object identification.

Description

Method and device for determining area of object from image

Technical Field

The present invention relates to the field of object recognition, and in particular, to a method and an apparatus for determining an area where an object is located from an image.

Background

Currently, area-based object recognition methods have been the dominant method in the field of object recognition. Currently, most advanced zone-based object detectors are implemented using a zone of interest pooling layer in combination with a zone candidate network.

By learning embedded spatial information from higher-order convolutional neural network (Convolutional Neural Network, CNN) features, area-based object detectors have made tremendous progress over traditional non-CNN approaches such as variable component models (Deformable Parts Model, DPM). Recently, region-based full convolution networks (Region-based Fully Convolutional Network, R-FCNs) have further optimized the performance and computational efficiency of object recognition by reducing the overhead of duplicate predictions through the Region-of-interest pooling layer on the final FCN (Fully Convolutional Network) signature.

However, these current region-based object recognition methods must use a large amount of frame labeling training data to perform model training to obtain object components, which results in low calculation efficiency of object recognition. Especially in advanced regional candidate networks (RPNs), when the size of the convolution kernel of the CNN used increases, the calculation amount of training parameters is very large, further reducing the calculation efficiency of object recognition

Disclosure of Invention

The application provides a method and a device for determining an area where an object is located from an image, so as to improve the calculation efficiency of object identification.

The application provides a method for determining an area where an object is located from an image, which comprises the following steps:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

and determining the area where the object in the image to be identified is located according to the object part.

Optionally, the acquiring the feature map data of the image to be identified includes:

and extracting features of the image to be identified by using a convolutional neural network, and acquiring feature map data of the image to be identified.

Optionally, obtaining the object part in the feature map data includes:

Performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result;

and obtaining the object parts in the characteristic map data according to the convolution result.

Optionally, the method for determining the area where the object is located in the image further includes:

determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;

taking the initial height as a reference, performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;

taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;

obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;

and obtaining the convolution kernel according to the size information of the candidate convolution kernel.

Optionally, the initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

Optionally, the obtaining the convolution kernel according to the size information of the candidate convolution kernel includes:

determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel;

Amplifying the size of the candidate convolution kernel to obtain amplified size information of the candidate convolution kernel;

and obtaining the convolution kernel according to the amplified size information of the candidate convolution kernel.

Optionally, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the amplified size information of the candidate convolution kernel includes the amplified height information of the candidate convolution kernel and the amplified width information of the candidate convolution kernel;

the determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel includes: determining the height of the candidate convolution kernel according to the height information of the candidate convolution kernel, and determining the width of the candidate convolution kernel according to the width information of the candidate convolution kernel;

amplifying the size of the candidate convolution kernel to obtain amplified size information of the candidate convolution kernel, wherein the amplifying comprises the following steps: amplifying the height of the candidate convolution kernel to obtain amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel to obtain amplified width information of the candidate convolution kernel.

Optionally, the performing convolution operation on the feature map data and the convolution kernel to obtain a convolution result includes:

judging whether filling processing is needed for the feature map data according to the attribute of the feature map data and the convolution kernel;

if yes, filling the feature map data, and carrying out convolution operation on the filled feature map data and the convolution kernel to obtain a convolution result.

performing convolution operation on the feature map data and the convolution kernels with the first number and different sizes respectively to obtain convolution results corresponding to the convolution kernels with the first number and different sizes;

the step of obtaining the object parts in the feature map data according to the convolution result comprises the following steps: and obtaining object parts in the feature map data according to convolution results corresponding to the convolution kernels with different sizes of the first number.

and obtaining the first number of convolution kernels with different sizes according to the attribute of the feature map data.

Optionally, the obtaining the object part in the feature map data according to the convolution results corresponding to the convolution kernels with the first number and different sizes includes:

and synthesizing convolution results corresponding to the convolution kernels with different sizes of the first number to obtain object parts in the feature map data.

Optionally, the synthesizing the convolution results corresponding to the convolution kernels with different sizes of the first number to obtain object parts in the feature map data includes:

splicing convolution results corresponding to the convolution kernels with different sizes to obtain spliced feature vectors of the feature map data;

and obtaining object parts of an interested region in the feature map data according to the spliced feature vector, wherein the interested region is a region possibly containing the object in the feature map data.

Optionally, the region of interest is determined in the feature map data according to a size of a convolution kernel corresponding to the feature map data.

Optionally, the determining, according to the object portion, an area where the object in the image to be identified is located includes:

determining a target area containing the object in the feature map data from the region of interest according to the object score;

And determining the area where the object in the image to be identified is located according to the target area.

The application provides a device for determining an area where an object is located from an image, which comprises the following steps:

the image acquisition unit is used for acquiring an image to be identified;

the feature acquisition unit is used for acquiring feature map data of the image to be identified;

an object division obtaining unit, configured to obtain an object division in the feature map data;

and the determining unit is used for determining the area where the object in the image to be identified is located according to the object parts.

Optionally, the feature acquisition unit is specifically configured to:

Optionally, the object obtaining unit is specifically configured to:

Optionally, the device for determining the area where the object is located from the image further includes a convolution kernel obtaining unit, configured to:

based on the initial height, the initial width, the candidate height and the candidate width, obtaining size information of the candidate convolution kernel;

Optionally, in the convolution kernel obtaining unit, an initial height of the candidate convolution kernel is 1, an initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

Optionally, the convolution kernel obtaining unit is further configured to:

amplifying the size of the candidate convolution kernel to obtain amplified size information of the candidate convolution kernel:

Optionally, in the convolution kernel obtaining unit, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes the enlarged height information of the candidate convolution kernel and the enlarged width information of the candidate convolution kernel;

the step of amplifying the size of the candidate convolution kernel to obtain amplified size information of the candidate convolution kernel includes: amplifying the height of the candidate convolution kernel to obtain amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel to obtain amplified width information of the candidate convolution kernel.

Optionally, the convolution kernel obtaining unit is further configured to:

Optionally, the object obtaining unit is specifically configured to:

Optionally, the object division obtaining unit further includes a convolution kernel obtaining unit with different sizes, where the convolution kernel obtaining unit is configured to:

Optionally, the object obtaining unit is specifically configured to:

Optionally, the object obtaining unit is further configured to:

The region of interest is determined in the feature map data according to the size of the convolution kernel corresponding to the feature map data.

Optionally, the determining unit is specifically configured to:

The application provides an electronic device, the electronic device includes:

a processor;

a memory for storing a linked list of processing programs which, when read and executed by the processor, perform the following operations:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

The present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

The application provides a method for determining an area where an object is located from an object to be identified, which comprises the following steps:

acquiring an object to be identified;

acquiring feature map data of the object to be identified;

obtaining object parts in the feature map data;

and determining the area where the object in the object to be identified is located according to the object part.

Optionally, the obtaining the object part in the feature map data includes:

Optionally, the method for determining the area where the object is located from the object to be identified further includes:

Compared with the prior art, the application has the following advantages:

by adopting the method for determining the area of the object from the image, the object part in the feature map data is directly obtained, so that a large amount of frame marking training data is not needed to be used for model training in the object recognition process, and the calculation efficiency of object recognition is improved.

Drawings

FIG. 1 is a flow chart of a method of a first embodiment of the present application;

FIG. 2 is a schematic illustration of determining an area of an object from an image according to a first embodiment of the present application;

fig. 3 is a network architecture diagram of a BING (Binarized Normed Gradients) -RPN (Region Proposal Network, area candidate network) according to a first embodiment of the present application;

FIG. 4 is a schematic diagram of an object identification branch of a BING-RPN network according to a first embodiment of the present application;

FIG. 5 is a schematic diagram of convolution computation in a BING-RPN network according to a first embodiment of the present application;

fig. 6 is a schematic view of an apparatus according to a second embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.

The first embodiment of the application provides a method for determining an area where an object is located from an image. Referring to fig. 1, a flowchart of a method according to a first embodiment of the present application is shown. A method for determining an area where an object is located from an image according to a first embodiment of the present application will be described in detail with reference to fig. 1. The method comprises the following steps:

Step S101: and acquiring an image to be identified.

The method is used for acquiring the image to be identified.

Fig. 2 is a schematic diagram of a process of determining an area of an object from an image according to the present embodiment. The first row in fig. 2 provides an image comprising one bird, the background of the image comprising green grass. The object to be identified is an image containing birds. In general, birds in an image are taken as objects to be identified. By adopting the method provided by the embodiment, the target area comprising the object to be identified is obtained from the image, and the object is identified.

Referring to fig. 3, a network architecture diagram of a BING-RPN employing the method provided in this embodiment is shown. In fig. 3, the image to be identified is the leftmost image of the figure that contains the dog.

Unlike conventional BING, this embodiment provides a modified BING method, please refer to FIGS. 4 and 5. Fig. 4 is a schematic diagram of an object recognition branch of the BING-RPN network provided in this embodiment. FIG. 5 is a schematic diagram of convolution computation in a BING-RPN network. The following step description will be described in more detail with reference to fig. 4 and 5.

Step S102: and acquiring the feature map data of the image to be identified.

The step is used for acquiring the feature map data of the image to be identified.

The obtaining the feature map data of the image to be identified comprises the following steps:

The convolutional neural network in this embodiment may be a commonly used ResNet, acceptance V3, acceptance V4, etc. Referring to fig. 3, the backbone CNN network is the convolutional neural network.

Step S103: and obtaining the object parts in the characteristic diagram data.

The step is used for obtaining the object parts in the characteristic map data by utilizing the negative Laplace operator.

The laplace operator is a tool commonly used in image processing for sharpening of images. The method provided by the embodiment uses the negative Laplace operator in the calculation of the object parts, and obtains higher object recognition efficiency.

An object score (object score) is used in an RPN network, and refers to a probability that a candidate region in feature map data corresponding to an image to be identified contains an object to be identified. The greater the value of the object score, which is between 0 and 1, the greater the probability that the candidate region contains the object to be identified.

The obtaining the object part in the feature map data by using the negative Laplacian comprises the following steps:

Performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative Laplacian;

The feature map data x in the present embodiment can be expressed using the following formula.

x∈R ^h×w×C ，

Where h represents the height of the feature map data, w represents the width of the feature map data, and C represents the number of channels of the feature map data. For example, the feature map data x may be an 8×8×1536 data block employed in acceptance-V4.

The BING-RPN provided in this embodiment performs multiple parallel deep convolution operations on the feature map data x, where the convolution kernel of the deep convolution operation is obtained using the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,

and->

Referring to fig. 4, the BING-RPN provided in this embodiment outputs N area candidates in the C dimension. Wherein N is obtained using the following formula:

where h is the height of the input feature map data, w is the width of the input feature map data, h _θ ，w _θ Is the height and width of the convolution kernel of different sizes, multiplied by 1 x 1 as described above.

The method for determining the area of the object from the image further comprises the following steps:

By the above formula of N, it can be seen that the first number N can be obtained by the height and width data of the feature map data.

The initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

For example, according to the above method, convolution kernel sizes of 1×1,2×1,4×1,8×1,1×2,2×2,4×2,8×2,1×4,2×4,4×4,8×4,1×8,2×8,4×8,8×8 can be obtained. By adopting the searching strategy of the convolution kernel which is multiplied by 1, the acquisition efficiency of the convolution kernel is improved. Please refer to fig. 5 (a) and (b). In fig. 5, the bold frame represents the convolution kernel, and the right (0, 1) represents (x, y, convolution kernel width, convolution kernel height), respectively, x, y being the coordinates of the upper left-most corner of the frame.

The obtaining the convolution kernel according to the size information of the candidate convolution kernel includes:

amplifying the size of the candidate convolution kernel according to the size of the negative Laplacian, and obtaining the amplified size information of the candidate convolution kernel;

Because objects are generally considered to have good closed boundaries, the visual field (recovery field) obtained from the region of interest (Region of interest) is insufficient for object recognition. The method for determining the area where the object is located in the image provided by the embodiment enlarges the size of the convolution kernel.

The size information of the candidate convolution kernel comprises the height information of the candidate convolution kernel and the width information of the candidate convolution kernel, and the amplified size information of the candidate convolution kernel comprises the amplified height information of the candidate convolution kernel and the amplified width information of the candidate convolution kernel;

The step of amplifying the size of the candidate convolution kernel according to the size of the negative Laplacian to obtain amplified size information of the candidate convolution kernel, including: amplifying the height of the candidate convolution kernel according to the height of the negative Laplacian to obtain amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel according to the width of the negative Laplacian to obtain amplified width information of the candidate convolution kernel.

The step of carrying out convolution operation on the feature map data and the convolution kernel to obtain a convolution result comprises the following steps:

Since the enlargement of the convolution kernel causes non-uniformity of the convolution calculation, a 0-filling operation is required for the feature map data. Please refer to fig. 5 (c) and (d). Taking (d) in fig. 5 as an example, pad (2, 4) represents that 2 lines are newly expanded in each of the upward and downward directions in the height direction on the basis of 8×8 feature map data. In the width direction, 4 columns are newly expanded in both the left and right directions. The expanded rows and columns are filled with 0 complements.

And performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative Laplacian, and the method comprises the following steps:

performing convolution operation on the feature map data and the convolution kernels with the first number and different sizes respectively to obtain convolution results corresponding to the convolution kernels with the first number and different sizes, wherein the weights of the convolution kernels with the first number and different sizes are negative Laplacian operators;

Given a givenThe negative Laplace operator has a height and width of k, and if a field of view of k×k times is to be obtained, a size of (k·h can be used _θ )×(k·w _θ ) And subjecting the convolution kernel to a fill process of (k-1)/2 h, (k-1)/2 w complement of 0. Based on the analysis, the BING-RPN provided by this embodiment can be expressed using the following formula:

BING(x；Θ，k)＝{Conv _depthwise (x'; θ) |θ ε Θ (k) } equation 1

Where x' is the filling data after the filling process by (k-1)/2 h, (k-1)/2 w of 0 complement to the input feature map data x, k is the enlargement ratio of the visual field, and Θ (k) is the set of convolution kernels that follow the change of k. When k=1, the BING algorithm is equivalent to a BING algorithm that is not filled with 0. Please refer to fig. 4. In fig. 4, the feature map data of 8×8×c at the lowest, and the penultimate line is the size of the filling, for example, the filling size (2, 1), representing expansion of two lines each in the upward and downward directions, and expansion of one column each in the left and right directions. And supplementing 0 on the expanded data. The third last row is a convolution kernel of 16 sizes obtained by enlarging k=3 for the previously obtained 1×1,2×1,4×1,8×1,1×2,2×2,4×2,8×2,1×4,2×4,4×4,1×8,2×8,4×8,8×8 convolution kernel, and k=3. The weight of the convolution kernel adopts a manually given negative Laplacian.

In existing BING implementations, a training model is used to obtain object scores. The method provided by this example, found that it was feasible to calculate object scores using the negative laplace operator, and demonstrated by the ImageNet 2012 experimental dataset commonly used in the industry. Based on the above analysis, the convolution kernel may be selected as follows:

is a two-dimensional Laplacian filter, and the variable sigma _x ，σ _y Proportional to the size of the laplace filter. Can set [ sigma ] _x ，σ _y ]＝[1.4，1.4]The corresponding convolution kernel has a size of 9 x 9.Θ in the equation 2 _obj With' formula 1, an N x C-dimensional vector is generated, where each element corresponds to an object component in a channel. The minimum size of the laplace filter is 3×3, so k=3 is used to maintain the resolution of the output. Object recognition can be achieved by BING (x; Θ) _obj ', 3). FIG. 4 shows BING (x; Θ) _obj ' 3) calculation process. As can be seen from fig. 4, the calculation process includes: firstly, feature map data are acquired, then filling processing is carried out, then convolution operation is carried out with a plurality of convolution kernels with different convolution kernel sizes, and each obtained convolution operation result is synthesized to obtain object components.

The obtaining the object part in the feature map data according to the convolution results corresponding to the convolution kernels with the first number and different sizes comprises the following steps:

The step of synthesizing the convolution results corresponding to the convolution kernels with different sizes of the first number to obtain object parts in the feature map data, includes:

and obtaining object parts of the region of interest in the feature map data according to the spliced feature vector, wherein the region of interest is a region possibly containing the object in the feature map data.

Step S104: and determining the area where the object in the image to be identified is located according to the object part.

The method comprises the step of determining the area where the object in the image to be identified is located according to the object part.

The determining the area where the object in the image to be identified is located according to the object part comprises the following steps:

Referring to fig. 3, the target area is a frame including a dog.

In the foregoing embodiment, a method for determining an area where an object is located in an image is provided, and correspondingly, the application also provides a device for determining an area where an object is located in an image. Referring to fig. 6, a flowchart of an embodiment of an apparatus for determining an area of an object from an image is shown. Since this embodiment, i.e. the second embodiment, is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description of the method embodiment for relevant points. The device embodiments described below are merely illustrative.

The device for determining the area of the object from the image comprises:

an image acquisition unit 601, configured to acquire an image to be identified;

a feature acquisition unit 602, configured to acquire feature map data of the image to be identified;

An object division obtaining unit 603, configured to obtain an object division in the feature map data by using a negative laplacian operator;

and the determining unit 604 is used for determining the area where the object in the image to be identified is located according to the object part.

In this embodiment, the feature acquiring unit is specifically configured to:

and extracting features of the object to be identified by using a convolutional neural network, and acquiring feature map data of the object to be identified.

In this embodiment, the object obtaining unit is specifically configured to:

In this embodiment, the apparatus for determining an area where an object is located from an image further includes a convolution kernel obtaining unit, configured to:

In this embodiment, in the convolution kernel obtaining unit, an initial height of the candidate convolution kernel is 1, an initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

In this embodiment, the convolution kernel obtaining unit is further configured to:

In this embodiment, in the convolution kernel obtaining unit, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes the enlarged height information of the candidate convolution kernel and the enlarged width information of the candidate convolution kernel;

In this embodiment, the object obtaining unit is specifically configured to:

In this embodiment, the object division obtaining unit further includes convolution kernel obtaining units with different sizes, where the convolution kernel obtaining units with different sizes are configured to:

Optionally, the object obtaining unit is specifically configured to:

In this embodiment, the object obtaining unit is further configured to:

In this embodiment, the determining unit is specifically configured to:

and identifying the object in the object to be identified according to the target area.

A third embodiment of the present application provides an electronic device, including:

a processor;

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

A fourth embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

The fifth embodiment Shen Di provides a method for determining an area where an object is located from an object to be identified, and is substantially similar to the first embodiment, so that the description is relatively simple, and reference is made to relevant parts of the first embodiment for details.

The method for determining the area where the object is located from the image comprises the following steps:

acquiring an object to be identified;

acquiring feature map data of the object to be identified;

obtaining object parts in the feature map data by using a negative Laplacian operator;

and identifying the object in the object to be identified according to the object part. The object to be identified can be a static picture or a video acquired by video acquisition equipment such as a camera.

In this embodiment, the obtaining the object portion in the feature map data by using the negative laplacian operator includes:

In this embodiment, the method for determining an area where an object is located from an image further includes:

While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the present invention, so that the scope of the present invention shall be defined by the claims of the present application.

In one typical configuration, a computing device includes one or more operators (CPUs), an input/output interface, a network interface, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

1. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

2. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A method for determining an area of an object from an image, comprising:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

determining the region where the object in the image to be identified is located according to the object part;

wherein, the obtaining the object part in the feature map data includes: performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result, and obtaining object parts in the feature map data according to the convolution result, wherein the object parts specifically comprise: and respectively carrying out convolution operation on the feature map data and the convolution kernels with the first number and different sizes to obtain convolution results corresponding to the convolution kernels with the first number and different sizes, and obtaining object parts in the feature map data according to the convolution results corresponding to the convolution kernels with the first number and different sizes.

2. The method for determining an area where an object is located from an image according to claim 1, wherein the acquiring feature map data of the image to be identified includes:

3. The method of determining an area of an object from an image of claim 1, further comprising:

4. A method of determining an area of an object from an image according to claim 3, wherein the initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

5. A method for determining a region of an object from an image according to claim 3, wherein the obtaining the convolution kernel according to the size information of the candidate convolution kernel comprises:

6. The method for determining an area where an object is located in an image according to claim 5, wherein the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the size information of the enlarged candidate convolution kernel includes the height information of the enlarged candidate convolution kernel and the width information of the enlarged candidate convolution kernel;

7. The method for determining an area where an object is located in an image according to claim 3, wherein the performing a convolution operation on the feature map data and a convolution kernel to obtain a convolution result includes:

8. The method of determining an area of an object from an image of claim 1, further comprising:

9. The method for determining an area where an object is located in an image according to claim 1, wherein the obtaining the object portion in the feature map data according to the convolution results corresponding to the convolution kernels of the first number and the different sizes includes:

10. The method for determining an area where an object is located in an image according to claim 9, wherein the synthesizing the convolution results corresponding to the convolution kernels with different sizes of the first number to obtain the object portion in the feature map data includes:

11. The method of determining a region of an object from an image according to claim 10, wherein the region of interest is determined in the feature map data according to a size of a convolution kernel corresponding to the feature map data.

12. The method for determining a region where an object is located in an image according to claim 10, wherein determining the region where the object is located in the image to be identified according to the object score comprises:

13. An apparatus for determining an area of an object from an image, comprising:

the image acquisition unit is used for acquiring an image to be identified;

the determining unit is used for determining the area where the object in the image to be identified is located according to the object parts;

14. An electronic device, the electronic device comprising:

a processor;

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

15. A computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, realizes the steps of:

Acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object parts in the feature map data;

16. A method for determining an area of an object from an object to be identified, comprising:

acquiring an object to be identified;

acquiring feature map data of the object to be identified;

obtaining object parts in the feature map data;

determining the region where the object in the object to be identified is located according to the object score;

17. The method of determining an area of an object from an object to be identified according to claim 16, further comprising: