CN111523533A

CN111523533A - Method and device for determining region of object from image

Info

Publication number: CN111523533A
Application number: CN201910106122.1A
Authority: CN
Inventors: 杨攸奕; 武元琪; 李名杨
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2020-08-11
Anticipated expiration: 2039-02-01
Also published as: CN111523533B

Abstract

The application provides a method and a device for determining the area where an object is located from an image. The method for determining the area where the object is located from the image comprises the following steps: acquiring an image to be identified; acquiring feature map data of the image to be identified; obtaining object scores in the feature map data; and determining the area of the object in the image to be recognized according to the object score. By the method, the calculation efficiency of object identification is improved.

Description

Method and device for determining region of object from image

Technical Field

The present application relates to the field of object recognition, and in particular, to a method and an apparatus for determining an area where an object is located from an image.

Background

At present, an object identification method based on a region is always the mainstream method in the field of object identification. Currently, most advanced region-based object detectors are implemented using a region-of-interest pooling layer in conjunction with a region candidate network.

By learning embedded spatial information from higher-order Convolutional Neural Network (CNN) features, region-based object detectors have made tremendous advances over traditional non-CNN methods such as Deformable Part Models (DPMs). Recently, a Region-based full Convolutional Network (R-FCNs) reduces the overhead of repeated prediction by a Region-of-interest pooling layer on the final fcn (full Convolutional functional Network) feature map, further optimizing the performance and computational efficiency of object identification.

However, in these current object recognition methods based on regions, a large amount of frame labeling training data must be used to perform model training to obtain object scores, which further results in low calculation efficiency of object recognition. Especially in advanced regional candidate networks (RPNs), when the size of the convolution kernel of the CNN used increases, the amount of calculation of the training parameters becomes very large, further reducing the efficiency of object recognition calculation

Disclosure of Invention

The application provides a method and a device for determining an area where an object is located from an image, so as to improve the calculation efficiency of object identification.

The application provides a method for determining the area of an object from an image, which comprises the following steps:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

and determining the area of the object in the image to be recognized according to the object score.

Optionally, the acquiring the feature map data of the image to be recognized includes:

and performing feature extraction on the image to be recognized by using a convolutional neural network to obtain feature map data of the image to be recognized.

Optionally, obtaining the object score in the feature map data includes:

carrying out convolution operation on the characteristic image data and a convolution kernel to obtain a convolution result;

and obtaining the object score in the feature map data according to the convolution result.

Optionally, the method for determining the region where the object is located from the image further includes:

determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;

taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;

taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;

obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;

and obtaining the convolution kernel according to the size information of the candidate convolution kernel.

Optionally, the initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

Optionally, the obtaining the convolution kernel according to the size information of the candidate convolution kernel includes:

determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel;

amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel;

and obtaining the convolution kernel according to the amplified size information of the candidate convolution kernel.

Optionally, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes height information of the enlarged candidate convolution kernel and width information of the enlarged candidate convolution kernel;

the determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel includes: determining the height of the candidate convolution kernel according to the height information of the candidate convolution kernel, and determining the width of the candidate convolution kernel according to the width information of the candidate convolution kernel;

amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel, including: and amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.

Optionally, the performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result includes:

judging whether filling processing needs to be carried out on the feature map data according to the attribute of the feature map data and the convolution kernel;

if so, performing filling processing on the feature map data, and performing convolution operation on the filled feature map data and the convolution kernel to obtain a convolution result.

performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes;

the obtaining the object score in the feature map data according to the convolution result includes: and obtaining object scores in the feature map data according to convolution results corresponding to the convolution kernels with the first number and different sizes.

and obtaining the first number of convolution kernels with different sizes according to the attribute of the feature map data.

Optionally, the obtaining, according to the convolution results corresponding to the convolution kernels with the first number and different sizes, the object score in the feature map data includes:

and synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain object scores in the feature map data.

Optionally, the synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain the object score in the feature map data includes:

splicing convolution results corresponding to the convolution kernels with different sizes to obtain spliced feature vectors of the feature map data;

and obtaining an object score of an interest region in the feature map data according to the spliced feature vector, wherein the interest region is a region which may contain the object in the feature map data.

Optionally, the region of interest is determined in the feature map data according to a size of a convolution kernel corresponding to the feature map data.

Optionally, the determining, according to the object score, the area where the object in the image to be recognized is located includes:

according to the object score, determining a target area containing the object in the feature map data from the region of interest;

and determining the area of the object in the image to be recognized according to the target area.

The application provides a device for determining the area of an object from an image, which comprises:

the image acquisition unit is used for acquiring an image to be identified;

the characteristic acquisition unit is used for acquiring characteristic map data of the image to be identified;

an object score obtaining unit, configured to obtain an object score in the feature map data;

and the determining unit is used for determining the area where the object in the image to be identified is located according to the object score.

Optionally, the feature obtaining unit is specifically configured to:

Optionally, the object obtaining unit is specifically configured to:

Optionally, the apparatus for determining the region where the object is located from the image further includes a convolution kernel obtaining unit, configured to:

Optionally, in the convolution kernel obtaining unit, an initial height of the candidate convolution kernel is 1, an initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

Optionally, the convolution kernel obtaining unit is further configured to:

amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel:

Optionally, in the convolution kernel obtaining unit, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes the enlarged height information of the candidate convolution kernel and the enlarged width information of the candidate convolution kernel;

the amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel includes: and amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.

Optionally, the convolution kernel obtaining unit is further configured to:

Optionally, the object obtaining unit is specifically configured to:

Optionally, the object obtaining unit further includes convolution kernel obtaining units with different sizes, and the convolution kernel obtaining units with different sizes are configured to:

Optionally, the object obtaining unit is specifically configured to:

Optionally, the object obtaining unit is further configured to:

the region of interest is determined in the feature map data based on a size of a convolution kernel corresponding to the feature map data.

Optionally, the determining unit is specifically configured to:

The application provides an electronic device, the electronic device includes:

a processor;

a memory for storing a linked list processing program, which when read executed by the processor performs the following operations:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

The present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

The application provides a method for determining the area of an object from an object to be identified, which comprises the following steps:

acquiring an object to be identified;

acquiring characteristic diagram data of the object to be identified;

obtaining object scores in the feature map data;

and determining the area of the object in the object to be recognized according to the object score.

Optionally, the obtaining the object score in the feature map data includes:

Optionally, the method for determining the area where the object is located from the object to be recognized further includes:

Compared with the prior art, the method has the following advantages:

by adopting the method for determining the area where the object is located from the image, the object score in the feature map data is directly obtained, so that in the process of object recognition, a large amount of frame marking training data is not needed for model training, and the calculation efficiency of object recognition is improved.

Drawings

FIG. 1 is a flow chart of a method of a first embodiment of the present application;

FIG. 2 is a schematic diagram of a first embodiment of the present application for determining an area of an object from an image;

fig. 3 is a Network architecture diagram of a bing (binary normalized distributions) -RPN (Region candidate Network) according to a first embodiment of the present application;

fig. 4 is a schematic diagram of an object recognition branch of a BING-RPN network according to a first embodiment of the present application;

FIG. 5 is a diagram illustrating a convolution calculation in a BING-RPN network according to a first embodiment of the present application;

fig. 6 is a schematic view of an apparatus according to a second embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The first embodiment of the application provides a method for determining the area where an object is located from an image. Please refer to fig. 1, which is a flowchart illustrating a method according to a first embodiment of the present application. A method for determining a region where an object is located from an image according to a first embodiment of the present application is described in detail below with reference to fig. 1. The method comprises the following steps:

step S101: and acquiring an image to be identified.

This step is used to obtain the image to be identified.

Please refer to fig. 2, which is a schematic diagram illustrating a process of determining an area of an object from an image according to the present embodiment. The first row in fig. 2 provides an image comprising a bird, the background of which comprises green grass. The object to be identified is an image containing a bird. Generally, a bird in an image is taken as an object to be recognized. By adopting the method provided by the embodiment, the target area including the object to be recognized is obtained from the image, and then the object is recognized.

Please refer to fig. 3, which is a network architecture diagram of the BING-RPN according to the method of the present embodiment. In fig. 3, the image to be recognized is the image including the dog on the leftmost side in the figure.

Unlike the conventional BING, the present embodiment provides an improved version of the BING method, please refer to fig. 4 and 5. Fig. 4 is a schematic diagram of an object identification branch of the BING-RPN network provided in this embodiment. FIG. 5 is a diagram of convolution calculations in a BING-RPN network. The following steps will be described in more detail with reference to fig. 4 and 5.

Step S102: and acquiring the characteristic diagram data of the image to be identified.

The step is used for obtaining the characteristic image data of the image to be identified.

The acquiring of the feature map data of the image to be recognized includes:

The convolutional neural network in this embodiment may be a commonly used ResNet, inclusion V3, inclusion V4, or the like. Referring to fig. 3, the backbone CNN network is the convolutional neural network.

Step S103: and obtaining the object score in the characteristic map data.

The step is used for obtaining the object score in the feature map data by using a negative Laplacian operator.

The laplacian is a tool commonly used in image processing, and is used for sharpening images. The method provided by the embodiment uses the negative laplacian operator in the calculation of the object score, and obtains high object recognition efficiency.

The object score (object score) is used in the RPN network, and refers to the probability that a candidate region in feature map data corresponding to an image to be recognized contains an object to be recognized. The value of the object score is between 0 and 1, and the larger the value is, the greater the probability that the representative region contains the object to be identified is.

The obtaining the object score in the feature map data by using the negative laplacian operator includes:

carrying out convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative Laplacian;

The feature map data x in the present embodiment can be expressed by the following formula.

x∈R^h×w×C，

Where h denotes the height of the feature map data, w denotes the width of the feature map data, and C denotes the number of channels of the feature map data. For example, the feature map data x may be an 8 × 8 × 1536 data block employed in inclusion-V4.

In the BING-RPN provided in this embodiment, the feature map data x is subjected to multiple parallel deep convolution operations, and a convolution kernel of the deep convolution operation is obtained by using the following formula:

wherein the content of the first and second substances,

and is

Referring to fig. 4, the ring-based bin-RPN provided in this embodiment outputs N C-dimensional region candidates. Wherein N is obtained using the following formula:

where h is the height of the input feature map data, w is the width of the input feature map data, h_θ，w_θIs the height and width of the convolution kernel of different sizes multiplied by 1 × 1 as described above.

The method for determining the area where the object is located from the image further comprises the following steps:

Through the above formula for obtaining N, it can be seen that the first number N can be obtained from the height and width data of the feature map data.

The initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

For example, convolution kernel sizes of 1 × 1, 2 × 1, 4 × 1, 8 × 1, 1 × 2, 2 × 2, 4 × 2, 8 × 2, 1 × 4, 2 × 4, 4 × 4, 8 × 4, 1 × 8, 2 × 8, 4 × 8, 8 × 8 can be obtained according to the above-described method. By adopting the search strategy of the convolution kernel which is multiplied from 1 multiplied by 1, the acquisition efficiency of the convolution kernel is improved. Please refer to (a) and (b) in fig. 5. In fig. 5, the bold boxes represent convolution kernels, and the right (0, 0, 1, 1) represents (x, y, convolution kernel width, convolution kernel height), respectively, x, y being the coordinates of the top left corner of the above-mentioned boxes.

The obtaining the convolution kernel according to the size information of the candidate convolution kernel includes:

amplifying the size of the candidate convolution kernel according to the size of the negative Laplace operator to obtain the amplified size information of the candidate convolution kernel;

Because objects are generally considered to have good closed boundaries, the viewability field obtained from a Region of interest (Region of interest) is not sufficient for object recognition. The method for determining the region where the object is located from the image provided by the embodiment enlarges the size of the convolution kernel.

The size information of the candidate convolution kernel comprises height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the amplified size information of the candidate convolution kernel comprises the amplified height information of the candidate convolution kernel and the amplified width information of the candidate convolution kernel;

the amplifying the size of the candidate convolution kernel according to the size of the negative laplacian operator to obtain the amplified size information of the candidate convolution kernel includes: and according to the height of the negative Laplace operator, amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and according to the width of the negative Laplace operator, amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.

Performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result, including:

Since the amplification for the convolution kernel may cause unevenness in convolution calculation, a 0-filling operation needs to be performed for the feature map data. Please refer to (c) and (d) in fig. 5. Taking fig. 5 (d) as an example, pad (2, 4) indicates that, on the basis of 8 × 8 feature map data, 2 rows are newly expanded in each of the height direction, the upward direction and the downward direction. In the width direction, the expansion is newly performed by 4 columns in both the left and right directions. The expanded rows and columns are filled with a 0 complement.

Performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative laplacian operator, and the method comprises the following steps:

performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes, wherein weights of the first number of convolution kernels with different sizes are negative Laplace operators;

Assuming that the height and width of a given negative Laplacian are both k, if a field of view k × k times is to be obtained, then a size of (k · h) may be used_θ)×(k·w_θ) And (3) performing a padding process of (k-1)/2h, (k-1)/2w and complementing 0 to the convolution kernel. According to the analysis, the BING-RPN provided by the present embodiment can be expressed using the following formula:

BING(x；Θ，k)＝{Conv_depthwise(x'; theta) | theta ∈ theta (k) } formula 1

Where x' is fill data after fill processing by 0-filling of (k-1)/2h, (k-1)/2w is performed for the input feature map data x, k is the magnification ratio of the visual field, and Θ (k) is a set of convolution kernels that follow the change in k. When k is 1, the BING algorithm is equivalent to the BING algorithm that is not filled with 0. Please refer to fig. 4. In fig. 4, the feature map data with 8 × 8 × C at the bottom, the second last row is the filled size, such as the filled size (2, 1), which represents two rows expanding in two directions, one up and one down, and one column expanding in one left and one right. And (4) complementing 0 on the expanded data. The third to last row is a convolution kernel of 16 sizes such as 3 × 3, 6 × 3, 12 × 3, 24 × 3, 3 × 6, 6 × 6, 12 × 6, 24 × 6, 3 × 12, 6 × 12, 24 × 24, 12 × 24, 24 × 24, and 24 × 24 obtained by performing an enlargement with k ═ 3 for the previously obtained convolution kernels of 1 × 1, 2 × 1, 4 × 1, 8 × 1, 1 × 2, 2 × 2, 4 × 2, 8 × 2, 1 × 8, and 8 × 8. The weights of the convolution kernels adopt an artificially given negative laplacian operator.

In existing BING implementations, a training model is used to obtain object scores. The method provided in this embodiment finds that calculating the object score using the negative laplacian operator is feasible, and is proved by the commonly used ImageNet 2012 experimental data set in the industry. Based on the above analysis, the convolution kernel can be selected as follows:

wherein the content of the first and second substances,

is a two-dimensional Laplace filter, the variable σ_x，σ_yProportional to the size of the laplace filter. Can set [ sigma ]_x，σ_y]＝[1.4，1.4]The size of the corresponding convolution kernel is 9 × 9. The theta in said equation 2 is expressed_objSubstituting equation 1 into a vector of dimension N × C, each element of which corresponds to an object in a channel, laplace filter has a minimum size of 3 × 3, so that k 3 is used to maintain the resolution of the output_obj', 3). FIG. 4 shows BING (x; Θ)_obj', 3). As can be seen from fig. 4, the calculation process includes: firstly, feature map data are obtained, then filling processing is carried out, convolution operation is carried out on the feature map data and convolution kernels with different sizes, obtained convolution operation results are synthesized, and object scores are obtained.

The obtaining the object scores in the feature map data according to the convolution results corresponding to the convolution kernels of the first number and different sizes includes:

The synthesizing the convolution results corresponding to the convolution kernels with the first number of different sizes to obtain the object scores in the feature map data includes:

Step S104: and determining the area of the object in the image to be recognized according to the object score.

The step is used for determining the area where the object in the image to be identified is located according to the object score.

The determining the area where the object in the image to be recognized is located according to the object score comprises the following steps:

Referring to fig. 3, the target area is the frame containing the dog.

In the above embodiment, a method for determining a region where an object is located from an image is provided, and correspondingly, an apparatus for determining a region where an object is located from an image is also provided. Please refer to fig. 6, which is a flowchart of an embodiment of an apparatus for determining an area of an object from an image according to the present application. Since this embodiment, i.e., the second embodiment, is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The device embodiments described below are merely illustrative.

The device for determining the area where the object is located from the image comprises the following steps:

an image acquisition unit 601 configured to acquire an image to be recognized;

a feature obtaining unit 602, configured to obtain feature map data of the image to be identified;

an object score obtaining unit 603, configured to obtain an object score in the feature map data by using a negative laplacian operator;

a determining unit 604, configured to determine, according to the object score, an area where an object in the image to be identified is located.

In this embodiment, the feature obtaining unit is specifically configured to:

and performing feature extraction on the object to be identified by using a convolutional neural network to obtain feature map data of the object to be identified.

In this embodiment, the object obtaining unit is specifically configured to:

In this embodiment, the apparatus for determining the region where the object is located from the image further includes a convolution kernel obtaining unit, configured to:

In this embodiment, in the convolution kernel obtaining unit, an initial height of the candidate convolution kernel is 1, an initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

In this embodiment, the convolution kernel obtaining unit is further configured to:

In this embodiment, in the convolution kernel obtaining unit, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes the height information of the enlarged candidate convolution kernel and the width information of the enlarged candidate convolution kernel;

In this embodiment, the object obtaining unit is specifically configured to:

In this embodiment, the object obtaining unit further includes convolution kernel obtaining units with different sizes, and the convolution kernel obtaining units with different sizes are configured to:

Optionally, the object obtaining unit is specifically configured to:

In this embodiment, the object obtaining unit is further configured to:

In this embodiment, the determining unit is specifically configured to:

and identifying the object in the object to be identified according to the target area.

A third embodiment of the present application provides an electronic apparatus, including:

a processor;

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

A fourth embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

The fifth embodiment of the present application provides a method for determining the area of an object from an object to be recognized, which is substantially similar to the first embodiment, so that the description is simple, and please refer to the relevant parts of the first embodiment for detailed description.

The application provides a method for determining a region where an object is located from an image, which comprises the following steps:

acquiring an object to be identified;

acquiring characteristic diagram data of the object to be identified;

obtaining object scores in the feature map data by using a negative Laplacian operator;

and identifying the object in the object to be identified according to the object classification. The object to be identified can be a static picture or a video acquired by video acquisition equipment such as a camera.

In this embodiment, the obtaining the object score in the feature map data by using the negative laplacian includes:

In this embodiment, the method for determining the area where the object is located from the image further includes:

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more operators (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A method for determining a region of an object from an image, comprising:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

2. The method according to claim 1, wherein the obtaining of the feature map data of the image to be recognized comprises:

3. The method according to claim 1, wherein the obtaining the object score in the feature map data comprises:

4. The method of claim 3, further comprising:

5. The method of claim 4, wherein the initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.

6. The method according to claim 4, wherein the obtaining the convolution kernel according to the size information of the candidate convolution kernel comprises:

7. The method according to claim 6, wherein the size information of the candidate convolution kernel comprises height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel comprises the height information of the enlarged candidate convolution kernel and the enlarged width information of the candidate convolution kernel;

8. The method according to claim 4, wherein the convolving the feature map data with a convolution kernel to obtain a convolution result comprises:

9. The method according to claim 3, wherein the obtaining a convolution result by performing a convolution operation on the feature map data and a convolution kernel comprises:

10. The method of claim 9, further comprising:

11. The method according to claim 9, wherein obtaining the object score in the feature map data according to the convolution result corresponding to the first number of convolution kernels with different sizes includes:

12. The method according to claim 11, wherein the synthesizing the convolution results corresponding to the first number of convolution kernels with different sizes to obtain the object score in the feature map data comprises:

13. The method of claim 12, wherein the region of interest is determined from the feature map data based on a size of a convolution kernel associated with the feature map data.

14. The method according to claim 12, wherein the determining the region of the object in the image to be recognized according to the object score comprises:

15. An apparatus for determining a region of an object from an image, comprising:

the image acquisition unit is used for acquiring an image to be identified;

16. An electronic device, characterized in that the electronic device comprises:

a processor;

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

17. A computer-readable storage medium having a computer program stored thereon, the program, when executed by a processor, performing the steps of:

acquiring an image to be identified;

acquiring feature map data of the image to be identified;

obtaining object scores in the feature map data;

18. A method for determining the area of an object from an object to be identified, comprising:

acquiring an object to be identified;

acquiring characteristic diagram data of the object to be identified;

obtaining object scores in the feature map data;

19. The method for determining the region where the object is located from the object to be identified according to claim 18, wherein the obtaining the object score in the feature map data comprises:

20. The method for determining the area where the object is located from the objects to be identified according to claim 19, further comprising: