CN111523533A - Method and device for determining region of object from image - Google Patents

Method and device for determining region of object from image Download PDF

Info

Publication number
CN111523533A
CN111523533A CN201910106122.1A CN201910106122A CN111523533A CN 111523533 A CN111523533 A CN 111523533A CN 201910106122 A CN201910106122 A CN 201910106122A CN 111523533 A CN111523533 A CN 111523533A
Authority
CN
China
Prior art keywords
convolution kernel
candidate
map data
feature map
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910106122.1A
Other languages
Chinese (zh)
Other versions
CN111523533B (en
Inventor
杨攸奕
武元琪
李名杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910106122.1A priority Critical patent/CN111523533B/en
Publication of CN111523533A publication Critical patent/CN111523533A/en
Application granted granted Critical
Publication of CN111523533B publication Critical patent/CN111523533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The application provides a method and a device for determining the area where an object is located from an image. The method for determining the area where the object is located from the image comprises the following steps: acquiring an image to be identified; acquiring feature map data of the image to be identified; obtaining object scores in the feature map data; and determining the area of the object in the image to be recognized according to the object score. By the method, the calculation efficiency of object identification is improved.

Description

Method and device for determining region of object from image
Technical Field
The present application relates to the field of object recognition, and in particular, to a method and an apparatus for determining an area where an object is located from an image.
Background
At present, an object identification method based on a region is always the mainstream method in the field of object identification. Currently, most advanced region-based object detectors are implemented using a region-of-interest pooling layer in conjunction with a region candidate network.
By learning embedded spatial information from higher-order Convolutional Neural Network (CNN) features, region-based object detectors have made tremendous advances over traditional non-CNN methods such as Deformable Part Models (DPMs). Recently, a Region-based full Convolutional Network (R-FCNs) reduces the overhead of repeated prediction by a Region-of-interest pooling layer on the final fcn (full Convolutional functional Network) feature map, further optimizing the performance and computational efficiency of object identification.
However, in these current object recognition methods based on regions, a large amount of frame labeling training data must be used to perform model training to obtain object scores, which further results in low calculation efficiency of object recognition. Especially in advanced regional candidate networks (RPNs), when the size of the convolution kernel of the CNN used increases, the amount of calculation of the training parameters becomes very large, further reducing the efficiency of object recognition calculation
Disclosure of Invention
The application provides a method and a device for determining an area where an object is located from an image, so as to improve the calculation efficiency of object identification.
The application provides a method for determining the area of an object from an image, which comprises the following steps:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
Optionally, the acquiring the feature map data of the image to be recognized includes:
and performing feature extraction on the image to be recognized by using a convolutional neural network to obtain feature map data of the image to be recognized.
Optionally, obtaining the object score in the feature map data includes:
carrying out convolution operation on the characteristic image data and a convolution kernel to obtain a convolution result;
and obtaining the object score in the feature map data according to the convolution result.
Optionally, the method for determining the region where the object is located from the image further includes:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
Optionally, the initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.
Optionally, the obtaining the convolution kernel according to the size information of the candidate convolution kernel includes:
determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel;
amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel;
and obtaining the convolution kernel according to the amplified size information of the candidate convolution kernel.
Optionally, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes height information of the enlarged candidate convolution kernel and width information of the enlarged candidate convolution kernel;
the determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel includes: determining the height of the candidate convolution kernel according to the height information of the candidate convolution kernel, and determining the width of the candidate convolution kernel according to the width information of the candidate convolution kernel;
amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel, including: and amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.
Optionally, the performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result includes:
judging whether filling processing needs to be carried out on the feature map data according to the attribute of the feature map data and the convolution kernel;
if so, performing filling processing on the feature map data, and performing convolution operation on the filled feature map data and the convolution kernel to obtain a convolution result.
Optionally, the performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result includes:
performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes;
the obtaining the object score in the feature map data according to the convolution result includes: and obtaining object scores in the feature map data according to convolution results corresponding to the convolution kernels with the first number and different sizes.
Optionally, the method for determining the region where the object is located from the image further includes:
and obtaining the first number of convolution kernels with different sizes according to the attribute of the feature map data.
Optionally, the obtaining, according to the convolution results corresponding to the convolution kernels with the first number and different sizes, the object score in the feature map data includes:
and synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain object scores in the feature map data.
Optionally, the synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain the object score in the feature map data includes:
splicing convolution results corresponding to the convolution kernels with different sizes to obtain spliced feature vectors of the feature map data;
and obtaining an object score of an interest region in the feature map data according to the spliced feature vector, wherein the interest region is a region which may contain the object in the feature map data.
Optionally, the region of interest is determined in the feature map data according to a size of a convolution kernel corresponding to the feature map data.
Optionally, the determining, according to the object score, the area where the object in the image to be recognized is located includes:
according to the object score, determining a target area containing the object in the feature map data from the region of interest;
and determining the area of the object in the image to be recognized according to the target area.
The application provides a device for determining the area of an object from an image, which comprises:
the image acquisition unit is used for acquiring an image to be identified;
the characteristic acquisition unit is used for acquiring characteristic map data of the image to be identified;
an object score obtaining unit, configured to obtain an object score in the feature map data;
and the determining unit is used for determining the area where the object in the image to be identified is located according to the object score.
Optionally, the feature obtaining unit is specifically configured to:
and performing feature extraction on the image to be recognized by using a convolutional neural network to obtain feature map data of the image to be recognized.
Optionally, the object obtaining unit is specifically configured to:
carrying out convolution operation on the characteristic image data and a convolution kernel to obtain a convolution result;
and obtaining the object score in the feature map data according to the convolution result.
Optionally, the apparatus for determining the region where the object is located from the image further includes a convolution kernel obtaining unit, configured to:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
Optionally, in the convolution kernel obtaining unit, an initial height of the candidate convolution kernel is 1, an initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.
Optionally, the convolution kernel obtaining unit is further configured to:
determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel;
amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel:
and obtaining the convolution kernel according to the amplified size information of the candidate convolution kernel.
Optionally, in the convolution kernel obtaining unit, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes the enlarged height information of the candidate convolution kernel and the enlarged width information of the candidate convolution kernel;
the determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel includes: determining the height of the candidate convolution kernel according to the height information of the candidate convolution kernel, and determining the width of the candidate convolution kernel according to the width information of the candidate convolution kernel;
the amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel includes: and amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.
Optionally, the convolution kernel obtaining unit is further configured to:
judging whether filling processing needs to be carried out on the feature map data according to the attribute of the feature map data and the convolution kernel;
if so, performing filling processing on the feature map data, and performing convolution operation on the filled feature map data and the convolution kernel to obtain a convolution result.
Optionally, the object obtaining unit is specifically configured to:
performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes;
the obtaining the object score in the feature map data according to the convolution result includes: and obtaining object scores in the feature map data according to convolution results corresponding to the convolution kernels with the first number and different sizes.
Optionally, the object obtaining unit further includes convolution kernel obtaining units with different sizes, and the convolution kernel obtaining units with different sizes are configured to:
and obtaining the first number of convolution kernels with different sizes according to the attribute of the feature map data.
Optionally, the object obtaining unit is specifically configured to:
and synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain object scores in the feature map data.
Optionally, the object obtaining unit is further configured to:
splicing convolution results corresponding to the convolution kernels with different sizes to obtain spliced feature vectors of the feature map data;
and obtaining an object score of an interest region in the feature map data according to the spliced feature vector, wherein the interest region is a region which may contain the object in the feature map data.
Optionally, the object obtaining unit is further configured to:
the region of interest is determined in the feature map data based on a size of a convolution kernel corresponding to the feature map data.
Optionally, the determining unit is specifically configured to:
according to the object score, determining a target area containing the object in the feature map data from the region of interest;
and determining the area of the object in the image to be recognized according to the target area.
The application provides an electronic device, the electronic device includes:
a processor;
a memory for storing a linked list processing program, which when read executed by the processor performs the following operations:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
The present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
The application provides a method for determining the area of an object from an object to be identified, which comprises the following steps:
acquiring an object to be identified;
acquiring characteristic diagram data of the object to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the object to be recognized according to the object score.
Optionally, the obtaining the object score in the feature map data includes:
carrying out convolution operation on the characteristic image data and a convolution kernel to obtain a convolution result;
and obtaining the object score in the feature map data according to the convolution result.
Optionally, the method for determining the area where the object is located from the object to be recognized further includes:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
Optionally, the initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.
Optionally, the performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result includes:
performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes;
the obtaining the object score in the feature map data according to the convolution result includes: and obtaining object scores in the feature map data according to convolution results corresponding to the convolution kernels with the first number and different sizes.
Compared with the prior art, the method has the following advantages:
by adopting the method for determining the area where the object is located from the image, the object score in the feature map data is directly obtained, so that in the process of object recognition, a large amount of frame marking training data is not needed for model training, and the calculation efficiency of object recognition is improved.
Drawings
FIG. 1 is a flow chart of a method of a first embodiment of the present application;
FIG. 2 is a schematic diagram of a first embodiment of the present application for determining an area of an object from an image;
fig. 3 is a Network architecture diagram of a bing (binary normalized distributions) -RPN (Region candidate Network) according to a first embodiment of the present application;
fig. 4 is a schematic diagram of an object recognition branch of a BING-RPN network according to a first embodiment of the present application;
FIG. 5 is a diagram illustrating a convolution calculation in a BING-RPN network according to a first embodiment of the present application;
fig. 6 is a schematic view of an apparatus according to a second embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The first embodiment of the application provides a method for determining the area where an object is located from an image. Please refer to fig. 1, which is a flowchart illustrating a method according to a first embodiment of the present application. A method for determining a region where an object is located from an image according to a first embodiment of the present application is described in detail below with reference to fig. 1. The method comprises the following steps:
step S101: and acquiring an image to be identified.
This step is used to obtain the image to be identified.
Please refer to fig. 2, which is a schematic diagram illustrating a process of determining an area of an object from an image according to the present embodiment. The first row in fig. 2 provides an image comprising a bird, the background of which comprises green grass. The object to be identified is an image containing a bird. Generally, a bird in an image is taken as an object to be recognized. By adopting the method provided by the embodiment, the target area including the object to be recognized is obtained from the image, and then the object is recognized.
Please refer to fig. 3, which is a network architecture diagram of the BING-RPN according to the method of the present embodiment. In fig. 3, the image to be recognized is the image including the dog on the leftmost side in the figure.
Unlike the conventional BING, the present embodiment provides an improved version of the BING method, please refer to fig. 4 and 5. Fig. 4 is a schematic diagram of an object identification branch of the BING-RPN network provided in this embodiment. FIG. 5 is a diagram of convolution calculations in a BING-RPN network. The following steps will be described in more detail with reference to fig. 4 and 5.
Step S102: and acquiring the characteristic diagram data of the image to be identified.
The step is used for obtaining the characteristic image data of the image to be identified.
The acquiring of the feature map data of the image to be recognized includes:
and performing feature extraction on the image to be recognized by using a convolutional neural network to obtain feature map data of the image to be recognized.
The convolutional neural network in this embodiment may be a commonly used ResNet, inclusion V3, inclusion V4, or the like. Referring to fig. 3, the backbone CNN network is the convolutional neural network.
Step S103: and obtaining the object score in the characteristic map data.
The step is used for obtaining the object score in the feature map data by using a negative Laplacian operator.
The laplacian is a tool commonly used in image processing, and is used for sharpening images. The method provided by the embodiment uses the negative laplacian operator in the calculation of the object score, and obtains high object recognition efficiency.
The object score (object score) is used in the RPN network, and refers to the probability that a candidate region in feature map data corresponding to an image to be recognized contains an object to be recognized. The value of the object score is between 0 and 1, and the larger the value is, the greater the probability that the representative region contains the object to be identified is.
The obtaining the object score in the feature map data by using the negative laplacian operator includes:
carrying out convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative Laplacian;
and obtaining the object score in the feature map data according to the convolution result.
The feature map data x in the present embodiment can be expressed by the following formula.
x∈Rh×w×C
Where h denotes the height of the feature map data, w denotes the width of the feature map data, and C denotes the number of channels of the feature map data. For example, the feature map data x may be an 8 × 8 × 1536 data block employed in inclusion-V4.
In the BING-RPN provided in this embodiment, the feature map data x is subjected to multiple parallel deep convolution operations, and a convolution kernel of the deep convolution operation is obtained by using the following formula:
Figure BDA0001966220900000091
wherein the content of the first and second substances,
Figure BDA0001966220900000092
and is
Figure BDA0001966220900000093
Referring to fig. 4, the ring-based bin-RPN provided in this embodiment outputs N C-dimensional region candidates. Wherein N is obtained using the following formula:
Figure BDA0001966220900000094
where h is the height of the input feature map data, w is the width of the input feature map data, hθ,wθIs the height and width of the convolution kernel of different sizes multiplied by 1 × 1 as described above.
The method for determining the area where the object is located from the image further comprises the following steps:
and obtaining the first number of convolution kernels with different sizes according to the attribute of the feature map data.
Through the above formula for obtaining N, it can be seen that the first number N can be obtained from the height and width data of the feature map data.
The method for determining the area where the object is located from the image further comprises the following steps:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
The initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.
For example, convolution kernel sizes of 1 × 1, 2 × 1, 4 × 1, 8 × 1, 1 × 2, 2 × 2, 4 × 2, 8 × 2, 1 × 4, 2 × 4, 4 × 4, 8 × 4, 1 × 8, 2 × 8, 4 × 8, 8 × 8 can be obtained according to the above-described method. By adopting the search strategy of the convolution kernel which is multiplied from 1 multiplied by 1, the acquisition efficiency of the convolution kernel is improved. Please refer to (a) and (b) in fig. 5. In fig. 5, the bold boxes represent convolution kernels, and the right (0, 0, 1, 1) represents (x, y, convolution kernel width, convolution kernel height), respectively, x, y being the coordinates of the top left corner of the above-mentioned boxes.
The obtaining the convolution kernel according to the size information of the candidate convolution kernel includes:
determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel;
amplifying the size of the candidate convolution kernel according to the size of the negative Laplace operator to obtain the amplified size information of the candidate convolution kernel;
and obtaining the convolution kernel according to the amplified size information of the candidate convolution kernel.
Because objects are generally considered to have good closed boundaries, the viewability field obtained from a Region of interest (Region of interest) is not sufficient for object recognition. The method for determining the region where the object is located from the image provided by the embodiment enlarges the size of the convolution kernel.
The size information of the candidate convolution kernel comprises height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the amplified size information of the candidate convolution kernel comprises the amplified height information of the candidate convolution kernel and the amplified width information of the candidate convolution kernel;
the determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel includes: determining the height of the candidate convolution kernel according to the height information of the candidate convolution kernel, and determining the width of the candidate convolution kernel according to the width information of the candidate convolution kernel;
the amplifying the size of the candidate convolution kernel according to the size of the negative laplacian operator to obtain the amplified size information of the candidate convolution kernel includes: and according to the height of the negative Laplace operator, amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and according to the width of the negative Laplace operator, amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.
Performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result, including:
judging whether filling processing needs to be carried out on the feature map data according to the attribute of the feature map data and the convolution kernel;
if so, performing filling processing on the feature map data, and performing convolution operation on the filled feature map data and the convolution kernel to obtain a convolution result.
Since the amplification for the convolution kernel may cause unevenness in convolution calculation, a 0-filling operation needs to be performed for the feature map data. Please refer to (c) and (d) in fig. 5. Taking fig. 5 (d) as an example, pad (2, 4) indicates that, on the basis of 8 × 8 feature map data, 2 rows are newly expanded in each of the height direction, the upward direction and the downward direction. In the width direction, the expansion is newly performed by 4 columns in both the left and right directions. The expanded rows and columns are filled with a 0 complement.
Performing convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative laplacian operator, and the method comprises the following steps:
performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes, wherein weights of the first number of convolution kernels with different sizes are negative Laplace operators;
the obtaining the object score in the feature map data according to the convolution result includes: and obtaining object scores in the feature map data according to convolution results corresponding to the convolution kernels with the first number and different sizes.
Assuming that the height and width of a given negative Laplacian are both k, if a field of view k × k times is to be obtained, then a size of (k · h) may be usedθ)×(k·wθ) And (3) performing a padding process of (k-1)/2h, (k-1)/2w and complementing 0 to the convolution kernel. According to the analysis, the BING-RPN provided by the present embodiment can be expressed using the following formula:
BING(x;Θ,k)={Convdepthwise(x'; theta) | theta ∈ theta (k) } formula 1
Where x' is fill data after fill processing by 0-filling of (k-1)/2h, (k-1)/2w is performed for the input feature map data x, k is the magnification ratio of the visual field, and Θ (k) is a set of convolution kernels that follow the change in k. When k is 1, the BING algorithm is equivalent to the BING algorithm that is not filled with 0. Please refer to fig. 4. In fig. 4, the feature map data with 8 × 8 × C at the bottom, the second last row is the filled size, such as the filled size (2, 1), which represents two rows expanding in two directions, one up and one down, and one column expanding in one left and one right. And (4) complementing 0 on the expanded data. The third to last row is a convolution kernel of 16 sizes such as 3 × 3, 6 × 3, 12 × 3, 24 × 3, 3 × 6, 6 × 6, 12 × 6, 24 × 6, 3 × 12, 6 × 12, 24 × 24, 12 × 24, 24 × 24, and 24 × 24 obtained by performing an enlargement with k ═ 3 for the previously obtained convolution kernels of 1 × 1, 2 × 1, 4 × 1, 8 × 1, 1 × 2, 2 × 2, 4 × 2, 8 × 2, 1 × 8, and 8 × 8. The weights of the convolution kernels adopt an artificially given negative laplacian operator.
In existing BING implementations, a training model is used to obtain object scores. The method provided in this embodiment finds that calculating the object score using the negative laplacian operator is feasible, and is proved by the commonly used ImageNet 2012 experimental data set in the industry. Based on the above analysis, the convolution kernel can be selected as follows:
Figure BDA0001966220900000122
wherein the content of the first and second substances,
Figure BDA0001966220900000121
is a two-dimensional Laplace filter, the variable σx,σyProportional to the size of the laplace filter. Can set [ sigma ]x,σy]=[1.4,1.4]The size of the corresponding convolution kernel is 9 × 9. The theta in said equation 2 is expressedobjSubstituting equation 1 into a vector of dimension N × C, each element of which corresponds to an object in a channel, laplace filter has a minimum size of 3 × 3, so that k 3 is used to maintain the resolution of the outputobj', 3). FIG. 4 shows BING (x; Θ)obj', 3). As can be seen from fig. 4, the calculation process includes: firstly, feature map data are obtained, then filling processing is carried out, convolution operation is carried out on the feature map data and convolution kernels with different sizes, obtained convolution operation results are synthesized, and object scores are obtained.
The obtaining the object scores in the feature map data according to the convolution results corresponding to the convolution kernels of the first number and different sizes includes:
and synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain object scores in the feature map data.
The synthesizing the convolution results corresponding to the convolution kernels with the first number of different sizes to obtain the object scores in the feature map data includes:
splicing convolution results corresponding to the convolution kernels with different sizes to obtain spliced feature vectors of the feature map data;
and obtaining an object score of an interest region in the feature map data according to the spliced feature vector, wherein the interest region is a region which may contain the object in the feature map data.
The region of interest is determined in the feature map data based on a size of a convolution kernel corresponding to the feature map data.
Step S104: and determining the area of the object in the image to be recognized according to the object score.
The step is used for determining the area where the object in the image to be identified is located according to the object score.
The determining the area where the object in the image to be recognized is located according to the object score comprises the following steps:
according to the object score, determining a target area containing the object in the feature map data from the region of interest;
and determining the area of the object in the image to be recognized according to the target area.
Referring to fig. 3, the target area is the frame containing the dog.
In the above embodiment, a method for determining a region where an object is located from an image is provided, and correspondingly, an apparatus for determining a region where an object is located from an image is also provided. Please refer to fig. 6, which is a flowchart of an embodiment of an apparatus for determining an area of an object from an image according to the present application. Since this embodiment, i.e., the second embodiment, is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The device embodiments described below are merely illustrative.
The device for determining the area where the object is located from the image comprises the following steps:
an image acquisition unit 601 configured to acquire an image to be recognized;
a feature obtaining unit 602, configured to obtain feature map data of the image to be identified;
an object score obtaining unit 603, configured to obtain an object score in the feature map data by using a negative laplacian operator;
a determining unit 604, configured to determine, according to the object score, an area where an object in the image to be identified is located.
In this embodiment, the feature obtaining unit is specifically configured to:
and performing feature extraction on the object to be identified by using a convolutional neural network to obtain feature map data of the object to be identified.
In this embodiment, the object obtaining unit is specifically configured to:
carrying out convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative Laplacian;
and obtaining the object score in the feature map data according to the convolution result.
In this embodiment, the apparatus for determining the region where the object is located from the image further includes a convolution kernel obtaining unit, configured to:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
In this embodiment, in the convolution kernel obtaining unit, an initial height of the candidate convolution kernel is 1, an initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.
In this embodiment, the convolution kernel obtaining unit is further configured to:
determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel;
amplifying the size of the candidate convolution kernel according to the size of the negative Laplace operator to obtain the amplified size information of the candidate convolution kernel;
and obtaining the convolution kernel according to the amplified size information of the candidate convolution kernel.
In this embodiment, in the convolution kernel obtaining unit, the size information of the candidate convolution kernel includes height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel includes the height information of the enlarged candidate convolution kernel and the width information of the enlarged candidate convolution kernel;
the determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel includes: determining the height of the candidate convolution kernel according to the height information of the candidate convolution kernel, and determining the width of the candidate convolution kernel according to the width information of the candidate convolution kernel;
the amplifying the size of the candidate convolution kernel according to the size of the negative laplacian operator to obtain the amplified size information of the candidate convolution kernel includes: and according to the height of the negative Laplace operator, amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and according to the width of the negative Laplace operator, amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.
In this embodiment, the convolution kernel obtaining unit is further configured to:
judging whether filling processing needs to be carried out on the feature map data according to the attribute of the feature map data and the convolution kernel;
if so, performing filling processing on the feature map data, and performing convolution operation on the filled feature map data and the convolution kernel to obtain a convolution result.
In this embodiment, the object obtaining unit is specifically configured to:
performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes, wherein weights of the first number of convolution kernels with different sizes are negative Laplace operators;
the obtaining the object score in the feature map data according to the convolution result includes: and obtaining object scores in the feature map data according to convolution results corresponding to the convolution kernels with the first number and different sizes.
In this embodiment, the object obtaining unit further includes convolution kernel obtaining units with different sizes, and the convolution kernel obtaining units with different sizes are configured to:
and obtaining the first number of convolution kernels with different sizes according to the attribute of the feature map data.
Optionally, the object obtaining unit is specifically configured to:
and synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain object scores in the feature map data.
In this embodiment, the object obtaining unit is further configured to:
splicing convolution results corresponding to the convolution kernels with different sizes to obtain spliced feature vectors of the feature map data;
and obtaining an object score of an interest region in the feature map data according to the spliced feature vector, wherein the interest region is a region which may contain the object in the feature map data.
In this embodiment, the object obtaining unit is further configured to:
the region of interest is determined in the feature map data based on a size of a convolution kernel corresponding to the feature map data.
In this embodiment, the determining unit is specifically configured to:
according to the object score, determining a target area containing the object in the feature map data from the region of interest;
and identifying the object in the object to be identified according to the target area.
A third embodiment of the present application provides an electronic apparatus, including:
a processor;
a memory for storing a linked list processing program, which when read executed by the processor performs the following operations:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
A fourth embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
The fifth embodiment of the present application provides a method for determining the area of an object from an object to be recognized, which is substantially similar to the first embodiment, so that the description is simple, and please refer to the relevant parts of the first embodiment for detailed description.
The application provides a method for determining a region where an object is located from an image, which comprises the following steps:
acquiring an object to be identified;
acquiring characteristic diagram data of the object to be identified;
obtaining object scores in the feature map data by using a negative Laplacian operator;
and identifying the object in the object to be identified according to the object classification. The object to be identified can be a static picture or a video acquired by video acquisition equipment such as a camera.
In this embodiment, the obtaining the object score in the feature map data by using the negative laplacian includes:
carrying out convolution operation on the feature map data and a convolution kernel to obtain a convolution result, wherein the weight of the convolution kernel is a negative Laplacian;
and obtaining the object score in the feature map data according to the convolution result.
In this embodiment, the method for determining the area where the object is located from the image further includes:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more operators (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (20)

1. A method for determining a region of an object from an image, comprising:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
2. The method according to claim 1, wherein the obtaining of the feature map data of the image to be recognized comprises:
and performing feature extraction on the image to be recognized by using a convolutional neural network to obtain feature map data of the image to be recognized.
3. The method according to claim 1, wherein the obtaining the object score in the feature map data comprises:
carrying out convolution operation on the characteristic image data and a convolution kernel to obtain a convolution result;
and obtaining the object score in the feature map data according to the convolution result.
4. The method of claim 3, further comprising:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
5. The method of claim 4, wherein the initial height of the candidate convolution kernel is 1, the initial width of the candidate convolution kernel is 1, the first multiple is 2, and the second multiple is 2.
6. The method according to claim 4, wherein the obtaining the convolution kernel according to the size information of the candidate convolution kernel comprises:
determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel;
amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel;
and obtaining the convolution kernel according to the amplified size information of the candidate convolution kernel.
7. The method according to claim 6, wherein the size information of the candidate convolution kernel comprises height information of the candidate convolution kernel and width information of the candidate convolution kernel, and the enlarged size information of the candidate convolution kernel comprises the height information of the enlarged candidate convolution kernel and the enlarged width information of the candidate convolution kernel;
the determining the size of the candidate convolution kernel according to the size information of the candidate convolution kernel includes: determining the height of the candidate convolution kernel according to the height information of the candidate convolution kernel, and determining the width of the candidate convolution kernel according to the width information of the candidate convolution kernel;
the amplifying the size of the candidate convolution kernel to obtain the amplified size information of the candidate convolution kernel includes: and amplifying the height of the candidate convolution kernel to obtain the amplified height information of the candidate convolution kernel, and amplifying the width of the candidate convolution kernel to obtain the amplified width information of the candidate convolution kernel.
8. The method according to claim 4, wherein the convolving the feature map data with a convolution kernel to obtain a convolution result comprises:
judging whether filling processing needs to be carried out on the feature map data according to the attribute of the feature map data and the convolution kernel;
if so, performing filling processing on the feature map data, and performing convolution operation on the filled feature map data and the convolution kernel to obtain a convolution result.
9. The method according to claim 3, wherein the obtaining a convolution result by performing a convolution operation on the feature map data and a convolution kernel comprises:
performing convolution operation on the feature map data and a first number of convolution kernels with different sizes respectively to obtain convolution results corresponding to the first number of convolution kernels with different sizes;
the obtaining the object score in the feature map data according to the convolution result includes: and obtaining object scores in the feature map data according to convolution results corresponding to the convolution kernels with the first number and different sizes.
10. The method of claim 9, further comprising:
and obtaining the first number of convolution kernels with different sizes according to the attribute of the feature map data.
11. The method according to claim 9, wherein obtaining the object score in the feature map data according to the convolution result corresponding to the first number of convolution kernels with different sizes includes:
and synthesizing the convolution results corresponding to the convolution kernels with different sizes in the first number to obtain object scores in the feature map data.
12. The method according to claim 11, wherein the synthesizing the convolution results corresponding to the first number of convolution kernels with different sizes to obtain the object score in the feature map data comprises:
splicing convolution results corresponding to the convolution kernels with different sizes to obtain spliced feature vectors of the feature map data;
and obtaining an object score of an interest region in the feature map data according to the spliced feature vector, wherein the interest region is a region which may contain the object in the feature map data.
13. The method of claim 12, wherein the region of interest is determined from the feature map data based on a size of a convolution kernel associated with the feature map data.
14. The method according to claim 12, wherein the determining the region of the object in the image to be recognized according to the object score comprises:
according to the object score, determining a target area containing the object in the feature map data from the region of interest;
and determining the area of the object in the image to be recognized according to the target area.
15. An apparatus for determining a region of an object from an image, comprising:
the image acquisition unit is used for acquiring an image to be identified;
the characteristic acquisition unit is used for acquiring characteristic map data of the image to be identified;
an object score obtaining unit, configured to obtain an object score in the feature map data;
and the determining unit is used for determining the area where the object in the image to be identified is located according to the object score.
16. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing a linked list processing program, which when read executed by the processor performs the following operations:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
17. A computer-readable storage medium having a computer program stored thereon, the program, when executed by a processor, performing the steps of:
acquiring an image to be identified;
acquiring feature map data of the image to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the image to be recognized according to the object score.
18. A method for determining the area of an object from an object to be identified, comprising:
acquiring an object to be identified;
acquiring characteristic diagram data of the object to be identified;
obtaining object scores in the feature map data;
and determining the area of the object in the object to be recognized according to the object score.
19. The method for determining the region where the object is located from the object to be identified according to claim 18, wherein the obtaining the object score in the feature map data comprises:
carrying out convolution operation on the characteristic image data and a convolution kernel to obtain a convolution result;
and obtaining the object score in the feature map data according to the convolution result.
20. The method for determining the area where the object is located from the objects to be identified according to claim 19, further comprising:
determining an initial height of a candidate convolution kernel and an initial width of the candidate convolution kernel;
taking the initial height as a reference, and performing multiplication operation of a first multiple to obtain a candidate height of the candidate convolution kernel;
taking the initial width as a reference, performing multiplication operation of a second multiple to obtain a candidate width of the candidate convolution kernel;
obtaining size information of the candidate convolution kernel according to the initial height, the initial width, the candidate height and the candidate width;
and obtaining the convolution kernel according to the size information of the candidate convolution kernel.
CN201910106122.1A 2019-02-01 2019-02-01 Method and device for determining area of object from image Active CN111523533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910106122.1A CN111523533B (en) 2019-02-01 2019-02-01 Method and device for determining area of object from image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910106122.1A CN111523533B (en) 2019-02-01 2019-02-01 Method and device for determining area of object from image

Publications (2)

Publication Number Publication Date
CN111523533A true CN111523533A (en) 2020-08-11
CN111523533B CN111523533B (en) 2023-07-07

Family

ID=71900036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910106122.1A Active CN111523533B (en) 2019-02-01 2019-02-01 Method and device for determining area of object from image

Country Status (1)

Country Link
CN (1) CN111523533B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378873A (en) * 2021-01-13 2021-09-10 杭州小创科技有限公司 Algorithm for determining attribution or classification of target object

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063719A (en) * 2014-06-27 2014-09-24 深圳市赛为智能股份有限公司 Method and device for pedestrian detection based on depth convolutional network
CN104933722A (en) * 2015-06-29 2015-09-23 电子科技大学 Image edge detection method based on Spiking-convolution network model
CN107315995A (en) * 2017-05-18 2017-11-03 中国科学院上海微系统与信息技术研究所 A kind of face identification method based on Laplce's logarithm face and convolutional neural networks
CN109086656A (en) * 2018-06-06 2018-12-25 平安科技(深圳)有限公司 Airport foreign matter detecting method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063719A (en) * 2014-06-27 2014-09-24 深圳市赛为智能股份有限公司 Method and device for pedestrian detection based on depth convolutional network
CN104933722A (en) * 2015-06-29 2015-09-23 电子科技大学 Image edge detection method based on Spiking-convolution network model
CN107315995A (en) * 2017-05-18 2017-11-03 中国科学院上海微系统与信息技术研究所 A kind of face identification method based on Laplce's logarithm face and convolutional neural networks
CN109086656A (en) * 2018-06-06 2018-12-25 平安科技(深圳)有限公司 Airport foreign matter detecting method, device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378873A (en) * 2021-01-13 2021-09-10 杭州小创科技有限公司 Algorithm for determining attribution or classification of target object

Also Published As

Publication number Publication date
CN111523533B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
JP2019087252A (en) Apparatus and method for performing deconvolution operation in neural network
CN111091123A (en) Text region detection method and equipment
US9053540B2 (en) Stereo matching by census transform and support weight cost aggregation
US20130287250A1 (en) Method and apparatus for tracking object in image data, and storage medium storing the same
CN110852349A (en) Image processing method, detection method, related equipment and storage medium
CN111696110B (en) Scene segmentation method and system
CN111814905A (en) Target detection method, target detection device, computer equipment and storage medium
CN114998595B (en) Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN110827292B (en) Video instance segmentation method and device based on convolutional neural network
CN114359665A (en) Training method and device of full-task face recognition model and face recognition method
CN113689434A (en) Image semantic segmentation method based on strip pooling
CN110209863B (en) Method and equipment for searching similar pictures
CN111523533B (en) Method and device for determining area of object from image
CN114241388A (en) Video instance segmentation method and segmentation device based on space-time memory information
CN111027551B (en) Image processing method, apparatus and medium
CN116091784A (en) Target tracking method, device and storage medium
CN113963236A (en) Target detection method and device
US20200372280A1 (en) Apparatus and method for image processing for machine learning
CN114724175A (en) Pedestrian image detection network, detection method, training method, electronic device, and medium
CN113947524A (en) Panoramic picture saliency prediction method and device based on full-convolution graph neural network
JP2018010359A (en) Information processor, information processing method, and program
CN111626305B (en) Target detection method, device and equipment
CN112634286A (en) Image cropping method and device
CN113362351A (en) Image processing method and device, electronic equipment and storage medium
CN110866431B (en) Training method of face recognition model, and face recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant