CN111462121A - Image cropping method, system, device and medium based on image semantic understanding - Google Patents

Image cropping method, system, device and medium based on image semantic understanding Download PDF

Info

Publication number
CN111462121A
CN111462121A CN202010206880.3A CN202010206880A CN111462121A CN 111462121 A CN111462121 A CN 111462121A CN 202010206880 A CN202010206880 A CN 202010206880A CN 111462121 A CN111462121 A CN 111462121A
Authority
CN
China
Prior art keywords
image
cropping
width
height
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010206880.3A
Other languages
Chinese (zh)
Inventor
罗超
黄小虎
吉聪睿
李巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ctrip Business Co Ltd
Original Assignee
Shanghai Ctrip Business Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ctrip Business Co Ltd filed Critical Shanghai Ctrip Business Co Ltd
Priority to CN202010206880.3A priority Critical patent/CN111462121A/en
Publication of CN111462121A publication Critical patent/CN111462121A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image cropping method, an image cropping system, image cropping equipment and a medium based on image semantic understanding, wherein the image cropping method based on image semantic understanding comprises the following steps: acquiring an image semantic segmentation model; inputting an image to be cut into the image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of the image to be segmented according to the semantic segmentation result of the image; acquiring the center of gravity of the region of interest; and taking the gravity center of the region of interest as a cutting center, obtaining a cutting window, and cutting the image to be cut. The image is cut based on the image semantic understanding method, and the cut image main body is ensured to be more complete.

Description

Image cropping method, system, device and medium based on image semantic understanding
Technical Field
The invention relates to the technical field of image semantic understanding and image segmentation, in particular to an image cropping method, an image cropping system, image cropping equipment and an image cropping medium based on image semantic understanding.
Background
The image is used as a direct and efficient display mode, and has direct and important influence on user experience and order conversion. Particularly, the display mode of the hotel images has more important influence on user experience and order conversion. Under a common condition, when a picture uploaded by a hotel at the back end arrives at a front-end display page, the picture can be cut into corresponding different sizes according to different display pages. Specifically, images of different display pages in APP (Application) are often displayed in different aspect ratio tiles. The current graph cutting algorithm directly uses the center of an original graph as an image cutting point, cutting is rough, loss of key content and key targets after graph cutting is caused, misleading is easily caused for a user, and user experience is poor.
Disclosure of Invention
The invention aims to overcome the defect that the target main body is incompletely displayed based on an image center cropping method in the prior art, and provides an image cropping method, an image cropping system, image cropping equipment and an image cropping medium based on image semantic understanding.
The invention solves the technical problems through the following technical scheme:
the invention provides an image cropping method based on image semantic understanding, which comprises the following steps:
acquiring an image semantic segmentation model;
inputting an image to be cut into the image semantic segmentation model to obtain an image semantic segmentation result;
obtaining an interested area of the image to be cut according to the semantic segmentation result of the image;
acquiring the center of gravity of the region of interest;
and taking the gravity center of the region of interest as a cutting center, obtaining a cutting window, and cutting the image to be cut.
Preferably, the step of obtaining the image semantic segmentation model comprises:
acquiring an image, zooming the image, and performing feature extraction on the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;
performing feature fusion based on the feature image to obtain a fusion feature image;
performing convolution operation on the fusion characteristic image, and acquiring the maximum probability value of each dimensionality through a softmax function to obtain an image semantic segmentation model to be trained;
and acquiring an image to be trained, and inputting the image into the image semantic segmentation model to be trained for training to obtain the image semantic segmentation model.
Preferably, the step of performing feature fusion based on the feature image to obtain a fused feature image includes:
performing pooling operation on the characteristic images by adopting four groups of pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;
connecting the four groups of first characteristic images to a convolution kernel of 1 × 1 × C/4, and performing convolution respectively to obtain four corresponding groups of second characteristic images;
sampling the four groups of second characteristic images by a bilinear difference method to obtain four corresponding groups of third characteristic images;
and connecting according to channel dimensions based on the four groups of third feature images and the feature images to obtain the fusion feature image.
Preferably, the step of obtaining the region of interest of the image to be cropped according to the semantic segmentation result of the image includes:
different interesting categories are preset for different types of images to be cut;
the image semantic segmentation result comprises the category of each pixel point in the image to be cut and different areas divided according to the categories of different pixel points;
and determining the area matched with the interested category corresponding to the image to be cropped as the interested area.
Preferably, the step of obtaining a cropping window with the center of gravity of the region of interest as a cropping center and cropping the image to be cropped includes:
based on the gravity center of the region of interest, calculating the width and height of the cropping window according to the size of the front-end display page;
moving the cropping window for multiple times, and calculating the sum of the distances from four vertexes of the cropping window to the gravity center of the region of interest after moving each time;
and selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.
Preferably, the step of calculating the width and height of the cropping window according to the front-end display page size based on the center of gravity of the region of interest includes:
acquiring the width, height and width-height ratio of the image to be cut and the width, height and width-height ratio of the front-end display page;
calculating the width and the height of the cutting window according to a calculation formula of the width and the height of the cutting window;
the width and height calculation formula of the cutting window is as follows:
if R is0>R1,W2=H0×R1,H2=H0
If R is0<R1,W2=W0,H2=W0/R1
In the formula, W2Indicating width of cutting window H2Height of cutting window R0Representing the aspect ratio, R, of the image to be cropped1Aspect ratio, H, representing the front-end presentation page0High, W, representing the image to be cropped0Indicating the width of the image to be cropped.
The invention provides an image cropping system based on image semantic understanding, which comprises:
the first acquisition module is used for acquiring an image semantic segmentation model;
the input module is used for inputting the image to be segmented into the image semantic segmentation model so as to obtain an image semantic segmentation result;
the second acquisition module is used for acquiring the region of interest of the image to be cut according to the semantic segmentation result of the image;
the third acquisition module is used for acquiring the gravity center of the region of interest;
and the cutting module is used for acquiring a cutting window by taking the gravity center of the region of interest as a cutting center and cutting the image to be cut.
Preferably, the first obtaining module includes:
the first acquisition unit is used for acquiring an image, zooming the image and extracting the features of the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;
the fusion unit is used for carrying out feature fusion based on the feature image to obtain a fusion feature image;
the first calculation unit is used for performing convolution operation on the fusion characteristic image and acquiring the maximum probability value of each dimensionality through a softmax function so as to obtain an image semantic segmentation model to be trained;
and the second acquisition unit is used for acquiring the image to be trained and inputting the image into the image semantic segmentation model to be trained for training so as to obtain the image semantic segmentation model.
Preferably, the fusion unit includes:
the pooling subunit is used for performing pooling operation on the characteristic images by adopting four pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;
a convolution subunit, configured to connect the four sets of first feature images to convolution kernels of 1 × 1 × C/4, and perform convolution respectively to obtain four corresponding sets of second feature images;
the sampling subunit is used for performing sampling operation on the four groups of second characteristic images by adopting a bilinear difference method to obtain four groups of corresponding third characteristic images;
and the connecting subunit is used for connecting the four groups of third characteristic images and the characteristic images according to channel dimensions to obtain the fusion characteristic image.
Preferably, the second obtaining module includes:
the preset unit is used for correspondingly presetting different interesting categories for images to be cut of different categories;
the dividing unit is used for enabling the image semantic segmentation result to comprise the category of each pixel point in the image to be cut and different areas divided according to the categories of different pixel points;
and the determining unit is used for determining an area matched with the interested category corresponding to the image to be cropped as the interested area.
Preferably, the cutting module comprises:
the second calculation unit is used for calculating the width and the height of the cropping window according to the size of the front-end display page based on the gravity center of the region of interest;
the third calculating unit is used for moving the cropping window for multiple times and calculating the sum of the distances from the four vertexes of the cropping window to the gravity center of the region of interest after each movement;
and the selection unit is used for selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.
Preferably, the third calculation unit includes:
the third acquisition unit is used for acquiring the width, height and width-height ratio of the image to be cut and the width, height and width-height ratio of the front-end display page;
the calculating unit is used for calculating the width and the height of the cutting window according to a calculating formula of the width and the height of the cutting window;
the width and height calculation formula of the cutting window is as follows:
if R is0>R1,W2=H0×R1,H2=H0
If R is0<R1,W2=W0,H2=W0/R1
In the formula, W2Indicating width of cutting window H2Height of cutting window R0Representing the aspect ratio, R, of the image to be cropped1Aspect ratio, H, representing the front-end presentation page0High, W, representing the image to be cropped0Indicating the width of the image to be cropped.
The invention also provides electronic equipment which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the image cropping method for semantic understanding of the image.
The present invention also provides a computer-readable storage medium having stored thereon a computer program for execution by a processor of the aforementioned steps of an image cropping method for semantic understanding of an image.
The positive progress effects of the invention are as follows:
the method comprises the steps of obtaining an image semantic segmentation model; inputting an image to be cut into an image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of an image to be cut according to an image semantic segmentation result and obtaining the gravity center of the interested area; and taking the gravity center of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped. Compared with the image center based cutting method in the prior art, the image semantic understanding based cutting method provided by the invention can be used for cutting the image, so that the cut image main body is more complete.
Drawings
Fig. 1 is a flowchart of an image cropping method based on image semantic understanding according to embodiment 1 of the present invention.
FIG. 2 is a flowchart of step 101 in example 1 of the present invention.
FIG. 3 is a flowchart of step 1012 in embodiment 1 of the present invention.
FIG. 4 is a flowchart of step 103 in example 1 of the present invention.
FIG. 5 is a flowchart of step 105 in example 1 of the present invention.
FIG. 6 is a block diagram of an image cropping system based on image semantic understanding according to embodiment 2 of the present invention.
Fig. 7 is a schematic structural diagram of a first obtaining module in embodiment 2 of the present invention.
Fig. 8 is a schematic structural diagram of a fusion unit in embodiment 2 of the present invention.
Fig. 9 is a schematic structural diagram of a second obtaining module in embodiment 2 of the present invention.
Fig. 10 is a schematic structural diagram of a cutting module in embodiment 2 of the present invention.
Fig. 11 is a schematic structural diagram of an electronic device in embodiment 3 of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.
Example 1
As shown in fig. 1, the present embodiment discloses an image cropping method based on image semantic understanding, which includes the following steps:
s101, acquiring an image semantic segmentation model;
s102, inputting an image to be segmented into the image semantic segmentation model to obtain an image semantic segmentation result;
s103, acquiring an interested area of the image to be cut according to the semantic segmentation result of the image;
s104, acquiring the gravity center of the region of interest;
and step S105, obtaining a cropping window by taking the gravity center of the region of interest as a cropping center, and cropping the image to be cropped.
As shown in fig. 2, in this embodiment, step S101 further includes the following steps:
s1011, acquiring an image, zooming the image to 300 × 300, 400 × 400, 500 × 500 or other sizes, and performing feature extraction on the image through a backbone network to obtain a feature image, wherein the backbone network comprises a multilayer CNN network structure;
in this embodiment, the backbone network may be any multi-layer CNN network structure, and in this embodiment, the renet 50 with residual connection is used as a backbone network for semantic segmentation, a full connection layer is removed, and the output of the last convolutional layer is used as an image feature.
Step S1012, performing feature fusion based on the feature image to obtain a fusion feature image;
in this embodiment, the feature fusion includes four operations of pooling (Pool), convolution (Conv), upsampling (Upsample), and fusion (Concat).
S1013, performing convolution operation on the fusion feature image by adopting a convolution kernel of 1x1x150 to obtain a feature map of W × H × 150, obtaining the maximum probability value of each dimension by a softmax function, and finally obtaining an output image of W × H × 2, namely the segmentation result of the image semantic segmentation model to be trained, so as to obtain the image semantic segmentation model to be trained;
and S1014, acquiring an image to be trained, and inputting the image into the image semantic segmentation model to be trained for training to obtain the image semantic segmentation model.
In this embodiment, on the public data set ADE20K, the image semantic segmentation model to be trained is trained until the model converges, and the model M is stored. The data set has 2 million training images, while the images contain annotations of buildings, beds, etc.
As shown in fig. 3, in the present embodiment, step S1012 includes the following sub-steps:
step S10121, performing pooling operation on the characteristic images by adopting four groups of pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;
in this embodiment, in order to obtain global and local semantic features of an image, in this embodiment, four sets of pooling layers with different receptive fields are used, and the feature maps obtained in the previous step are respectively pooled to obtain feature maps with four different sizes, 1 × 1 × C, 2 × 02 × 1C, 3 × 23 × 3C, and 6 × 46 × 5C, where 1 × 1 × C corresponds to the global semantic features of an original drawing, and 2 × 2 × C, 3 × 3 × C, and 6 × 6 × C correspond to the local semantic features of the original drawing.
Step S10122, connecting the four groups of first feature images to convolution kernels of 1 × 1 × C/4, performing convolution respectively to obtain four corresponding groups of second feature images, wherein the sizes of the compressed second feature images after convolution are 1 × 1 × C/4, 2 × 2 × C/4, 3 × 3 × C/4 and 6 × 6 × C/4 respectively;
s10123, sampling the four groups of second characteristic images by a bilinear difference method to obtain four corresponding groups of third characteristic images, wherein the sizes of the third characteristic images after sampling operation are W × H × C/4, W × H × C/4, W × H × C/4 and W × H × C/4 respectively;
and S10124, based on the four groups of third feature images and the feature images, connecting according to channel dimensions to obtain a fusion feature image of the global semantic features and the local semantic features of the fusion image, wherein the feature dimension of the fusion feature image is W × H × 2C.
As shown in fig. 4, in the present embodiment, step S103 includes the following steps:
step S1031, presetting different interest categories for different types of images to be cut;
step S1032, the image semantic segmentation result comprises the category of each pixel point in the image to be segmented and different areas segmented according to the categories of different pixel points;
in this embodiment, the pixel of the region of interest is set to 255, and the pixel of the region of no interest is set to 0, so as to obtain the binarization result, i.e. the result of region of interest extraction. The region of interest for the hotel appearance map is the hotel appearance and the region of interest for the room type map is the hotel room type.
Step S1033, determining an area matching the category of interest corresponding to the image to be cropped as the area of interest.
As shown in fig. 5, in the present embodiment, step S105 includes the following steps:
step S1051, based on the gravity center of the region of interest, calculating the width and height of the cropping window according to the size of the front-end display page;
in the embodiment, the width and the height of the cutting window are calculated according to the width, the height and the aspect ratio of the image to be cut, the width, the height and the aspect ratio of the front-end display page and a calculation formula of the width and the height of the cutting window; the calculation formula of the width and the height of the cutting window is as follows:
if R is0>R1,W2=H0×R1,H2=H0
If R is0<R1,W2=W0,H2=W0/R1
In the formula, W2Indicating width of cutting window H2Height of cutting window R0Representing the aspect ratio, R, of the image to be cropped1Aspect ratio, H, representing the front-end presentation page0High, W, representing the image to be cropped0Indicating the width of the image to be cropped.
Step 1052, moving the cropping window for multiple times, and calculating the sum of the distances from four vertexes of the cropping window to the gravity center of the region of interest after each movement;
and S1053, selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.
The image cropping method based on image semantic understanding disclosed by the embodiment inputs an image to be cropped into an image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of an image to be segmented and obtaining the gravity center of the interested area according to the semantic segmentation result of the image; and taking the gravity center of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped. Compared with the image center based cutting method in the prior art, the image is cut by the image semantic understanding based method, and the cut image main body is ensured to be more complete.
Example 2
As shown in FIG. 6, the embodiment discloses an image cropping system based on image semantic understanding, which comprises a first acquiring module 1, an input module 2, a second acquiring module 3, a third acquiring module 4 and a cropping module 5. The first acquisition module 1 is used for acquiring an image semantic segmentation model;
the input module 2 is used for inputting the image to be cut into the image semantic segmentation model to obtain an image semantic segmentation result;
the second obtaining module 3 is configured to obtain an interesting region of the image to be cropped according to the semantic segmentation result of the image;
the third obtaining module 4 is used for obtaining the gravity center of the region of interest;
the cutting module 5 is used for obtaining a cutting window by taking the gravity center of the region of interest as a cutting center, and cutting the image to be cut.
As shown in fig. 7, the first acquiring module 1 includes a first acquiring unit 11, a fusing unit 12, a first calculating unit 13, and a second acquiring unit 14:
the first obtaining unit 11 is configured to obtain an image, scale the image, and perform feature extraction on the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;
the fusion unit 12 is configured to perform feature fusion based on the feature image to obtain a fusion feature image;
the first calculating unit 13 is configured to perform convolution operation on the fusion feature image, and acquire a maximum probability value of each dimension through a softmax function to obtain an image semantic segmentation model to be trained;
the second obtaining unit 14 is configured to obtain an image to be trained, and input the image into the image semantic segmentation model to be trained for training, so as to obtain the image semantic segmentation model.
As shown in fig. 8, the fusion unit 11 includes:
the pooling subunit 111 is configured to perform pooling operation on the feature images by using four pooling layers with different receptive fields to obtain four corresponding groups of first feature images;
a convolution subunit 112, configured to connect the four sets of first feature images to convolution kernels of 1 × 1 × C/4, and perform convolution respectively to obtain four corresponding sets of second feature images;
the sampling subunit 113 is configured to perform sampling operation on the four groups of second feature images by using a bilinear difference method to obtain four corresponding groups of third feature images;
a connection subunit 114, configured to connect according to a channel dimension based on the four sets of third feature images and the feature images to obtain the fused feature image.
As shown in fig. 9, the second obtaining module 3 includes:
the presetting unit 31 is used for presetting different interest categories for images to be cut of different categories;
the dividing unit 32 is configured to obtain the semantic segmentation result of the image, where the semantic segmentation result includes a category of each pixel in the image to be cut, and different regions divided according to categories of different pixels;
a determining unit 33, configured to determine, as the region of interest, a region that matches the category of interest corresponding to the image to be cropped.
As shown in fig. 10, the cutting module 5 includes:
a second calculating unit 51, configured to calculate a width and a height of the cropping window according to a front-end display page size based on the center of gravity of the region of interest;
in the embodiment, the width and the height of the cutting window are calculated according to the width, the height and the aspect ratio of the image to be cut, the width, the height and the aspect ratio of the front-end display page and a calculation formula of the width and the height of the cutting window; the calculation formula of the width and the height of the cutting window is as follows:
if R is0>R1,W2=H0×R1,H2=H0
If R is0<R1,W2=W0,H2=W0/R1
Formula (II)In, W2Indicating width of cutting window H2Height of cutting window R0Representing the aspect ratio, R, of the image to be cropped1Aspect ratio, H, representing the front-end presentation page0High, W, representing the image to be cropped0Indicating the width of the image to be cropped.
A third calculating unit 52, configured to move the cropping window multiple times, and calculate a sum of distances from four vertices of the cropping window to a center of gravity of the region of interest after each movement;
and the selecting unit 53 is configured to select the corresponding cutting window when the sum of the distances is the minimum value, and cut the image to be cut according to the selected cutting window.
The image cropping system based on image semantic understanding provided by the embodiment inputs an image to be cropped into an image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of an image to be cut and obtaining the gravity center of the interested area according to the semantic segmentation result of the image; and taking the gravity center of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped. Compared with the image center based cutting method in the prior art, the image is cut by the image semantic understanding based method, and the cut image main body is ensured to be more complete.
Example 3
Fig. 11 is a schematic structural diagram of an electronic device according to embodiment 3 of the present invention. The electronic equipment comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the image cropping method based on image semantic understanding provided by embodiment 1. The electronic device 60 shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 11, the electronic device 60 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 60 may include, but are not limited to: the at least one processor 61, the at least one memory 62, and a bus 63 connecting the various system components (including the memory 62 and the processor 61).
The bus 63 includes a data bus, an address bus, and a control bus.
The memory 62 may include volatile memory, such as Random Access Memory (RAM)621 and/or cache memory 622, and may further include Read Only Memory (ROM) 623.
The memory 62 may also include a program/utility 625 having a set (at least one) of program modules 624, such program modules 624 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 61 executes various functional applications and data processing, such as an image cropping method based on semantic understanding of an image provided in embodiment 1 of the present invention, by running a computer program stored in the memory 62.
The electronic device 60 may also communicate with one or more external devices 64 (e.g., keyboard, pointing device, etc.) such communication may be through AN input/output (I/O) interface 65. also, the model-generated device 60 may communicate with one or more networks (e.g., a local area network (L AN), a Wide Area Network (WAN) and/or a public network, such as the Internet) through a network adapter 66. As shown, the network adapter 66 communicates with other modules of the model-generated device 60 through a bus 63. it should be understood that, although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generated device 60, including, but not limited to, microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the image cropping method based on image semantic understanding provided in embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the image cropping method based on image semantic understanding provided in embodiment 1 when the program product runs on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (14)

1. An image cropping method based on image semantic understanding, which is characterized by comprising the following steps:
acquiring an image semantic segmentation model;
inputting an image to be cut into the image semantic segmentation model to obtain an image semantic segmentation result;
obtaining an interested area of the image to be cut according to the semantic segmentation result of the image;
acquiring the center of gravity of the region of interest;
and taking the gravity center of the region of interest as a cutting center, obtaining a cutting window, and cutting the image to be cut.
2. The image cropping method based on image semantic understanding of claim 1, wherein the step of obtaining an image semantic segmentation model comprises:
acquiring an image, zooming the image, and performing feature extraction on the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;
performing feature fusion based on the feature image to obtain a fusion feature image;
performing convolution operation on the fusion characteristic image, and acquiring the maximum probability value of each dimensionality through a softmax function to obtain an image semantic segmentation model to be trained;
and acquiring an image to be trained, and inputting the image into the image semantic segmentation model to be trained for training to obtain the image semantic segmentation model.
3. The image cropping method based on image semantic understanding according to claim 2, wherein the step of performing feature fusion based on the feature image to obtain a fused feature image comprises:
performing pooling operation on the characteristic images by adopting four groups of pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;
connecting the four groups of first characteristic images to a convolution kernel of 1 × 1 × C/4, and performing convolution respectively to obtain four corresponding groups of second characteristic images;
sampling the four groups of second characteristic images by a bilinear difference method to obtain four corresponding groups of third characteristic images;
and connecting according to channel dimensions based on the four groups of third feature images and the feature images to obtain the fusion feature image.
4. The image cropping method based on image semantic understanding of claim 1, wherein the step of obtaining the region of interest of the image to be cropped according to the image semantic segmentation result comprises:
different interesting categories are preset for different types of images to be cut;
the image semantic segmentation result comprises the category of each pixel point in the image to be cut and different areas divided according to the categories of different pixel points;
and determining the area matched with the interested category corresponding to the image to be cropped as the interested area.
5. The image cropping method based on image semantic understanding of claim 1, wherein the step of taking the center of gravity of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped comprises:
based on the gravity center of the region of interest, calculating the width and height of the cropping window according to the size of the front-end display page;
moving the cropping window for multiple times, and calculating the sum of the distances from four vertexes of the cropping window to the gravity center of the region of interest after moving each time;
and selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.
6. The image cropping method based on image semantic understanding of claim 5, wherein the step of calculating the width and height of the cropping window according to the front-end presentation page size based on the center of gravity of the region of interest comprises:
acquiring the width, height and width-height ratio of the image to be cut and the width, height and width-height ratio of the front-end display page;
calculating the width and the height of the cutting window according to a calculation formula of the width and the height of the cutting window;
the width and height calculation formula of the cutting window is as follows:
if R is0>R1,W2=H0×R1,H2=H0
If R is0<R1,W2=W0,H2=W0/R1
In the formula, W2Indicating width of cutting window H2Height of cutting window R0Representing the aspect ratio, R, of the image to be cropped1Aspect ratio, H, representing the front-end presentation page0High, W, representing the image to be cropped0Indicating the width of the image to be cropped.
7. An image cropping system based on image semantic understanding, characterized in that the image cropping system based on image semantic understanding comprises:
the first acquisition module is used for acquiring an image semantic segmentation model;
the input module is used for inputting the image to be segmented into the image semantic segmentation model so as to obtain an image semantic segmentation result;
the second acquisition module is used for acquiring the region of interest of the image to be cut according to the semantic segmentation result of the image;
the third acquisition module is used for acquiring the gravity center of the region of interest;
and the cutting module is used for acquiring a cutting window by taking the gravity center of the region of interest as a cutting center and cutting the image to be cut.
8. The image cropping system based on image semantic understanding of claim 7, wherein the first acquisition module comprises:
the first acquisition unit is used for acquiring an image, zooming the image and extracting features of the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;
the fusion unit is used for carrying out feature fusion based on the feature image to obtain a fusion feature image;
the first calculation unit is used for performing convolution operation on the fusion characteristic image and acquiring the maximum probability value of each dimensionality through a softmax function so as to obtain an image semantic segmentation model to be trained;
and the second acquisition unit is used for acquiring the image to be trained and inputting the image into the image semantic segmentation model to be trained for training so as to obtain the image semantic segmentation model.
9. The image cropping system based on image semantic understanding of claim 8, wherein the fusion unit comprises:
the pooling subunit is used for performing pooling operation on the characteristic images by adopting four pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;
a convolution subunit, configured to connect the four sets of first feature images to convolution kernels of 1 × 1 × C/4, and perform convolution respectively to obtain four corresponding sets of second feature images;
the sampling subunit is used for performing sampling operation on the four groups of second characteristic images by adopting a bilinear difference method to obtain four groups of corresponding third characteristic images;
and the connecting subunit is used for connecting the four groups of third characteristic images and the characteristic images according to channel dimensions to obtain the fusion characteristic image.
10. The image cropping system based on image semantic understanding of claim 7, wherein the second acquisition module comprises:
the preset unit is used for correspondingly presetting different interesting categories for images to be cut of different categories;
the dividing unit is used for enabling the image semantic segmentation result to comprise the category of each pixel point in the image to be cut and different areas divided according to the categories of different pixel points;
and the determining unit is used for determining an area matched with the interested category corresponding to the image to be cropped as the interested area.
11. The image cropping system based on image semantic understanding of claim 7, wherein the cropping module comprises:
the second calculation unit is used for calculating the width and the height of the cropping window according to the size of the front-end display page based on the gravity center of the region of interest;
the third calculating unit is used for moving the cropping window for multiple times and calculating the sum of the distances from the four vertexes of the cropping window to the gravity center of the region of interest after each movement;
and the selection unit is used for selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.
12. The image cropping system based on image semantic understanding of claim 11, wherein the third computing unit comprises:
the third acquisition unit is used for acquiring the width, height and width-height ratio of the image to be cut and the width, height and width-height ratio of the front-end display page;
the calculating unit is used for calculating the width and the height of the cutting window according to a calculating formula of the width and the height of the cutting window;
the width and height calculation formula of the cutting window is as follows:
if R is0>R1,W2=H0×R1,H2=H0
If R is0<R1,W2=W0,H2=W0/R1
In the formula, W2Indicating width of cutting window H2Height of cutting window R0Representing the aspect ratio, R, of the image to be cropped1Aspect ratio, H, representing the front-end presentation page0High, W, representing the image to be cropped0Indicating the width of the image to be cropped.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements an image cropping method for semantic understanding of an image as claimed in any one of claims 1 to 6 when executing the computer program.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the image cropping method for semantic understanding of an image according to any one of claims 1 to 6.
CN202010206880.3A 2020-03-23 2020-03-23 Image cropping method, system, device and medium based on image semantic understanding Pending CN111462121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010206880.3A CN111462121A (en) 2020-03-23 2020-03-23 Image cropping method, system, device and medium based on image semantic understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010206880.3A CN111462121A (en) 2020-03-23 2020-03-23 Image cropping method, system, device and medium based on image semantic understanding

Publications (1)

Publication Number Publication Date
CN111462121A true CN111462121A (en) 2020-07-28

Family

ID=71685662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010206880.3A Pending CN111462121A (en) 2020-03-23 2020-03-23 Image cropping method, system, device and medium based on image semantic understanding

Country Status (1)

Country Link
CN (1) CN111462121A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082490A (en) * 2022-08-23 2022-09-20 腾讯科技(深圳)有限公司 Anomaly prediction method, and training method, device and equipment of anomaly prediction model

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473737A (en) * 2012-06-06 2013-12-25 索尼公司 Image processing device, image processing method, and program
CN103914689A (en) * 2014-04-09 2014-07-09 百度在线网络技术(北京)有限公司 Picture cropping method and device based on face recognition
CN105357436A (en) * 2015-11-03 2016-02-24 广东欧珀移动通信有限公司 Image cropping method and system for image shooting
CN106920141A (en) * 2015-12-28 2017-07-04 阿里巴巴集团控股有限公司 Page presentation content processing method and device
CN107610131A (en) * 2017-08-25 2018-01-19 百度在线网络技术(北京)有限公司 A kind of image cropping method and image cropping device
CN108537292A (en) * 2018-04-10 2018-09-14 上海白泽网络科技有限公司 Semantic segmentation network training method, image, semantic dividing method and device
CN108776970A (en) * 2018-06-12 2018-11-09 北京字节跳动网络技术有限公司 Image processing method and device
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium
CN110136142A (en) * 2019-04-26 2019-08-16 微梦创科网络科技(中国)有限公司 A kind of image cropping method, apparatus, electronic equipment
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning
CN110377204A (en) * 2019-06-30 2019-10-25 华为技术有限公司 A kind of method and electronic equipment generating user's head portrait
CN110456960A (en) * 2019-05-09 2019-11-15 华为技术有限公司 Image processing method, device and equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473737A (en) * 2012-06-06 2013-12-25 索尼公司 Image processing device, image processing method, and program
CN103914689A (en) * 2014-04-09 2014-07-09 百度在线网络技术(北京)有限公司 Picture cropping method and device based on face recognition
CN105357436A (en) * 2015-11-03 2016-02-24 广东欧珀移动通信有限公司 Image cropping method and system for image shooting
CN106920141A (en) * 2015-12-28 2017-07-04 阿里巴巴集团控股有限公司 Page presentation content processing method and device
CN107610131A (en) * 2017-08-25 2018-01-19 百度在线网络技术(北京)有限公司 A kind of image cropping method and image cropping device
CN108537292A (en) * 2018-04-10 2018-09-14 上海白泽网络科技有限公司 Semantic segmentation network training method, image, semantic dividing method and device
CN108776970A (en) * 2018-06-12 2018-11-09 北京字节跳动网络技术有限公司 Image processing method and device
CN109447990A (en) * 2018-10-22 2019-03-08 北京旷视科技有限公司 Image, semantic dividing method, device, electronic equipment and computer-readable medium
CN110136142A (en) * 2019-04-26 2019-08-16 微梦创科网络科技(中国)有限公司 A kind of image cropping method, apparatus, electronic equipment
CN110456960A (en) * 2019-05-09 2019-11-15 华为技术有限公司 Image processing method, device and equipment
CN110188817A (en) * 2019-05-28 2019-08-30 厦门大学 A kind of real-time high-performance street view image semantic segmentation method based on deep learning
CN110377204A (en) * 2019-06-30 2019-10-25 华为技术有限公司 A kind of method and electronic equipment generating user's head portrait

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HENGSHUANG ZHAO ET AL: "Pyramid Scene Parsing Network" *
JIANSHENG CHEN ET AL: "Automatic Image Cropping : A Computational Complexity Study" *
张鹏飞: "基于深度学习的图像缩略图生成技术及其应用" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082490A (en) * 2022-08-23 2022-09-20 腾讯科技(深圳)有限公司 Anomaly prediction method, and training method, device and equipment of anomaly prediction model

Similar Documents

Publication Publication Date Title
JP7238139B2 (en) Image area recognition method by artificial intelligence, model training method, image processing device, terminal device, server, computer device and computer program
CN109977192B (en) Unmanned aerial vehicle tile map rapid loading method, system, equipment and storage medium
JP7265034B2 (en) Method and apparatus for human body detection
CN112102411A (en) Visual positioning method and device based on semantic error image
CN108876706B (en) Thumbnail generation from panoramic images
CN111932546A (en) Image segmentation model training method, image segmentation method, device, equipment and medium
US11715186B2 (en) Multi-image-based image enhancement method and device
CN113869138A (en) Multi-scale target detection method and device and computer readable storage medium
CN113792651B (en) Gesture interaction method, device and medium integrating gesture recognition and fingertip positioning
CN110874591A (en) Image positioning method, device, equipment and storage medium
CN108648149B (en) Image splicing method, system, equipment and storage medium based on augmented reality
CN113934297A (en) Interaction method and device based on augmented reality, electronic equipment and medium
CN114708436B (en) Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium
CN113297986A (en) Handwritten character recognition method, device, medium and electronic equipment
CN112287144A (en) Picture retrieval method, equipment and storage medium
CN111191553A (en) Face tracking method and device and electronic equipment
CN111462121A (en) Image cropping method, system, device and medium based on image semantic understanding
CN114238541A (en) Sensitive target information acquisition method and device and computer equipment
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
CN111062388B (en) Advertisement character recognition method, system, medium and equipment based on deep learning
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN110516094A (en) De-weight method, device, electronic equipment and the storage medium of class interest point data
CN112085842A (en) Depth value determination method and device, electronic equipment and storage medium
CN114119365A (en) Application detection method, device, equipment and storage medium
CN113963289A (en) Target detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200728

RJ01 Rejection of invention patent application after publication