CN111462121A

CN111462121A - Image cropping method, system, device and medium based on image semantic understanding

Info

Publication number: CN111462121A
Application number: CN202010206880.3A
Authority: CN
Inventors: 罗超; 黄小虎; 吉聪睿; 李巍
Original assignee: Shanghai Ctrip Business Co Ltd
Current assignee: Shanghai Ctrip Business Co Ltd
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2020-07-28

Abstract

The invention discloses an image cropping method, an image cropping system, image cropping equipment and a medium based on image semantic understanding, wherein the image cropping method based on image semantic understanding comprises the following steps: acquiring an image semantic segmentation model; inputting an image to be cut into the image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of the image to be segmented according to the semantic segmentation result of the image; acquiring the center of gravity of the region of interest; and taking the gravity center of the region of interest as a cutting center, obtaining a cutting window, and cutting the image to be cut. The image is cut based on the image semantic understanding method, and the cut image main body is ensured to be more complete.

Description

Image cropping method, system, device and medium based on image semantic understanding

Technical Field

The invention relates to the technical field of image semantic understanding and image segmentation, in particular to an image cropping method, an image cropping system, image cropping equipment and an image cropping medium based on image semantic understanding.

Background

The image is used as a direct and efficient display mode, and has direct and important influence on user experience and order conversion. Particularly, the display mode of the hotel images has more important influence on user experience and order conversion. Under a common condition, when a picture uploaded by a hotel at the back end arrives at a front-end display page, the picture can be cut into corresponding different sizes according to different display pages. Specifically, images of different display pages in APP (Application) are often displayed in different aspect ratio tiles. The current graph cutting algorithm directly uses the center of an original graph as an image cutting point, cutting is rough, loss of key content and key targets after graph cutting is caused, misleading is easily caused for a user, and user experience is poor.

Disclosure of Invention

The invention aims to overcome the defect that the target main body is incompletely displayed based on an image center cropping method in the prior art, and provides an image cropping method, an image cropping system, image cropping equipment and an image cropping medium based on image semantic understanding.

The invention solves the technical problems through the following technical scheme:

the invention provides an image cropping method based on image semantic understanding, which comprises the following steps:

acquiring an image semantic segmentation model;

inputting an image to be cut into the image semantic segmentation model to obtain an image semantic segmentation result;

obtaining an interested area of the image to be cut according to the semantic segmentation result of the image;

acquiring the center of gravity of the region of interest;

and taking the gravity center of the region of interest as a cutting center, obtaining a cutting window, and cutting the image to be cut.

Preferably, the step of obtaining the image semantic segmentation model comprises:

acquiring an image, zooming the image, and performing feature extraction on the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;

performing feature fusion based on the feature image to obtain a fusion feature image;

performing convolution operation on the fusion characteristic image, and acquiring the maximum probability value of each dimensionality through a softmax function to obtain an image semantic segmentation model to be trained;

and acquiring an image to be trained, and inputting the image into the image semantic segmentation model to be trained for training to obtain the image semantic segmentation model.

Preferably, the step of performing feature fusion based on the feature image to obtain a fused feature image includes:

performing pooling operation on the characteristic images by adopting four groups of pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;

connecting the four groups of first characteristic images to a convolution kernel of 1 × 1 × C/4, and performing convolution respectively to obtain four corresponding groups of second characteristic images;

sampling the four groups of second characteristic images by a bilinear difference method to obtain four corresponding groups of third characteristic images;

and connecting according to channel dimensions based on the four groups of third feature images and the feature images to obtain the fusion feature image.

Preferably, the step of obtaining the region of interest of the image to be cropped according to the semantic segmentation result of the image includes:

different interesting categories are preset for different types of images to be cut;

the image semantic segmentation result comprises the category of each pixel point in the image to be cut and different areas divided according to the categories of different pixel points;

and determining the area matched with the interested category corresponding to the image to be cropped as the interested area.

Preferably, the step of obtaining a cropping window with the center of gravity of the region of interest as a cropping center and cropping the image to be cropped includes:

based on the gravity center of the region of interest, calculating the width and height of the cropping window according to the size of the front-end display page;

moving the cropping window for multiple times, and calculating the sum of the distances from four vertexes of the cropping window to the gravity center of the region of interest after moving each time;

and selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.

Preferably, the step of calculating the width and height of the cropping window according to the front-end display page size based on the center of gravity of the region of interest includes:

acquiring the width, height and width-height ratio of the image to be cut and the width, height and width-height ratio of the front-end display page;

calculating the width and the height of the cutting window according to a calculation formula of the width and the height of the cutting window;

the width and height calculation formula of the cutting window is as follows:

if R is₀>R₁，W₂＝H₀×R₁，H₂＝H₀；

If R is₀<R₁，W₂＝W₀，H₂＝W₀/R₁；

In the formula, W₂Indicating width of cutting window H₂Height of cutting window R₀Representing the aspect ratio, R, of the image to be cropped₁Aspect ratio, H, representing the front-end presentation page₀High, W, representing the image to be cropped₀Indicating the width of the image to be cropped.

The invention provides an image cropping system based on image semantic understanding, which comprises:

the first acquisition module is used for acquiring an image semantic segmentation model;

the input module is used for inputting the image to be segmented into the image semantic segmentation model so as to obtain an image semantic segmentation result;

the second acquisition module is used for acquiring the region of interest of the image to be cut according to the semantic segmentation result of the image;

the third acquisition module is used for acquiring the gravity center of the region of interest;

and the cutting module is used for acquiring a cutting window by taking the gravity center of the region of interest as a cutting center and cutting the image to be cut.

Preferably, the first obtaining module includes:

the first acquisition unit is used for acquiring an image, zooming the image and extracting the features of the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;

the fusion unit is used for carrying out feature fusion based on the feature image to obtain a fusion feature image;

the first calculation unit is used for performing convolution operation on the fusion characteristic image and acquiring the maximum probability value of each dimensionality through a softmax function so as to obtain an image semantic segmentation model to be trained;

and the second acquisition unit is used for acquiring the image to be trained and inputting the image into the image semantic segmentation model to be trained for training so as to obtain the image semantic segmentation model.

Preferably, the fusion unit includes:

the pooling subunit is used for performing pooling operation on the characteristic images by adopting four pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;

a convolution subunit, configured to connect the four sets of first feature images to convolution kernels of 1 × 1 × C/4, and perform convolution respectively to obtain four corresponding sets of second feature images;

the sampling subunit is used for performing sampling operation on the four groups of second characteristic images by adopting a bilinear difference method to obtain four groups of corresponding third characteristic images;

and the connecting subunit is used for connecting the four groups of third characteristic images and the characteristic images according to channel dimensions to obtain the fusion characteristic image.

Preferably, the second obtaining module includes:

the preset unit is used for correspondingly presetting different interesting categories for images to be cut of different categories;

the dividing unit is used for enabling the image semantic segmentation result to comprise the category of each pixel point in the image to be cut and different areas divided according to the categories of different pixel points;

and the determining unit is used for determining an area matched with the interested category corresponding to the image to be cropped as the interested area.

Preferably, the cutting module comprises:

the second calculation unit is used for calculating the width and the height of the cropping window according to the size of the front-end display page based on the gravity center of the region of interest;

the third calculating unit is used for moving the cropping window for multiple times and calculating the sum of the distances from the four vertexes of the cropping window to the gravity center of the region of interest after each movement;

and the selection unit is used for selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.

Preferably, the third calculation unit includes:

the third acquisition unit is used for acquiring the width, height and width-height ratio of the image to be cut and the width, height and width-height ratio of the front-end display page;

the calculating unit is used for calculating the width and the height of the cutting window according to a calculating formula of the width and the height of the cutting window;

the width and height calculation formula of the cutting window is as follows:

if R is₀>R₁，W₂＝H₀×R₁，H₂＝H₀；

If R is₀<R₁，W₂＝W₀，H₂＝W₀/R₁；

The invention also provides electronic equipment which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the image cropping method for semantic understanding of the image.

The present invention also provides a computer-readable storage medium having stored thereon a computer program for execution by a processor of the aforementioned steps of an image cropping method for semantic understanding of an image.

The positive progress effects of the invention are as follows:

the method comprises the steps of obtaining an image semantic segmentation model; inputting an image to be cut into an image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of an image to be cut according to an image semantic segmentation result and obtaining the gravity center of the interested area; and taking the gravity center of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped. Compared with the image center based cutting method in the prior art, the image semantic understanding based cutting method provided by the invention can be used for cutting the image, so that the cut image main body is more complete.

Drawings

Fig. 1 is a flowchart of an image cropping method based on image semantic understanding according to embodiment 1 of the present invention.

FIG. 2 is a flowchart of step 101 in example 1 of the present invention.

FIG. 3 is a flowchart of step 1012 in embodiment 1 of the present invention.

FIG. 4 is a flowchart of step 103 in example 1 of the present invention.

FIG. 5 is a flowchart of step 105 in example 1 of the present invention.

FIG. 6 is a block diagram of an image cropping system based on image semantic understanding according to embodiment 2 of the present invention.

Fig. 7 is a schematic structural diagram of a first obtaining module in embodiment 2 of the present invention.

Fig. 8 is a schematic structural diagram of a fusion unit in embodiment 2 of the present invention.

Fig. 9 is a schematic structural diagram of a second obtaining module in embodiment 2 of the present invention.

Fig. 10 is a schematic structural diagram of a cutting module in embodiment 2 of the present invention.

Fig. 11 is a schematic structural diagram of an electronic device in embodiment 3 of the present invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

As shown in fig. 1, the present embodiment discloses an image cropping method based on image semantic understanding, which includes the following steps:

s101, acquiring an image semantic segmentation model;

s102, inputting an image to be segmented into the image semantic segmentation model to obtain an image semantic segmentation result;

s103, acquiring an interested area of the image to be cut according to the semantic segmentation result of the image;

s104, acquiring the gravity center of the region of interest;

and step S105, obtaining a cropping window by taking the gravity center of the region of interest as a cropping center, and cropping the image to be cropped.

As shown in fig. 2, in this embodiment, step S101 further includes the following steps:

s1011, acquiring an image, zooming the image to 300 × 300, 400 × 400, 500 × 500 or other sizes, and performing feature extraction on the image through a backbone network to obtain a feature image, wherein the backbone network comprises a multilayer CNN network structure;

in this embodiment, the backbone network may be any multi-layer CNN network structure, and in this embodiment, the renet 50 with residual connection is used as a backbone network for semantic segmentation, a full connection layer is removed, and the output of the last convolutional layer is used as an image feature.

Step S1012, performing feature fusion based on the feature image to obtain a fusion feature image;

in this embodiment, the feature fusion includes four operations of pooling (Pool), convolution (Conv), upsampling (Upsample), and fusion (Concat).

S1013, performing convolution operation on the fusion feature image by adopting a convolution kernel of 1x1x150 to obtain a feature map of W × H × 150, obtaining the maximum probability value of each dimension by a softmax function, and finally obtaining an output image of W × H × 2, namely the segmentation result of the image semantic segmentation model to be trained, so as to obtain the image semantic segmentation model to be trained;

and S1014, acquiring an image to be trained, and inputting the image into the image semantic segmentation model to be trained for training to obtain the image semantic segmentation model.

In this embodiment, on the public data set ADE20K, the image semantic segmentation model to be trained is trained until the model converges, and the model M is stored. The data set has 2 million training images, while the images contain annotations of buildings, beds, etc.

As shown in fig. 3, in the present embodiment, step S1012 includes the following sub-steps:

step S10121, performing pooling operation on the characteristic images by adopting four groups of pooling layers with different receptive fields to obtain four corresponding groups of first characteristic images;

in this embodiment, in order to obtain global and local semantic features of an image, in this embodiment, four sets of pooling layers with different receptive fields are used, and the feature maps obtained in the previous step are respectively pooled to obtain feature maps with four different sizes, 1 × 1 × C, 2 × 02 × 1C, 3 × 23 × 3C, and 6 × 46 × 5C, where 1 × 1 × C corresponds to the global semantic features of an original drawing, and 2 × 2 × C, 3 × 3 × C, and 6 × 6 × C correspond to the local semantic features of the original drawing.

Step S10122, connecting the four groups of first feature images to convolution kernels of 1 × 1 × C/4, performing convolution respectively to obtain four corresponding groups of second feature images, wherein the sizes of the compressed second feature images after convolution are 1 × 1 × C/4, 2 × 2 × C/4, 3 × 3 × C/4 and 6 × 6 × C/4 respectively;

s10123, sampling the four groups of second characteristic images by a bilinear difference method to obtain four corresponding groups of third characteristic images, wherein the sizes of the third characteristic images after sampling operation are W × H × C/4, W × H × C/4, W × H × C/4 and W × H × C/4 respectively;

and S10124, based on the four groups of third feature images and the feature images, connecting according to channel dimensions to obtain a fusion feature image of the global semantic features and the local semantic features of the fusion image, wherein the feature dimension of the fusion feature image is W × H × 2C.

As shown in fig. 4, in the present embodiment, step S103 includes the following steps:

step S1031, presetting different interest categories for different types of images to be cut;

step S1032, the image semantic segmentation result comprises the category of each pixel point in the image to be segmented and different areas segmented according to the categories of different pixel points;

in this embodiment, the pixel of the region of interest is set to 255, and the pixel of the region of no interest is set to 0, so as to obtain the binarization result, i.e. the result of region of interest extraction. The region of interest for the hotel appearance map is the hotel appearance and the region of interest for the room type map is the hotel room type.

Step S1033, determining an area matching the category of interest corresponding to the image to be cropped as the area of interest.

As shown in fig. 5, in the present embodiment, step S105 includes the following steps:

step S1051, based on the gravity center of the region of interest, calculating the width and height of the cropping window according to the size of the front-end display page;

in the embodiment, the width and the height of the cutting window are calculated according to the width, the height and the aspect ratio of the image to be cut, the width, the height and the aspect ratio of the front-end display page and a calculation formula of the width and the height of the cutting window; the calculation formula of the width and the height of the cutting window is as follows:

if R is₀>R₁，W₂＝H₀×R₁，H₂＝H₀；

If R is₀<R₁，W₂＝W₀，H₂＝W₀/R₁；

Step 1052, moving the cropping window for multiple times, and calculating the sum of the distances from four vertexes of the cropping window to the gravity center of the region of interest after each movement;

and S1053, selecting the corresponding cutting window when the sum of the distances is the minimum value, and cutting the image to be cut according to the selected cutting window.

The image cropping method based on image semantic understanding disclosed by the embodiment inputs an image to be cropped into an image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of an image to be segmented and obtaining the gravity center of the interested area according to the semantic segmentation result of the image; and taking the gravity center of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped. Compared with the image center based cutting method in the prior art, the image is cut by the image semantic understanding based method, and the cut image main body is ensured to be more complete.

Example 2

As shown in FIG. 6, the embodiment discloses an image cropping system based on image semantic understanding, which comprises a first acquiring module 1, an input module 2, a second acquiring module 3, a third acquiring module 4 and a cropping module 5. The first acquisition module 1 is used for acquiring an image semantic segmentation model;

the input module 2 is used for inputting the image to be cut into the image semantic segmentation model to obtain an image semantic segmentation result;

the second obtaining module 3 is configured to obtain an interesting region of the image to be cropped according to the semantic segmentation result of the image;

the third obtaining module 4 is used for obtaining the gravity center of the region of interest;

the cutting module 5 is used for obtaining a cutting window by taking the gravity center of the region of interest as a cutting center, and cutting the image to be cut.

As shown in fig. 7, the first acquiring module 1 includes a first acquiring unit 11, a fusing unit 12, a first calculating unit 13, and a second acquiring unit 14:

the first obtaining unit 11 is configured to obtain an image, scale the image, and perform feature extraction on the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;

the fusion unit 12 is configured to perform feature fusion based on the feature image to obtain a fusion feature image;

the first calculating unit 13 is configured to perform convolution operation on the fusion feature image, and acquire a maximum probability value of each dimension through a softmax function to obtain an image semantic segmentation model to be trained;

the second obtaining unit 14 is configured to obtain an image to be trained, and input the image into the image semantic segmentation model to be trained for training, so as to obtain the image semantic segmentation model.

As shown in fig. 8, the fusion unit 11 includes:

the pooling subunit 111 is configured to perform pooling operation on the feature images by using four pooling layers with different receptive fields to obtain four corresponding groups of first feature images;

a convolution subunit 112, configured to connect the four sets of first feature images to convolution kernels of 1 × 1 × C/4, and perform convolution respectively to obtain four corresponding sets of second feature images;

the sampling subunit 113 is configured to perform sampling operation on the four groups of second feature images by using a bilinear difference method to obtain four corresponding groups of third feature images;

a connection subunit 114, configured to connect according to a channel dimension based on the four sets of third feature images and the feature images to obtain the fused feature image.

As shown in fig. 9, the second obtaining module 3 includes:

the presetting unit 31 is used for presetting different interest categories for images to be cut of different categories;

the dividing unit 32 is configured to obtain the semantic segmentation result of the image, where the semantic segmentation result includes a category of each pixel in the image to be cut, and different regions divided according to categories of different pixels;

a determining unit 33, configured to determine, as the region of interest, a region that matches the category of interest corresponding to the image to be cropped.

As shown in fig. 10, the cutting module 5 includes:

a second calculating unit 51, configured to calculate a width and a height of the cropping window according to a front-end display page size based on the center of gravity of the region of interest;

if R is₀>R₁，W₂＝H₀×R₁，H₂＝H₀；

If R is₀<R₁，W₂＝W₀，H₂＝W₀/R₁；

Formula (II)In, W₂Indicating width of cutting window H₂Height of cutting window R₀Representing the aspect ratio, R, of the image to be cropped₁Aspect ratio, H, representing the front-end presentation page₀High, W, representing the image to be cropped₀Indicating the width of the image to be cropped.

A third calculating unit 52, configured to move the cropping window multiple times, and calculate a sum of distances from four vertices of the cropping window to a center of gravity of the region of interest after each movement;

and the selecting unit 53 is configured to select the corresponding cutting window when the sum of the distances is the minimum value, and cut the image to be cut according to the selected cutting window.

The image cropping system based on image semantic understanding provided by the embodiment inputs an image to be cropped into an image semantic segmentation model to obtain an image semantic segmentation result; obtaining an interested area of an image to be cut and obtaining the gravity center of the interested area according to the semantic segmentation result of the image; and taking the gravity center of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped. Compared with the image center based cutting method in the prior art, the image is cut by the image semantic understanding based method, and the cut image main body is ensured to be more complete.

Example 3

Fig. 11 is a schematic structural diagram of an electronic device according to embodiment 3 of the present invention. The electronic equipment comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the image cropping method based on image semantic understanding provided by embodiment 1. The electronic device 60 shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 11, the electronic device 60 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 60 may include, but are not limited to: the at least one processor 61, the at least one memory 62, and a bus 63 connecting the various system components (including the memory 62 and the processor 61).

The bus 63 includes a data bus, an address bus, and a control bus.

The memory 62 may include volatile memory, such as Random Access Memory (RAM)621 and/or cache memory 622, and may further include Read Only Memory (ROM) 623.

The memory 62 may also include a program/utility 625 having a set (at least one) of program modules 624, such program modules 624 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor 61 executes various functional applications and data processing, such as an image cropping method based on semantic understanding of an image provided in embodiment 1 of the present invention, by running a computer program stored in the memory 62.

The electronic device 60 may also communicate with one or more external devices 64 (e.g., keyboard, pointing device, etc.) such communication may be through AN input/output (I/O) interface 65. also, the model-generated device 60 may communicate with one or more networks (e.g., a local area network (L AN), a Wide Area Network (WAN) and/or a public network, such as the Internet) through a network adapter 66. As shown, the network adapter 66 communicates with other modules of the model-generated device 60 through a bus 63. it should be understood that, although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generated device 60, including, but not limited to, microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 4

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the image cropping method based on image semantic understanding provided in embodiment 1.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation manner, the present invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the image cropping method based on image semantic understanding provided in embodiment 1 when the program product runs on the terminal device.

Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. An image cropping method based on image semantic understanding, which is characterized by comprising the following steps:

acquiring an image semantic segmentation model;

acquiring the center of gravity of the region of interest;

2. The image cropping method based on image semantic understanding of claim 1, wherein the step of obtaining an image semantic segmentation model comprises:

3. The image cropping method based on image semantic understanding according to claim 2, wherein the step of performing feature fusion based on the feature image to obtain a fused feature image comprises:

4. The image cropping method based on image semantic understanding of claim 1, wherein the step of obtaining the region of interest of the image to be cropped according to the image semantic segmentation result comprises:

5. The image cropping method based on image semantic understanding of claim 1, wherein the step of taking the center of gravity of the region of interest as a cropping center, acquiring a cropping window, and cropping the image to be cropped comprises:

6. The image cropping method based on image semantic understanding of claim 5, wherein the step of calculating the width and height of the cropping window according to the front-end presentation page size based on the center of gravity of the region of interest comprises:

the width and height calculation formula of the cutting window is as follows:

if R is₀>R₁，W₂＝H₀×R₁，H₂＝H₀；

If R is₀<R₁，W₂＝W₀，H₂＝W₀/R₁；

7. An image cropping system based on image semantic understanding, characterized in that the image cropping system based on image semantic understanding comprises:

8. The image cropping system based on image semantic understanding of claim 7, wherein the first acquisition module comprises:

the first acquisition unit is used for acquiring an image, zooming the image and extracting features of the image through a backbone network to obtain a feature image; the backbone network comprises a multi-layer CNN network structure;

9. The image cropping system based on image semantic understanding of claim 8, wherein the fusion unit comprises:

10. The image cropping system based on image semantic understanding of claim 7, wherein the second acquisition module comprises:

11. The image cropping system based on image semantic understanding of claim 7, wherein the cropping module comprises:

12. The image cropping system based on image semantic understanding of claim 11, wherein the third computing unit comprises:

the width and height calculation formula of the cutting window is as follows:

if R is₀>R₁，W₂＝H₀×R₁，H₂＝H₀；

If R is₀<R₁，W₂＝W₀，H₂＝W₀/R₁；

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements an image cropping method for semantic understanding of an image as claimed in any one of claims 1 to 6 when executing the computer program.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the image cropping method for semantic understanding of an image according to any one of claims 1 to 6.