WO2023093851A1 - Image cropping method and apparatus, and electronic device - Google Patents

Image cropping method and apparatus, and electronic device Download PDF

Info

Publication number
WO2023093851A1
WO2023093851A1 PCT/CN2022/134366 CN2022134366W WO2023093851A1 WO 2023093851 A1 WO2023093851 A1 WO 2023093851A1 CN 2022134366 W CN2022134366 W CN 2022134366W WO 2023093851 A1 WO2023093851 A1 WO 2023093851A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
cropping
saliency
training
model
Prior art date
Application number
PCT/CN2022/134366
Other languages
French (fr)
Chinese (zh)
Inventor
刘鑫
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2023093851A1 publication Critical patent/WO2023093851A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the present application relates to the field of communication technologies, and in particular to an image cropping method, device and electronic equipment.
  • the purpose of the embodiments of the present application is to provide an image cropping method, device and electronic equipment, so as to solve the problem of poor image quality obtained during image cropping in the prior art.
  • an image cropping method including:
  • image features corresponding to the target image where the image features include first image features and second image features, the first image features are associated with the first image areas corresponding to the plurality of cropping candidate areas, and the second A second image feature is associated with a second image area in the target image other than the first image area;
  • an image cropping device including:
  • a determining module configured to determine a plurality of cropping candidate regions corresponding to the target image
  • the first acquisition module is configured to acquire image features corresponding to the target image, the image features include first image features and second image features, and the first image features correspond to the first cropping candidate regions. Image area association, the second image feature is associated with a second image area in the target image other than the first image area;
  • the second acquisition module is configured to input the image features corresponding to the target image into the image evaluation network model, and acquire feature scores respectively corresponding to the plurality of cropping candidate regions, and the feature scores are used to characterize the aesthetics of the cropping candidate regions at least one of characteristics and distinctive features;
  • a processing module configured to determine at least one candidate target cropping area according to the feature scores corresponding to the plurality of candidate cropping areas, and crop the target image according to the candidate target cropping area.
  • the embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and the programs or instructions are processed by the The steps of the method described in the first aspect are realized when the controller is executed.
  • an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented .
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions, so as to implement the first aspect the method described.
  • an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.
  • the embodiment of the present application provides an electronic device configured to execute the method described in the first aspect.
  • the image features corresponding to the target image including the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region are acquired , input the image features corresponding to the target image into the image evaluation network model, obtain the feature scores of at least one of the representative aesthetic features and salient features corresponding to the multiple cropping candidate regions, and according to the feature scores corresponding to the multiple cropping candidate regions , determining at least one target cropping candidate region and cropping the target image can efficiently mine information in the image based on aesthetic and/or salient features, so as to ensure the image cropping effect and obtain a cropped image with good image quality.
  • FIG. 1 shows a schematic diagram of an image cropping method provided in an embodiment of the present application
  • Figure 2 shows a schematic diagram of the network architecture of the aesthetic evaluation task model provided by the embodiment of the present application
  • FIG. 3 shows a schematic diagram of the network architecture of the saliency task model provided by the embodiment of the present application
  • Fig. 4 represents the schematic diagram of the evaluation network model based on the strategy-acquired image provided by the embodiment
  • Fig. 5 represents the schematic diagram of the image evaluation network model obtained based on strategy two provided by the embodiment
  • FIG. 6 shows a schematic diagram of an image cropping device provided in an embodiment of the present application.
  • Fig. 7 is one of the schematic block diagrams of the electronic device provided by the embodiment of the present application.
  • FIG. 8 is a second schematic block diagram of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application provides an image cropping method, as shown in FIG. 1, including:
  • Step 101 Determine multiple cropping candidate frames corresponding to the target image.
  • a plurality of cropping candidate regions corresponding to the target image may be determined, so as to determine a final cropping region among the multiple cropping candidate regions.
  • the preset composition principles can be determined based on various photographic composition principles.
  • the various photographic composition principles can include but are not limited to triangular composition principles, diagonal composition principles, The principle of three-point composition, the principle of blank composition at the top of the sky, the principle of blank composition for movement, the principle of balanced and stable composition, etc.
  • Step 102 Obtain image features corresponding to the target image, the image features include first image features and second image features, the first image features are associated with the first image areas corresponding to the plurality of cropping candidate areas, The second image feature is associated with a second image area in the target image other than the first image area.
  • image data processing is first performed on the target image, and a process of image data processing is briefly introduced below.
  • image data processing is first performed on the target image, and a process of image data processing is briefly introduced below.
  • bilinear interpolation is used to adjust it to a size of 256 ⁇ 256, and data enhancement is performed, wherein data enhancement may include mirror processing, random rotation processing, Gaussian noise processing, normalization processing, etc.
  • the target image After processing the image data of the target image, the target image is input into the backbone network (such as MobileNetv2), and the backbone network outputs features of various scales, and the output features are spliced to obtain the image features corresponding to the target image.
  • the image features corresponding to the target image may include first image features and second image features.
  • the first image feature is a feature corresponding to a plurality of cropping candidate areas, and the multiple cropping candidate areas are associated with the first image area of the target image, that is, the first image feature is associated with the first image area corresponding to the multiple cropping candidate areas Features;
  • the second image feature is a feature associated with the second image area, and the second image area is an image area in the target image that is different from the first image area.
  • Step 103 Input the image features corresponding to the target image into the image evaluation network model, and obtain the feature scores corresponding to the plurality of cropping candidate regions respectively, and the feature scores are used to characterize the aesthetic features and salience of the cropping candidate regions at least one of the features.
  • the image features corresponding to the target image can be input into the image evaluation network model, and the image evaluation network model outputs the feature scores corresponding to the plurality of clipping candidate regions respectively.
  • the image evaluation network model is obtained through model training based on the image saliency information of multiple training images and/or the image aesthetic information of multiple training images, and is used for scoring the cropping candidate regions of the image.
  • the feature score of the cropping candidate frame region corresponding to the target image is used to characterize at least one of the aesthetic feature and the salient feature of the cropping candidate region.
  • Step 104 is executed after the feature scores corresponding to the plurality of clipping candidate regions are obtained based on the image evaluation network model.
  • Step 104 Determine at least one candidate target cropping area according to the feature scores respectively corresponding to the plurality of candidate cropping areas, and crop the target image according to the candidate target cropping area.
  • At least one target cropping candidate region is determined in the plurality of cropping candidate regions, and based on the determined at least one target cropping The candidate area crops the target image to obtain the cropped image.
  • step 101 determines a plurality of cropping candidate regions corresponding to the target image, including:
  • For the at least one target grid respectively expand according to at least one expansion ratio, and determine the plurality of clipping candidate regions.
  • the process of determining the cropping candidate region is introduced.
  • the target image is divided by the intersection of two horizontal lines and two vertical lines, and the target image is divided into nine large grid blocks of the same size (9 grids). ). Determine the small grids that the four three-point lines pass through and all the small grids contained in the large grid in the center of the target image, determine these small grids as the target grid, and use the center of the target grid as the center of the candidate area for cropping.
  • At least one determined target grid is expanded according to at least one expansion ratio, so as to determine a plurality of cropping candidate regions.
  • the target grid center can be expanded according to various expansion ratios to obtain cropping candidate regions.
  • the expansion ratio refers to the aspect ratio of the expanded cropping candidate area, by Expanding based on the target grid, the clipping candidate area including the target grid and the neighborhood grid can be delineated on the basis of the target grid.
  • the upper left corner and the lower right corner of the cropping candidate region obtained by expanding are located at the center of the small grid.
  • the area ratio of the cropping candidate regions to the original image is reasonable (for example, the area ratio can be greater than 0.4).
  • At least one target grid is determined in the target image in the grid anchor form, and then expanded for the target grid, it can be achieved in Cropping candidate regions are determined on the basis of the target grid.
  • the method also includes:
  • model training is performed to obtain the image evaluation network model.
  • the embodiment of the present application needs to obtain multiple training images, and perform model training based on at least one of image saliency information and image aesthetic information of the multiple training images to obtain an image evaluation network model. Since the image evaluation network model is used to score images, by acquiring the image evaluation network model, feature scores corresponding to multiple cropping candidate regions of the target image can be obtained based on the image evaluation network model.
  • performing model training according to at least one of image saliency information and image aesthetic information corresponding to a plurality of training images to obtain the image evaluation network model includes:
  • the image aesthetic information including labeling scores and prediction scores of the cropping candidate regions;
  • corresponding cropping candidate regions may be determined for each training image.
  • the process of determining the cropping candidate region corresponding to the training image is the same as the process of determining the cropping candidate region corresponding to the target image, and will not be further elaborated here.
  • labeling scores and prediction scores of the corresponding multiple cropping candidate regions may be acquired for each training image.
  • the corresponding image saliency information can be obtained. Then perform model training according to the annotation scores and prediction scores of the cropping candidate regions corresponding to the multiple training images and/or the image saliency information corresponding to the multiple training images, and obtain the image evaluation network model through the model training.
  • the image aesthetic information of the training image includes the labeling score and prediction score of the cropping candidate region of the training image
  • the labeling score is the score obtained by the labeling personnel aesthetically labeling the cropping candidate region of the training image based on their own aesthetic standards.
  • the prediction score is the score obtained by predicting the features of the cropping candidate regions of the training image based on the multi-layer convolutional layer.
  • image saliency information of multiple training images and/or image aesthetic information including labeling scores and prediction scores of candidate cropping regions, according to the labeling scores and The prediction score, and/or, the image saliency information of the training image performs model training, and obtains the image evaluation network model, which can realize at least one of the saliency features and aesthetic features based on the training image, and is used for image evaluation.
  • Image evaluation network model for scoring aesthetic features and/or salient features.
  • the acquiring the image aesthetic information of the multiple cropping candidate regions corresponding to each of the training images includes:
  • For each of the training images obtain the screening results obtained by the labeling staff for at least two screenings of the multiple cropping candidate regions corresponding to the training image, and determine according to the screening results that the multiple cropping candidate regions correspond to label score;
  • For each of the training images obtain the feature map of the training image, extract the region of interest (region of interest, RoI) feature and discarding region (region of discard, RoD) of the cropping candidate region on the feature map ) features are combined into target features, and prediction scores corresponding to the plurality of clipping candidate regions are obtained according to the target features.
  • region of interest region of interest, RoI
  • region of discard, RoD region of discard, RoD
  • the screening results obtained by the labeling staff for at least two screenings of the multiple cropping candidate regions corresponding to the training image can be obtained, and then based on Based on the obtained screening results, labeling scores corresponding to multiple cropping candidate regions of the current training image are determined.
  • the specific process of obtaining the labeling score corresponding to the clipping candidate region is introduced below.
  • the region images corresponding to the multiple cropping candidate regions corresponding to the training image are scored, and for each expansion ratio, the cropping candidate frames with the top N scores are output, and for each Expand the scale, randomly select K cropping candidate boxes from the unselected cropping candidate boxes, and then mark them by the labeler, so as to filter out some cropping candidate boxes based on the benchmark model.
  • the output cropping candidate regions can be combined into a first candidate pool, and the annotators select n (for example, 3 to 5) cropping candidates for each first candidate pool according to the expansion ratio.
  • n and N can be the same or different.
  • the selected cropping candidate regions and some randomly mixed cropping candidate regions form the second candidate pool, and then perform secondary selection therefrom to select m (as can is 3 to 5) optimal cropping candidate regions.
  • the candidate cropping regions may be screened based on region images corresponding to the candidate cropping regions.
  • the electronic device can determine the labeling scores corresponding to the multiple cropping candidate regions according to the screening results. For example, the labeling score corresponding to the cropping candidate region selected twice is 2 points, and The labeling score corresponding to the clipping candidate area is 1 point, and the labeling score corresponding to the cropping candidate area that is not selected is 0 point. It should be noted that, considering aesthetic differences, each training image can be screened by multiple annotators for cropping candidate regions, and then the final screening result is determined based on the selections of multiple annotators.
  • the first backbone network can be used to extract features of different scales for the training image, and then perform feature splicing to form a feature map.
  • the output of the 7th layer, 14th layer and the last layer of MobileNetv2 are processed by upsampling and downsampling, and feature splicing is performed on the channel to form a feature map. Then 1 ⁇ 1 convolution is used for channel dimensionality reduction.
  • the final feature is sent to the multi-layer convolutional layer to output the prediction score s score of the clipping candidate area.
  • the multi-layer convolutional layer is part of the aesthetic evaluation task model.
  • the above implementation process of this application can determine the labeling scores corresponding to the cropping candidate regions according to the screening results corresponding to the selection of the cropping candidate regions by the labeler, determine the target feature based on the RoI feature and RoD feature of the cropping candidate region, and obtain the cropping based on the target feature.
  • the prediction score of the candidate area can then obtain the image aesthetic information of the training image.
  • the image evaluation network model includes one of the following solutions:
  • Image saliency information includes saliency grayscale map and saliency map prediction results
  • the aesthetic evaluation task training can be performed according to the labeling scores and prediction scores of the cropping candidate regions corresponding to multiple training images respectively.
  • the aesthetic evaluation task model trained at this time is the image evaluation network model.
  • the saliency task training can be performed according to the saliency grayscale map and saliency map prediction results corresponding to multiple training images to determine the saliency task Model, the saliency task model trained at this time is the image evaluation network model.
  • the image saliency information includes the saliency grayscale map and the prediction result of the saliency map.
  • the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images and the multiple training images can be respectively
  • the corresponding image saliency information is used for model training, and the image evaluation network model obtained at this time can score the aesthetic features and saliency features of the image.
  • three models can be trained according to different features, so that image scoring can be performed based on any one of the models.
  • the aesthetic evaluation task model the aesthetic feature score of the image can be performed; for the saliency task model, the saliency feature score of the image can be scored; for the model trained based on image aesthetic information and image saliency information, the aesthetic feature of the image can be performed and significant feature scores.
  • the aesthetic evaluation task model the aesthetic evaluation task training is carried out according to the labeling scores and prediction scores of multiple cropping candidate regions corresponding to a plurality of the training images, and the aesthetic evaluation task model is determined, including:
  • the aesthetic evaluation task loss is determined according to the annotation scores and prediction scores of multiple cropping candidate regions corresponding to the training images;
  • the aesthetic evaluation task loss update the model parameters of the aesthetic evaluation task model, so as to perform model training to determine the aesthetic evaluation task model.
  • the aesthetic evaluation task loss can be determined according to the labeling scores and prediction scores of multiple cropping candidate regions corresponding to the training image, based on the aesthetic evaluation task The loss updates the model parameters of the aesthetic evaluation task model.
  • the model parameters of the aesthetic evaluation task model are updated according to the aesthetic evaluation task loss, and then based on the updated aesthetic evaluation task model based on the model parameters, the corresponding Based on the prediction scores of the multiple cropping candidate regions of the second training image, the corresponding aesthetic evaluation task loss is determined based on the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the second training image, and the aesthetic evaluation is updated again according to the aesthetic evaluation task loss
  • the model parameters of the task model continue to execute the aesthetic evaluation task model updated based on the model parameters, obtain the prediction scores of multiple cropping candidate regions corresponding to the next training image and determine the aesthetic evaluation task loss, and update the model parameters based on the aesthetic evaluation task loss The process until it is determined that the loss of the aesthetic evaluation task satisfies the preset conditions, and it is determined that the model training is successful.
  • the network architecture of the aesthetic evaluation task model can be seen in Figure 2, that is, the aesthetic evaluation task model includes a first backbone network, a multi-layer convolutional layer and a feature acquisition architecture between the two, and the first backbone network is used to acquire The features of different scales of the image are then spliced by the feature acquisition framework to obtain the feature map. Based on the feature map, the RoI feature and RoD feature of the candidate area are extracted and combined into the target feature, and the target feature is input into the multi-layer convolution layer to obtain the image. Prediction scores for multiple crop candidate regions.
  • the process of updating the model parameters of the aesthetic evaluation task model can be understood as the process of updating the parameters of the first backbone network and the parameters of the multi-layer convolutional layer.
  • the process of determining the loss of the aesthetic evaluation task according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the training images is described below.
  • the labeling scores s gd of the cropping candidate regions are divided in pairs to obtain the labeling score difference matrix S gd (the number of rows and columns is equal, which is related to the number of cropping candidate regions, such as 5 cropping candidate regions, then is a 5*5 matrix).
  • the first principle is: diagonal elements Set to 0, set the elements of the same proportion to 1, set the difference element between the optimal clipping candidate area and other clipping candidate areas to 2, and set the element in the first matrix corresponding to 0 in the label score difference matrix S gd to 0, and finally get an effective The image pair matrix P, namely the first matrix.
  • the prediction score s score of the clipping candidate area is subtracted in pairs to obtain the prediction score difference matrix S (the number of rows and columns is equal).
  • the edge matrix M is obtained by G*S gd .
  • the ranking loss RankLoss and the score loss ScoreLoss can be calculated.
  • the ranking loss RankLoss is determined according to the predicted score difference matrix S, the second matrix G, the first matrix P, and the edge matrix M, specifically: calculate the product of S and negative G, and then add it to M, the obtained matrix and Multiply P to get the target matrix, compare the elements in the target matrix with 0, take the maximum value, and update the elements in the target matrix. Then for the updated target matrix, accumulate the elements in the target matrix to obtain the first value, and for the first matrix P, accumulate the elements in the first matrix P to obtain the second value, and calculate the first value and the second value
  • the ratio is the ranking loss RankLoss.
  • the calculation expression of sorting loss can be:
  • the score loss ScoreLoss is determined according to the prediction score s score of the clipping candidate area and the labeling score s gd of the clipping candidate area. Specifically, it is calculated based on the L1 smoothing loss function.
  • the calculation expression of the score loss can be:
  • ScoreLoss SmoothL1Loss(s score , s gd ), where SmoothL1Loss is the L1 smoothing loss function.
  • s score is the prediction score of the clipping candidate region
  • s gd is the labeling score of the cropping candidate region
  • the aesthetic evaluation task loss is the weighted sum of ranking loss and score loss.
  • the above-mentioned implementation process of the present application can determine the predicted score difference matrix and the labeled score difference matrix based on the predicted score and labeling score of the clipping candidate area, determine the sorting loss based on the two matrices, and determine the ranking loss according to the predicted score and labeling of the cropping candidate area. Calculate the score loss based on the score value, determine the aesthetic evaluation task loss based on the weighted sum of the ranking loss and the score loss, and adjust the parameters based on the aesthetic evaluation task loss to train the aesthetic evaluation task model.
  • performing saliency task training and determining the saliency task model according to the saliency grayscale images and saliency map prediction results respectively corresponding to the plurality of training images including:
  • model parameters of the saliency task model are updated to perform model training to determine the saliency task model.
  • the saliency task loss can be determined according to the saliency grayscale image corresponding to the training image and the prediction result of the saliency map, and then the model of the saliency task model can be updated based on the saliency task loss parameter.
  • the saliency grayscale image can be determined based on a well-trained large salient object detection (SOD) network model
  • the initial saliency task model obtains the saliency map prediction result corresponding to the first training image, and determines the saliency task loss of the first training image according to the saliency grayscale map and the saliency map prediction result.
  • the model parameters of the saliency task model are updated according to the saliency task loss, and then the saliency map prediction results corresponding to the second training image are obtained based on the updated saliency task model based on the model parameters, and based on the saliency gray corresponding to the second training image
  • SalLoss BCEWithLogitsLoss(s pred , s sod )
  • BCEWithLogitsLoss is the cross-entropy loss function
  • SalLoss is the saliency task loss
  • s pred is the prediction result of the saliency map
  • s sod is the saliency grayscale image.
  • the network architecture of the saliency task model can be seen in Figure 3, that is, the saliency task model includes the first backbone network, cross-stage convolution and multi-layer convolutional layers (which can be compared with the multi-layer convolutional layers of the aesthetic evaluation task model). difference), the first backbone network is used to obtain different scale features of the image, and use cross-stage convolution for feature fusion.
  • each scale of the feature is composed of a set of parallel with different expansion rates Convolution processing, and then generate the highest resolution features through 1 ⁇ 1 convolution across stages, and finally output the saliency map prediction results through multiple convolutional layers.
  • the process of updating the model parameters of the saliency task model can be understood as the process of updating the parameters of the first backbone network, the parameters of the cross-stage convolution and the parameters of the multi-layer convolution layer.
  • the multi-layer features of the first backbone network are fused by using cross-stage convolution to predict the salient regions in the image, and the cross-entropy loss function is used to predict the results and saliency based on the saliency map.
  • Salient task loss is computed on the grayscale image for model training based on the saliency task loss.
  • the saliency task model is the image evaluation network model
  • the image features corresponding to the target image are input into the image evaluation network model, and the feature scores corresponding to the plurality of clipping candidate regions are respectively obtained.
  • the feature score corresponding to the candidate cropping region is determined.
  • the saliency task model is an image evaluation network model
  • the image features corresponding to the target image can be input into the saliency task model, To obtain the salient feature information corresponding to each pixel of the target image. It can be understood that each pixel point of the target image corresponds to a saliency feature value.
  • the feature score at this time is the score that characterizes the significant feature. The higher the proportion of pixels with high salient feature values included in the cropping candidate region, the higher the feature score corresponding to the cropping candidate region.
  • the saliency feature information corresponding to the target image can be obtained, and the corresponding saliency feature information can be determined based on the saliency feature information of the pixel corresponding to the crop candidate area.
  • the feature score can be used to determine the corresponding feature score based on the position of the cropping candidate area on the target image.
  • model training is performed according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images and the image saliency information respectively corresponding to the multiple training images, and the image evaluation network is obtained.
  • models including:
  • the image saliency information includes the saliency grayscale map and saliency map prediction results
  • Joint training is performed based on the aesthetic evaluation task model and the saliency task model to obtain the image evaluation network model.
  • the aesthetic evaluation task training can be carried out according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images, and the aesthetic evaluation task model can be determined; according to the multiple training images respectively
  • the corresponding saliency grayscale map and saliency map prediction results are carried out for saliency task training to determine the saliency task model.
  • the image saliency information includes the saliency grayscale map and saliency map prediction results; then based on the aesthetic evaluation task
  • the model and the saliency task model are jointly trained to obtain the image evaluation network model.
  • both task models can be optimized simultaneously.
  • the saliency task model urges the model to focus on and preserve visually salient regions in the original image
  • the aesthetic evaluation task model urges the model to focus on better aesthetic compositions.
  • the joint learning of the two task models can ensure that the crop candidate region with the highest aesthetic evaluation score is selected among all crop candidate regions containing salient features, so that the cropped image has a high aesthetic score while retaining the original salient features of the image.
  • the following steps are included when performing joint training based on the aesthetic evaluation task model and the saliency task model to obtain the image evaluation network model:
  • the target loss update the model parameters of the aesthetic evaluation task model and the saliency task model to perform model training and obtain the image evaluation network model.
  • the target loss can be determined according to the aesthetic evaluation task loss corresponding to the aesthetic evaluation task model and the saliency task loss corresponding to the saliency task model.
  • the aesthetic evaluation task loss is the final loss corresponding to the aesthetic evaluation task model
  • the saliency task loss is the final loss corresponding to the saliency task model.
  • Target loss loss SalLoss+ ⁇ RankLoss+ ⁇ ScoreLoss, where ⁇ and ⁇ are trade-off parameters, and usually the values of ⁇ and ⁇ can be 1.
  • the process of updating the model parameters of the image evaluation network model based on the target loss is to update the parameters of the first backbone network of the saliency task model, the parameters of the cross-stage convolution and the multi-layer convolution layer, and update the parameters of the aesthetic evaluation task model.
  • the saliency task model and the aesthetic evaluation task model can share the first backbone network.
  • the aesthetic evaluation task loss and the saliency task loss can be re-determined, and then the target loss can be determined, and then continue to update the model parameters based on the target loss, and repeat the above process until the target loss meets the preset The condition determines that the model was trained successfully.
  • the image evaluation network model can be used to crop the target image.
  • Step 401 for multiple training images, determine multiple cropping candidate regions corresponding to each training image.
  • Step 402 for each training image, determine labeling scores of multiple cropping candidate regions corresponding to the training image.
  • Step 403 Perform image data processing on the training image, and acquire features of different scales of the training image based on the first backbone network.
  • Step 404 and step 405 are respectively executed after step 403 .
  • Step 404 Perform aesthetic evaluation task model training based on different scale features of the training image and labeling scores of cropping candidate regions corresponding to the training image.
  • Step 405 Perform saliency task model training based on different scale features of the training image and the saliency grayscale image of the training image.
  • the saliency grayscale image may be acquired in advance.
  • Step 406 performing joint training based on the aesthetic evaluation task model and the saliency task model to determine an image evaluation network model.
  • the above-mentioned process of determining the image evaluation network model based on strategy 1 can use the saliency task as a subtask of joint learning to train the deep network, which can effectively combine the saliency information of the image without increasing the complexity and reasoning performance of the network. , so that the trained image evaluation network model can well output cropped images combining aesthetic and salient features.
  • model training is performed according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images and the image saliency information respectively corresponding to the multiple training images, and the image evaluation network is obtained.
  • models including:
  • the image salient information includes the salient image features, and the salient image features Including the RoI feature and RoD feature of the clipping candidate region;
  • the image saliency information includes the salient image features generated by splicing, and the salient image features include the RoI features and RoD features of the training image, and can further include the RoI features and RoD features of the cropping candidate area.
  • model training is performed according to the annotation scores and updated prediction scores of multiple cropping candidate regions corresponding to multiple training images to obtain an image evaluation network model, and the process of model training will not be described here.
  • Step 501 For multiple training images, determine multiple cropping candidate regions corresponding to each training image.
  • Step 502 for each training image, determine the labeling scores of multiple cropping candidate regions corresponding to the training image.
  • Step 503 perform image data processing on the training image. After step 503, step 504 and step 505 are executed respectively.
  • Step 504 acquiring different scale features of the training image based on the first backbone network.
  • Step 505 Obtain a saliency grayscale image of the training image, and obtain salient features of different scales corresponding to the saliency grayscale image based on the second backbone network.
  • Step 506 is executed after step 504 and step 505 .
  • Step 506 Concatenate salient features of different scales corresponding to the saliency grayscale image with features of different scales of the training image to obtain salient image features.
  • Step 507 Input the salient image features of the training images into the aesthetic evaluation task model for model training, and obtain the image evaluation network model.
  • the above-mentioned process of determining the image evaluation network model based on strategy 2 directly obtains the saliency grayscale image of the original image, and then extracts high-level salient features through the backbone network, splicing the salient features and the features of the original image and sending them into the aesthetic evaluation task model Learning in , can achieve a more direct use of salient features.
  • the determining at least one target cropping candidate region according to the feature scores respectively corresponding to the plurality of cropping candidate regions includes:
  • At least one target feature score greater than a preset score threshold is screened out;
  • the cropping candidate region corresponding to the target feature score is determined as the target cropping candidate region.
  • the corresponding feature scores can be sorted from high to low for each expansion ratio, based on the sorting
  • the target feature score greater than the preset score threshold is screened out, so as to determine at least one target feature score for each expansion ratio, and then the cropping candidate area corresponding to the target feature score is determined as the target cropping candidate area, and the cropping candidate area based on The original image is cropped to obtain a cropped image representing aesthetic and/or salient features.
  • the above is the image cropping method provided by the embodiment of the present application.
  • the target image corresponding to the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region are obtained.
  • the image feature of the image feature, the image feature corresponding to the target image is input into the image evaluation network model, and the feature score of at least one of the representative aesthetic features and the salient features corresponding to the plurality of cropping candidate regions is obtained, and according to the plurality of cropping candidate regions Respectively corresponding feature scores, determine at least one target cropping candidate area and crop the target image, based on aesthetic and/or salient features, efficiently mine the information in the image to ensure the image cropping effect and obtain good image quality cropping image.
  • the target grid can be The cropping candidate area is determined on the basis of , and the selection of the cropping candidate area is realized.
  • Image-based saliency can be achieved by obtaining image saliency information of multiple training images and/or image aesthetic information including annotation scores and prediction scores of cropping candidate regions, performing model training based on training images, and obtaining image evaluation network models. sexual features and/or aesthetic features, and obtain an image evaluation network model for scoring aesthetic features and/or salient features for images.
  • the determination method of the image evaluation network model is enriched; by determining the target crop candidate area based on the feature score and then performing image cropping, it is possible to efficiently mine images based on aesthetic and/or salient features. information to ensure image cropping effect.
  • the image cropping method provided in the embodiment of the present application may be executed by an image cropping device.
  • the image cropping device provided in the embodiment of the present application is described by taking the image cropping method performed by the image cropping device as an example.
  • the embodiment of the present application also provides an image cropping device, as shown in FIG. 6 , including:
  • a determining module 601, configured to determine a plurality of cropping candidate regions corresponding to the target image
  • the first acquiring module 602 is configured to acquire image features corresponding to the target image, where the image features include a first image feature and a second image feature, and the first image feature is the first image feature corresponding to the plurality of cropping candidate regions. an image area association, the second image feature is associated with a second image area in the target image other than the first image area;
  • the second acquiring module 603 is configured to input the image features corresponding to the target image into the image evaluation network model, and acquire feature scores respectively corresponding to the plurality of cropping candidate regions, and the feature scores are used to characterize the cropping candidate regions at least one of aesthetic and distinctive features;
  • the processing module 604 is configured to determine at least one candidate target cropping area according to feature scores respectively corresponding to the multiple candidate cropping areas, and crop the target image according to the candidate target cropping areas.
  • the determination module includes:
  • the first determination submodule is used to determine at least one target grid in the target image in the grid anchor form based on preset composition principles
  • the second determination sub-module is configured to respectively expand the at least one target grid according to at least one expansion ratio, and determine the plurality of cropping candidate regions.
  • the device also includes:
  • the training acquisition module is used to perform model training according to at least one of image saliency information and image aesthetic information corresponding to the plurality of training images to acquire the image evaluation network model.
  • the training acquisition module includes:
  • An acquisition sub-module configured to acquire the image aesthetic information of multiple cropping candidate regions corresponding to each of the training images, where the image aesthetic information includes labeling scores and prediction scores of the cropping candidate regions;
  • the training acquisition sub-module is used to perform model training according to at least one of the image aesthetic information of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively.
  • To train obtain the image evaluation network model.
  • the acquisition submodule includes:
  • the first processing unit is configured to, for each of the training images, obtain screening results obtained by annotators screening multiple cropping candidate regions corresponding to the training images at least twice, and determine the Marking scores corresponding to multiple cropping candidate regions;
  • the second processing unit is used to obtain the feature map of the training image for each of the training images, extract the RoI feature and RoD feature of the cropping candidate region on the feature map and combine them into target features, according to the
  • the target features are used to obtain prediction scores corresponding to the plurality of clipping candidate regions.
  • the training acquisition submodule includes one of the following units:
  • the first training unit is configured to perform aesthetic evaluation task training and determine an aesthetic evaluation task model according to the labeling scores and prediction scores of a plurality of clipping candidate regions respectively corresponding to the plurality of training images, and the aesthetic evaluation task model is the image evaluation network model;
  • the second training unit is configured to perform saliency task training and determine a saliency task model according to the saliency grayscale images and saliency map prediction results respectively corresponding to the plurality of training images, and the saliency task model is the An image evaluation network model, wherein the image saliency information includes a saliency grayscale map and a saliency map prediction result;
  • the third training unit is configured to perform model training according to the labeling scores and prediction scores of the plurality of clipping candidate regions respectively corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively, to obtain The image evaluation network model.
  • the second acquisition module is further used for:
  • the feature score corresponding to the candidate cropping region is determined.
  • the first training unit includes:
  • the first determining subunit is configured to determine an aesthetic evaluation task loss for each of the training images according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the training images;
  • the first update subunit is configured to update the model parameters of the aesthetic evaluation task model according to the aesthetic evaluation task loss, so as to perform model training to determine the aesthetic evaluation task model.
  • the second training unit includes:
  • the second determination subunit is configured to determine the saliency task loss for each of the training images according to the saliency grayscale map corresponding to the training image and the saliency map prediction result;
  • the second updating subunit is configured to update the model parameters of the saliency task model according to the saliency task loss, so as to perform model training to determine the saliency task model.
  • the third training unit includes:
  • the third determining subunit is used to determine the aesthetic evaluation task model according to the labeling scores and prediction scores of the multiple cropping candidate regions respectively corresponding to the multiple training images;
  • the fourth determining subunit is used to determine the saliency task model according to the saliency grayscale images and saliency image prediction results respectively corresponding to the plurality of training images, and the image saliency information includes the saliency grayscale image and the saliency image Sex map prediction results;
  • the first obtaining subunit is configured to perform joint training based on the aesthetic evaluation task model and the saliency task model, and obtain the image evaluation network model.
  • the third training unit includes:
  • a generation subunit is configured to, for each of the training images, generate salient image features according to the training images and the salient grayscale images corresponding to the training images, the image salient information including the salient image features, And the salient image feature includes the RoI feature and the RoD feature of the clipping candidate region;
  • a third updating subunit configured to, for each of the training images, update the prediction scores of the plurality of cropping candidate regions corresponding to the training images according to the salient image features corresponding to the training images;
  • the second acquisition subunit is configured to perform model training according to the labeling scores and updated prediction scores of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and acquire the image evaluation network model.
  • the image cropping device provided in the embodiment of the present application, by determining a plurality of cropping candidate regions corresponding to the target image, obtains the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region corresponding to the target image
  • the image features corresponding to the target image are input into the image evaluation network model, and the feature scores of at least one of the representative aesthetic features and salient features corresponding to the multiple cropping candidate regions are obtained.
  • the multiple cropping candidate regions corresponding to To determine at least one target cropping candidate region and crop the target image, based on the aesthetic and/or salient features, the information in the image can be efficiently mined to ensure the image cropping effect and obtain a cropped image with good image quality.
  • the target grid can be The cropping candidate area is determined on the basis of , and the selection of the cropping candidate area is realized.
  • Image-based saliency can be achieved by obtaining image saliency information of multiple training images and/or image aesthetic information including annotation scores and prediction scores of cropping candidate regions, performing model training based on training images, and obtaining image evaluation network models. sexual features and/or aesthetic features, and obtain an image evaluation network model for scoring aesthetic features and/or salient features for images.
  • the determination method of the image evaluation network model is enriched; by determining the target crop candidate area based on the feature score and then performing image cropping, it is possible to efficiently mine images based on aesthetic and/or salient features. information to ensure image cropping effect.
  • the image cropping apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip.
  • the electronic device may be a terminal, or other devices other than the terminal.
  • the electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) ) equipment, robots, wearable devices, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc.
  • the image cropping device in the embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
  • the image cropping device provided in the embodiment of the present application can realize various processes implemented in the embodiment of the image cropping method shown in FIG. 1 , and details are not repeated here to avoid repetition.
  • the embodiment of the present application further provides an electronic device 700, including a processor 701, a memory 702, and programs or instructions stored in the memory 702 and operable on the processor 701,
  • the program or instruction is executed by the processor 701
  • the various processes of the above-mentioned image cropping method embodiment can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 800 includes, but is not limited to: a radio frequency unit 801, a network module 802, an audio output unit 803, an input unit 804, a sensor 805, a display unit 806, a user input unit 807, an interface unit 808, a memory 809, and a processor 810, etc. .
  • the electronic device 800 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 810 through the power management system, so as to manage charging, discharging and power consumption through the power management system. Management and other functions.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 8 does not constitute a limitation to the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine some components, or arrange different components, and details will not be repeated here. .
  • the processor 810 is configured to: determine a plurality of cropping candidate regions corresponding to the target image; obtain image features corresponding to the target image, the image features include a first image feature and a second image feature, and the first image feature Associated with the first image area corresponding to the plurality of cropping candidate areas, the second image feature is associated with a second image area in the target image other than the first image area;
  • the image features are input into the image evaluation network model, and the feature scores corresponding to the plurality of cropping candidate regions are obtained, and the feature scores are used to characterize at least one of the aesthetic features and salient features of the cropping candidate regions; according to the The feature scores corresponding to the plurality of cropping candidate regions are used to determine at least one target cropping candidate region, and the target image is cropped according to the target cropping candidate region.
  • the processor 810 is further configured to: divide the target image into grid anchor forms; based on preset composition principles, the target image in the grid anchor form Determine at least one target grid; for the at least one target grid, respectively expand according to at least one expansion ratio, and determine the plurality of clipping candidate regions.
  • the processor 810 is further configured to: perform model training according to at least one of image saliency information and image aesthetic information corresponding to a plurality of training images to obtain the image evaluation network model.
  • the processor 810 when performing model training to obtain the image evaluation network model according to at least one of image saliency information and image aesthetic information corresponding to multiple training images, is further configured to: obtain each The image aesthetic information of the multiple cropping candidate regions corresponding to the training image, the image aesthetic information includes the labeling score and the prediction score of the cropping candidate regions; according to the multiple cropping corresponding to the multiple training images At least one of the image aesthetic information of the candidate area and the image saliency information respectively corresponding to the plurality of training images is used for model training to obtain the image evaluation network model.
  • the processor 810 when acquiring the image aesthetic information of the plurality of cropping candidate regions corresponding to each of the training images, is further configured to: for each of the training images, acquire The corresponding multiple cropping candidate regions are screened at least twice to obtain the screening results, and according to the screening results, the respective labeling scores corresponding to the multiple cropping candidate regions are determined; for each of the training images, the The feature map of the training image, extracting the RoI features and RoD features of the clipping candidate regions on the feature map and combining them into target features, and obtaining the respective prediction scores corresponding to the multiple cropping candidate regions according to the target features.
  • the processor 810 is also configured to perform one of the following solutions: perform aesthetic evaluation task training according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images respectively , determine the aesthetic evaluation task model, the aesthetic evaluation task model is the image evaluation network model; perform saliency task training according to the saliency grayscale images and saliency map prediction results respectively corresponding to a plurality of the training images, and determine A saliency task model, the saliency task model is the image evaluation network model, and the image saliency information includes a saliency grayscale map and a saliency map prediction result; according to a plurality of training images respectively corresponding to Clipping the labeling score and prediction score of the candidate area and the image saliency information respectively corresponding to the plurality of training
  • the processor 810 is also configured to: input the image features corresponding to the target image into the saliency task model, and obtain the saliency feature information corresponding to each pixel of the target image; for each of the target images The clipping candidate area, according to the salient feature information corresponding to the pixel points included in the clipping candidate area, determines the feature score corresponding to the clipping candidate area.
  • the processor 810 when performing aesthetic evaluation task training and determining the aesthetic evaluation task model according to the labeling scores and prediction scores of the plurality of clipping candidate regions respectively corresponding to the plurality of training images, is further configured to: For each of the training images, an aesthetic evaluation task loss is determined according to the annotation scores and prediction scores of multiple cropping candidate regions corresponding to the training image; according to the aesthetic evaluation task loss, a model of the aesthetic evaluation task model is updated. Parameters for model training to determine the aesthetic evaluation task model.
  • the processor 810 when performing saliency task training and determining a saliency task model according to the saliency grayscale images and saliency map prediction results respectively corresponding to the plurality of training images, is further configured to: for each For the training image, determine the saliency task loss according to the saliency grayscale image corresponding to the training image and the saliency map prediction result; update the model parameters of the saliency task model according to the saliency task loss to perform Model training determines the saliency task model.
  • model training is performed according to the labeling scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively, to obtain the image evaluation network model
  • the processor 810 is also configured to: determine the aesthetic evaluation task model according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images; determining the saliency task model corresponding to the saliency grayscale image and the saliency map prediction result respectively, and the image saliency information includes the saliency grayscale image and the saliency map prediction result; based on the aesthetic evaluation task model and the The salient task model is jointly trained to obtain the image evaluation network model.
  • model training is performed according to the labeling scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively, to obtain the
  • the processor 810 is further configured to: for each of the training images, generate salient image features according to the training image and the saliency grayscale image corresponding to the training image, and the image saliency information Include the salient image features, and the salient image features include the RoI features and RoD features of the clipping candidate region; for each of the training images, update the corresponding training images according to the salient image features corresponding to the training images Prediction scores of the plurality of cropping candidate regions; performing model training according to the labeling scores and updated prediction scores of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and obtaining the image evaluation network model.
  • the image features corresponding to the target image including the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region are obtained, and the corresponding target image
  • the image features corresponding to the target image including the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region
  • the corresponding target image Input the image features of the image evaluation network model into the image evaluation network model, obtain the feature scores of at least one of the aesthetic features and salient features corresponding to the multiple cropping candidate regions, and determine at least one target according to the feature scores corresponding to the multiple cropping candidate regions
  • Cropping the candidate area and cropping the target image can efficiently mine the information in the image based on aesthetic and/or salient features, so as to ensure the image cropping effect and obtain a cropped image with good image quality.
  • the target grid can be The cropping candidate area is determined on the basis of , and the selection of the cropping candidate area is realized.
  • Image-based saliency can be achieved by obtaining image saliency information of multiple training images and/or image aesthetic information including annotation scores and prediction scores of cropping candidate regions, performing model training based on training images, and obtaining image evaluation network models. sexual features and/or aesthetic features, and obtain an image evaluation network model for scoring aesthetic features and/or salient features for images.
  • the determination method of the image evaluation network model is enriched; by determining the target crop candidate area based on the feature score and then performing image cropping, it is possible to efficiently mine images based on aesthetic and/or salient features. information to ensure image cropping effect.
  • the input unit 804 may include a graphics processor (Graphics Processing Unit, GPU) 8041 and a microphone 8042, and the graphics processor 8041 is used for the image captured by the image capture device in the video capture mode or the image capture mode. (such as a camera) to process the image data of still pictures or videos.
  • the display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 807 includes at least one of a touch panel 8071 and other input devices 8072 .
  • the touch panel 8071 is also called a touch screen.
  • the touch panel 8071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 8072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • Memory 809 may be used to store software programs as well as various data, including but not limited to application programs and operating systems.
  • the processor 810 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user pages, and application programs, and the modem processor mainly processes wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 810 .
  • the memory 809 can be used to store software programs as well as various data.
  • the memory 809 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc.
  • memory 809 may include volatile memory or nonvolatile memory, or, memory 809 may include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • ROM Read-Only Memory
  • PROM programmable read-only memory
  • Erasable PROM Erasable PROM
  • EPROM erasable programmable read-only memory
  • Electrical EPROM Electrical EPROM
  • EEPROM electronically programmable Erase Programmable Read-Only Memory
  • Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM).
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM Double Data Rate SDRAM
  • DDRSDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • Synch link DRAM , SLDRAM
  • Direct Memory Bus Random Access Memory Direct Rambus
  • the processor 810 may include one or more processing units; optionally, the processor 810 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 810 .
  • the embodiment of the present application also provides a readable storage medium.
  • the readable storage medium stores programs or instructions.
  • the program or instructions are executed by the processor, the various processes of the above-mentioned image cropping method embodiments can be achieved, and the same To avoid repetition, the technical effects will not be repeated here.
  • the processor is the processor in the electronic device described in the above embodiments.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk, and the like.
  • the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the above image cropping method embodiment
  • the chip includes a processor and a communication interface
  • the communication interface is coupled to the processor
  • the processor is used to run programs or instructions to implement the above image cropping method embodiment
  • chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
  • the embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above image cropping method embodiment, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
  • the term “comprising”, “comprising” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase “comprising a " does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
  • the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are an image cropping method and apparatus, and an electronic device. The image cropping method comprises: determining a plurality of cropping candidate regions corresponding to a target image; obtaining image features corresponding to the target image, the image features comprising a first image feature and a second image feature, the first image feature being associated with a first image region corresponding to the plurality of cropping candidate regions, and the second image feature being associated with a second image region, other than the first image region, in the target image; inputting the image features corresponding to the target image into an image evaluation network model to obtain feature scores respectively corresponding to the plurality of cropping candidate regions, the feature score being used for representing at least one of an aesthetic feature and a salient feature of the cropping candidate region; and determining at least one target cropping candidate region according to the feature scores respectively corresponding to the plurality of cropping candidate regions, and cropping the target image according to the target cropping candidate region.

Description

图像裁剪方法、装置及电子设备Image cropping method, device and electronic equipment
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年11月29日提交中国专利局、申请号为202111435959.4、名称为“图像裁剪方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application with application number 202111435959.4 and titled "Image Cropping Method, Apparatus, and Electronic Equipment" filed with the China Patent Office on November 29, 2021, the entire contents of which are hereby incorporated by reference in this application .
技术领域technical field
本申请涉及通信技术领域,具体涉及一种图像裁剪方法、装置及电子设备。The present application relates to the field of communication technologies, and in particular to an image cropping method, device and electronic equipment.
背景技术Background technique
随着电子设备的快速发展,采用电子设备拍摄图像成为常见的图像采集方式。电子设备的相册中存储有用户拍摄的图像,由于大部分图像并非由专业摄影师进行拍摄,并且图像展现的画幅不一,导致图像美学质量参差不齐。With the rapid development of electronic devices, using electronic devices to capture images has become a common image acquisition method. Images taken by users are stored in the photo album of electronic devices. Since most of the images are not taken by professional photographers, and the images are displayed in different frames, the aesthetic quality of the images is uneven.
在某些特定场景应用相册中的图像时,需要对图像进行裁剪,以展现图像的图像特征,如,针对展示桌面挂件、相册缩略图、回忆相册或者相册合集的封面的场景,由于这些展示出口往往有着不同地画幅比例,如果仅是对图像进行简单地裁剪生成桌面挂件、相册缩略图或者相册封面,效果往往很差,图像质量不佳。When using the images in the album in some specific scenarios, it is necessary to crop the image to show the image characteristics of the image. For example, for the scenarios of displaying desktop widgets, album thumbnails, memory albums or album collection covers, because these display outlets There are often different aspect ratios. If the image is simply cropped to generate a desktop pendant, album thumbnail or album cover, the effect is often poor and the image quality is not good.
概述overview
本申请实施例的目的是提供一种图像裁剪方法、装置及电子设备,以解决现有技术中在进行图像裁剪时,所得到的图像质量不佳的问题。The purpose of the embodiments of the present application is to provide an image cropping method, device and electronic equipment, so as to solve the problem of poor image quality obtained during image cropping in the prior art.
第一方面,本申请实施例提供了一种图像裁剪方法,包括:In the first aspect, the embodiment of the present application provides an image cropping method, including:
确定目标图像对应的多个裁剪候选框;Determining multiple cropping candidate frames corresponding to the target image;
获取所述目标图像对应的图像特征,所述图像特征包括第一图像特征和第二图像特征,所述第一图像特征与所述多个裁剪候选区域对应的第一图像区域关联,所述第二图像特征与所述目标图像中除所述第一图像区域外的第二图像区域关联;Acquiring image features corresponding to the target image, where the image features include first image features and second image features, the first image features are associated with the first image areas corresponding to the plurality of cropping candidate areas, and the second A second image feature is associated with a second image area in the target image other than the first image area;
将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,所述特征分数用于表征所述裁剪候选区域的美学特征和显著性特征中的至少一项;Input the image features corresponding to the target image into the image evaluation network model, and obtain the feature scores corresponding to the plurality of cropping candidate regions respectively, and the feature scores are used to characterize the aesthetic features and salient features of the cropping candidate regions at least one;
根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,并根据所述目标裁剪候选区域对所述目标图像进行裁剪。Determine at least one candidate target cropping area according to feature scores respectively corresponding to the multiple candidate cropping areas, and crop the target image according to the candidate target cropping areas.
第二方面,本申请实施例提供了一种图像裁剪装置,包括:In a second aspect, an embodiment of the present application provides an image cropping device, including:
确定模块,用于确定目标图像对应的多个裁剪候选区域;A determining module, configured to determine a plurality of cropping candidate regions corresponding to the target image;
第一获取模块,用于获取所述目标图像对应的图像特征,所述图像特征包括第一图像特征和第二图像特征,所述第一图像特征与所述多个裁剪候选区域对应的第一图像区域关联,所述第二图像特征与所述目标图像中除所述第一图像区域外的第二图像区域关联;The first acquisition module is configured to acquire image features corresponding to the target image, the image features include first image features and second image features, and the first image features correspond to the first cropping candidate regions. Image area association, the second image feature is associated with a second image area in the target image other than the first image area;
第二获取模块,用于将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,所述特征分数用于表征所述裁剪候选区域的美学特征和显著性特征中的至少一项;The second acquisition module is configured to input the image features corresponding to the target image into the image evaluation network model, and acquire feature scores respectively corresponding to the plurality of cropping candidate regions, and the feature scores are used to characterize the aesthetics of the cropping candidate regions at least one of characteristics and distinctive features;
处理模块,用于根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,并根据所述目标裁剪候选区域对所述目标图像进行裁剪。A processing module, configured to determine at least one candidate target cropping area according to the feature scores corresponding to the plurality of candidate cropping areas, and crop the target image according to the candidate target cropping area.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In the third aspect, the embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and the programs or instructions are processed by the The steps of the method described in the first aspect are realized when the controller is executed.
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented .
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。In the fifth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions, so as to implement the first aspect the method described.
第六方面,本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如第一方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.
第七方面,本申请实施例提供了一种电子设备,该电子设备被配置成用于执行如第一方面所述的方法。In a seventh aspect, the embodiment of the present application provides an electronic device configured to execute the method described in the first aspect.
在本申请实施例中,通过确定目标图像对应的多个裁剪候选区域, 获取目标图像对应的包括与裁剪候选区域关联的第一图像特征和与非裁剪候选区域关联的第二图像特征的图像特征,将目标图像对应的图像特征输入图像评价网络模型,获取多个裁剪候选区域分别对应的表征美学特征和显著性特征中的至少一项的特征分数,根据多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域并对目标图像进行裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果,获取图像质量佳的裁剪图像。In the embodiment of the present application, by determining a plurality of cropping candidate regions corresponding to the target image, the image features corresponding to the target image including the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region are acquired , input the image features corresponding to the target image into the image evaluation network model, obtain the feature scores of at least one of the representative aesthetic features and salient features corresponding to the multiple cropping candidate regions, and according to the feature scores corresponding to the multiple cropping candidate regions , determining at least one target cropping candidate region and cropping the target image can efficiently mine information in the image based on aesthetic and/or salient features, so as to ensure the image cropping effect and obtain a cropped image with good image quality.
附图说明Description of drawings
图1表示本申请实施例提供的图像裁剪方法示意图;FIG. 1 shows a schematic diagram of an image cropping method provided in an embodiment of the present application;
图2表示本申请实施例提供的美学评价任务模型的网络架构示意图;Figure 2 shows a schematic diagram of the network architecture of the aesthetic evaluation task model provided by the embodiment of the present application;
图3表示本申请实施例提供的显著性任务模型的网络架构示意图;FIG. 3 shows a schematic diagram of the network architecture of the saliency task model provided by the embodiment of the present application;
图4表示本身请实施例提供的基于策略一获取图像评价网络模型的示意图;Fig. 4 represents the schematic diagram of the evaluation network model based on the strategy-acquired image provided by the embodiment;
图5表示本身请实施例提供的基于策略二获取图像评价网络模型的示意图;Fig. 5 represents the schematic diagram of the image evaluation network model obtained based on strategy two provided by the embodiment;
图6表示本申请实施例提供的图像裁剪装置的示意图;FIG. 6 shows a schematic diagram of an image cropping device provided in an embodiment of the present application;
图7是本申请实施例提供的电子设备的示意框图之一;Fig. 7 is one of the schematic block diagrams of the electronic device provided by the embodiment of the present application;
图8是本申请实施例提供的电子设备的示意框图之二。FIG. 8 is a second schematic block diagram of an electronic device provided by an embodiment of the present application.
详细描述A detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。The following will clearly describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application belong to the protection scope of this application.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application can be practiced in sequences other than those illustrated or described herein, and that references to "first," "second," etc. distinguish Objects are generally of one type, and the number of objects is not limited. For example, there may be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的图像裁剪方法进行详细地说明。The image cropping method provided by the embodiment of the present application will be described in detail below through specific embodiments and application scenarios with reference to the accompanying drawings.
本申请实施例提供一种图像裁剪方法,参见图1所示,包括:The embodiment of the present application provides an image cropping method, as shown in FIG. 1, including:
步骤101、确定目标图像对应的多个裁剪候选框。 Step 101. Determine multiple cropping candidate frames corresponding to the target image.
本申请实施例提供的图像裁剪方法,可以首先针对目标图像,确定其对应的多个裁剪候选区域,以便于在多个裁剪候选区域中确定出最终的裁剪区域。在确定多个裁剪候选区域时,可以依据预设构图原则确定,预设构图原则可以基于多种摄影构图原则确定,多种摄影构图原则可以包括但不限于三角形构图原则、对角线构图原则、三分构图原则、天头留白构图原则、运动空白构图原则、均衡稳定构图原则等。In the image cropping method provided in the embodiment of the present application, firstly, a plurality of cropping candidate regions corresponding to the target image may be determined, so as to determine a final cropping region among the multiple cropping candidate regions. When determining multiple cropping candidate areas, it can be determined according to the preset composition principles. The preset composition principles can be determined based on various photographic composition principles. The various photographic composition principles can include but are not limited to triangular composition principles, diagonal composition principles, The principle of three-point composition, the principle of blank composition at the top of the sky, the principle of blank composition for movement, the principle of balanced and stable composition, etc.
步骤102、获取所述目标图像对应的图像特征,所述图像特征包括第一图像特征和第二图像特征,所述第一图像特征与所述多个裁剪候选区域对应的第一图像区域关联,所述第二图像特征与所述目标图像中除所述第一图像区域外的第二图像区域关联。Step 102: Obtain image features corresponding to the target image, the image features include first image features and second image features, the first image features are associated with the first image areas corresponding to the plurality of cropping candidate areas, The second image feature is associated with a second image area in the target image other than the first image area.
针对目标图像而言,需要获取目标图像对应的图像特征。在获取目标图像对应的图像特征之前,首先对目标图像进行图像数据处理,下面对进行图像数据处理的过程进行简要介绍。针对目标图像,采用双线性插值将其调整为256×256大小,并进行数据增强,其中,数据增强可以包括镜像处理,随机旋转处理,高斯噪声处理,归一化处理等。For the target image, it is necessary to obtain the image features corresponding to the target image. Before acquiring the image features corresponding to the target image, image data processing is first performed on the target image, and a process of image data processing is briefly introduced below. For the target image, bilinear interpolation is used to adjust it to a size of 256×256, and data enhancement is performed, wherein data enhancement may include mirror processing, random rotation processing, Gaussian noise processing, normalization processing, etc.
在对目标图像进行图像数据处理之后,将目标图像输入骨干网络(如MobileNetv2),由骨干网络输出多种尺度的特征,并对输出的特征进行特征拼接获取目标图像对应的图像特征。目标图像对应的图像特征可以包括第一图像特征和第二图像特征。第一图像特征为多个裁剪候选区域对应的特征,多个裁剪候选区域与目标图像的第一图像区域关联,即,第一图像特征为与多个裁剪候选区域对应的第一图像区域关联的特征;第二图像特征为与第二图像区域关联的特征,第二图像区域为目标图像中区别于第一图像区域外的图像区域。After processing the image data of the target image, the target image is input into the backbone network (such as MobileNetv2), and the backbone network outputs features of various scales, and the output features are spliced to obtain the image features corresponding to the target image. The image features corresponding to the target image may include first image features and second image features. The first image feature is a feature corresponding to a plurality of cropping candidate areas, and the multiple cropping candidate areas are associated with the first image area of the target image, that is, the first image feature is associated with the first image area corresponding to the multiple cropping candidate areas Features; the second image feature is a feature associated with the second image area, and the second image area is an image area in the target image that is different from the first image area.
步骤103、将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,所述特征分数用于表征所述裁剪候选区域的美学特征和显著性特征中的至少一项。Step 103: Input the image features corresponding to the target image into the image evaluation network model, and obtain the feature scores corresponding to the plurality of cropping candidate regions respectively, and the feature scores are used to characterize the aesthetic features and salience of the cropping candidate regions at least one of the features.
在获取目标图像对应的图像特征之后,可以将目标图像对应的图像特征输入图像评价网络模型,通过图像评价网络模型输出多个裁剪候选区域分别对应的特征分数。其中,图像评价网络模型基于多个训练图像的图像显著性信息和/或多个训练图像的图像美学信息通过模型训练得到,用于针对图像的 裁剪候选区域进行评分。目标图像对应的裁剪候选框区域的特征分数用于表征裁剪候选区域的美学特征和显著性特征中的至少一项。After acquiring the image features corresponding to the target image, the image features corresponding to the target image can be input into the image evaluation network model, and the image evaluation network model outputs the feature scores corresponding to the plurality of clipping candidate regions respectively. Wherein, the image evaluation network model is obtained through model training based on the image saliency information of multiple training images and/or the image aesthetic information of multiple training images, and is used for scoring the cropping candidate regions of the image. The feature score of the cropping candidate frame region corresponding to the target image is used to characterize at least one of the aesthetic feature and the salient feature of the cropping candidate region.
在基于图像评价网络模型获取多个裁剪候选区域分别对应的特征分数之后,执行步骤104。Step 104 is executed after the feature scores corresponding to the plurality of clipping candidate regions are obtained based on the image evaluation network model.
步骤104、根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,并根据所述目标裁剪候选区域对所述目标图像进行裁剪。Step 104: Determine at least one candidate target cropping area according to the feature scores respectively corresponding to the plurality of candidate cropping areas, and crop the target image according to the candidate target cropping area.
在确定多个裁剪候选区域分别对应的特征分数之后,依据多个裁剪候选区域分别对应的特征分数,在多个裁剪候选区域中确定至少一个目标裁剪候选区域,并基于所确定的至少一个目标裁剪候选区域对目标图像进行裁剪,获取裁剪图像。After determining the feature scores corresponding to the plurality of cropping candidate regions, according to the feature scores corresponding to the plurality of cropping candidate regions, at least one target cropping candidate region is determined in the plurality of cropping candidate regions, and based on the determined at least one target cropping The candidate area crops the target image to obtain the cropped image.
本申请上述实施过程,通过确定目标图像对应的多个裁剪候选区域,获取目标图像对应的包括与裁剪候选区域关联的第一图像特征和与非裁剪候选区域关联的第二图像特征的图像特征,将目标图像对应的图像特征输入图像评价网络模型,获取多个裁剪候选区域分别对应的表征美学特征和显著性特征中的至少一项的特征分数,根据多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域并对目标图像进行裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果,获取图像质量佳的裁剪图像。其中,步骤101确定目标图像对应的多个裁剪候选区域,包括:In the above implementation process of the present application, by determining a plurality of cropping candidate regions corresponding to the target image, the image features corresponding to the target image including the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region are obtained, Input the image features corresponding to the target image into the image evaluation network model, obtain the feature scores of at least one of the representative aesthetic features and salient features corresponding to the multiple cropping candidate regions, and according to the feature scores corresponding to the multiple cropping candidate regions, Determining at least one target cropping candidate area and cropping the target image can efficiently mine information in the image based on aesthetic and/or salient features, so as to ensure the image cropping effect and obtain a cropped image with good image quality. Wherein, step 101 determines a plurality of cropping candidate regions corresponding to the target image, including:
将目标图像划分为网格锚形式;Divide the target image into a grid anchor form;
基于预设构图原则,在网格锚形式的所述目标图像中确定至少一个目标网格;determining at least one target grid in the target image in the form of grid anchors based on preset composition principles;
针对所述至少一个目标网格,按照至少一个扩展比例分别进行扩展,确定所述多个裁剪候选区域。For the at least one target grid, respectively expand according to at least one expansion ratio, and determine the plurality of clipping candidate regions.
在确定目标图像对应的多个裁剪候选区域时,需要将目标图像划分为网格锚形式,如将目标图像划分成H×W个小网格块,然后基于预设构图原则,在网格锚形式的目标图像中确定至少一个目标网格。本实施例中,以预设构图原则为三分构图原则为例,对确定裁剪候选区域的过程进行介绍。在将目标图像划分为网格锚形式之后,依据三分构图原则,用两条水平线和两条垂直线相交来划分目标图像,把目标图像分成九个相同大小的大网格块(9宫格)。确定四条三分线所经过的小网格以及目标图像中心大网格所包含的全 部小网格,将这些小网格确定为目标网格,将目标网格的中心作为裁剪候选区域的中心。When determining multiple cropping candidate regions corresponding to the target image, it is necessary to divide the target image into a grid anchor form, such as dividing the target image into H×W small grid blocks, and then based on the preset composition principle, in the grid anchor Identify at least one object grid in the object image of the form. In this embodiment, taking the preset composition principle as the thirds composition principle as an example, the process of determining the cropping candidate region is introduced. After dividing the target image into a grid anchor form, according to the principle of three-point composition, the target image is divided by the intersection of two horizontal lines and two vertical lines, and the target image is divided into nine large grid blocks of the same size (9 grids). ). Determine the small grids that the four three-point lines pass through and all the small grids contained in the large grid in the center of the target image, determine these small grids as the target grid, and use the center of the target grid as the center of the candidate area for cropping.
在确定目标网格之后,针对所确定的至少一个目标网格,按照至少一个扩展比例分别进行扩展,以确定多个裁剪候选区域。在进行扩展时,可以依据目标网格的网格中心按照多种扩展比例进行扩展以得到裁剪候选区域。其中,由于各网格的大小相等,在进行扩展时,实际为对网格的长宽进行扩展以得到裁剪候选区域,即,扩展比例指的是扩展得到的裁剪候选区域的长宽比,通过基于目标网格进行扩展,可以在目标网格的基础上划定包括目标网格以及邻域网格的裁剪候选区域。After the target grid is determined, at least one determined target grid is expanded according to at least one expansion ratio, so as to determine a plurality of cropping candidate regions. When expanding, the target grid center can be expanded according to various expansion ratios to obtain cropping candidate regions. Wherein, since the sizes of each grid are equal, when expanding, the length and width of the grid are actually expanded to obtain the cropping candidate area, that is, the expansion ratio refers to the aspect ratio of the expanded cropping candidate area, by Expanding based on the target grid, the clipping candidate area including the target grid and the neighborhood grid can be delineated on the basis of the target grid.
需要说明的是,通过扩展所得到的裁剪候选区域的左上角和右下角位于小网格中心。为了权衡裁剪候选区域的数量和原图内容完整性,还需要保证裁剪候选区域与原图的面积比值合理(例如,面积比值可以大于0.4)。It should be noted that the upper left corner and the lower right corner of the cropping candidate region obtained by expanding are located at the center of the small grid. In order to balance the number of cropping candidate regions and the integrity of the original image content, it is also necessary to ensure that the area ratio of the cropping candidate regions to the original image is reasonable (for example, the area ratio can be greater than 0.4).
本申请上述实施过程,通过将目标图像划分为网格锚形式,基于预设构图原则,在网格锚形式的目标图像中确定至少一个目标网格,然后针对目标网格进行扩展,可以实现在目标网格的基础上确定裁剪候选区域。In the above implementation process of the present application, by dividing the target image into grid anchor forms, based on preset composition principles, at least one target grid is determined in the target image in the grid anchor form, and then expanded for the target grid, it can be achieved in Cropping candidate regions are determined on the basis of the target grid.
在本申请一可选实施例中,该方法还包括:In an optional embodiment of the present application, the method also includes:
根据多个训练图像对应的图像显著性信息和图像美学信息中的至少一种,进行模型训练以获取所述图像评价网络模型。According to at least one of image saliency information and image aesthetic information corresponding to the plurality of training images, model training is performed to obtain the image evaluation network model.
本申请实施例需要获取多个训练图像,基于多个训练图像的图像显著性信息和图像美学信息中的至少一项,进行模型训练以得到图像评价网络模型。由于图像评价网络模型用于针对图像进行评分,通过获取图像评价网络模型,可以基于图像评价网络模型获取目标图像的多个裁剪候选区域分别对应的特征分数。The embodiment of the present application needs to obtain multiple training images, and perform model training based on at least one of image saliency information and image aesthetic information of the multiple training images to obtain an image evaluation network model. Since the image evaluation network model is used to score images, by acquiring the image evaluation network model, feature scores corresponding to multiple cropping candidate regions of the target image can be obtained based on the image evaluation network model.
其中,所述根据多个训练图像对应的图像显著性信息和图像美学信息中的至少一种,进行模型训练以获取所述图像评价网络模型,包括:Wherein, performing model training according to at least one of image saliency information and image aesthetic information corresponding to a plurality of training images to obtain the image evaluation network model includes:
获取每个所述训练图像对应的多个裁剪候选区域的所述图像美学信息,所述图像美学信息包括所述裁剪候选区域的标注分值以及预测分值;Acquiring the image aesthetic information of a plurality of cropping candidate regions corresponding to each of the training images, the image aesthetic information including labeling scores and prediction scores of the cropping candidate regions;
根据多个所述训练图像分别对应的多个裁剪候选区域的图像美学信息、多个所述训练图像分别对应的所述图像显著性信息中的至少一项,进行模型训练,获取所述图像评价网络模型。performing model training according to at least one of the image aesthetic information of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information respectively corresponding to the plurality of training images, and obtaining the image evaluation network model.
在根据多个训练图像的图像显著性信息和/或多个训练图像的图像美学信息进行模型训练时,可以针对每个训练图像,确定对应的裁剪候选区域。 确定训练图像对应的裁剪候选区域的过程与确定目标图像对应的裁剪候选区域的过程相同,这里不再进一步阐述。When performing model training according to image saliency information of multiple training images and/or image aesthetic information of multiple training images, corresponding cropping candidate regions may be determined for each training image. The process of determining the cropping candidate region corresponding to the training image is the same as the process of determining the cropping candidate region corresponding to the target image, and will not be further elaborated here.
在针对每个训练图像确定对应的裁剪候选区域之后,可以针对每个训练图像,获取对应的多个裁剪候选区域的标注分值和预测分值。针对每个训练图像,可以获取对应的图像显著性信息。然后根据多个训练图像对应的裁剪候选区域的标注分值和预测分值和/或多个训练图像对应的图像显著性信息,进行模型训练,通过模型训练获取图像评价网络模型。After the corresponding cropping candidate regions are determined for each training image, labeling scores and prediction scores of the corresponding multiple cropping candidate regions may be acquired for each training image. For each training image, the corresponding image saliency information can be obtained. Then perform model training according to the annotation scores and prediction scores of the cropping candidate regions corresponding to the multiple training images and/or the image saliency information corresponding to the multiple training images, and obtain the image evaluation network model through the model training.
其中,训练图像的图像美学信息包括训练图像的裁剪候选区域的标注分值以及预测分值,标注分值为标注人员基于自身的审美标准对训练图像的裁剪候选区域进行美学标注所得到的的分值,预测分值为基于多层卷积层对训练图像的裁剪候选区域进行特征预测所得到的分值。Among them, the image aesthetic information of the training image includes the labeling score and prediction score of the cropping candidate region of the training image, and the labeling score is the score obtained by the labeling personnel aesthetically labeling the cropping candidate region of the training image based on their own aesthetic standards. Value, the prediction score is the score obtained by predicting the features of the cropping candidate regions of the training image based on the multi-layer convolutional layer.
本申请上述实施过程,通过获取多个训练图像的图像显著性信息和/或包括裁剪候选区域的标注分值以及预测分值的图像美学信息,根据训练图像对应的裁剪候选区域的标注分值和预测分值,和/或,训练图像的图像显著性信息,进行模型训练,获取图像评价网络模型,可以实现基于训练图像的显著性特征和美学特征中的至少一项,获取用于针对图像进行美学特征和/或显著性特征评分的图像评价网络模型。In the above-mentioned implementation process of the present application, by acquiring image saliency information of multiple training images and/or image aesthetic information including labeling scores and prediction scores of candidate cropping regions, according to the labeling scores and The prediction score, and/or, the image saliency information of the training image, performs model training, and obtains the image evaluation network model, which can realize at least one of the saliency features and aesthetic features based on the training image, and is used for image evaluation. Image evaluation network model for scoring aesthetic features and/or salient features.
可选地,所述获取每个所述训练图像对应的多个裁剪候选区域的所述图像美学信息,包括:Optionally, the acquiring the image aesthetic information of the multiple cropping candidate regions corresponding to each of the training images includes:
针对每个所述训练图像,获取标注人员对所述训练图像对应的多个裁剪候选区域进行至少两次筛选所得到的筛选结果,并根据所述筛选结果确定所述多个裁剪候选区域分别对应的标注分值;For each of the training images, obtain the screening results obtained by the labeling staff for at least two screenings of the multiple cropping candidate regions corresponding to the training image, and determine according to the screening results that the multiple cropping candidate regions correspond to label score;
针对每个所述训练图像,获取所述训练图像的特征图,在所述特征图上提取所述裁剪候选区域的感兴趣区域(region of interest,RoI)特征和丢弃区域(region of discard,RoD)特征并组合为目标特征,根据所述目标特征获取所述多个裁剪候选区域分别对应的预测分值。For each of the training images, obtain the feature map of the training image, extract the region of interest (region of interest, RoI) feature and discarding region (region of discard, RoD) of the cropping candidate region on the feature map ) features are combined into target features, and prediction scores corresponding to the plurality of clipping candidate regions are obtained according to the target features.
在针对每个训练图像,获取训练图像对应的多个裁剪候选区域的标注分值时,可以获取标注人员对训练图像对应的多个裁剪候选区域进行至少两次筛选所得到的筛选结果,然后基于所获取的筛选结果,确定当前训练图像的多个裁剪候选区域分别对应的标注分值。For each training image, when obtaining the labeling scores of multiple cropping candidate regions corresponding to the training image, the screening results obtained by the labeling staff for at least two screenings of the multiple cropping candidate regions corresponding to the training image can be obtained, and then based on Based on the obtained screening results, labeling scores corresponding to multiple cropping candidate regions of the current training image are determined.
下面对获取裁剪候选区域对应的标注分值的具体过程进行介绍。首先针对每个训练图像而言,基于现有基准模型对训练图像对应的多个裁剪候选区 域对应的区域图像进行打分,针对每个扩展比例,输出得分前N的裁剪候选框,并针对每个扩展比例,在未选中的裁剪候选框中随机选择K个裁剪候选框输出,然后由标注人员进行打分,实现基于基准模型过滤掉一部分裁剪候选框。The specific process of obtaining the labeling score corresponding to the clipping candidate region is introduced below. First, for each training image, based on the existing benchmark model, the region images corresponding to the multiple cropping candidate regions corresponding to the training image are scored, and for each expansion ratio, the cropping candidate frames with the top N scores are output, and for each Expand the scale, randomly select K cropping candidate boxes from the unselected cropping candidate boxes, and then mark them by the labeler, so as to filter out some cropping candidate boxes based on the benchmark model.
考虑到裁剪候选区域相似度较高,如果让标注人员直接进行打分(比如0到5分)难度较高。因此,可以首先针对每个扩展比例,将输出的裁剪候选区域组成第一候选池,标注人员依照扩展比例针对每个第一候选池分别选出n个(如可以为3到5个)裁剪候选区域,n与N的取值可以相同或者不同。在选择完后,再将选出的裁剪候选区域和部分随机混入的裁剪候选区域(未选中的部分裁剪候选区域)组成第二候选池,然后从中进行二次选择,选出m个(如可以为3到5个)最优的裁剪候选区域。其中筛选裁剪候选区域时,可以基于裁剪候选区域对应的区域图像进行裁剪候选区域的筛选。Considering the high similarity of cropping candidate regions, it is more difficult for labelers to directly score (for example, 0 to 5 points). Therefore, for each expansion ratio, the output cropping candidate regions can be combined into a first candidate pool, and the annotators select n (for example, 3 to 5) cropping candidates for each first candidate pool according to the expansion ratio. The values of n and N can be the same or different. After the selection, the selected cropping candidate regions and some randomly mixed cropping candidate regions (unselected partial cropping candidate regions) form the second candidate pool, and then perform secondary selection therefrom to select m (as can is 3 to 5) optimal cropping candidate regions. When screening the candidate cropping regions, the candidate cropping regions may be screened based on region images corresponding to the candidate cropping regions.
在标注人员完成筛选之后,电子设备可以根据筛选结果确定多个裁剪候选区域分别对应的标注分值,如,被选中2次的裁剪候选区域对应的标注分值为2分,被选中1次的裁剪候选区域对应的标注分值为1分,没有被选中的裁剪候选区域对应的标注分值为0分。需要说明的是,考虑到审美差异性,每张训练图像可以由多个标注人员进行裁剪候选区域的筛选,然后基于多个标注人员的选择确定最终的筛选结果。After the labeler completes the screening, the electronic device can determine the labeling scores corresponding to the multiple cropping candidate regions according to the screening results. For example, the labeling score corresponding to the cropping candidate region selected twice is 2 points, and The labeling score corresponding to the clipping candidate area is 1 point, and the labeling score corresponding to the cropping candidate area that is not selected is 0 point. It should be noted that, considering aesthetic differences, each training image can be screened by multiple annotators for cropping candidate regions, and then the final screening result is determined based on the selections of multiple annotators.
在针对每个训练图像,获取训练图像对应的多个裁剪候选区域的预测分值时,可以利用第一骨干网络针对训练图像进行不同尺度的特征提取,然后进行特征拼接组成特征图。如MobileNetv2的第7层、第14层和最后一层的输出经过上采样和下采样处理,在通道上进行特征拼接组成特征图。然后采用1×1卷积进行通道降维。根据裁剪候选区域在特征图上应用感兴趣区域算子(RoIAlign)和丢弃区域算子(RoDAlign)提取裁剪候选区域的RoI特征和RoD特征,将RoI特征和RoD特征进行拼接组成最终特征(目标特征),将最终特征送入多层卷积层中输出裁剪候选区域的预测分值s score。其中,多层卷积层属于美学评价任务模型的一部分。 For each training image, when obtaining the prediction scores of multiple cropping candidate regions corresponding to the training image, the first backbone network can be used to extract features of different scales for the training image, and then perform feature splicing to form a feature map. For example, the output of the 7th layer, 14th layer and the last layer of MobileNetv2 are processed by upsampling and downsampling, and feature splicing is performed on the channel to form a feature map. Then 1×1 convolution is used for channel dimensionality reduction. Apply the region of interest operator (RoIAlign) and discarding region operator (RoDAlign) on the feature map according to the clipping candidate area to extract the RoI feature and RoD feature of the cropping candidate area, and splicing the RoI feature and RoD feature to form the final feature (target feature ), the final feature is sent to the multi-layer convolutional layer to output the prediction score s score of the clipping candidate area. Among them, the multi-layer convolutional layer is part of the aesthetic evaluation task model.
本申请上述实施过程,可以根据标注人员对裁剪候选区域进行选择对应的筛选结果确定裁剪候选区域分别对应的标注分值,基于裁剪候选区域的RoI特征和RoD特征确定目标特征,基于目标特征获取裁剪候选区域的预测分值,进而可以获取训练图像的图像美学信息。The above implementation process of this application can determine the labeling scores corresponding to the cropping candidate regions according to the screening results corresponding to the selection of the cropping candidate regions by the labeler, determine the target feature based on the RoI feature and RoD feature of the cropping candidate region, and obtain the cropping based on the target feature. The prediction score of the candidate area can then obtain the image aesthetic information of the training image.
可选地,根据多个所述训练图像分别对应的多个裁剪候选区域的图像美 学信息、多个所述训练图像分别对应的所述图像显著性信息中的至少一项,进行模型训练,获取所述图像评价网络模型,包括以下方案其中之一:Optionally, perform model training according to at least one of the image aesthetic information of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and the image saliency information corresponding to the plurality of training images respectively, to obtain The image evaluation network model includes one of the following solutions:
根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型,所述美学评价任务模型为所述图像评价网络模型;Perform aesthetic evaluation task training according to the labeling scores and prediction scores of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and determine the aesthetic evaluation task model, where the aesthetic evaluation task model is the image evaluation network model;
根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型,所述显著性任务模型为所述图像评价网络模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;Perform saliency task training according to the saliency grayscale images and saliency map prediction results respectively corresponding to a plurality of training images, and determine a saliency task model, where the saliency task model is the image evaluation network model, and the saliency task model is determined. Image saliency information includes saliency grayscale map and saliency map prediction results;
根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型。Performing model training to acquire the image evaluation network model according to the annotation scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively.
在根据多个训练图像分别对应的多个裁剪候选区域的图像美学信息进行模型训练时,可以根据多个训练图像分别对应的裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,以确定美学评价任务模型,此时训练得到的美学评价任务模型即为图像评价网络模型。When performing model training according to the image aesthetic information of multiple cropping candidate regions corresponding to multiple training images, the aesthetic evaluation task training can be performed according to the labeling scores and prediction scores of the cropping candidate regions corresponding to multiple training images respectively, To determine the aesthetic evaluation task model, the aesthetic evaluation task model trained at this time is the image evaluation network model.
在根据多个训练图像分别对应的图像显著性信息进行模型训练时,可以根据多个训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,以确定显著性任务模型,此时训练得到的显著性任务模型即为图像评价网络模型,在此实施过程中,图像显著性信息包括显著性灰度图和显著性图预测结果。When performing model training based on the image saliency information corresponding to multiple training images, the saliency task training can be performed according to the saliency grayscale map and saliency map prediction results corresponding to multiple training images to determine the saliency task Model, the saliency task model trained at this time is the image evaluation network model. During this implementation process, the image saliency information includes the saliency grayscale map and the prediction result of the saliency map.
在根据多个训练图像分别对应的图像美学信息以及图像显著性信息进行模型训练时,可以根据多个训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个训练图像分别对应的图像显著性信息,进行模型训练,此时获取的图像评价网络模型可以对图像进行美学特征和显著性特征评分。When performing model training according to the image aesthetic information and image saliency information corresponding to multiple training images, the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images and the multiple training images can be respectively The corresponding image saliency information is used for model training, and the image evaluation network model obtained at this time can score the aesthetic features and saliency features of the image.
上述过程,可以根据不同的特征训练三种模型,以便于可以基于任意一种模型进行图像评分。针对美学评价任务模型,可以对图像进行美学特征评分;针对显著性任务模型,可以对图像进行显著性特征评分;针对基于图像美学信息以及图像显著性信息训练得到的模型,可以对图像进行美学特征和显著性特征评分。In the above process, three models can be trained according to different features, so that image scoring can be performed based on any one of the models. For the aesthetic evaluation task model, the aesthetic feature score of the image can be performed; for the saliency task model, the saliency feature score of the image can be scored; for the model trained based on image aesthetic information and image saliency information, the aesthetic feature of the image can be performed and significant feature scores.
下面对三种模型的训练过程分别进行介绍。针对美学评价任务模型而 言,所述根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型,包括:The training process of the three models is introduced respectively in the following. For the aesthetic evaluation task model, the aesthetic evaluation task training is carried out according to the labeling scores and prediction scores of multiple cropping candidate regions corresponding to a plurality of the training images, and the aesthetic evaluation task model is determined, including:
针对每个所述训练图像,根据所述训练图像对应的多个裁剪候选区域的标注分值和预测分值确定美学评价任务损失;For each of the training images, the aesthetic evaluation task loss is determined according to the annotation scores and prediction scores of multiple cropping candidate regions corresponding to the training images;
根据所述美学评价任务损失,更新所述美学评价任务模型的模型参数,以进行模型训练确定所述美学评价任务模型。According to the aesthetic evaluation task loss, update the model parameters of the aesthetic evaluation task model, so as to perform model training to determine the aesthetic evaluation task model.
在根据训练图像进行模型训练以确定美学评价任务模型时,可以针对每个训练图像,根据训练图像对应的多个裁剪候选区域的标注分值和预测分值确定美学评价任务损失,基于美学评价任务损失更新美学评价任务模型的模型参数。具体为:在根据第一训练图像得到对应的美学评价任务损失之后,根据美学评价任务损失更新美学评价任务模型的模型参数,然后基于模型参数更新后的美学评价任务模型,获取第二训练图像对应的多个裁剪候选区域的预测分值,并基于第二训练图像对应的多个裁剪候选区域的标注分值和预测分值,确定对应的美学评价任务损失,再次根据美学评价任务损失更新美学评价任务模型的模型参数,继续执行基于模型参数更新后的美学评价任务模型,获取下一个训练图像对应的多个裁剪候选区域的预测分值并确定美学评价任务损失、基于美学评价任务损失更新模型参数的过程,直至确定美学评价任务损失满足预设条件,确定模型训练成功。When performing model training based on training images to determine the aesthetic evaluation task model, for each training image, the aesthetic evaluation task loss can be determined according to the labeling scores and prediction scores of multiple cropping candidate regions corresponding to the training image, based on the aesthetic evaluation task The loss updates the model parameters of the aesthetic evaluation task model. Specifically, after the corresponding aesthetic evaluation task loss is obtained according to the first training image, the model parameters of the aesthetic evaluation task model are updated according to the aesthetic evaluation task loss, and then based on the updated aesthetic evaluation task model based on the model parameters, the corresponding Based on the prediction scores of the multiple cropping candidate regions of the second training image, the corresponding aesthetic evaluation task loss is determined based on the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the second training image, and the aesthetic evaluation is updated again according to the aesthetic evaluation task loss The model parameters of the task model continue to execute the aesthetic evaluation task model updated based on the model parameters, obtain the prediction scores of multiple cropping candidate regions corresponding to the next training image and determine the aesthetic evaluation task loss, and update the model parameters based on the aesthetic evaluation task loss The process until it is determined that the loss of the aesthetic evaluation task satisfies the preset conditions, and it is determined that the model training is successful.
其中,美学评价任务模型的网络架构可参见图2所示,即,美学评价任务模型包括第一骨干网络、多层卷积层以及两者之间的特征获取架构,第一骨干网络用于获取图像的不同尺度特征,然后由特征获取架构进行特征拼接得到特征图,基于特征图进行裁剪候选区域的RoI特征和RoD特征提取并组合为目标特征,将目标特征输入多层卷积层得到图像的多个裁剪候选区域的预测分值。其中,对美学评价任务模型进行模型参数更新的过程,可以理解为对第一骨干网络的参数以及多层卷积层的参数进行更新的过程。Among them, the network architecture of the aesthetic evaluation task model can be seen in Figure 2, that is, the aesthetic evaluation task model includes a first backbone network, a multi-layer convolutional layer and a feature acquisition architecture between the two, and the first backbone network is used to acquire The features of different scales of the image are then spliced by the feature acquisition framework to obtain the feature map. Based on the feature map, the RoI feature and RoD feature of the candidate area are extracted and combined into the target feature, and the target feature is input into the multi-layer convolution layer to obtain the image. Prediction scores for multiple crop candidate regions. Among them, the process of updating the model parameters of the aesthetic evaluation task model can be understood as the process of updating the parameters of the first backbone network and the parameters of the multi-layer convolutional layer.
下面对根据述训练图像对应的多个裁剪候选区域的标注分值和预测分值确定美学评价任务损失的过程进行阐述。针对训练图像而言,将裁剪候选区域的标注分值s gd两两作差,得到标注得分差值矩阵S gd(行列数相等,与裁剪候选区域的数量相关,如5个裁剪候选区域,则为5*5矩阵)。根据标注得分差值矩阵S gd构建第一全零矩阵,然后针对第一全零矩阵,依据第一原则对第一全零矩阵进行修改得到第一矩阵,其中第一原则为:对角线元素置0,同比例元素置1,最优裁剪候选区域与其他裁剪候选区域的差值元素 置2,标注得分差值矩阵S gd元素为0的对应第一矩阵中的元素置0,最终得到有效的图片对矩阵P,即第一矩阵。 The process of determining the loss of the aesthetic evaluation task according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the training images is described below. For the training image, the labeling scores s gd of the cropping candidate regions are divided in pairs to obtain the labeling score difference matrix S gd (the number of rows and columns is equal, which is related to the number of cropping candidate regions, such as 5 cropping candidate regions, then is a 5*5 matrix). Construct the first all-zero matrix according to the marked score difference matrix S gd , and then modify the first all-zero matrix according to the first principle to obtain the first matrix, wherein the first principle is: diagonal elements Set to 0, set the elements of the same proportion to 1, set the difference element between the optimal clipping candidate area and other clipping candidate areas to 2, and set the element in the first matrix corresponding to 0 in the label score difference matrix S gd to 0, and finally get an effective The image pair matrix P, namely the first matrix.
针对训练图像而言,将裁剪候选区域的预测分值s score两两作差,得到预测得分差值矩阵S(行列数相等)。根据预测得分差值矩阵S构建第二全零矩阵,依据第二原则修改第二全零矩阵得到第二矩阵G,第二原则为:预测得分差值矩阵S中元素大于0,则G中元素置1,否则G中元素置-1。边缘矩阵M由G*S gd得到。 For the training image, the prediction score s score of the clipping candidate area is subtracted in pairs to obtain the prediction score difference matrix S (the number of rows and columns is equal). Construct the second all-zero matrix according to the predicted score difference matrix S, modify the second all-zero matrix according to the second principle to obtain the second matrix G, the second principle is: the elements in the predicted score difference matrix S are greater than 0, then the elements in G Set to 1, otherwise the elements in G are set to -1. The edge matrix M is obtained by G*S gd .
在确定标注得分差值矩阵S gd、第一矩阵P、预测得分差值矩阵S、第二矩阵G以及边缘矩阵M之后,可以计算排序损失RankLoss以及得分损失ScoreLoss。 After determining the annotation score difference matrix S gd , the first matrix P, the prediction score difference matrix S, the second matrix G, and the edge matrix M, the ranking loss RankLoss and the score loss ScoreLoss can be calculated.
其中,排序损失RankLoss根据预测得分差值矩阵S、第二矩阵G、第一矩阵P以及边缘矩阵M确定,具体为:计算S与负G的乘积,然后与M相加,所得到的矩阵与P相乘,得到目标矩阵,将目标矩阵中的元素分别与0相比,取最大值,实现对目标矩阵中的元素进行更新。然后针对更新后的目标矩阵,累加目标矩阵中的各元素,得到第一数值,并针对第一矩阵P,累加第一矩阵P中的各元素得到第二数值,计算第一数值与第二数值之比,得到排序损失RankLoss。排序损失的计算表达式可以为:Among them, the ranking loss RankLoss is determined according to the predicted score difference matrix S, the second matrix G, the first matrix P, and the edge matrix M, specifically: calculate the product of S and negative G, and then add it to M, the obtained matrix and Multiply P to get the target matrix, compare the elements in the target matrix with 0, take the maximum value, and update the elements in the target matrix. Then for the updated target matrix, accumulate the elements in the target matrix to obtain the first value, and for the first matrix P, accumulate the elements in the first matrix P to obtain the second value, and calculate the first value and the second value The ratio is the ranking loss RankLoss. The calculation expression of sorting loss can be:
RankLoss=sum(max((S*(-G)+M)*P),0)/sum(P)RankLoss=sum(max((S*(-G)+M)*P), 0)/sum(P)
上述表达式中-G表示对第二矩阵中的元素取负,(S*(-G)+M)表示S与负G的乘积,然后与M相加,然后所得矩阵与P矩阵相乘,得到目标矩阵。(max((S*(-G)+M)*P),0)表示将目标矩阵中的元素分别与0相比,取最大值,实现对目标矩阵中的元素进行更新。sum(max((S*(-G)+M)*P),0)表示针对更新后的目标矩阵,累加目标矩阵中的各元素,sum(P)表示累加第一矩阵中的各元素。In the above expression -G means that the elements in the second matrix are negative, (S*(-G)+M) means the product of S and negative G, and then added to M, and then the resulting matrix is multiplied by the P matrix, Get the target matrix. (max((S*(-G)+M)*P), 0) indicates that the elements in the target matrix are compared with 0 respectively, and the maximum value is taken to update the elements in the target matrix. sum(max((S*(-G)+M)*P), 0) means to add up each element in the target matrix for the updated target matrix, and sum(P) means to add up each element in the first matrix.
其中,得分损失ScoreLoss根据裁剪候选区域的预测分值s score以及裁剪候选区域的标注分值s gd确定,具体为基于L1平滑损失函数进行计算,得分损失的计算表达式可以为: Among them, the score loss ScoreLoss is determined according to the prediction score s score of the clipping candidate area and the labeling score s gd of the clipping candidate area. Specifically, it is calculated based on the L1 smoothing loss function. The calculation expression of the score loss can be:
ScoreLoss=SmoothL1Loss(s score,s gd),其中SmoothL1Loss为L1平滑损失函数。 ScoreLoss=SmoothL1Loss(s score , s gd ), where SmoothL1Loss is the L1 smoothing loss function.
其中,s score为裁剪候选区域的预测分值,s gd为裁剪候选区域的标注分值,美学评价任务损失为排序损失和得分损失的加权和。 Among them, s score is the prediction score of the clipping candidate region, s gd is the labeling score of the cropping candidate region, and the aesthetic evaluation task loss is the weighted sum of ranking loss and score loss.
本申请上述实施过程,可以基于裁剪候选区域的预测分值和标注分值确 定预测得分差值矩阵和标注得分差值矩阵,基于两个矩阵确定排序损失,依据裁剪候选区域的预测分值以及标注分值计算得分损失,依据排序损失和得分损失的加权和确定美学评价任务损失,基于美学评价任务损失调整参数以训练美学评价任务模型。The above-mentioned implementation process of the present application can determine the predicted score difference matrix and the labeled score difference matrix based on the predicted score and labeling score of the clipping candidate area, determine the sorting loss based on the two matrices, and determine the ranking loss according to the predicted score and labeling of the cropping candidate area. Calculate the score loss based on the score value, determine the aesthetic evaluation task loss based on the weighted sum of the ranking loss and the score loss, and adjust the parameters based on the aesthetic evaluation task loss to train the aesthetic evaluation task model.
针对显著性任务模型而言,所述根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型,包括:For the saliency task model, performing saliency task training and determining the saliency task model according to the saliency grayscale images and saliency map prediction results respectively corresponding to the plurality of training images, including:
针对每个所述训练图像,根据所述训练图像对应的显著性灰度图和显著性图预测结果确定显著性任务损失;For each of the training images, determining the saliency task loss according to the saliency grayscale map corresponding to the training image and the saliency map prediction result;
根据所述显著性任务损失,更新所述显著性任务模型的模型参数,以进行模型训练确定所述显著性任务模型。在确定显著性任务模型时,可以针对每个训练图像,根据训练图像对应的显著性灰度图和显著性图预测结果确定显著性任务损失,进而基于显著性任务损失更新显著性任务模型的模型参数。具体为:针对第一训练图像,获取第一训练图像的显著性灰度图(显著性灰度图可基于训练完备的大型显著性目标检测(salient object detection,SOD)网络模型确定),然后利用初始显著性任务模型获取第一训练图像对应的显著性图预测结果,根据显著性灰度图和显著性图预测结果确定第一训练图像的显著性任务损失。根据显著性任务损失更新显著性任务模型的模型参数,然后基于模型参数更新后的显著性任务模型,获取第二训练图像对应的显著性图预测结果,并基于第二训练图像对应的显著性灰度图和显著性图预测结果确定第二训练图像的显著性任务损失,再次根据显著性任务损失更新显著性任务模型,继续执行基于模型参数更新后的显著性任务模型,获取下一个训练图像对应的显著性图预测结果并确定显著性任务损失、基于显著性任务损失更新模型参数的过程,直至确定显著性任务损失满足预设条件,确定模型训练成功。According to the saliency task loss, model parameters of the saliency task model are updated to perform model training to determine the saliency task model. When determining the saliency task model, for each training image, the saliency task loss can be determined according to the saliency grayscale image corresponding to the training image and the prediction result of the saliency map, and then the model of the saliency task model can be updated based on the saliency task loss parameter. Specifically: for the first training image, obtain the saliency grayscale image of the first training image (the saliency grayscale image can be determined based on a well-trained large salient object detection (SOD) network model), and then use The initial saliency task model obtains the saliency map prediction result corresponding to the first training image, and determines the saliency task loss of the first training image according to the saliency grayscale map and the saliency map prediction result. The model parameters of the saliency task model are updated according to the saliency task loss, and then the saliency map prediction results corresponding to the second training image are obtained based on the updated saliency task model based on the model parameters, and based on the saliency gray corresponding to the second training image Determine the saliency task loss of the second training image based on the prediction results of the degree map and saliency map, update the saliency task model according to the saliency task loss again, continue to execute the saliency task model updated based on the model parameters, and obtain the next training image corresponding to The process of predicting the results of the saliency map and determining the saliency task loss, and updating the model parameters based on the saliency task loss, until the saliency task loss meets the preset conditions, and the model training is confirmed to be successful.
其中,显著性任务损失的计算过程为:SalLoss=BCEWithLogitsLoss(s pred,s sod),BCEWithLogitsLoss为交叉熵损失函数,SalLoss为显著性任务损失,s pred为显著性图预测结果,s sod为显著性灰度图。 Among them, the calculation process of the saliency task loss is: SalLoss=BCEWithLogitsLoss(s pred , s sod ), BCEWithLogitsLoss is the cross-entropy loss function, SalLoss is the saliency task loss, s pred is the prediction result of the saliency map, s sod is the saliency grayscale image.
显著性任务模型的网络架构可参见图3所示,即,显著性任务模型包括第一骨干网络、跨阶段卷积以及多层卷积层(与美学评价任务模型的多层卷积层可以相区别),第一骨干网络用于获取图像的不同尺度特征,利用跨阶段卷积进行特征融合,为了在粒度级别上提取多尺度特征,特征的每个尺度 都由一组具有不同扩展率的并行卷积处理,然后经跨阶段1×1卷积生成最高分辨率的特征,最后经多层卷积层输出显著性图预测结果。其中,对显著性任务模型进行模型参数更新的过程,可以理解为对第一骨干网络的参数、跨阶段卷积以及多层卷积层的参数进行更新的过程。The network architecture of the saliency task model can be seen in Figure 3, that is, the saliency task model includes the first backbone network, cross-stage convolution and multi-layer convolutional layers (which can be compared with the multi-layer convolutional layers of the aesthetic evaluation task model). difference), the first backbone network is used to obtain different scale features of the image, and use cross-stage convolution for feature fusion. In order to extract multi-scale features at the granular level, each scale of the feature is composed of a set of parallel with different expansion rates Convolution processing, and then generate the highest resolution features through 1 × 1 convolution across stages, and finally output the saliency map prediction results through multiple convolutional layers. Among them, the process of updating the model parameters of the saliency task model can be understood as the process of updating the parameters of the first backbone network, the parameters of the cross-stage convolution and the parameters of the multi-layer convolution layer.
本申请上述实施过程,在整幅图像上,利用跨阶段卷积对第一骨干网络的多层特征进行融合,预测图像中的显著区域,应用交叉熵损失函数基于显著性图预测结果和显著性灰度图计算显著性任务损失,以基于显著性任务损失进行模型训练。In the above implementation process of this application, on the entire image, the multi-layer features of the first backbone network are fused by using cross-stage convolution to predict the salient regions in the image, and the cross-entropy loss function is used to predict the results and saliency based on the saliency map. Salient task loss is computed on the grayscale image for model training based on the saliency task loss.
其中,在所述显著性任务模型为所述图像评价网络模型的情况下,所述将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,包括:Wherein, when the saliency task model is the image evaluation network model, the image features corresponding to the target image are input into the image evaluation network model, and the feature scores corresponding to the plurality of clipping candidate regions are respectively obtained. ,include:
将所述目标图像对应的图像特征输入所述显著性任务模型,获取所述目标图像各像素点对应的显著性特征信息;inputting image features corresponding to the target image into the saliency task model, and obtaining saliency feature information corresponding to each pixel of the target image;
针对所述目标图像的每一个裁剪候选区域,根据所述裁剪候选区域所包括的像素点所对应的显著性特征信息,确定所述裁剪候选区域对应的特征分数。For each candidate cropping region of the target image, according to the salient feature information corresponding to the pixels included in the candidate cropping region, the feature score corresponding to the candidate cropping region is determined.
针对显著性任务模型为图像评价网络模型的情况,在基于显著性任务模型获取目标图像的多个裁剪候选区域分别对应的特征分数时,可以将目标图像对应的图像特征输入显著性任务模型中,以获取目标图像各像素点对应的显著性特征信息。可以理解为,目标图像各像素点分别对应于一显著性特征值。然后针对目标图像的每一个裁剪候选区域,根据裁剪候选区域所包括的像素点所对应的显著性特征信息(所包括的每个像素点对应的显著性特征值),确定裁剪候选区域对应的特征分数,此时的特征分数为表征显著性特征的分数。其中裁剪候选区域所包括的显著性特征值高的像素点的比例越高,则裁剪候选区域所对应的特征分数越高。For the case where the saliency task model is an image evaluation network model, when the feature scores corresponding to multiple cropping candidate regions of the target image are obtained based on the saliency task model, the image features corresponding to the target image can be input into the saliency task model, To obtain the salient feature information corresponding to each pixel of the target image. It can be understood that each pixel point of the target image corresponds to a saliency feature value. Then, for each cropping candidate region of the target image, according to the salient feature information corresponding to the pixels included in the cropping candidate region (the salient feature value corresponding to each pixel included), determine the corresponding feature of the cropping candidate region Score, the feature score at this time is the score that characterizes the significant feature. The higher the proportion of pixels with high salient feature values included in the cropping candidate region, the higher the feature score corresponding to the cropping candidate region.
本申请上述实施过程,在利用显著性任务模型确定裁剪候选区域对应的特征分数时,可以获取目标图像对应的显著性特征信息,基于裁剪候选区域所对应的像素点的显著性特征信息确定对应的特征分数,可以实现基于裁剪候选区域在目标图像上的位置确定对应的特征分数。In the above implementation process of the present application, when using the saliency task model to determine the feature score corresponding to the crop candidate area, the saliency feature information corresponding to the target image can be obtained, and the corresponding saliency feature information can be determined based on the saliency feature information of the pixel corresponding to the crop candidate area. The feature score can be used to determine the corresponding feature score based on the position of the cropping candidate area on the target image.
下面对根据多个训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个训练图像分别对应的图像显著性信息,进行模型训练的过程进行介绍。该过程对应于两种实施策略,下面首先对实施策略一进行介绍。The following is an introduction to the process of model training based on the annotation scores and prediction scores of multiple cropping candidate regions corresponding to multiple training images and the image saliency information corresponding to multiple training images respectively. This process corresponds to two implementation strategies, and the first implementation strategy will be introduced below.
所述根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型,包括:The model training is performed according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images and the image saliency information respectively corresponding to the multiple training images, and the image evaluation network is obtained. models, including:
根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,确定美学评价任务模型;Determining an aesthetic evaluation task model according to the labeling scores and prediction scores of a plurality of cropping candidate regions respectively corresponding to a plurality of training images;
根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,确定显著性任务模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;determining a saliency task model according to the saliency grayscale maps and saliency map prediction results respectively corresponding to the plurality of training images, and the image saliency information includes the saliency grayscale map and saliency map prediction results;
基于所述美学评价任务模型和所述显著性任务模型进行联合训练,获取所述图像评价网络模型。Joint training is performed based on the aesthetic evaluation task model and the saliency task model to obtain the image evaluation network model.
在根据训练图像进行模型训练时,可以根据多个训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型;根据多个训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型,此时图像显著性信息包括显著性灰度图和显著性图预测结果;然后基于美学评价任务模型和显著性任务模型进行联合训练,以获取图像评价网络模型。When performing model training according to the training images, the aesthetic evaluation task training can be carried out according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images, and the aesthetic evaluation task model can be determined; according to the multiple training images respectively The corresponding saliency grayscale map and saliency map prediction results are carried out for saliency task training to determine the saliency task model. At this time, the image saliency information includes the saliency grayscale map and saliency map prediction results; then based on the aesthetic evaluation task The model and the saliency task model are jointly trained to obtain the image evaluation network model.
通过基于显著性任务模型和美学评价任务模型的联合学习,可以同时优化两个任务模型。显著性任务模型促使模型关注并保留原始图像中的视觉显著区域,美学评价任务模型促使模型关注更好的美学构图。两个任务模型的联合学习可以保证在包含显著特征的所有裁剪候选区域中选取美学评价分数最高的候裁剪选区域,使得裁剪图像在保留图像原有显著特征的情况下具有高的美学分数。Through joint learning based on the saliency task model and the aesthetic evaluation task model, both task models can be optimized simultaneously. The saliency task model urges the model to focus on and preserve visually salient regions in the original image, and the aesthetic evaluation task model urges the model to focus on better aesthetic compositions. The joint learning of the two task models can ensure that the crop candidate region with the highest aesthetic evaluation score is selected among all crop candidate regions containing salient features, so that the cropped image has a high aesthetic score while retaining the original salient features of the image.
其中,在确定美学评价任务模型和显著性任务模型之后,在基于美学评价任务模型和显著性任务模型进行联合训练,获取图像评价网络模型时,包括如下步骤:Among them, after the aesthetic evaluation task model and the saliency task model are determined, the following steps are included when performing joint training based on the aesthetic evaluation task model and the saliency task model to obtain the image evaluation network model:
根据所述美学评价任务损失和所述显著性任务损失确定目标损失;determining a target loss based on the aesthetic evaluation task loss and the saliency task loss;
根据所述目标损失,更新所述美学评价任务模型和所述显著性任务模型的模型参数,以进行模型训练,获取所述图像评价网络模型。According to the target loss, update the model parameters of the aesthetic evaluation task model and the saliency task model to perform model training and obtain the image evaluation network model.
在确定美学评价任务模型和显著性任务模型之后,可以根据美学评价任务模型对应的美学评价任务损失和显著性任务模型对应的显著性任务损失,确定目标损失。这里的美学评价任务损失为美学评价任务模型对应的最终损失,显著性任务损失为显著性任务模型对应的最终损失。After determining the aesthetic evaluation task model and the saliency task model, the target loss can be determined according to the aesthetic evaluation task loss corresponding to the aesthetic evaluation task model and the saliency task loss corresponding to the saliency task model. Here, the aesthetic evaluation task loss is the final loss corresponding to the aesthetic evaluation task model, and the saliency task loss is the final loss corresponding to the saliency task model.
目标损失loss=SalLoss+αRankLoss+βScoreLoss,其中,α和β为权衡参数,通常情况下α和β的取值可以为1。Target loss loss=SalLoss+αRankLoss+βScoreLoss, where α and β are trade-off parameters, and usually the values of α and β can be 1.
基于目标损失更新图像评价网络模型的模型参数的过程,即为对显著性任务模型的第一骨干网络的参数、跨阶段卷积以及多层卷积层的参数进行更新,对美学评价任务模型的第一骨干网络的参数以及多层卷积层的参数进行更新的过程。其中,显著性任务模型以及美学评价任务模型可以共用第一骨干网络。The process of updating the model parameters of the image evaluation network model based on the target loss is to update the parameters of the first backbone network of the saliency task model, the parameters of the cross-stage convolution and the multi-layer convolution layer, and update the parameters of the aesthetic evaluation task model. The process of updating the parameters of the first backbone network and the parameters of the multi-layer convolutional layer. Among them, the saliency task model and the aesthetic evaluation task model can share the first backbone network.
在对图像评价网络模型的模型参数进行更新之后,可以重新确定美学评价任务损失和显著性任务损失,进而确定目标损失,然后继续基于目标损失进行模型参数更新,重复上述过程直至目标损失满足预设条件确定模型训练成功。在获取图像评价网络模型之后,可以利用图像评价网络模型对目标图像进行图像裁剪。After updating the model parameters of the image evaluation network model, the aesthetic evaluation task loss and the saliency task loss can be re-determined, and then the target loss can be determined, and then continue to update the model parameters based on the target loss, and repeat the above process until the target loss meets the preset The condition determines that the model was trained successfully. After acquiring the image evaluation network model, the image evaluation network model can be used to crop the target image.
下面通过一具体实施流程对策略一对应的获取图像评价网络模型的过程进行介绍,参见图4所示,包括如下步骤:The following describes the process of obtaining the image evaluation network model corresponding to the strategy through a specific implementation process, as shown in Figure 4, including the following steps:
步骤401、针对多个训练图像,确定每个训练图像分别对应的多个裁剪候选区域。 Step 401 , for multiple training images, determine multiple cropping candidate regions corresponding to each training image.
步骤402、针对每个训练图像,确定训练图像对应的多个裁剪候选区域的标注分值。 Step 402 , for each training image, determine labeling scores of multiple cropping candidate regions corresponding to the training image.
步骤403、对训练图像进行图像数据处理,基于第一骨干网络获取训练图像的不同尺度特征。在步骤403之后分别执行步骤404以及步骤405。Step 403: Perform image data processing on the training image, and acquire features of different scales of the training image based on the first backbone network. Step 404 and step 405 are respectively executed after step 403 .
步骤404、基于训练图像的不同尺度特征以及训练图像对应的裁剪候选区域的标注分值进行美学评价任务模型训练。Step 404 : Perform aesthetic evaluation task model training based on different scale features of the training image and labeling scores of cropping candidate regions corresponding to the training image.
步骤405、基于训练图像的不同尺度特征、训练图像的显著性灰度图进行显著性任务模型训练,显著性灰度图可以为预先获取。Step 405: Perform saliency task model training based on different scale features of the training image and the saliency grayscale image of the training image. The saliency grayscale image may be acquired in advance.
步骤406、基于美学评价任务模型和显著性任务模型进行联合训练确定图像评价网络模型。 Step 406, performing joint training based on the aesthetic evaluation task model and the saliency task model to determine an image evaluation network model.
上述基于策略一确定图像评价网络模型的过程,可以将显著性任务作为联合学习的子任务对深度网络进行训练,可以有效的结合图像的显著性信息,而又不增加网络的复杂程度和推理性能,以便于可以基于训练的图像评价网络模型很好地输出结合美学和显著性特征的裁剪图像。The above-mentioned process of determining the image evaluation network model based on strategy 1 can use the saliency task as a subtask of joint learning to train the deep network, which can effectively combine the saliency information of the image without increasing the complexity and reasoning performance of the network. , so that the trained image evaluation network model can well output cropped images combining aesthetic and salient features.
上述介绍了基于策略一获取图像评价网络模型的过程,下面对基于策略二获取图像评价网络模型的过程进行介绍。所述根据多个所述训练图像分别 对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型,包括:The above describes the process of obtaining the image evaluation network model based on the first strategy, and the following describes the process of obtaining the image evaluation network model based on the second strategy. The model training is performed according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images and the image saliency information respectively corresponding to the multiple training images, and the image evaluation network is obtained. models, including:
针对每个所述训练图像,根据所述训练图像和所述训练图像对应的显著性灰度图,生成显著图像特征,所述图像显著性信息包括所述显著图像特征,且所述显著图像特征包括所述裁剪候选区域的RoI特征和RoD特征;For each of the training images, generating salient image features according to the training image and the salient grayscale image corresponding to the training image, the image salient information includes the salient image features, and the salient image features Including the RoI feature and RoD feature of the clipping candidate region;
针对每个所述训练图像,根据所述训练图像对应的显著图像特征更新所述训练图像对应的多个裁剪候选区域的预测分值;For each of the training images, updating the prediction scores of the plurality of cropping candidate regions corresponding to the training images according to the salient image features corresponding to the training images;
根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和更新后的预测分值,进行模型训练,获取所述图像评价网络模型。Perform model training according to the labeling scores and updated prediction scores of the multiple cropping candidate regions respectively corresponding to the multiple training images, and acquire the image evaluation network model.
在进行模型训练获取图像评价网络模型时,可以针对多个训练图像中的每一个训练图像,首先基于训练完备的大型SOD网络模型获取训练图像的显著性灰度图,然后将显著性灰度图输入第二骨干网络获取不同尺度的显著特征,将训练图像输入第一骨干网络获取训练图像的不同尺度特征,将第二骨干网络输出的不同尺度的显著特征和第一骨干网络输出的不同尺度特征进行拼接生成显著图像特征,此时的图像显著性信息包括拼接生成的显著图像特征,且显著图像特征包括训练图像的RoI特征和RoD特征,进而可以包括裁剪候选区域的RoI特征和RoD特征。When performing model training to obtain an image evaluation network model, for each training image in multiple training images, first obtain the saliency grayscale image of the training image based on a large-scale SOD network model with complete training, and then convert the saliency grayscale image to Input the second backbone network to obtain the salient features of different scales, input the training image into the first backbone network to obtain the different scale features of the training image, and output the salient features of different scales output by the second backbone network and the different scale features output by the first backbone network Splicing is performed to generate salient image features. At this time, the image saliency information includes the salient image features generated by splicing, and the salient image features include the RoI features and RoD features of the training image, and can further include the RoI features and RoD features of the cropping candidate area.
针对每个训练图像,根据当前训练图像对应的显著图像特征更新当前训练图像对应的多个裁剪候选区域的预测分值时,由于在获取训练图像对应的多个裁剪候选区域的预测分值时,需要对裁剪候选区域的RoI特征和RoD特征进行提取,显著图像特征包括裁剪候选区域的RoI特征和RoD特征,因此可以基于训练图像的显著图像特征更新训练图像的裁剪候选区域的RoI特征和RoD特征,进而实现更新裁剪候选区域的预测分值。For each training image, when updating the prediction scores of multiple cropping candidate regions corresponding to the current training image according to the salient image features corresponding to the current training image, when obtaining the prediction scores of multiple cropping candidate regions corresponding to the training image, It is necessary to extract the RoI feature and RoD feature of the cropping candidate area, and the salient image features include the RoI feature and RoD feature of the cropping candidate area, so the RoI feature and RoD feature of the cropping candidate area of the training image can be updated based on the salient image features of the training image , and then update the prediction score of the clipping candidate region.
然后根据多个训练图像分别对应的多个裁剪候选区域的标注分值和更新后的预测分值,进行模型训练,获取图像评价网络模型,其中模型训练的过程这里不再赘述。Then, model training is performed according to the annotation scores and updated prediction scores of multiple cropping candidate regions corresponding to multiple training images to obtain an image evaluation network model, and the process of model training will not be described here.
下面通过一具体实施流程对策略二对应的获取图像评价网络模型的过程进行介绍,参见图5所示,包括如下步骤:The following describes the process of obtaining the image evaluation network model corresponding to strategy 2 through a specific implementation process, as shown in Figure 5, including the following steps:
步骤501、针对多个训练图像,确定每个训练图像分别对应的多个裁剪候选区域。Step 501. For multiple training images, determine multiple cropping candidate regions corresponding to each training image.
步骤502、针对每个训练图像,确定训练图像对应的多个裁剪候选区域 的标注分值。 Step 502, for each training image, determine the labeling scores of multiple cropping candidate regions corresponding to the training image.
步骤503、对训练图像进行图像数据处理。步骤503之后分别执行步骤504以及步骤505。 Step 503, perform image data processing on the training image. After step 503, step 504 and step 505 are executed respectively.
步骤504、基于第一骨干网络获取训练图像的不同尺度特征。 Step 504, acquiring different scale features of the training image based on the first backbone network.
步骤505、获取训练图像的显著性灰度图,基于第二骨干网络获取显著性灰度图对应的不同尺度的显著特征。Step 505: Obtain a saliency grayscale image of the training image, and obtain salient features of different scales corresponding to the saliency grayscale image based on the second backbone network.
在步骤504和步骤505之后执行步骤506。Step 506 is executed after step 504 and step 505 .
步骤506、将显著性灰度图对应的不同尺度的显著特征与训练图像的不同尺度特征进行拼接,获取显著图像特征。Step 506 : Concatenate salient features of different scales corresponding to the saliency grayscale image with features of different scales of the training image to obtain salient image features.
步骤507、将训练图像的显著性图像特征输入美学评价任务模型进行模型训练,获取图像评价网络模型。Step 507: Input the salient image features of the training images into the aesthetic evaluation task model for model training, and obtain the image evaluation network model.
上述基于策略二确定图像评价网络模型的过程,直接获取原始图像的显著性灰度图,再经过骨干网络提取高层次的显著特征,将显著特征和原始图像的特征拼接后送入美学评价任务模型中进行学习,可以实现对显著性特征更为直接的利用。The above-mentioned process of determining the image evaluation network model based on strategy 2 directly obtains the saliency grayscale image of the original image, and then extracts high-level salient features through the backbone network, splicing the salient features and the features of the original image and sending them into the aesthetic evaluation task model Learning in , can achieve a more direct use of salient features.
在本申请一可选实施例中,所述根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,包括:In an optional embodiment of the present application, the determining at least one target cropping candidate region according to the feature scores respectively corresponding to the plurality of cropping candidate regions includes:
在多个所述特征分数中,筛选出大于预设分数阈值的至少一个目标特征分数;Among the plurality of feature scores, at least one target feature score greater than a preset score threshold is screened out;
将所述目标特征分数对应的所述裁剪候选区域确定为所述目标裁剪候选区域。The cropping candidate region corresponding to the target feature score is determined as the target cropping candidate region.
在根据目标图像对应的多个裁剪候选区域的多个特征分数,确定至少一个目标裁剪候选区域时,可以针对每个扩展比例,将对应的特征分数按照从高到低的顺序进行排序,基于排序结果筛选出大于预设分数阈值的目标特征分数,以实现针对每个扩展比例确定出至少一个目标特征分数,然后将目标特征分数对应的裁剪候选区域确定为目标裁剪候选区域,实现基于裁剪候选区域对原图像进行裁剪,得到表征美学和/或显著性特征的裁剪图像。When at least one target cropping candidate region is determined according to multiple feature scores of multiple cropping candidate regions corresponding to the target image, the corresponding feature scores can be sorted from high to low for each expansion ratio, based on the sorting As a result, the target feature score greater than the preset score threshold is screened out, so as to determine at least one target feature score for each expansion ratio, and then the cropping candidate area corresponding to the target feature score is determined as the target cropping candidate area, and the cropping candidate area based on The original image is cropped to obtain a cropped image representing aesthetic and/or salient features.
以上为本申请实施例提供的图像裁剪方法,通过确定目标图像对应的多个裁剪候选区域,获取目标图像对应的包括与裁剪候选区域关联的第一图像特征和与非裁剪候选区域关联的第二图像特征的图像特征,将目标图像对应的图像特征输入图像评价网络模型,获取多个裁剪候选区域分别对应的表征美学特征和显著性特征中的至少一项的特征分数,根据多个裁剪候选区域分 别对应的特征分数,确定至少一个目标裁剪候选区域并对目标图像进行裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果,获取图像质量佳的裁剪图像。The above is the image cropping method provided by the embodiment of the present application. By determining a plurality of cropping candidate regions corresponding to the target image, the target image corresponding to the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region are obtained. The image feature of the image feature, the image feature corresponding to the target image is input into the image evaluation network model, and the feature score of at least one of the representative aesthetic features and the salient features corresponding to the plurality of cropping candidate regions is obtained, and according to the plurality of cropping candidate regions Respectively corresponding feature scores, determine at least one target cropping candidate area and crop the target image, based on aesthetic and/or salient features, efficiently mine the information in the image to ensure the image cropping effect and obtain good image quality cropping image.
进一步而言,通过将目标图像划分为网格锚形式,基于预设构图原则,在网格锚形式的目标图像中确定至少一个目标网格,然后针对目标网格进行扩展,可以在目标网格的基础上确定裁剪候选区域,实现精选裁剪候选区域。Further, by dividing the target image into the grid anchor form, based on the preset composition principle, at least one target grid is determined in the target image of the grid anchor form, and then expanded for the target grid, the target grid can be The cropping candidate area is determined on the basis of , and the selection of the cropping candidate area is realized.
通过获取多个训练图像的图像显著性信息和/或包括裁剪候选区域的标注分值以及预测分值的图像美学信息,根据训练图像进行模型训练,获取图像评价网络模型,可以实现基于图像的显著性特征和/或美学特征,获取用于针对图像进行美学特征和/或显著性特征评分的图像评价网络模型。Image-based saliency can be achieved by obtaining image saliency information of multiple training images and/or image aesthetic information including annotation scores and prediction scores of cropping candidate regions, performing model training based on training images, and obtaining image evaluation network models. sexual features and/or aesthetic features, and obtain an image evaluation network model for scoring aesthetic features and/or salient features for images.
通过基于不同的方式确定图像评价网络模型,丰富了图像评价网络模型的确定方式;通过基于特征分数确定目标裁剪候选区域进而进行图像裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果。By determining the image evaluation network model based on different methods, the determination method of the image evaluation network model is enriched; by determining the target crop candidate area based on the feature score and then performing image cropping, it is possible to efficiently mine images based on aesthetic and/or salient features. information to ensure image cropping effect.
本申请实施例提供的图像裁剪方法,执行主体可以为图像裁剪装置。本申请实施例中以图像裁剪装置执行图像裁剪方法为例,说明本申请实施例提供的图像裁剪装置。The image cropping method provided in the embodiment of the present application may be executed by an image cropping device. In the embodiment of the present application, the image cropping device provided in the embodiment of the present application is described by taking the image cropping method performed by the image cropping device as an example.
本申请实施例还提供一种图像裁剪装置,如图6所示,包括:The embodiment of the present application also provides an image cropping device, as shown in FIG. 6 , including:
确定模块601,用于确定目标图像对应的多个裁剪候选区域;A determining module 601, configured to determine a plurality of cropping candidate regions corresponding to the target image;
第一获取模块602,用于获取所述目标图像对应的图像特征,所述图像特征包括第一图像特征和第二图像特征,所述第一图像特征与所述多个裁剪候选区域对应的第一图像区域关联,所述第二图像特征与所述目标图像中除所述第一图像区域外的第二图像区域关联;The first acquiring module 602 is configured to acquire image features corresponding to the target image, where the image features include a first image feature and a second image feature, and the first image feature is the first image feature corresponding to the plurality of cropping candidate regions. an image area association, the second image feature is associated with a second image area in the target image other than the first image area;
第二获取模块603,用于将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,所述特征分数用于表征所述裁剪候选区域的美学特征和显著性特征中的至少一项;The second acquiring module 603 is configured to input the image features corresponding to the target image into the image evaluation network model, and acquire feature scores respectively corresponding to the plurality of cropping candidate regions, and the feature scores are used to characterize the cropping candidate regions at least one of aesthetic and distinctive features;
处理模块604,用于根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,并根据所述目标裁剪候选区域对所述目标图像进行裁剪。The processing module 604 is configured to determine at least one candidate target cropping area according to feature scores respectively corresponding to the multiple candidate cropping areas, and crop the target image according to the candidate target cropping areas.
可选地,所述确定模块包括:Optionally, the determination module includes:
划分子模块,用于将目标图像划分为网格锚形式;A division sub-module for dividing the target image into grid anchor forms;
第一确定子模块,用于基于预设构图原则,在网格锚形式的所述目标图 像中确定至少一个目标网格;The first determination submodule is used to determine at least one target grid in the target image in the grid anchor form based on preset composition principles;
第二确定子模块,用于针对所述至少一个目标网格,按照至少一个扩展比例分别进行扩展,确定所述多个裁剪候选区域。The second determination sub-module is configured to respectively expand the at least one target grid according to at least one expansion ratio, and determine the plurality of cropping candidate regions.
可选地,所述装置还包括:Optionally, the device also includes:
训练获取模块,用于根据多个训练图像对应的图像显著性信息和图像美学信息中的至少一种,进行模型训练以获取所述图像评价网络模型。The training acquisition module is used to perform model training according to at least one of image saliency information and image aesthetic information corresponding to the plurality of training images to acquire the image evaluation network model.
可选地,所述训练获取模块包括:Optionally, the training acquisition module includes:
获取子模块,用于获取每个所述训练图像对应的多个裁剪候选区域的所述图像美学信息,所述图像美学信息包括所述裁剪候选区域的标注分值以及预测分值;An acquisition sub-module, configured to acquire the image aesthetic information of multiple cropping candidate regions corresponding to each of the training images, where the image aesthetic information includes labeling scores and prediction scores of the cropping candidate regions;
训练获取子模块,用于根据多个所述训练图像分别对应的多个裁剪候选区域的图像美学信息、多个所述训练图像分别对应的所述图像显著性信息中的至少一项,进行模型训练,获取所述图像评价网络模型。The training acquisition sub-module is used to perform model training according to at least one of the image aesthetic information of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively. To train, obtain the image evaluation network model.
可选地,所述获取子模块包括:Optionally, the acquisition submodule includes:
第一处理单元,用于针对每个所述训练图像,获取标注人员对所述训练图像对应的多个裁剪候选区域进行至少两次筛选所得到的筛选结果,并根据所述筛选结果确定所述多个裁剪候选区域分别对应的标注分值;The first processing unit is configured to, for each of the training images, obtain screening results obtained by annotators screening multiple cropping candidate regions corresponding to the training images at least twice, and determine the Marking scores corresponding to multiple cropping candidate regions;
第二处理单元,用于针对每个所述训练图像,获取所述训练图像的特征图,在所述特征图上提取所述裁剪候选区域的RoI特征和RoD特征并组合为目标特征,根据所述目标特征获取所述多个裁剪候选区域分别对应的预测分值。The second processing unit is used to obtain the feature map of the training image for each of the training images, extract the RoI feature and RoD feature of the cropping candidate region on the feature map and combine them into target features, according to the The target features are used to obtain prediction scores corresponding to the plurality of clipping candidate regions.
可选地,所述训练获取子模块包括以下单元其中之一:Optionally, the training acquisition submodule includes one of the following units:
第一训练单元,用于根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型,所述美学评价任务模型为所述图像评价网络模型;The first training unit is configured to perform aesthetic evaluation task training and determine an aesthetic evaluation task model according to the labeling scores and prediction scores of a plurality of clipping candidate regions respectively corresponding to the plurality of training images, and the aesthetic evaluation task model is the image evaluation network model;
第二训练单元,用于根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型,所述显著性任务模型为所述图像评价网络模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;The second training unit is configured to perform saliency task training and determine a saliency task model according to the saliency grayscale images and saliency map prediction results respectively corresponding to the plurality of training images, and the saliency task model is the An image evaluation network model, wherein the image saliency information includes a saliency grayscale map and a saliency map prediction result;
第三训练单元,用于根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型。The third training unit is configured to perform model training according to the labeling scores and prediction scores of the plurality of clipping candidate regions respectively corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively, to obtain The image evaluation network model.
可选地,在所述显著性任务模型为所述图像评价网络模型的情况下,所述第二获取模块进一步用于:Optionally, when the saliency task model is the image evaluation network model, the second acquisition module is further used for:
将所述目标图像对应的图像特征输入所述显著性任务模型,获取所述目标图像各像素点对应的显著性特征信息;inputting image features corresponding to the target image into the saliency task model, and obtaining saliency feature information corresponding to each pixel of the target image;
针对所述目标图像的每一个裁剪候选区域,根据所述裁剪候选区域所包括的像素点所对应的显著性特征信息,确定所述裁剪候选区域对应的特征分数。For each candidate cropping region of the target image, according to the salient feature information corresponding to the pixels included in the candidate cropping region, the feature score corresponding to the candidate cropping region is determined.
可选地,所述第一训练单元包括:Optionally, the first training unit includes:
第一确定子单元,用于针对每个所述训练图像,根据所述训练图像对应的多个裁剪候选区域的标注分值和预测分值确定美学评价任务损失;The first determining subunit is configured to determine an aesthetic evaluation task loss for each of the training images according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the training images;
第一更新子单元,用于根据所述美学评价任务损失,更新所述美学评价任务模型的模型参数,以进行模型训练确定所述美学评价任务模型。The first update subunit is configured to update the model parameters of the aesthetic evaluation task model according to the aesthetic evaluation task loss, so as to perform model training to determine the aesthetic evaluation task model.
可选地,所述第二训练单元包括:Optionally, the second training unit includes:
第二确定子单元,用于针对每个所述训练图像,根据所述训练图像对应的显著性灰度图和显著性图预测结果确定显著性任务损失;The second determination subunit is configured to determine the saliency task loss for each of the training images according to the saliency grayscale map corresponding to the training image and the saliency map prediction result;
第二更新子单元,用于根据所述显著性任务损失,更新所述显著性任务模型的模型参数,以进行模型训练确定所述显著性任务模型。The second updating subunit is configured to update the model parameters of the saliency task model according to the saliency task loss, so as to perform model training to determine the saliency task model.
可选地,所述第三训练单元包括:Optionally, the third training unit includes:
第三确定子单元,用于根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,确定美学评价任务模型;The third determining subunit is used to determine the aesthetic evaluation task model according to the labeling scores and prediction scores of the multiple cropping candidate regions respectively corresponding to the multiple training images;
第四确定子单元,用于根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,确定显著性任务模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;The fourth determining subunit is used to determine the saliency task model according to the saliency grayscale images and saliency image prediction results respectively corresponding to the plurality of training images, and the image saliency information includes the saliency grayscale image and the saliency image Sex map prediction results;
第一获取子单元,用于基于所述美学评价任务模型和所述显著性任务模型进行联合训练,获取所述图像评价网络模型。The first obtaining subunit is configured to perform joint training based on the aesthetic evaluation task model and the saliency task model, and obtain the image evaluation network model.
可选地,所述第三训练单元包括:Optionally, the third training unit includes:
生成子单元,用于针对每个所述训练图像,根据所述训练图像和所述训练图像对应的显著性灰度图,生成显著图像特征,所述图像显著性信息包括所述显著图像特征,且所述显著图像特征包括所述裁剪候选区域的RoI特征和RoD特征;A generation subunit is configured to, for each of the training images, generate salient image features according to the training images and the salient grayscale images corresponding to the training images, the image salient information including the salient image features, And the salient image feature includes the RoI feature and the RoD feature of the clipping candidate region;
第三更新子单元,用于针对每个所述训练图像,根据所述训练图像对应的显著图像特征更新所述训练图像对应的多个裁剪候选区域的预测分值;A third updating subunit, configured to, for each of the training images, update the prediction scores of the plurality of cropping candidate regions corresponding to the training images according to the salient image features corresponding to the training images;
第二获取子单元,用于根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和更新后的预测分值,进行模型训练,获取所述图像评价网络模型。The second acquisition subunit is configured to perform model training according to the labeling scores and updated prediction scores of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and acquire the image evaluation network model.
本申请实施例提供的图像裁剪装置,通过确定目标图像对应的多个裁剪候选区域,获取目标图像对应的包括与裁剪候选区域关联的第一图像特征和与非裁剪候选区域关联的第二图像特征的图像特征,将目标图像对应的图像特征输入图像评价网络模型,获取多个裁剪候选区域分别对应的表征美学特征和显著性特征中的至少一项的特征分数,根据多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域并对目标图像进行裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果,获取图像质量佳的裁剪图像。The image cropping device provided in the embodiment of the present application, by determining a plurality of cropping candidate regions corresponding to the target image, obtains the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region corresponding to the target image The image features corresponding to the target image are input into the image evaluation network model, and the feature scores of at least one of the representative aesthetic features and salient features corresponding to the multiple cropping candidate regions are obtained. According to the multiple cropping candidate regions corresponding to To determine at least one target cropping candidate region and crop the target image, based on the aesthetic and/or salient features, the information in the image can be efficiently mined to ensure the image cropping effect and obtain a cropped image with good image quality.
进一步而言,通过将目标图像划分为网格锚形式,基于预设构图原则,在网格锚形式的目标图像中确定至少一个目标网格,然后针对目标网格进行扩展,可以在目标网格的基础上确定裁剪候选区域,实现精选裁剪候选区域。Further, by dividing the target image into the grid anchor form, based on the preset composition principle, at least one target grid is determined in the target image of the grid anchor form, and then expanded for the target grid, the target grid can be The cropping candidate area is determined on the basis of , and the selection of the cropping candidate area is realized.
通过获取多个训练图像的图像显著性信息和/或包括裁剪候选区域的标注分值以及预测分值的图像美学信息,根据训练图像进行模型训练,获取图像评价网络模型,可以实现基于图像的显著性特征和/或美学特征,获取用于针对图像进行美学特征和/或显著性特征评分的图像评价网络模型。Image-based saliency can be achieved by obtaining image saliency information of multiple training images and/or image aesthetic information including annotation scores and prediction scores of cropping candidate regions, performing model training based on training images, and obtaining image evaluation network models. sexual features and/or aesthetic features, and obtain an image evaluation network model for scoring aesthetic features and/or salient features for images.
通过基于不同的方式确定图像评价网络模型,丰富了图像评价网络模型的确定方式;通过基于特征分数确定目标裁剪候选区域进而进行图像裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果。By determining the image evaluation network model based on different methods, the determination method of the image evaluation network model is enriched; by determining the target crop candidate area based on the feature score and then performing image cropping, it is possible to efficiently mine images based on aesthetic and/or salient features. information to ensure image cropping effect.
本申请实施例中的图像裁剪装置可以是电子设备,也可以是电子设备中的部件,例如集成电路或芯片。该电子设备可以是终端,也可以为除终端之外的其他设备。示例性的,电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、移动上网装置(Mobile Internet Device,MID)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、机器人、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,还可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。The image cropping apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or other devices other than the terminal. Exemplarily, the electronic device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) ) equipment, robots, wearable devices, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., can also serve as server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine, or self-service machine, etc., which are not specifically limited in this embodiment of the present application.
本申请实施例中的图像裁剪装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为iOS操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The image cropping device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
本申请实施例提供的图像裁剪装置能够实现图1所示的图像裁剪方法实施例实现的各个过程,为避免重复,这里不再赘述。The image cropping device provided in the embodiment of the present application can realize various processes implemented in the embodiment of the image cropping method shown in FIG. 1 , and details are not repeated here to avoid repetition.
可选地,如图7所示,本申请实施例还提供一种电子设备700,包括处理器701,存储器702,存储在存储器702上并可在所述处理器701上运行的程序或指令,该程序或指令被处理器701执行时实现上述图像裁剪方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, as shown in FIG. 7 , the embodiment of the present application further provides an electronic device 700, including a processor 701, a memory 702, and programs or instructions stored in the memory 702 and operable on the processor 701, When the program or instruction is executed by the processor 701, the various processes of the above-mentioned image cropping method embodiment can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
图8为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 8 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
该电子设备800包括但不限于:射频单元801、网络模块802、音频输出单元803、输入单元804、传感器805、显示单元806、用户输入单元807、接口单元808、存储器809以及处理器810等部件。The electronic device 800 includes, but is not limited to: a radio frequency unit 801, a network module 802, an audio output unit 803, an input unit 804, a sensor 805, a display unit 806, a user input unit 807, an interface unit 808, a memory 809, and a processor 810, etc. .
本领域技术人员可以理解,电子设备800还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器810逻辑相连,从而通过电源管理系统实现管理充电、放电以及功耗管理等功能。图8中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art can understand that the electronic device 800 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 810 through the power management system, so as to manage charging, discharging and power consumption through the power management system. Management and other functions. The structure of the electronic device shown in FIG. 8 does not constitute a limitation to the electronic device. The electronic device may include more or fewer components than shown in the figure, or combine some components, or arrange different components, and details will not be repeated here. .
其中,处理器810用于:确定目标图像对应的多个裁剪候选区域;获取所述目标图像对应的图像特征,所述图像特征包括第一图像特征和第二图像特征,所述第一图像特征与所述多个裁剪候选区域对应的第一图像区域关联,所述第二图像特征与所述目标图像中除所述第一图像区域外的第二图像区域关联;将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,所述特征分数用于表征所述裁剪候选区域的美学特征和显著性特征中的至少一项;根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,并根据所述目标裁剪候选区域对所述目标图像进行裁剪。Wherein, the processor 810 is configured to: determine a plurality of cropping candidate regions corresponding to the target image; obtain image features corresponding to the target image, the image features include a first image feature and a second image feature, and the first image feature Associated with the first image area corresponding to the plurality of cropping candidate areas, the second image feature is associated with a second image area in the target image other than the first image area; The image features are input into the image evaluation network model, and the feature scores corresponding to the plurality of cropping candidate regions are obtained, and the feature scores are used to characterize at least one of the aesthetic features and salient features of the cropping candidate regions; according to the The feature scores corresponding to the plurality of cropping candidate regions are used to determine at least one target cropping candidate region, and the target image is cropped according to the target cropping candidate region.
可选地,在确定目标图像对应的多个裁剪候选区域时,处理器810还用于:将目标图像划分为网格锚形式;基于预设构图原则,在网格锚形式的所述目标图像中确定至少一个目标网格;针对所述至少一个目标网格,按照至 少一个扩展比例分别进行扩展,确定所述多个裁剪候选区域。Optionally, when determining a plurality of cropping candidate regions corresponding to the target image, the processor 810 is further configured to: divide the target image into grid anchor forms; based on preset composition principles, the target image in the grid anchor form Determine at least one target grid; for the at least one target grid, respectively expand according to at least one expansion ratio, and determine the plurality of clipping candidate regions.
可选地,处理器810还用于:根据多个训练图像对应的图像显著性信息和图像美学信息中的至少一种,进行模型训练以获取所述图像评价网络模型。Optionally, the processor 810 is further configured to: perform model training according to at least one of image saliency information and image aesthetic information corresponding to a plurality of training images to obtain the image evaluation network model.
可选地,在根据多个训练图像对应的图像显著性信息和图像美学信息中的至少一种,进行模型训练以获取所述图像评价网络模型时,处理器810还用于:获取每个所述训练图像对应的多个裁剪候选区域的所述图像美学信息,所述图像美学信息包括所述裁剪候选区域的标注分值以及预测分值;根据多个所述训练图像分别对应的多个裁剪候选区域的图像美学信息、多个所述训练图像分别对应的所述图像显著性信息中的至少一项,进行模型训练,获取所述图像评价网络模型。Optionally, when performing model training to obtain the image evaluation network model according to at least one of image saliency information and image aesthetic information corresponding to multiple training images, the processor 810 is further configured to: obtain each The image aesthetic information of the multiple cropping candidate regions corresponding to the training image, the image aesthetic information includes the labeling score and the prediction score of the cropping candidate regions; according to the multiple cropping corresponding to the multiple training images At least one of the image aesthetic information of the candidate area and the image saliency information respectively corresponding to the plurality of training images is used for model training to obtain the image evaluation network model.
可选地,在获取每个所述训练图像对应的多个裁剪候选区域的所述图像美学信息时,处理器810还用于:针对每个所述训练图像,获取标注人员对所述训练图像对应的多个裁剪候选区域进行至少两次筛选所得到的筛选结果,并根据所述筛选结果确定所述多个裁剪候选区域分别对应的标注分值;针对每个所述训练图像,获取所述训练图像的特征图,在所述特征图上提取所述裁剪候选区域的RoI特征和RoD特征并组合为目标特征,根据所述目标特征获取所述多个裁剪候选区域分别对应的预测分值。Optionally, when acquiring the image aesthetic information of the plurality of cropping candidate regions corresponding to each of the training images, the processor 810 is further configured to: for each of the training images, acquire The corresponding multiple cropping candidate regions are screened at least twice to obtain the screening results, and according to the screening results, the respective labeling scores corresponding to the multiple cropping candidate regions are determined; for each of the training images, the The feature map of the training image, extracting the RoI features and RoD features of the clipping candidate regions on the feature map and combining them into target features, and obtaining the respective prediction scores corresponding to the multiple cropping candidate regions according to the target features.
可选地,根据多个所述训练图像分别对应的多个裁剪候选区域的图像美学信息、多个所述训练图像分别对应的所述图像显著性信息中的至少一项,进行模型训练,获取所述图像评价网络模型时,处理器810还用于执行以下方案其中之一:根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型,所述美学评价任务模型为所述图像评价网络模型;根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型,所述显著性任务模型为所述图像评价网络模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型。Optionally, perform model training according to at least one of the image aesthetic information of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and the image saliency information corresponding to the plurality of training images respectively, to obtain When the image evaluates the network model, the processor 810 is also configured to perform one of the following solutions: perform aesthetic evaluation task training according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images respectively , determine the aesthetic evaluation task model, the aesthetic evaluation task model is the image evaluation network model; perform saliency task training according to the saliency grayscale images and saliency map prediction results respectively corresponding to a plurality of the training images, and determine A saliency task model, the saliency task model is the image evaluation network model, and the image saliency information includes a saliency grayscale map and a saliency map prediction result; according to a plurality of training images respectively corresponding to Clipping the labeling score and prediction score of the candidate area and the image saliency information respectively corresponding to the plurality of training images, performing model training, and obtaining the image evaluation network model.
可选地,在所述显著性任务模型为所述图像评价网络模型的情况下,在将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数时,处理器810还用于:将所述目标图像对应 的图像特征输入所述显著性任务模型,获取所述目标图像各像素点对应的显著性特征信息;针对所述目标图像的每一个裁剪候选区域,根据所述裁剪候选区域所包括的像素点所对应的显著性特征信息,确定所述裁剪候选区域对应的特征分数。Optionally, when the saliency task model is the image evaluation network model, the image features corresponding to the target image are input into the image evaluation network model, and the features corresponding to the plurality of clipping candidate regions are respectively obtained. When scoring, the processor 810 is also configured to: input the image features corresponding to the target image into the saliency task model, and obtain the saliency feature information corresponding to each pixel of the target image; for each of the target images The clipping candidate area, according to the salient feature information corresponding to the pixel points included in the clipping candidate area, determines the feature score corresponding to the clipping candidate area.
可选地,在根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型时,处理器810还用于:针对每个所述训练图像,根据所述训练图像对应的多个裁剪候选区域的标注分值和预测分值确定美学评价任务损失;根据所述美学评价任务损失,更新所述美学评价任务模型的模型参数,以进行模型训练确定所述美学评价任务模型。Optionally, when performing aesthetic evaluation task training and determining the aesthetic evaluation task model according to the labeling scores and prediction scores of the plurality of clipping candidate regions respectively corresponding to the plurality of training images, the processor 810 is further configured to: For each of the training images, an aesthetic evaluation task loss is determined according to the annotation scores and prediction scores of multiple cropping candidate regions corresponding to the training image; according to the aesthetic evaluation task loss, a model of the aesthetic evaluation task model is updated. Parameters for model training to determine the aesthetic evaluation task model.
可选地,在根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型时,处理器810还用于:针对每个所述训练图像,根据所述训练图像对应的显著性灰度图和显著性图预测结果确定显著性任务损失;根据所述显著性任务损失,更新所述显著性任务模型的模型参数,以进行模型训练确定所述显著性任务模型。Optionally, when performing saliency task training and determining a saliency task model according to the saliency grayscale images and saliency map prediction results respectively corresponding to the plurality of training images, the processor 810 is further configured to: for each For the training image, determine the saliency task loss according to the saliency grayscale image corresponding to the training image and the saliency map prediction result; update the model parameters of the saliency task model according to the saliency task loss to perform Model training determines the saliency task model.
可选地,在根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型时,处理器810还用于:根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,确定美学评价任务模型;根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,确定显著性任务模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;基于所述美学评价任务模型和所述显著性任务模型进行联合训练,获取所述图像评价网络模型。Optionally, model training is performed according to the labeling scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively, to obtain the When evaluating the image network model, the processor 810 is also configured to: determine the aesthetic evaluation task model according to the labeling scores and prediction scores of the multiple cropping candidate regions corresponding to the multiple training images; determining the saliency task model corresponding to the saliency grayscale image and the saliency map prediction result respectively, and the image saliency information includes the saliency grayscale image and the saliency map prediction result; based on the aesthetic evaluation task model and the The salient task model is jointly trained to obtain the image evaluation network model.
可选地,在根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型时,处理器810还用于:针对每个所述训练图像,根据所述训练图像和所述训练图像对应的显著性灰度图,生成显著图像特征,所述图像显著性信息包括所述显著图像特征,且所述显著图像特征包括所述裁剪候选区域的RoI特征和RoD特征;针对每个所述训练图像,根据所述训练图像对应的显著图像特征更新所述训练图像对应的多个裁剪候选区域的预测分值;根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和更新后的预测分值,进行模型训练,获取所述图像评价网络 模型。Optionally, model training is performed according to the labeling scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively, to obtain the When the image evaluates the network model, the processor 810 is further configured to: for each of the training images, generate salient image features according to the training image and the saliency grayscale image corresponding to the training image, and the image saliency information Include the salient image features, and the salient image features include the RoI features and RoD features of the clipping candidate region; for each of the training images, update the corresponding training images according to the salient image features corresponding to the training images Prediction scores of the plurality of cropping candidate regions; performing model training according to the labeling scores and updated prediction scores of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and obtaining the image evaluation network model.
这样,通过确定目标图像对应的多个裁剪候选区域,获取目标图像对应的包括与裁剪候选区域关联的第一图像特征和与非裁剪候选区域关联的第二图像特征的图像特征,将目标图像对应的图像特征输入图像评价网络模型,获取多个裁剪候选区域分别对应的表征美学特征和显著性特征中的至少一项的特征分数,根据多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域并对目标图像进行裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果,获取图像质量佳的裁剪图像。In this way, by determining a plurality of cropping candidate regions corresponding to the target image, the image features corresponding to the target image including the first image feature associated with the cropping candidate region and the second image feature associated with the non-cropping candidate region are obtained, and the corresponding target image Input the image features of the image evaluation network model into the image evaluation network model, obtain the feature scores of at least one of the aesthetic features and salient features corresponding to the multiple cropping candidate regions, and determine at least one target according to the feature scores corresponding to the multiple cropping candidate regions Cropping the candidate area and cropping the target image can efficiently mine the information in the image based on aesthetic and/or salient features, so as to ensure the image cropping effect and obtain a cropped image with good image quality.
进一步而言,通过将目标图像划分为网格锚形式,基于预设构图原则,在网格锚形式的目标图像中确定至少一个目标网格,然后针对目标网格进行扩展,可以在目标网格的基础上确定裁剪候选区域,实现精选裁剪候选区域。Further, by dividing the target image into the grid anchor form, based on the preset composition principle, at least one target grid is determined in the target image of the grid anchor form, and then expanded for the target grid, the target grid can be The cropping candidate area is determined on the basis of , and the selection of the cropping candidate area is realized.
通过获取多个训练图像的图像显著性信息和/或包括裁剪候选区域的标注分值以及预测分值的图像美学信息,根据训练图像进行模型训练,获取图像评价网络模型,可以实现基于图像的显著性特征和/或美学特征,获取用于针对图像进行美学特征和/或显著性特征评分的图像评价网络模型。Image-based saliency can be achieved by obtaining image saliency information of multiple training images and/or image aesthetic information including annotation scores and prediction scores of cropping candidate regions, performing model training based on training images, and obtaining image evaluation network models. sexual features and/or aesthetic features, and obtain an image evaluation network model for scoring aesthetic features and/or salient features for images.
通过基于不同的方式确定图像评价网络模型,丰富了图像评价网络模型的确定方式;通过基于特征分数确定目标裁剪候选区域进而进行图像裁剪,可以基于美学和/或显著性特征,高效地挖掘图像中的信息,以保证图像裁剪效果。By determining the image evaluation network model based on different methods, the determination method of the image evaluation network model is enriched; by determining the target crop candidate area based on the feature score and then performing image cropping, it is possible to efficiently mine images based on aesthetic and/or salient features. information to ensure image cropping effect.
应理解的是,在本申请实施例中,输入单元804可以包括图形处理器(Graphics Processing Unit,GPU)8041和麦克风8042,图形处理器8041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元806可包括显示面板8061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板8061。用户输入单元807包括触控面板8071以及其他输入设备8072中的至少一种。触控面板8071,也称为触摸屏。触控面板8071可包括触摸检测装置和触摸控制器两个部分。其他输入设备8072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器809可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器810可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户页面和应用程序等,调制解调处理器主要处理无线 通信。可以理解的是,上述调制解调处理器也可以不集成到处理器810中。It should be understood that, in the embodiment of the present application, the input unit 804 may include a graphics processor (Graphics Processing Unit, GPU) 8041 and a microphone 8042, and the graphics processor 8041 is used for the image captured by the image capture device in the video capture mode or the image capture mode. (such as a camera) to process the image data of still pictures or videos. The display unit 806 may include a display panel 8061, and the display panel 8061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 807 includes at least one of a touch panel 8071 and other input devices 8072 . The touch panel 8071 is also called a touch screen. The touch panel 8071 may include two parts, a touch detection device and a touch controller. Other input devices 8072 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, and joysticks, which will not be repeated here. Memory 809 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. The processor 810 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user pages, and application programs, and the modem processor mainly processes wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 810 .
存储器809可用于存储软件程序以及各种数据。存储器809可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器809可以包括易失性存储器或非易失性存储器,或者,存储器809可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器809包括但不限于这些和任意其它适合类型的存储器。The memory 809 can be used to store software programs as well as various data. The memory 809 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required by at least one function (such as a sound playing function, image playback function, etc.), etc. Furthermore, memory 809 may include volatile memory or nonvolatile memory, or, memory 809 may include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. Volatile memory can be random access memory (Random Access Memory, RAM), static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synch link DRAM , SLDRAM) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM). The memory 809 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.
处理器810可包括一个或多个处理单元;可选的,处理器810集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器810中。The processor 810 may include one or more processing units; optionally, the processor 810 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, etc., Modem processors mainly process wireless communication signals, such as baseband processors. It can be understood that the foregoing modem processor may not be integrated into the processor 810 .
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述图像裁剪方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application also provides a readable storage medium. The readable storage medium stores programs or instructions. When the program or instructions are executed by the processor, the various processes of the above-mentioned image cropping method embodiments can be achieved, and the same To avoid repetition, the technical effects will not be repeated here.
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。Wherein, the processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes a computer-readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk, and the like.
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述图像裁剪方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the above image cropping method embodiment Each process can achieve the same technical effect, so in order to avoid repetition, it will not be repeated here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如上述图像裁剪方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above image cropping method embodiment, and can achieve the same technical effect , to avoid repetition, it will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions are performed, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of computer software products, which are stored in a storage medium (such as ROM/RAM, magnetic disk, etc.) , optical disc), including several instructions to enable a terminal (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in various embodiments of the present application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Under the inspiration of this application, without departing from the purpose of this application and the scope of protection of the claims, many forms can also be made, all of which belong to the protection of this application.

Claims (17)

  1. 一种图像裁剪方法,其中,包括:An image cropping method, including:
    确定目标图像对应的多个裁剪候选区域;Determining multiple cropping candidate regions corresponding to the target image;
    获取所述目标图像对应的图像特征,所述图像特征包括第一图像特征和第二图像特征,所述第一图像特征与所述多个裁剪候选区域对应的第一图像区域关联,所述第二图像特征与所述目标图像中除所述第一图像区域外的第二图像区域关联;Acquiring image features corresponding to the target image, where the image features include first image features and second image features, the first image features are associated with the first image areas corresponding to the plurality of cropping candidate areas, and the second A second image feature is associated with a second image area in the target image other than the first image area;
    将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,所述特征分数用于表征所述裁剪候选区域的美学特征和显著性特征中的至少一项;Input the image features corresponding to the target image into the image evaluation network model, and obtain the feature scores corresponding to the plurality of cropping candidate regions respectively, and the feature scores are used to characterize the aesthetic features and salient features of the cropping candidate regions at least one;
    根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,并根据所述目标裁剪候选区域对所述目标图像进行裁剪。Determine at least one candidate target cropping area according to feature scores respectively corresponding to the multiple candidate cropping areas, and crop the target image according to the candidate target cropping areas.
  2. 根据权利要求1所述的方法,其中,所述确定目标图像对应的多个裁剪候选区域,包括:The method according to claim 1, wherein said determining a plurality of cropping candidate regions corresponding to the target image comprises:
    将目标图像划分为网格锚形式;Divide the target image into a grid anchor form;
    基于预设构图原则,在网格锚形式的所述目标图像中确定至少一个目标网格;determining at least one target grid in the target image in the form of grid anchors based on preset composition principles;
    针对所述至少一个目标网格,按照至少一个扩展比例分别进行扩展,确定所述多个裁剪候选区域。For the at least one target grid, respectively expand according to at least one expansion ratio, and determine the plurality of clipping candidate regions.
  3. 根据权利要求1所述的方法,其中,还包括:The method according to claim 1, further comprising:
    根据多个训练图像对应的图像显著性信息和图像美学信息中的至少一种,进行模型训练以获取所述图像评价网络模型。According to at least one of image saliency information and image aesthetic information corresponding to the plurality of training images, model training is performed to obtain the image evaluation network model.
  4. 根据权利要求3所述的方法,其中,所述根据多个训练图像对应的图像显著性信息和图像美学信息中的至少一种,进行模型训练以获取所述图像评价网络模型,包括:The method according to claim 3, wherein, performing model training according to at least one of image saliency information and image aesthetic information corresponding to a plurality of training images to obtain the image evaluation network model, comprising:
    获取每个所述训练图像对应的多个裁剪候选区域的所述图像美学信息,所述图像美学信息包括所述裁剪候选区域的标注分值以及预测分值;Acquiring the image aesthetic information of a plurality of cropping candidate regions corresponding to each of the training images, the image aesthetic information including labeling scores and prediction scores of the cropping candidate regions;
    根据多个所述训练图像分别对应的多个裁剪候选区域的图像美学信息、多个所述训练图像分别对应的所述图像显著性信息中的至少一项,进行模型 训练,获取所述图像评价网络模型。performing model training according to at least one of the image aesthetic information of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information respectively corresponding to the plurality of training images, and obtaining the image evaluation network model.
  5. 根据权利要求4所述的方法,其中,所述获取每个所述训练图像对应的多个裁剪候选区域的所述图像美学信息,包括:The method according to claim 4, wherein said obtaining the image aesthetic information of a plurality of cropping candidate regions corresponding to each of said training images comprises:
    针对每个所述训练图像,获取标注人员对所述训练图像对应的多个裁剪候选区域进行至少两次筛选所得到的筛选结果,并根据所述筛选结果确定所述多个裁剪候选区域分别对应的标注分值;For each of the training images, obtain the screening results obtained by the labeling staff for at least two screenings of the multiple cropping candidate regions corresponding to the training image, and determine according to the screening results that the multiple cropping candidate regions correspond to label score;
    针对每个所述训练图像,获取所述训练图像的特征图,在所述特征图上提取所述裁剪候选区域的RoI特征和RoD特征并组合为目标特征,根据所述目标特征获取所述多个裁剪候选区域分别对应的预测分值。For each of the training images, obtain the feature map of the training image, extract the RoI features and RoD features of the cropping candidate region on the feature map and combine them into target features, and obtain the multiple features according to the target features. The prediction scores corresponding to the cropping candidate regions respectively.
  6. 根据权利要求5所述的方法,其中,所述根据多个所述训练图像分别对应的多个裁剪候选区域的图像美学信息、多个所述训练图像分别对应的所述图像显著性信息中的至少一项,进行模型训练,获取所述图像评价网络模型,包括以下方案其中之一:The method according to claim 5, wherein, according to the image aesthetic information of the plurality of cropping candidate regions corresponding to the plurality of training images, the image saliency information corresponding to the plurality of training images respectively At least one item is to perform model training to obtain the image evaluation network model, including one of the following schemes:
    根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型,所述美学评价任务模型为所述图像评价网络模型;Perform aesthetic evaluation task training according to the labeling scores and prediction scores of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and determine the aesthetic evaluation task model, where the aesthetic evaluation task model is the image evaluation network model;
    根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型,所述显著性任务模型为所述图像评价网络模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;Perform saliency task training according to the saliency grayscale images and saliency map prediction results respectively corresponding to a plurality of training images, and determine a saliency task model, where the saliency task model is the image evaluation network model, and the saliency task model is determined. Image saliency information includes saliency grayscale map and saliency map prediction results;
    根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型。Performing model training to acquire the image evaluation network model according to the annotation scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively.
  7. 根据权利要求6所述的方法,其中,在所述显著性任务模型为所述图像评价网络模型的情况下,所述将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,包括:The method according to claim 6, wherein, when the saliency task model is the image evaluation network model, the image features corresponding to the target image are input into the image evaluation network model, and the multiple The feature scores corresponding to the cropping candidate regions, including:
    将所述目标图像对应的图像特征输入所述显著性任务模型,获取所述目标图像各像素点对应的显著性特征信息;inputting image features corresponding to the target image into the saliency task model, and obtaining saliency feature information corresponding to each pixel of the target image;
    针对所述目标图像的每一个裁剪候选区域,根据所述裁剪候选区域所包 括的像素点所对应的显著性特征信息,确定所述裁剪候选区域对应的特征分数。For each cropping candidate area of the target image, according to the salient feature information corresponding to the pixels included in the cropping candidate area, determine the feature score corresponding to the cropping candidate area.
  8. 根据权利要求6所述的方法,其中,所述根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,进行美学评价任务训练,确定美学评价任务模型,包括:The method according to claim 6, wherein the aesthetic evaluation task training is performed according to the labeling scores and prediction scores of the plurality of cropping candidate regions respectively corresponding to the plurality of training images, and the aesthetic evaluation task model is determined, including :
    针对每个所述训练图像,根据所述训练图像对应的多个裁剪候选区域的标注分值和预测分值确定美学评价任务损失;For each of the training images, the aesthetic evaluation task loss is determined according to the annotation scores and prediction scores of multiple cropping candidate regions corresponding to the training images;
    根据所述美学评价任务损失,更新所述美学评价任务模型的模型参数,以进行模型训练确定所述美学评价任务模型。According to the aesthetic evaluation task loss, update the model parameters of the aesthetic evaluation task model, so as to perform model training to determine the aesthetic evaluation task model.
  9. 根据权利要求6所述的方法,其中,所述根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,进行显著性任务训练,确定显著性任务模型,包括:The method according to claim 6, wherein, performing saliency task training and determining a saliency task model according to the saliency grayscale images and saliency map prediction results respectively corresponding to a plurality of said training images, comprising:
    针对每个所述训练图像,根据所述训练图像对应的显著性灰度图和显著性图预测结果确定显著性任务损失;For each of the training images, determining the saliency task loss according to the saliency grayscale map corresponding to the training image and the saliency map prediction result;
    根据所述显著性任务损失,更新所述显著性任务模型的模型参数,以进行模型训练确定所述显著性任务模型。According to the saliency task loss, the model parameters of the saliency task model are updated to perform model training to determine the saliency task model.
  10. 根据权利要求6所述的方法,其中,所述根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型,包括:The method according to claim 6, wherein, according to the labeling scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively , perform model training, and obtain the image evaluation network model, including:
    根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和预测分值,确定美学评价任务模型;Determining an aesthetic evaluation task model according to the labeling scores and prediction scores of a plurality of cropping candidate regions respectively corresponding to a plurality of training images;
    根据多个所述训练图像分别对应的显著性灰度图和显著性图预测结果,确定显著性任务模型,所述图像显著性信息包括显著性灰度图和显著性图预测结果;determining a saliency task model according to the saliency grayscale maps and saliency map prediction results respectively corresponding to the plurality of training images, and the image saliency information includes the saliency grayscale map and saliency map prediction results;
    基于所述美学评价任务模型和所述显著性任务模型进行联合训练,获取所述图像评价网络模型。Joint training is performed based on the aesthetic evaluation task model and the saliency task model to obtain the image evaluation network model.
  11. 根据权利要求6所述的方法,其中,所述根据多个所述训练图像分 别对应的多个裁剪候选区域的标注分值和预测分值以及多个所述训练图像分别对应的图像显著性信息,进行模型训练,获取所述图像评价网络模型,包括:The method according to claim 6, wherein, according to the labeling scores and prediction scores of the plurality of cropping candidate regions corresponding to the plurality of training images and the image saliency information corresponding to the plurality of training images respectively , perform model training, and obtain the image evaluation network model, including:
    针对每个所述训练图像,根据所述训练图像和所述训练图像对应的显著性灰度图,生成显著图像特征,所述图像显著性信息包括所述显著图像特征,且所述显著图像特征包括所述裁剪候选区域的RoI特征和RoD特征;For each of the training images, generating salient image features according to the training image and the salient grayscale image corresponding to the training image, the image salient information includes the salient image features, and the salient image features Including the RoI feature and RoD feature of the clipping candidate region;
    针对每个所述训练图像,根据所述训练图像对应的显著图像特征更新所述训练图像对应的多个裁剪候选区域的预测分值;For each of the training images, updating the prediction scores of the plurality of cropping candidate regions corresponding to the training images according to the salient image features corresponding to the training images;
    根据多个所述训练图像分别对应的多个裁剪候选区域的标注分值和更新后的预测分值,进行模型训练,获取所述图像评价网络模型。Perform model training according to the labeling scores and updated prediction scores of the multiple cropping candidate regions respectively corresponding to the multiple training images, and acquire the image evaluation network model.
  12. 一种图像裁剪装置,其中,包括:An image cropping device, including:
    确定模块,用于确定目标图像对应的多个裁剪候选区域;A determining module, configured to determine a plurality of cropping candidate regions corresponding to the target image;
    第一获取模块,用于获取所述目标图像对应的图像特征,所述图像特征包括第一图像特征和第二图像特征,所述第一图像特征与所述多个裁剪候选区域对应的第一图像区域关联,所述第二图像特征与所述目标图像中除所述第一图像区域外的第二图像区域关联;The first acquisition module is configured to acquire image features corresponding to the target image, the image features include first image features and second image features, and the first image features correspond to the first cropping candidate regions. Image area association, the second image feature is associated with a second image area in the target image other than the first image area;
    第二获取模块,用于将所述目标图像对应的图像特征输入图像评价网络模型,获取所述多个裁剪候选区域分别对应的特征分数,所述特征分数用于表征所述裁剪候选区域的美学特征和显著性特征中的至少一项;The second acquisition module is configured to input the image features corresponding to the target image into the image evaluation network model, and acquire feature scores respectively corresponding to the plurality of cropping candidate regions, and the feature scores are used to characterize the aesthetics of the cropping candidate regions at least one of characteristics and distinctive features;
    处理模块,用于根据所述多个裁剪候选区域分别对应的特征分数,确定至少一个目标裁剪候选区域,并根据所述目标裁剪候选区域对所述目标图像进行裁剪。A processing module, configured to determine at least one candidate target cropping area according to the feature scores corresponding to the plurality of candidate cropping areas, and crop the target image according to the candidate target cropping area.
  13. 一种电子设备,其中,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至11任一项所述的图像裁剪方法的步骤。An electronic device, which includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and when the programs or instructions are executed by the processor, any of claims 1 to 11 can be realized. A step of the described image cropping method.
  14. 一种可读存储介质,其中,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1-11任一项所述的图像裁剪方法。A readable storage medium, wherein a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the image cropping method according to any one of claims 1-11 is realized.
  15. 一种芯片,其中,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-11任一项所述的图像裁剪方法。A chip, wherein the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run a program or an instruction to implement the method described in any one of claims 1-11. Image cropping method.
  16. 一种计算机程序产品,其中,所述程序产品被存储在非易失的存储介质中,所述程序产品被至少一个处理器执行以实现如权利要求1-11任一项所述的图像裁剪方法。A computer program product, wherein the program product is stored in a non-volatile storage medium, and the program product is executed by at least one processor to implement the image cropping method according to any one of claims 1-11 .
  17. 一种电子设备,其中,所述电子设备被配置成用于执行如权利要求1-11任一项所述的图像裁剪方法。An electronic device, wherein the electronic device is configured to execute the image cropping method according to any one of claims 1-11.
PCT/CN2022/134366 2021-11-29 2022-11-25 Image cropping method and apparatus, and electronic device WO2023093851A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111435959.4A CN114119373A (en) 2021-11-29 2021-11-29 Image cropping method and device and electronic equipment
CN202111435959.4 2021-11-29

Publications (1)

Publication Number Publication Date
WO2023093851A1 true WO2023093851A1 (en) 2023-06-01

Family

ID=80367853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134366 WO2023093851A1 (en) 2021-11-29 2022-11-25 Image cropping method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN114119373A (en)
WO (1) WO2023093851A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152409A (en) * 2023-08-07 2023-12-01 中移互联网有限公司 Image clipping method, device and equipment based on multi-mode perception modeling

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119373A (en) * 2021-11-29 2022-03-01 广东维沃软件技术有限公司 Image cropping method and device and electronic equipment
CN116309627B (en) * 2022-12-15 2023-09-15 北京航空航天大学 Image cropping method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650737A (en) * 2016-11-21 2017-05-10 中国科学院自动化研究所 Image automatic cutting method
US20170178291A1 (en) * 2014-10-09 2017-06-22 Adobe Systems Incorporated Image Cropping Suggestion Using Multiple Saliency Maps
CN110909724A (en) * 2019-10-08 2020-03-24 华北电力大学 Multi-target image thumbnail generation method
CN113159028A (en) * 2020-06-12 2021-07-23 杭州喔影网络科技有限公司 Saliency-aware image cropping method and apparatus, computing device, and storage medium
CN114119373A (en) * 2021-11-29 2022-03-01 广东维沃软件技术有限公司 Image cropping method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178291A1 (en) * 2014-10-09 2017-06-22 Adobe Systems Incorporated Image Cropping Suggestion Using Multiple Saliency Maps
CN106650737A (en) * 2016-11-21 2017-05-10 中国科学院自动化研究所 Image automatic cutting method
CN110909724A (en) * 2019-10-08 2020-03-24 华北电力大学 Multi-target image thumbnail generation method
CN113159028A (en) * 2020-06-12 2021-07-23 杭州喔影网络科技有限公司 Saliency-aware image cropping method and apparatus, computing device, and storage medium
CN114119373A (en) * 2021-11-29 2022-03-01 广东维沃软件技术有限公司 Image cropping method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZENG HUI; LI LIDA; CAO ZISHENG; ZHANG LEI: "Reliable and Efficient Image Cropping: A Grid Anchor Based Approach", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 15 June 2019 (2019-06-15), pages 5942 - 5950, XP033686820, DOI: 10.1109/CVPR.2019.00610 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152409A (en) * 2023-08-07 2023-12-01 中移互联网有限公司 Image clipping method, device and equipment based on multi-mode perception modeling

Also Published As

Publication number Publication date
CN114119373A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
WO2023093851A1 (en) Image cropping method and apparatus, and electronic device
CN109688463B (en) Clip video generation method and device, terminal equipment and storage medium
US9665962B2 (en) Image distractor detection and processng
CN111612873B (en) GIF picture generation method and device and electronic equipment
WO2015192713A1 (en) Image processing method and device, mobile terminal, and computer storage medium
JPH10240220A (en) Information processing equipment having annotation display function
CN111556336B (en) Multimedia file processing method, device, terminal equipment and medium
CN111145308A (en) Paster obtaining method and device
CN112532882B (en) Image display method and device
WO2024027583A1 (en) Image processing method and apparatus, and electronic device and readable storage medium
Zhang et al. Hybrid image retargeting using optimized seam carving and scaling
JPH11331693A (en) Method for processing picture and its device
CN113194256B (en) Shooting method, shooting device, electronic equipment and storage medium
WO2024012289A1 (en) Video generation method and apparatus, electronic device and medium
Liu et al. Video decolorization based on the CNN and LSTM neural network
CN114529558A (en) Image processing method and device, electronic equipment and readable storage medium
WO2023283894A1 (en) Image processing method and device
CN111953907B (en) Composition method and device
CN115454365A (en) Picture processing method and device, electronic equipment and medium
CN114785957A (en) Shooting method and device thereof
CN106063252A (en) Techniques for processing subtitles
CN113393558A (en) Cartoon generation method and device, electronic equipment and storage medium
CN114299271A (en) Three-dimensional modeling method, three-dimensional modeling apparatus, electronic device, and readable storage medium
CN114245193A (en) Display control method and device and electronic equipment
CN114363521B (en) Image processing method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897933

Country of ref document: EP

Kind code of ref document: A1