WO2019109268A1 - 基于强化学习的图片自动裁剪的方法及装置 - Google Patents

基于强化学习的图片自动裁剪的方法及装置 Download PDF

Info

Publication number
WO2019109268A1
WO2019109268A1 PCT/CN2017/114795 CN2017114795W WO2019109268A1 WO 2019109268 A1 WO2019109268 A1 WO 2019109268A1 CN 2017114795 W CN2017114795 W CN 2017114795W WO 2019109268 A1 WO2019109268 A1 WO 2019109268A1
Authority
WO
WIPO (PCT)
Prior art keywords
cropping
picture
training
reinforcement learning
current
Prior art date
Application number
PCT/CN2017/114795
Other languages
English (en)
French (fr)
Inventor
黄凯奇
张俊格
李德榜
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to PCT/CN2017/114795 priority Critical patent/WO2019109268A1/zh
Publication of WO2019109268A1 publication Critical patent/WO2019109268A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to the field of image processing, and in particular, to a method and apparatus for automatically cropping pictures based on reinforcement learning.
  • the conventional method is a sliding window based method, which mainly extracts candidate regions through a sliding window, and then extracts features on each candidate region and scores them, and the region with the highest score is the final result.
  • this method generates a large number of candidate windows, and the shape and size of each candidate window are relatively fixed.
  • the fixed shape and size of the cropping window cannot cover all the cases, so that not only the cutting effect is not good, but also a lot of calculations are required. Resources and longer time.
  • the present invention provides a method and apparatus for automatically cropping pictures based on reinforcement learning.
  • a method for automatically cropping pictures based on reinforcement learning in the present invention includes:
  • the reinforcement learning model is a model constructed based on a convolutional neural network.
  • the cutting strategy is obtained by:
  • the LSTM unit in the enhanced learning model records the historical observation information of the training picture, and combines the historical observation information of the training picture with the current observation information of the training picture as a current state representation of the training picture;
  • the reinforcement learning model is trained according to the reward function to obtain the cropping strategy.
  • the reward function is calculated according to the following formula:
  • the reward is the reward function
  • the aspect ratio is the aspect ratio of the current cropping window
  • the sign is a symbol function
  • the score is the quality score of the current cropping window
  • the previos_score is the quality score of the previous cropping window of the current cropping window
  • t is The number of steps used by the reinforcement learning model in the cropping process.
  • the cutting action is set according to the cutting task and the current state, including a position changing action, a shape changing action, a scale changing action, and a stopping action;
  • the position change action is used to adjust the position of the crop window
  • the shape change action is for adjusting a shape of the crop window
  • the scale change action is used to adjust a size of the crop window
  • the stopping action is for stopping the reinforcement learning model from cutting, and outputting the current window as a cropping result.
  • the ranking optimization method of the ranking model is:
  • the preset ranking model is trained using the pair of picture training sets.
  • the apparatus for automatically cutting a picture based on reinforcement learning in the present invention comprises:
  • the extraction module is configured to perform feature extraction on the current cropping window by using a reinforcement learning model to obtain a local feature, and splicing it with the global feature of the image to be cropped to obtain a new feature vector, and using the new feature vector as current observation information ;
  • the combining module is configured to record the historical observation information by using the LSTM unit in the reinforcement learning model, and combine the historical observation information with the current observation information as a current state representation;
  • a cropping module configured to perform a cropping action on the to-be-trimmed image serially according to the cropping strategy and the current state representation, to obtain a cropping result
  • the reinforcement learning model is a model constructed based on a convolutional neural network.
  • the apparatus further includes a cropping strategy acquisition module, where the cropping strategy acquisition module includes:
  • the splicing unit is configured to perform feature extraction on the training picture by using the reinforcement learning model to obtain a local feature of the training picture, and splicing it with the global feature of the training picture to obtain a first feature vector, and using the first feature vector as The current observation information of the training picture;
  • the combining unit is configured to record the historical observation information of the training picture by using the LSTM unit in the reinforcement learning model, and combine the historical observation information of the training picture with the current observation information of the training picture as the current training picture State representation
  • a cropping unit configured to perform a cropping of the training image by using a preset cutting action according to a current state representation of the training picture, to obtain a cropped training picture
  • a setting unit configured to acquire a quality score of the training picture before and after the cropping by using the sorting model, and set a bonus function according to the quality score of the training picture before and after the cutting;
  • a first training unit configured to perform the reinforcement learning according to the reward function
  • the model is trained to obtain the cropping strategy.
  • the apparatus further includes a sorting model training module, the sorting model training module comprising:
  • a random cropping unit configured to randomly crop a high-quality picture set to obtain a low-quality picture corresponding to the high-quality picture, and use the high-quality picture as a pair of picture training sets;
  • a second training unit is configured to train the ranking model with the pair of picture training sets.
  • the storage device of the present invention wherein a plurality of programs are stored, is adapted to be loaded and executed by a processor to implement the method of automatic learning based on enhanced learning according to the above technical solution.
  • the processing apparatus of the present invention includes
  • a processor adapted to execute various programs
  • a storage device adapted to store a plurality of programs
  • the program is adapted to be loaded by a processor and executed to implement a method of automatic learning based on enhanced learning based on the above described technical solution.
  • the historical observation information and the current observation information are combined as a current state representation, and the cropping action is serially performed on the cropped image according to the current state representation and the cropping strategy. Only a few candidate windows are needed to get the final result, which greatly reduces the amount of calculation and the time required for calculation.
  • the quality score of the pictures before and after the cropping is obtained by the sorting model, and the reward function is set as the benchmark, and the reinforcement learning model is trained according to the reward function, after a lot of training,
  • the reinforcement learning model can be more precise cutting strategy, which greatly improves the accuracy of image cropping.
  • the size and position of the cropping window can be arbitrarily adjusted by setting the cropping action, which not only enables the cropping window to cover the corresponding area more accurately, It also makes the cutting process more flexible.
  • FIG. 1 is a schematic diagram of main steps of a method for automatically cropping pictures based on reinforcement learning according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a clipping action space of a reinforcement learning model according to an embodiment of the present invention.
  • a sorting model and a reinforcement learning model are designed, and the quality score of the image before and after the cropping is obtained as a benchmark to set a reward function, and the reinforcement learning model is trained according to the reward function, thereby obtaining a tailoring strategy for improving the aesthetic quality of the image, and finally based on The cropping strategy and the current state represent the corresponding cropping actions of the serialized execution, ultimately resulting in high quality cropping results.
  • Figure 1 exemplarily shows the main steps of a method for automatic cropping based on enhanced learning.
  • the method for automatically cropping pictures based on reinforcement learning in this embodiment may include step S1, step S2, and step S3.
  • step S1 the feature extraction is performed on the current cropping window by using the reinforcement learning model to obtain local features, and the global features of the image to be cropped are spliced to obtain a new feature vector, and the new feature vector is used as the current observation information.
  • the reinforcement learning model in the present embodiment is a model constructed based on a convolutional neural network, and the reinforcement learning model is used to extract features of the current cropping window to obtain local features, and the extracted local features are spliced with the global features of the entire image.
  • a new feature vector is obtained, and the new feature vector is used as current observation information, wherein the current observation information is an observation of the state of the image to be cropped at the current time.
  • step S2 the historical observation information is recorded by the LSTM unit in the reinforcement learning model, and the historical observation information is combined with the current observation information as the current state representation.
  • the reinforcement learning model records observation information from the start to the current time through its LSTM unit, and records it as historical observation information.
  • the LSTM unit integrates historical observation information and current observation information into a current state representation.
  • the observation information recorded by the LSTM unit includes ⁇ o 1 , o 2 , . . . , o t-1 , o t ⁇ , where o i represents the observation at the ith time.
  • the LSTM unit integrates the recorded observation information to obtain a current state representation s t .
  • the reinforcement learning model performs the corresponding cropping action according to the state representation of each moment, and obtains the cropped image.
  • step S3 according to the cropping strategy and the current state representation, the cropping action is performed serially on the cropped image, and the cropping result is obtained.
  • the method for acquiring the cropping strategy in this embodiment may include step S31, step S32, step S33, step S34, and step S35.
  • Step S31 using the reinforcement learning model to extract the feature of the training picture to obtain the local feature of the training picture, and splicing it with the global feature of the training picture to obtain the first feature vector, and using the first feature vector as the current observation information of the training picture .
  • the current observation information of the training picture is an observation of the state of the training picture at the current time.
  • Step S32 Recording the historical observation information of the training picture by using the LSTM unit in the reinforcement learning model, and combining the historical observation information of the training picture with the current observation information of the training picture as the current state representation of the training picture.
  • Step S33 according to the current state representation of the training picture, the training picture is cut by using a preset cutting action, and the cropped training picture is obtained.
  • the preset cropping action is set according to the cropping task.
  • Step S34 using the ranking model to obtain the quality score of the training picture before and after the cropping, and setting the reward function according to the quality score of the training picture before and after the cropping.
  • the sorting model needs to be optimized, and the method for optimizing the sorting model includes step A1 and step A2.
  • step A1 the high quality picture set is randomly cropped, and a low quality picture corresponding to the high quality picture is obtained, and the high quality picture is used as a paired picture training set.
  • the picture can be randomly clipped on a large-scale high-quality picture data set, and a low-quality picture corresponding to the high-quality picture is obtained, and then the paired picture data sets are used as pictures. Training set.
  • step A2 the preset sorting model is trained by using a pair of picture training sets.
  • the preset sorting model in this embodiment is a model constructed based on a convolutional neural network, and the ranking model can be used to score the aesthetic quality of the image.
  • the reward function can be calculated according to the following formula (1):
  • reward is the reward function
  • aspect ratio is the aspect ratio of the current cropping window
  • sign is the symbol function
  • score is the quality score of the current cropping window
  • previos_score is the quality score of the previous cropping window of the current cropping window
  • t is the reinforcement learning model. The number of steps used during the cropping process.
  • the model in order to limit the shape of the cropped picture, when the aspect ratio of the cropping window exceeds [0.5, 2], the model receives a reverse reward; during the training process, the aesthetic quality of the cropping window is improved.
  • the reinforcement learning model will receive a +1 reward, and on the contrary, a reward of -1 will be obtained. This ensures that the reinforcement learning model can learn the cutting strategy to improve the aesthetic quality of the picture; -0.001*t as part of the reward function, making The model learns how to quickly crop the image.
  • Step S35 training the reinforcement learning model according to the reward function, and obtaining a cropping strategy.
  • Figure 2 exemplarily shows the cropping action space of the reinforcement learning model.
  • the cropping action is set according to the cropping task and the current state.
  • the cropping action includes a transforming action, a shape changing action, a scale changing action, and a stopping action;
  • the position change action is used to adjust the position of the crop window
  • the shape change action is used to adjust the shape of the crop window
  • the scale change action is used to adjust the size of the crop window
  • the stop action is used to stop the reinforcement learning model from cropping and output the current window as the crop result.
  • the cropping action includes a total of 14 corresponding actions, and each time the clipping window is adjusted, the original image size is 0.05 as the adjustment distance.
  • the traditional automatic cropping algorithm needs to use a sliding window method to densely select candidate windows on the image to be cropped, then perform feature extraction and scoring on the corresponding window, and select the result of the cropping according to the score.
  • the traditional method will obtain a large number of candidate regions, and it takes a huge amount of calculation and time to perform feature extraction and scoring for each candidate region.
  • the reinforcement learning model can adjust the cropped window to any size and position, so that the cropping window can more accurately cover the corresponding region, solving the traditional method. Fixed the problem that the window size and shape are fixed and the best cropping window cannot be found.
  • the reinforcement learning model reduces a large number of candidate windows in the process of cropping, and solves the problem of a large amount of computational resources and a large amount of time in the traditional method of clipping.
  • the tailoring strategy learned by the reinforcement learning model can perform precise cropping operations with few candidate windows, while taking time out Compared with the traditional method, it is greatly reduced.
  • the embodiment of the present invention also provides an apparatus for automatic cropping based on enhanced learning.
  • the apparatus for automatically cropping pictures based on reinforcement learning will be specifically described below.
  • the apparatus for automatically cutting the picture based on the reinforcement learning in the implementation may further include an extraction module, a combination module, and a cropping module.
  • the extraction module may be configured to perform feature extraction on the current cropping window by using the reinforcement learning model to obtain local features, and splicing them with the global features of the image to be cropped to obtain a new feature vector, and using the new feature vector as the current observation information. .
  • the combining module can be configured to record historical observation information by using the LSTM unit in the reinforcement learning model, and combine the historical observation information with the current observation information as the current state representation.
  • the cropping module can be configured to perform a cropping action on the cropped image serially according to the cropping strategy and the current state representation, and obtain the cropping result.
  • the reinforcement learning model is a model based on convolutional neural network.
  • the apparatus for automatically cropping the image based on the reinforcement learning in the embodiment may further include a cropping strategy acquisition module, where the cropping strategy acquisition module includes a splicing unit, a combining unit, a cropping unit, a setting unit, and a first training unit.
  • the cropping strategy acquisition module includes a splicing unit, a combining unit, a cropping unit, a setting unit, and a first training unit.
  • the splicing unit may be configured to perform feature extraction on the training picture by using a reinforcement learning model to obtain a local feature of the training picture, and splicing it with the global feature of the training picture to obtain a first feature vector, and using the first feature vector as a training picture Current observations.
  • the combining unit can be configured to utilize the LSTM unit training in the reinforcement learning model
  • the historical observation information of the training picture is recorded, and the historical observation information of the training picture is combined with the current observation information of the training picture as the current state representation of the training picture.
  • the cropping unit may be configured to perform a cropping of the training picture by using a reinforcement learning model according to a current state representation of the training picture to obtain a cropped training picture.
  • the setting unit may be configured to acquire a quality score of the training picture before and after the cropping by using the sorting model, and set a bonus function according to the quality score of the training picture before and after the cropping.
  • the first training unit may be configured to train the reinforcement learning model according to the reward function to obtain a cropping strategy.
  • the apparatus for automatically cropping the picture based on the reinforcement learning in the embodiment may further include a sorting model training module, and the sorting model training module includes a random cutting unit and a second training unit.
  • the random cropping unit may be configured to randomly crop the high quality image set to obtain a low quality image corresponding to the high quality image, and use the high quality image as a paired picture training set.
  • the second training unit can be configured to train the ranking model with a pair of picture training sets.
  • the above-mentioned apparatus for automatically cutting the picture based on reinforcement learning further includes some other well-known structures, such as a processor, a controller, a memory, and the like, wherein the memory includes but is not limited to a random access memory, a flash memory, a read only memory, Programmable read-only memory, volatile memory, non-volatile memory, serial memory, parallel memory or registers, etc., including but not limited to CPLD/FPGA, DSP, ARM processor, MIPS processor, etc., in order not to Embodiments of the present disclosure are obscured as necessary, and these well-known structures are not shown.
  • modules in the apparatus in the embodiment can be adaptively changed and placed in one or more different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the present invention also provides a storage device based on the above embodiment of the method for automatic cropping based on enhanced learning.
  • a storage device of the embodiment a plurality of programs are stored, and the program is applicable to a method that is loaded and executed by a processor to implement the above-described automatic learning based on enhanced learning.
  • the present invention also provides a processing apparatus based on the above embodiment of the method of automatic cropping based on enhanced learning.
  • the processing device in this embodiment may include a processor and a storage device. Wherein the processor is adapted to execute a plurality of programs, the storage device is adapted to store a plurality of programs, and the programs are adapted to be loaded and executed by the processor to implement the above-described method of automatic learning based on enhanced learning.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the servers, clients, in accordance with embodiments of the present invention.
  • the invention may also be implemented as a device or device program (e.g., a PC program and a PC program product) for performing some or all of the methods described herein.
  • a program implementing the present invention may be stored on a PC readable medium or may have the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种基于强化学习的图片自动裁剪的方法及装置,属于图像处理领域。该方法包括步骤:利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,并将其与待裁剪图片的全局特征进行拼接,得到新的特征向量,将新的特征向量作为当前观测信息(S1);利用强化学习模型中LSTM单元对历史观测信息进行记录,并将历史观测信息与当前观测信息结合作为当前的状态表示(S2);根据裁剪策略以及当前的状态表示,对待裁剪图片序列化地执行裁剪动作,得到裁剪结果(S3);其中,强化学习模型为基于卷积神经网络构建的模型。该方法可以快速地获取准确的图片裁剪结果。

Description

基于强化学习的图片自动裁剪的方法及装置 技术领域
本发明涉及图像处理领域,具体涉及一种基于强化学习的图片自动裁剪的方法及装置。
背景技术
随着图像处理领域的飞速发展,图像自动裁剪作为图像处理领域的重要部分也得到了较大的提升。图像自动裁剪要求计算机能够从输入的图片中自动挑选出具有良好构图的区域,这些区域相较于原始图片具有较高的美感质量。
传统的方法为基于滑动窗口的方法,该方法主要通过滑动窗口进行候选区域的提取,然后在每个候选区域上抽取特征并且对其进行评分,分数最高的区域作为最终结果。但是该方法会产生大量的候选窗口,并且每个候选窗口的形状和大小是比较固定的,固定形状和大小的裁剪窗口不能覆盖到所有的情况,这样不仅裁剪效果不佳,同时会耗费大量计算资源和较长时间。
发明内容
为了解决现有技术中的上述问题,即为了解决如何快速地获得精确的图片裁剪结果的技术问题,本发明提供了一种基于强化学习的图片自动裁剪的方法及装置。
在第一方面,本发明中的基于强化学习的图片自动裁剪的方法,包括:
利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,并将其与待裁剪图片的全局特征进行拼接,得到新的特征向量,将所述新的特征向量作为当前观测信息;
利用所述强化学习模型中LSTM单元对历史观测信息进行记录,并将所述历史观测信息与所述当前观测信息结合作为当前的状态表示;
根据裁剪策略以及所述当前的状态表示,对所述待裁剪图片序列化地执行裁剪动作,得到裁剪结果;
其中,所述强化学习模型为基于卷积神经网络构建的模型。
优选地,所述裁剪策略,其获取方法为:
利用强化学习模型对训练图片进行特征提取获得训练图片的局部特征,并将其与所述训练图片的全局特征进行拼接,得到第一特征向量,将所述第一特征向量作为训练图片的当前观测信息;
利用所述强化学习模型中LSTM单元对训练图片的历史观测信息进行记录,并将所述训练图片的历史观测信息与所述训练图片的当前观测信息结合作为训练图片的当前的状态表示;
根据所述训练图片的当前的状态表示利用所述强化学习模型对所述训练图片采用预设裁剪动作进行裁剪,获得裁剪后的训练图片;
利用排序模型获取裁剪前后的训练图片的质量分数,并依据所述裁剪前后的训练图片的质量分数设置奖励函数;
根据所述奖励函数对所述强化学习模型进行训练,得到所述裁剪策略。
优选地,依据下述公式计算所述奖励函数:
Figure PCTCN2017114795-appb-000001
其中,reward为所述奖励函数,aspect ratio为当前裁剪窗口的纵横比,sign为符号函数,score为当前裁剪窗口的质量分数,previos_score为所述当前裁剪窗口前一个裁剪窗口的质量分数,t为所述强化学习模型在裁剪过程中使用的步数。
优选地,所述裁剪动作依据裁剪任务以及当前的状态表示设定,包括位置变换动作、形状变化动作、尺度变化动作以及停止动作;
其中,
所述位置变化动作用于调整裁剪窗口的位置;
所述形状变化动作用于调整所述裁剪窗口的形状;
所述尺度变化动作用于调整所述裁剪窗口的大小;
所述停止动作用于使所述强化学习模型停止裁剪,并输出当前窗口作为裁剪结果。
优选地,所述排序模型,其训练优化方法为:
对高质量的图片集进行随机裁剪,得到与高质量图片相对应的低质量图片,并将其与所述高质量图片作为成对的图片训练集;
利用所述成对的图片训练集对所述预设的排序模型进行训练。
在第二方面,本发明中的基于强化学习的图片自动裁剪的装置,包括:
提取模块,配置为利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,并将其与待裁剪图片的全局特征进行拼接,得到新的特征向量,将所述新的特征向量作为当前观测信息;
结合模块,配置为利用所述强化学习模型中LSTM单元对历史观测信息进行记录,并将所述历史观测信息与所述当前观测信息结合作为当前的状态表示;
裁剪模块,配置为根据裁剪策略以及所述当前的状态表示,对所述待裁剪图片序列化地执行裁剪动作,得到裁剪结果;
其中,所述强化学习模型为基于卷积神经网络构建的模型。
优选地,所述装置还包括裁剪策略获取模块,所述裁剪策略获取模块包括:
拼接单元,配置为利用强化学习模型对训练图片进行特征提取获得训练图片的局部特征,并将其与所述训练图片的全局特征进行拼接,得到第一特征向量,将所述第一特征向量作为训练图片的当前观测信息;
结合单元,配置为利用所述强化学习模型中LSTM单元对训练图片的历史观测信息进行记录,并将所述训练图片的历史观测信息与所述训练图片的当前观测信息结合作为训练图片的当前的状态表示;
裁剪单元,配置为根据所述训练图片的当前的状态表示利用所述强化学习模型对所述训练图片采用预设裁剪动作进行裁剪,获得裁剪后的训练图片;
设置单元,配置为利用排序模型获取裁剪前后的训练图片的质量分数,并依据所述裁剪前后的训练图片的质量分数设置奖励函数;
第一训练单元,配置为根据所述奖励函数对所述强化学习 模型进行训练,得到所述裁剪策略。
优选地,所述装置还包括排序模型训练模块,所述排序模型训练模块包括:
随机裁剪单元,配置为对高质量的图片集进行随机裁剪,得到与高质量图片相对应的低质量图片,并将其与所述高质量图片作为成对的图片训练集;
第二训练单元,配置为利用所述成对的图片训练集对所述排序模型进行训练。
在第三方面,本发明中的存储装置,其中存储有多条程序,适用于由处理器加载并执行以实现上述技术方案所述的基于强化学习的图片自动裁剪的方法。
在第四方面,本发明中的处理装置,包括
处理器,适于执行各条程序;以及
存储设备,适于存储多条程序;
所述程序适于由处理器加载并执行以实现上述技术方案所述的基于强化学习的图片自动裁剪的方法。
与最接近的现有技术相比,上述技术方案至少具有以下有益效果:
1.本发明的基于强化学习的图片自动裁剪的方法中,通过将历史观测信息和当前观测信息结合作为当前的状态表示,并依据当前的状态表示以及裁剪策略对待裁剪图片序列化地执行裁剪动作,只需要很少的候选窗口就可以得到最终的结果,大大减少了计算量和计算所需时间。
2.本发明的基于强化学习的图片自动裁剪的方法中,通过排序模型获得裁剪前后图片的质量分数,并以此作为基准设置奖励函数,根据该奖励函数训练强化学习模型,经过大量的训练,可以使强化学习模型得到更为精准的裁剪策略,从而大大提高图片裁剪的精准性。
3.本发明的基于强化学习的图片自动裁剪的方法中,通过设定的裁剪动作,可以任意调整裁剪窗口的大小和位置,不仅可以使得到的裁剪窗口能够更为精准地覆盖相应的区域,还可以使裁剪过程更加灵活。
附图说明
图1是本发明实施例的基于强化学习的图片自动裁剪的方法主要步骤示意图;
图2是本发明实施例的强化学习模型的裁剪动作空间示意图。
具体实施方式
下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是,这些实施方式仅仅用于解释本发明的技术原理,并非旨在限制本发明的保护范围。
本发明中设计了排序模型和强化学习模型,以排序模型获得的裁剪前后图片的质量分数作为基准设置奖励函数,并依据奖励函数训练强化学习模型,从而获得提升图片美感质量的裁剪策略,最后依据裁剪策略和当前的状态表示序列化的执行相应的裁剪动作,最终获得高质量的裁剪结果。
下面结合附图,对本发明实施例中基于强化学习的图片自动裁剪的方法进行说明。
参阅附图1,图1示例性的示出了基于强化学习的图片自动裁剪的方法主要步骤。如图1所示,本实施例中基于强化学习的图片自动裁剪的方法可以包括步骤S1、步骤S2以及步骤S3。
步骤S1,利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,并将其与待裁剪图片的全局特征进行拼接,得到新的特征向量,将新的特征向量作为当前观测信息。
具体地,本实施中强化学习模型为基于卷积神经网络构建的模型,利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,将提取到的局部特征与整张图片的全局特征进行拼接,得到新的特征向量,将该新的特征向量作为当前观测信息,其中当前观测信息为当前时刻待裁剪图片状态的观测。
步骤S2,利用强化学习模型中LSTM单元对历史观测信息进行记录,并将历史观测信息与当前观测信息结合作为当前的状态表示。
具体地,本实施例中强化学习模型通过其LSTM单元来记录从开始到当前时刻的观测信息,记作历史观测信息。LSTM单元将历史 观测信息和当前观测信息整合为当前的状态表示。例如,在t时刻,LSTM单元记录的观测信息包括{o1,o2,…,ot-1,ot},其中oi表示第i时刻的观测。并且LSTM单元对上述所记录的观测信息进行整合,得到当前状态表示st。强化学习模型根据每个时刻的状态表示,执行相应的裁剪动作,并得到裁剪之后的图像。
步骤S3,根据裁剪策略以及当前的状态表示,对待裁剪图片序列化地执行裁剪动作,得到裁剪结果。
进一步地,本实施中裁剪策略的获取方法可以包括步骤S31、步骤S32、步骤S33、步骤S34以及步骤S35。
步骤S31,利用强化学习模型对训练图片进行特征提取获得训练图片的局部特征,并将其与训练图片的全局特征进行拼接,得到第一特征向量,将第一特征向量作为训练图片的当前观测信息。
具体地,训练图片的当前观测信息为当前时刻训练图片状态的观测。
步骤S32,利用强化学习模型中LSTM单元对训练图片的历史观测信息进行记录,并将训练图片的历史观测信息与训练图片的当前观测信息结合作为训练图片的当前的状态表示。
步骤S33,根据训练图片的当前的状态表示利用强化学习模型对训练图片采用预设裁剪动作进行裁剪,获得裁剪后的训练图片。
具体地,本实施例中预设裁剪动作根据裁剪任务设定。
步骤S34,利用排序模型获取裁剪前后的训练图片的质量分数,并依据裁剪前后的训练图片的质量分数设置奖励函数。
进一步地,本实施例中在利用排序模型获取裁剪前后的训练图片的质量分数之前,还需要对该排序模型进行优化训练,对排序模型进行优化训练的方法包括步骤A1和步骤A2。
步骤A1,对高质量的图片集进行随机裁剪,得到与高质量图片相对应的低质量图片,并将其与高质量图片作为成对的图片训练集。
具体地,本实施例中可以在大规模高质量的图片数据集上,对图片进行随机裁剪,得到与高质量的图片相对应的低质量的图片,然后通过这些成对的图片数据集作为图片训练集。
步骤A2,利用成对的图片训练集对预设的排序模型进行训练。
具体地,本实施例中预设的排序模型为基于卷积神经网络构建的模型,可以使用该排序模型对图片的美感质量进行评分。
进一步地,本实施中可以依据下述公式(1)计算奖励函数:
Figure PCTCN2017114795-appb-000002
其中,reward为奖励函数,aspect ratio为当前裁剪窗口的纵横比,sign为符号函数,score为当前裁剪窗口的质量分数,previos_score为当前裁剪窗口前一个裁剪窗口的质量分数,t为强化学习模型在裁剪过程中使用的步数。
具体地,本实施例中,为了限制裁剪图片的形状,当裁剪窗口的纵横比超过[0.5,2]时,模型会收到一个逆向的奖励;在训练过程中,裁剪窗口的美感质量得到提升,强化学习模型会得到一个+1的奖励,反之,会获得一个-1的奖励,这样设置保证强化学习模型能够学到提升图片美感质量的裁剪策略;-0.001*t作为奖励函数的一部分,使得模型能够学到快速对图像进行裁剪的动作。
步骤S35,根据奖励函数对强化学习模型进行训练,得到裁剪策略。
参阅附图2,图2示例性的示出了强化学习模型的裁剪动作空间。本实施例中裁剪动作依据裁剪任务以及当前的状态表示设定,如图2所示,裁剪动作包括变换动作、形状变化动作、尺度变化动作以及停止动作;
其中,位置变化动作用于调整裁剪窗口的位置;
形状变化动作用于调整裁剪窗口的形状;
尺度变化动作用于调整裁剪窗口的大小;
停止动作用于使强化学习模型停止裁剪,并输出当前窗口作为裁剪结果。
具体地,本实施中,裁剪动作共包含14个相应的动作,每次调整裁剪窗口是以原图大小的0.05作为调整距离。
传统的自动裁剪算法需要使用滑动窗口的方法在待裁剪图片上密集的选取候选窗口,然后在对应窗口上进行特征提取和评分,并且根据分数选取裁剪的结果。但是传统方法会获得大量的候选区域,对每个候选区域进行特征提取和评分需要巨大的计算量和时间。而在本技 术方案中,通过为强化学习模型设计丰富的动作空间,强化学习模型能够将裁剪的窗口调整到任意大小和任意位置,使得到的裁剪窗口能够更为精确地覆盖相应的区域,解决了传统方法中窗口大小和形状固定导致的不能找到最好的裁剪窗口的问题。另外强化学习模型在裁剪的过程中减少了大量的候选窗口,解决了传统方法裁剪过程中耗费大量计算资源和大量时间的问题。
通过为强化学习模型设计精确的状态表示、丰富的动作空间和具有引导性的奖励函数,强化学习模型学到的裁剪策略能够使用很少的候选窗口便可以完成精确的裁剪操作,同时其耗时相较于传统方法也大大减少。
基于与基于强化学习的图片自动裁剪的方法实施例相同的技术构思,本发明中的实施例还提供了一种基于强化学习的图片自动裁剪的装置。下面对该基于强化学习的图片自动裁剪的装置进行具体说明。
本实施中基于强化学习的图片自动裁剪的装置还可以包括提取模块、结合模块以及裁剪模块。
其中,提取模块可以配置为利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,并将其与待裁剪图片的全局特征进行拼接,得到新的特征向量,将新的特征向量作为当前观测信息。
结合模块可以配置为利用强化学习模型中LSTM单元对历史观测信息进行记录,并将历史观测信息与当前观测信息结合作为当前的状态表示。
裁剪模块可以配置为根据裁剪策略以及当前的状态表示,对待裁剪图片序列化地执行裁剪动作,得到裁剪结果。
其中,强化学习模型为基于卷积神经网络构建的模型。
进一步地,本实施例中基于强化学习的图片自动裁剪的装置还可以包括裁剪策略获取模块,裁剪策略获取模块包括拼接单元、结合单元、裁剪单元、设置单元以及第一训练单元。
其中,拼接单元可以配置为利用强化学习模型对训练图片进行特征提取获得训练图片的局部特征,并将其与训练图片的全局特征进行拼接,得到第一特征向量,将第一特征向量作为训练图片的当前观测信息。
结合单元可以配置为利用强化学习模型中LSTM单元对训 练图片的历史观测信息进行记录,并将训练图片的历史观测信息与训练图片的当前观测信息结合作为训练图片的当前的状态表示。
裁剪单元可以配置为根据训练图片的当前的状态表示利用强化学习模型对训练图片采用预设裁剪动作进行裁剪,获得裁剪后的训练图片。
设置单元可以配置为利用排序模型获取裁剪前后的训练图片的质量分数,并依据裁剪前后的训练图片的质量分数设置奖励函数。
第一训练单元可以配置为根据奖励函数对强化学习模型进行训练,得到裁剪策略。
进一步地,本实施例中基于强化学习的图片自动裁剪的装置还可以包括排序模型训练模块,排序模型训练模块包括随机裁剪单元和第二训练单元。
其中,随机裁剪单元可以配置为对高质量的图片集进行随机裁剪,得到与高质量图片相对应的低质量图片,并将其与高质量图片作为成对的图片训练集。
第二训练单元可以配置为利用成对的图片训练集对排序模型进行训练。
上述基于强化学习的图片自动裁剪的方法的实施例,其技术原理、所解决的技术问题及产生的技术效果相似,所属技术领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的基于强化学习的图片自动裁剪的装置的具体工作过程及有关说明,可以参考前述基于强化学习的图片自动裁剪的方法,在此不再赘述。
本领域技术人员可以理解,上述基于强化学习的图片自动裁剪的装置还包括一些其他公知结构,例如处理器、控制器、存储器等,其中,存储器包括但不限于随机存储器、闪存、只读存储器、可编程只读存储器、易失性存储器、非易失性存储器、串行存储器、并行存储器或寄存器等,处理器包括但不限于CPLD/FPGA、DSP、ARM处理器、MIPS处理器等,为了不必要地模糊本公开的实施例,这些公知的结构未示出。
本领域技术人员可以理解,可以对实施例中的装置中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个装 置中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
基于上述基于强化学习的图片自动裁剪的方法的实施例,本发明还提供了一种存储装置。本实施例中存储装置中存储有多条程序,该程序适用于由处理器加载并执行以实现上述基于强化学习的图片自动裁剪的方法。
基于上述基于强化学习的图片自动裁剪的方法的实施例,本发明还提供了一种处理装置。本实施例中处理装置可以包括处理器和存储设备。其中,处理器适于执行各条程序,存储设备适于存储多条程序,并且这些程序适于由处理器加载并执行以实现上述基于强化学习的图片自动裁剪的方法。
所属技术领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的存储装置、处理装置的具体工作过程及有关说明,可以参考前述基于强化学习的图片自动裁剪的方法实施例中的对应过程,在此不再赘述。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的服务器、客户端中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,PC程序和PC程序产品)。这样的实现本发明的程序可以存储在PC可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。 例如,在本发明的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的PC来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
至此,已经结合附图所示的优选实施方式描述了本发明的技术方案,但是,本领域技术人员容易理解的是,本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下,本领域技术人员可以对相关技术特征作出等同的更改或替换,这些更改或替换之后的技术方案都将落入本发明的保护范围之内。

Claims (10)

  1. 一种基于强化学习的图片自动裁剪的方法,其特征在于,所述方法包括:
    利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,并将其与待裁剪图片的全局特征进行拼接,得到新的特征向量,将所述新的特征向量作为当前观测信息;
    利用所述强化学习模型中LSTM单元对历史观测信息进行记录,并将所述历史观测信息与所述当前观测信息结合作为当前的状态表示;
    根据裁剪策略以及所述当前的状态表示,对所述待裁剪图片序列化地执行裁剪动作,得到裁剪结果;
    其中,所述强化学习模型为基于卷积神经网络构建的模型。
  2. 根据权利要求1所述的基于强化学习的图片自动裁剪的方法,其特征在于,所述裁剪策略,其获取方法为:
    利用强化学习模型对训练图片进行特征提取获得训练图片的局部特征,并将其与所述训练图片的全局特征进行拼接,得到第一特征向量,将所述第一特征向量作为训练图片的当前观测信息;
    利用所述强化学习模型中LSTM单元对训练图片的历史观测信息进行记录,并将所述训练图片的历史观测信息与所述训练图片的当前观测信息结合作为训练图片的当前的状态表示;
    根据所述训练图片的当前的状态表示利用所述强化学习模型对所述训练图片采用预设裁剪动作进行裁剪,获得裁剪后的训练图片;
    利用排序模型获取裁剪前后的训练图片的质量分数,并依据所述裁剪前后的训练图片的质量分数设置奖励函数;
    根据所述奖励函数对所述强化学习模型进行训练,得到所述裁剪策略。
  3. 根据权利要求2所述的基于强化学习的图片自动裁剪的方法,其特征在于,依据下述公式计算所述奖励函数:
    Figure PCTCN2017114795-appb-100001
    其中,reward为所述奖励函数,aspect ratio为当前裁剪窗口的纵横比,sign为符号函数,score为当前裁剪窗口的质量分数,previos_score为所述当前裁剪窗口前一个裁剪窗口的质量分数,t为所述强化学习模型在裁剪过程中使用的步数。
  4. 根据权利要求2所述的基于强化学习的图片自动裁剪的方法,其特征在于,所述裁剪动作依据裁剪任务以及当前的状态表示设定,包括位置变换动作、形状变化动作、尺度变化动作以及停止动作;
    其中,
    所述位置变化动作用于调整裁剪窗口的位置;
    所述形状变化动作用于调整所述裁剪窗口的形状;
    所述尺度变化动作用于调整所述裁剪窗口的大小;
    所述停止动作用于使所述强化学习模型停止裁剪,并输出当前窗口作为裁剪结果。
  5. 根据权利要求2-4任意一项所述的基于强化学习的图片自动裁剪的方法,其特征在于,所述排序模型,其训练优化方法为:
    对高质量的图片集进行随机裁剪,得到与高质量图片相对应的低质量图片,并将其与所述高质量图片作为成对的图片训练集;
    利用所述成对的图片训练集对所述预设的排序模型进行训练。
  6. 一种基于强化学习的图片自动裁剪的装置,其特征在于,所述装置包括:
    提取模块,配置为利用强化学习模型对当前裁剪窗口进行特征提取获得局部特征,并将其与待裁剪图片的全局特征进行拼接,得到新的特征向量,将所述新的特征向量作为当前观测信息;
    结合模块,配置为利用所述强化学习模型中LSTM单元对历史观测信息进行记录,并将所述历史观测信息与所述当前观测信息结合作为当前的状态表示;
    裁剪模块,配置为根据裁剪策略以及所述当前的状态表示,对所述待裁剪图片序列化地执行裁剪动作,得到裁剪结果;
    其中,所述强化学习模型为基于卷积神经网络构建的模型。
  7. 根据权利要求6所述的装置,其特征在于,所述装置还包括裁剪策略获取模块,所述裁剪策略获取模块包括:
    拼接单元,配置为利用强化学习模型对训练图片进行特征提取获得训练图片的局部特征,并将其与所述训练图片的全局特征进行拼接,得到第一特征向量,将所述第一特征向量作为训练图片的当前观测信息;
    结合单元,配置为利用所述强化学习模型中LSTM单元对训练图片的历史观测信息进行记录,并将所述训练图片的历史观测信息与所述训练图片的当前观测信息结合作为训练图片的当前的状态表示;
    裁剪单元,配置为根据所述训练图片的当前的状态表示利用所述强化学习模型对所述训练图片采用预设裁剪动作进行裁剪,获得裁剪后的训练图片;
    设置单元,配置为利用排序模型获取裁剪前后的训练图片的质量分数,并依据所述裁剪前后的训练图片的质量分数设置奖励函数;
    第一训练单元,配置为根据所述奖励函数对所述强化学习模型进行训练,得到所述裁剪策略。
  8. 根据权利要求6-7任意一项所述的装置,其特征在于,所述装置还包括排序模型训练模块,所述排序模型训练模块包括:
    随机裁剪单元,配置为对高质量的图片集进行随机裁剪,得到与高质量图片相对应的低质量图片,并将其与所述高质量图片作为成对的图片训练集;
    第二训练单元,配置为利用所述成对的图片训练集对所述排序模型进行训练。
  9. 一种存储装置,其中存储有多条程序,其特征在于,所述程序适用于由处理器加载并执行以实现权利要求1-5任一项所述的基于强化学习的图片自动裁剪的方法。
  10. 一种处理装置,包括
    处理器,适于执行各条程序;以及
    存储设备,适于存储多条程序;
    其特征在于,所述程序适于由处理器加载并执行以实现:权利要求1-5任一项所述的基于强化学习的图片自动裁剪的方法。
PCT/CN2017/114795 2017-12-06 2017-12-06 基于强化学习的图片自动裁剪的方法及装置 WO2019109268A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/114795 WO2019109268A1 (zh) 2017-12-06 2017-12-06 基于强化学习的图片自动裁剪的方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/114795 WO2019109268A1 (zh) 2017-12-06 2017-12-06 基于强化学习的图片自动裁剪的方法及装置

Publications (1)

Publication Number Publication Date
WO2019109268A1 true WO2019109268A1 (zh) 2019-06-13

Family

ID=66751282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/114795 WO2019109268A1 (zh) 2017-12-06 2017-12-06 基于强化学习的图片自动裁剪的方法及装置

Country Status (1)

Country Link
WO (1) WO2019109268A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11540096B2 (en) * 2018-06-27 2022-12-27 Niantic, Inc. Multi-sync ensemble model for device localization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090297060A1 (en) * 2008-06-02 2009-12-03 Canon Kabushiki Kaisha Image processing apparatus and image processing method
CN103996186A (zh) * 2014-04-29 2014-08-20 小米科技有限责任公司 图像裁剪方法及装置
CN106446930A (zh) * 2016-06-28 2017-02-22 沈阳工业大学 基于深层卷积神经网络的机器人工作场景识别方法
CN107016366A (zh) * 2017-03-29 2017-08-04 浙江师范大学 一种基于自适应滑动窗口和卷积神经网络的路牌检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090297060A1 (en) * 2008-06-02 2009-12-03 Canon Kabushiki Kaisha Image processing apparatus and image processing method
CN103996186A (zh) * 2014-04-29 2014-08-20 小米科技有限责任公司 图像裁剪方法及装置
CN106446930A (zh) * 2016-06-28 2017-02-22 沈阳工业大学 基于深层卷积神经网络的机器人工作场景识别方法
CN107016366A (zh) * 2017-03-29 2017-08-04 浙江师范大学 一种基于自适应滑动窗口和卷积神经网络的路牌检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KAO, YUEYING ET AL.: "Automatic Image Cropping with Aesthetic Map and Gradient Energy Map", 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 5 March 2017 (2017-03-05), pages 1983 - 1985, XP033258763 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11540096B2 (en) * 2018-06-27 2022-12-27 Niantic, Inc. Multi-sync ensemble model for device localization

Similar Documents

Publication Publication Date Title
CN108154464B (zh) 基于强化学习的图片自动裁剪的方法及装置
WO2020252917A1 (zh) 一种模糊人脸图像识别方法、装置、终端设备及介质
CN111161311A (zh) 一种基于深度学习的视觉多目标跟踪方法及装置
US10600171B2 (en) Image-blending via alignment or photometric adjustments computed by a neural network
CN108664920B (zh) 一种实时的大规模级联人脸聚类方法和装置
US20230049135A1 (en) Deep learning-based video editing method, related device, and storage medium
EP3473016B1 (en) Method and system for automatically producing video highlights
JP2015087903A (ja) 情報処理装置及び情報処理方法
CN108960269B (zh) 数据集的特征获取方法、装置及计算设备
Lasseck Large-scale Identification of Birds in Audio Recordings.
TWI764287B (zh) 用於組織分割之機器學習模型的交互式訓練
CN112529913A (zh) 图像分割模型训练方法、图像处理方法及装置
US11314970B1 (en) Reinforcement learning techniques for automated video summarization
WO2022132912A1 (en) Remote farm damage assessment system and method
CN114449313B (zh) 视频的音画面播放速率调整方法及装置
CN112597909A (zh) 一种用于人脸图片质量评价的方法与设备
CN112307900A (zh) 面部图像质量的评估方法、装置和电子设备
WO2019109268A1 (zh) 基于强化学习的图片自动裁剪的方法及装置
CN111325212A (zh) 模型训练方法、装置、电子设备和计算机可读存储介质
WO2017201907A1 (zh) 检索词分类方法及装置
CN111767424B (zh) 图像处理方法、装置、电子设备及计算机存储介质
CN116704591A (zh) 眼轴预测模型的训练方法、眼轴预测方法和装置
CN108615235B (zh) 一种对颞耳图像进行处理的方法及装置
CN115937372A (zh) 一种面部表情的模拟方法、装置、设备及存储介质
CN114841955A (zh) 一种生物物种的识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17934076

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17934076

Country of ref document: EP

Kind code of ref document: A1