CN111191059B

CN111191059B - Image processing method, device, computer storage medium and electronic equipment

Info

Publication number: CN111191059B
Application number: CN201911422014.1A
Authority: CN
Inventors: 梁涛; 张晗; 马连洋; 衡阵
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-05-05
Anticipated expiration: 2039-12-31
Also published as: CN111191059A

Abstract

The disclosure provides an image processing method and device, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring an image set to be processed and a historical image set, wherein the historical image set comprises historical interesting images of a user; extracting features of the historical interesting images through an image processing model, and acquiring historical interesting information description based on the extracted features; simultaneously, extracting features of the images to be processed in the images to be processed through an image processing model so as to obtain feature information corresponding to the images to be processed; and determining interest weights corresponding to the images to be processed according to the feature information and the historical interest information description, and determining the cover images from the images to be processed based on the feature information and the interest weights corresponding to the images to be processed. According to the method and the device for recommending the images, the inherent relevance between the historical interest images and the images to be processed is mined according to the historical interest information description and the characteristic information of the images to be processed, and then the cover images are determined from the images to be processed, so that the efficiency and the accuracy of image recommendation are improved.

Description

Image processing method, device, computer storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to an image processing method, an image processing apparatus, a computer storage medium, and an electronic device.

Background

Along with the rapid development of computer technology, in order to adapt to different user demands and improve the clicking rate of platforms, each platform starts to establish a personalized recommendation system and provides personalized recommendation services for clients, wherein the personalized recommendation system is a product of development of Internet and electronic commerce, is an advanced commerce intelligent platform established on the basis of mass data mining, and provides personalized information services and decision support for the clients.

At present, existing recommendation schemes simply compare the shape similarity of historical data information features with recommendation data when using user historical data, and cannot mine inherent association information between the historical data information and the recommendation data.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides an image processing method, an image processing device, a computer storage medium and electronic equipment, so that the inherent relevance between a user history interested image and an image set to be processed can be mined at least to a certain extent, and personalized recommendation can be performed on a user more specifically and accurately.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to an aspect of the embodiments of the present disclosure, there is provided an image processing method including: acquiring a to-be-processed image set and a historical image set, wherein the historical image set comprises historical interesting images of a user; extracting features of the historical interesting images through an image processing model, and acquiring historical interesting information description based on the extracted features; simultaneously, extracting features of the images to be processed in the image set to be processed through the image processing model so as to obtain feature information corresponding to the images to be processed; and determining interest weights corresponding to the images to be processed according to the characteristic information and the historical interest information description, and determining cover images from the images to be processed based on the characteristic information and the interest weights corresponding to the images to be processed.

According to an aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including: the acquisition module is used for acquiring an image set to be processed and a historical image set, wherein the historical image set comprises a historical interesting image of a user; the feature extraction module is used for extracting features of the historical interesting images through an image processing model and acquiring historical interesting information description based on the extracted features; simultaneously, extracting features of the images to be processed in the image set to be processed through the image processing model so as to obtain feature information corresponding to the images to be processed; and the image determining module is used for determining interest weights corresponding to the images to be processed according to the characteristic information and the historical interest information description, and determining cover images from the images to be processed based on the characteristic information and the interest weights corresponding to the images to be processed.

In some embodiments of the present disclosure, the image processing model includes a first bi-directional gating loop unit; based on the foregoing, the image determining module is configured to: inputting the characteristic information and the historical interest information description into the first bidirectional gating circulation unit, and extracting the characteristic information and the historical interest information description through the first bidirectional gating circulation unit; and determining the coincidence ratio between the characteristic information and the historical interest information description according to the characteristic corresponding to the characteristic information and the characteristic corresponding to the historical interest information description, and taking the coincidence ratio as the interest weight.

In some embodiments of the present disclosure, the image processing model includes a first fully connected layer and a first normalized layer; based on the foregoing, the image determining module includes: the weighting unit is used for carrying out weighting processing on the feature information according to the interest weight so as to acquire the feature information of the interest point; the full-connection unit is used for inputting the feature information of the interest points to the first full-connection layer, and performing nonlinear transformation processing on the feature information of the interest points through the first full-connection layer so as to acquire the feature information of the full-connection corresponding to the image to be processed; the normalization unit is used for inputting the full-connection characteristic information to the first normalization layer, and carrying out normalization processing on the full-connection characteristic information through the first normalization layer so as to acquire a probability value corresponding to the image to be processed; and the image acquisition unit is used for acquiring the image to be processed with the maximum probability value and taking the image to be processed with the maximum probability value as the cover image.

In some embodiments of the disclosure, based on the foregoing scheme, the fully connected unit is configured to: and (3) carrying out nonlinear transformation processing on the interest point characteristic information according to a formula (1):

Z＝f(w ₁ X+b ₁ ) (1)

wherein f () is an activation function, w ₁ As a weight matrix, b ₁ For bias constant, X is the feature information of the interest points, and Z is the feature information of the full connection;

the normalization unit is configured to: and (3) carrying out normalization processing on the full-connection characteristic information according to a formula (2):

wherein z is _j For the j-th sub-feature information in the full-connection feature information, K is the total number of sub-feature information contained in the full-connection feature information, and sigma (z) _j Is a probability value corresponding to the image to be processed.

In some embodiments of the present disclosure, the image processing model includes a first convolution unit and a second bi-directional gating loop unit; based on the foregoing, the feature extraction module is configured to: extracting features of the historical interesting image through the first convolution unit to obtain interesting image feature information; and inputting the interested image characteristic information into the second bi-directional gating circulation unit, and extracting the characteristics of the interested image characteristic information through the second bi-directional gating circulation unit so as to acquire the historical interest information description.

In some embodiments of the present disclosure, the image processing model includes a second convolution unit; based on the foregoing, the feature extraction module includes: and the first feature extraction unit is used for carrying out feature extraction on the image to be processed through the second convolution unit so as to acquire feature information corresponding to the image to be processed.

In some embodiments of the disclosure, based on the foregoing aspect, the first feature extraction unit is configured to: acquiring a plurality of target images to be processed from the set of images to be processed according to a preset rule; and extracting the characteristics of each target to-be-processed image through the second convolution unit so as to acquire characteristic information corresponding to each target to-be-processed image.

In some embodiments of the present disclosure, based on the foregoing scheme, the image processing apparatus is further configured to: acquiring a video to be processed, and analyzing the video to be processed to acquire an image frame; and forming the image set to be processed according to all the image frames in the video to be processed.

In some embodiments of the present disclosure, based on the foregoing aspect, the image processing apparatus further includes: the sample acquisition module is used for acquiring a plurality of groups of training samples, wherein each group of training samples comprises a video sample, a video category sample, a video cover sample and a historical interesting image sample; and the model training model is used for training an image processing model to be trained according to the video sample, the video category sample, the video cover sample and the historical interesting image sample so as to acquire the image processing model.

In some embodiments of the present disclosure, the image processing model to be trained includes a first training branch, a second training branch, and a third training branch; based on the foregoing, the model training module includes: the second feature extraction unit is used for carrying out feature extraction on the historical interesting image sample through the first training branch so as to obtain interest distribution information description corresponding to the historical interesting image sample; extracting features of the video sample through the second training branch, and acquiring recommended image information corresponding to the video sample based on the extracted features and the interest distribution information description; extracting characteristics of the video sample through the third training branch so as to obtain category information corresponding to the video sample; a loss function determining unit, configured to determine a first loss function according to the recommended image information and the video cover sample, and determine a second loss function according to the category information and the video category sample; and the parameter adjusting unit is used for adjusting parameters of the image processing model to be trained so as to minimize the weighted sum of the first loss function and the second loss function and acquire the image processing model.

In some embodiments of the present disclosure, the first training branch includes a first convolution unit to be trained and a second bi-directional gating loop unit to be trained; based on the foregoing, the second feature extraction unit is configured to: extracting features of the historical interesting image samples through a convolution layer in the first convolution unit to be trained to obtain interesting distribution feature information corresponding to the historical interesting image samples; and inputting the interest distribution characteristic information corresponding to the historical interest image sample into the second bidirectional gating circulation unit to be trained, and processing the interest distribution characteristic information through the second bidirectional gating circulation unit to be trained so as to acquire the interest distribution information description.

In some embodiments of the present disclosure, the second training branch includes a second convolution unit to be trained, a first bi-directional gating cycle unit to be trained, a first full connection layer to be trained, and a first normalization layer to be trained; based on the foregoing, the second feature extraction unit includes: the first feature extraction subunit is used for extracting features of the image samples in the video samples through the convolution layer in the second convolution unit to be trained so as to acquire feature information of the image samples; the second feature extraction subunit is used for inputting the feature information of the image sample and the interest distribution information description into the first bidirectional gating circulation unit to be trained, and extracting features of the feature information of the image sample and the interest distribution information description through the first bidirectional gating circulation unit to be trained so as to acquire an interest distribution weight; the weighting unit is used for acquiring a plurality of weighted characteristic information according to the image sample characteristic information and the interest distribution weight; and the post-processing unit is used for carrying out nonlinear transformation processing on the weighted characteristic information through the first full-connection layer to be trained, and carrying out normalization processing on the weighted characteristic information after the nonlinear transformation processing through the first normalization layer to be trained so as to acquire the recommended image information.

In some embodiments of the disclosure, based on the foregoing scheme, the first feature extraction subunit is configured to: determining a plurality of target image samples from the image set samples according to a preset rule; and extracting the characteristics of each target image sample through a convolution layer in the second convolution unit to be trained so as to acquire the characteristic information of the image sample.

In some embodiments of the present disclosure, the third training branch includes a second full connection layer to be trained and a second normalization layer to be trained; based on the foregoing, the second feature extraction unit is configured to: acquiring the characteristic information of the image sample output by the second convolution unit to be trained; adding and averaging the image sample feature information to obtain a video feature description corresponding to the video sample; and carrying out nonlinear transformation processing on the video feature description through the second full-connection layer to be trained, and carrying out normalization processing on the video feature description after the nonlinear transformation processing through the second normalization layer to be trained so as to acquire category information corresponding to the video sample.

According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method as described in the above embodiments.

According to one aspect of an embodiment of the present disclosure, there is provided an electronic device including one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to perform the image processing method as described in the above embodiments.

In the technical scheme provided by the embodiment of the disclosure, firstly, an image set to be processed and a historical image set containing historical interesting images of a user are obtained; then, carrying out feature extraction on the historical image set to obtain historical interest information description, and carrying out feature extraction on the image set to be processed to obtain feature information corresponding to the image to be processed; then determining interest weights corresponding to the images to be processed according to the feature information and the historical interest information description corresponding to the images to be processed; and finally, determining the cover image from the image set to be processed according to the feature information and the interest weight corresponding to the image to be processed. According to the technical scheme, the historical interest information description of the user can be determined according to the historical interest image of the user, the mining of the inherent relevance between the historical interest image and the image to be processed can be realized based on the historical interest information description and the characteristic information of the image set to be processed, and further the cover image can be accurately determined from the image set to be processed and recommended to the user, so that the efficiency and the accuracy of image recommendation are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of embodiments of the present disclosure may be applied;

FIG. 2 schematically illustrates a flow diagram of an image processing method according to one embodiment of the present disclosure;

FIG. 3 schematically illustrates a structural schematic of an image processing model according to one embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow framework diagram of acquiring a description of historical interest information according to one embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow frame diagram for acquiring feature information corresponding to an image to be processed according to one embodiment of the disclosure;

FIG. 6 schematically illustrates a flow frame diagram for acquiring feature information corresponding to a target image to be processed according to one embodiment of the disclosure;

FIG. 7 schematically illustrates a flow diagram for acquiring a cover image according to one embodiment of the present disclosure;

FIG. 8 schematically illustrates the composition of a training sample according to one embodiment of the present disclosure;

FIG. 9 schematically illustrates a structural schematic of an image processing model to be trained according to one embodiment of the present disclosure;

FIG. 10 schematically illustrates a flow diagram for training an image processing model to be trained in accordance with one embodiment of the present disclosure;

FIG. 11 schematically illustrates a process framework diagram of a fully connected layer according to one embodiment of the disclosure;

fig. 12 schematically illustrates a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 13 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include a terminal device 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminal device 101 and server 103. Network 102 may include various connection types, such as wired communication links, wireless communication links, and the like.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks and servers as practical. For example, the server 103 may be a server cluster formed by a plurality of servers. The terminal device 101 may be a terminal device such as a notebook, a portable computer, a desktop computer, or the like.

In one embodiment of the present disclosure, a set of images to be processed, which is made up of images to be processed, may be collected by the terminal device 101 while a set of history images including a user's history image of interest is acquired, and the set of images to be processed and the set of history images are transmitted to the server 103 through the network 102. After receiving the image set to be processed and the historical image set, the server 103 can perform feature extraction on the historical interesting images of the user in the historical image set through the first convolution unit and the second bidirectional gating circulation unit of the trained image processing model so as to acquire the historical interest information description of the user. Extracting features of the images to be processed in the image set to be processed through a second convolution unit in the image processing model to obtain feature information corresponding to the images to be processed; then, processing characteristic information and historical interest information description corresponding to the image to be processed through a first bidirectional gating circulation unit in the image processing model so as to acquire interest weights corresponding to the image to be processed; then weighting the feature information corresponding to the image to be processed according to the interest weight to obtain feature information of the interest point; and finally, carrying out nonlinear transformation processing on the feature information of the interest points through a first full-connection layer in the image processing model to obtain full-connection feature information corresponding to the image to be processed, carrying out normalization processing on the full-connection feature information through a first normalization layer to obtain a probability value corresponding to the image to be processed, wherein the probability value is used for representing the probability that the image to be processed accords with the user interest and can be used as a cover image of the image set to be processed, and determining the cover image of the image set to be processed by selecting the image to be processed with the maximum probability value. Before image processing is performed by using an image processing model, the image processing model to be trained needs to be trained to obtain a stable image processing model, and in the embodiment of the present disclosure, the image processing model to be trained is a multitask learning image processing model, which includes a cover image determining task and an image set/video category identifying task, and the image set/video category identifying task can enable the model to extract corresponding feature information for different types of image sets/videos more specifically, so as to provide complementary information for the cover image determining task. According to the technical scheme, the user interest point distribution characteristics can be built on different data types, the inherent relevance between the historical interest image of the user and the image set to be processed can be mined, and further the cover image corresponding to the user can be determined more accurately and more pertinently.

It should be noted that, the image processing method provided in the embodiments of the present disclosure is generally executed by a server, and accordingly, the image processing apparatus is generally disposed in the server. However, in other embodiments of the present disclosure, the image processing method provided by the embodiments of the present disclosure may also be performed by a terminal device.

In the related art in the field, a common user personalized recommendation scheme constructs user interest feature information based on user history data information or user attribute information so as to perform personalized recommendation on new recommended data, but the method fails to consider the attribute information of the recommended data, and because different data types exist, the user has different interest feature distribution, and when facing different types of data, the existing method cannot well extract the information features of the data types in a targeted manner. In addition, the existing technical method is generally simple in comparison of the similarity of the historical data information characteristics and the recommended data when the historical data of the user is used, and the inherent association information between the historical data information and the recommended data cannot be mined.

In view of the problems in the related art, the embodiments of the present disclosure provide an image processing method implemented based on artificial intelligence (Artificial Intelligence, AI), which is a theory, method, technique, and application system that simulates, extends, and expands human intelligence using a digital computer or a machine controlled by a digital computer, perceives an environment, acquires knowledge, and uses the knowledge to obtain an optimal result. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

The scheme provided by the embodiment of the disclosure relates to an image processing technology of artificial intelligence, and is specifically described by the following embodiments:

The embodiment of the present disclosure first provides an image processing method, and fig. 2 schematically illustrates a flowchart of the image processing method according to an embodiment of the present disclosure, which may be performed by a server, which may be the server 103 illustrated in fig. 1. Referring to fig. 2, the image processing method at least includes steps S210 to S230, and is described in detail as follows:

in step S210, a set of images to be processed and a set of history images including a history image of interest of the user are acquired.

In one embodiment of the disclosure, the image set to be processed may be an album including a plurality of images to be processed, or may be a video including a plurality of frames of images, after the album or the video is manufactured, one or a plurality of representative frames of images are usually selected as a cover of the album or the video, or a plurality of representative frames of image combinations may be selected as a cover of the album or the video, so as to attract a user to click to view, for example, when the user views the video or the album on a news platform or a video interaction platform, the user first views one of the cover images, and if the user is interested in the cover image, a link corresponding to the cover image is correspondingly triggered to view the whole video or the album. However, the number of videos and/or atlases on the platform is huge, users usually do not watch each click on, but choose to watch the video or the atlases interested by themselves, so that the front cover of the video or the atlases is of great importance for users to watch or not click on, while for the same video or the atlases, different users have different interest emphasis points, so that the click play amount of the video or the atlases is greatly improved by generating different front cover images for different users. For example, a video is a section selected from a hot-cast television play, roles appearing in the section include a first man number and a second man number, and some users like the first man number and some users like the second man number, so when a video cover is generated, interest points of the users need to be mined, for example, a user A likes to watch a first beauty in normal times and clicks other videos of a first man player before, when the video is pushed to the user A, a representative image frame can be selected from image frames of the first man number appearing in the video section as the video cover, the user A can be attracted by the video cover and clicks to watch the video section, and similarly, for a user B like the first man number, a representative image frame can be selected from the image frames of the first man number appearing in the video section as the video cover, so that the user B can be attracted to click to watch the video section. In order to facilitate understanding of the technical solution of the present disclosure, the image processing method of the present disclosure will be described below by taking the recommendation of the video cover as an example.

In one embodiment of the disclosure, firstly, a to-be-processed image set needs to be acquired, after a to-be-processed video, namely, a video for which a video cover needs to be recommended, is acquired, the to-be-processed video can be analyzed to acquire an image frame corresponding to the to-be-processed video, and the to-be-processed image set can be formed according to all the image frames corresponding to the to-be-processed video; and meanwhile, a historical image set is required to be acquired, wherein the historical image set comprises historical interesting images of the user, and when the historical interesting images of the user are collected, the cover images of the video clicked by the user within a preset time period from the current moment can be collected, and the cover images of the video clicked by the user from the first login platform to the current moment can also be collected. According to the video of different categories, the interest point distribution of the user is different, and according to the historical image set, the interest feature distribution of the user on the video of different categories can be clarified, so that the cover image can be accurately determined from the video to be processed.

In step S220, extracting features of the historical interest image through an image processing model, and acquiring a historical interest information description based on the extracted features; and simultaneously, extracting the characteristics of the images to be processed in the image set to be processed through the image processing model so as to acquire the characteristic information corresponding to the images to be processed.

In one embodiment of the present disclosure, after the image set to be processed and the history image set are acquired, the image set to be processed and the history image set may be processed by an image processing model to acquire a cover image. When the image processing model is used for processing the image set to be processed and the historical image set, the characteristic extraction can be carried out on the historical image set through the image processing model, and the historical interest information description is obtained based on the extracted characteristic; and simultaneously, the feature extraction can be carried out on the image set to be processed through the image processing model so as to acquire feature information corresponding to the image to be processed.

In one embodiment of the present disclosure, fig. 3 shows a schematic structural diagram of an image processing model, and as shown in fig. 3, an image processing model 300 includes a first convolution unit 301, a second bi-directional gating loop unit 302, a second convolution unit 303, a first bi-directional gating loop unit 304, a first full connection layer 305, and a first normalization layer 306. The first convolution unit 301 and the second bi-directional gating circulation unit 302 are used for extracting features of the historical image set and acquiring historical interest information description based on the extracted features; the second convolution unit 303 is configured to perform feature extraction on the image set to be processed, so as to obtain feature information corresponding to the image set to be processed.

In one embodiment of the present disclosure, a historical image set includes a plurality of historical interesting images of a user, in order to obtain a historical interesting information description of the user, a plurality of convolution layers may be set in a first convolution unit 301, fig. 4 shows a flowchart of a process of obtaining the historical interesting information description, as shown in fig. 4, the first convolution unit 301 includes a plurality of convolution layers 401-1, 401-2, … …, 401-N, the historical interesting images include HF-1, HF-2, … …, HF-N, wherein the number of convolution layers is equal to the number of the historical interesting images, and feature extraction is performed on each of the historical interesting images through each convolution layer to obtain interesting image feature information HFV-1, HFV-2, … …, HFV-N; the image feature information of interest HFV-1, HFV-2, … …, HFV-N may then be input to the second bi-directional gating loop unit 302, and feature extraction is performed on the image feature information of interest by the second bi-directional gating loop unit 302 to obtain the historical interest information description HFV. The historical interest information description HFV is mainly used for representing the distribution of interest points of a user on various image sets, for example, the user likes to watch a TV play or a movie of a gun class, and then the interest information description about the TV play or the movie in the historical interest information description HFV can be related descriptions of a gun, a gun war and the like; the user likes to watch various kinds of variety of shows attended by the stars, and then the interest information description about the variety of shows in the historical interest information description HFV can be the stars; the user likes to watch financial news, and then the interest information description about news programs in the historical interest information description HFV may be related descriptions of financial, etc.

The gated loop unit (Gate recurrent Unit, GRU) is a new generation recurrent neural network, very similar to LSTM, which gets rid of cellular status and uses hidden status to transfer information, GRU has only two gates, a reset gate and an update gate, wherein the reset gate decides to forget the amount of information in the past, and the update gate decides which information to discard and which new information to add. The Bi-GRU includes a forward gating loop unit and a reverse gating loop unit, and when feature extraction is performed on the image feature information of interest by the second Bi-gating loop unit 302, information in the image feature information of interest may be processed by a reset gate and an update gate to obtain a historical interest information description HFV.

In one embodiment of the present disclosure, the set of images to be processed also includes a plurality of images to be processed, that is, the video to be processed includes a plurality of image frames to be processed, in order to obtain the feature information corresponding to the images to be processed, a plurality of convolution layers may be provided in the second convolution unit 303, fig. 5 illustrates a flowchart of obtaining the feature information corresponding to the images to be processed, and as illustrated in fig. 5, the second convolution unit 303 includes a plurality of convolution layers 501-1, 501-2, … …, 501-S, and the video to be processed includes image frames PF-1, PF-2, … …, PF-S, where the number of convolution layers is equal to the number of image frames in the video to be processed, and feature extraction is performed on each image frame by each convolution layer to obtain the feature information PFV-1, PFV-2, … …, PFV-S corresponding to the images to be processed.

In one embodiment of the present disclosure, since the number of image frames included in the video to be processed is large, for example, a video having a duration of 5 minutes, if each second includes 25 frames of images, the video having a duration of 5 minutes includes 7500 frames of images in total, and if each frame of images is subjected to feature extraction, a long processing time is required, and the efficiency of image processing is reduced, in an embodiment of the present disclosure, a plurality of target images to be processed may be acquired from the video to be processed according to a preset rule, and then feature extraction is performed on the target images to be processed through a convolution layer in the second convolution unit 303, so as to acquire feature information corresponding to each target image to be processed. The preset rule may be that one image frame is acquired as a target to-be-processed image at intervals of 1s, 3s or 5s, or the to-be-processed video is divided into a plurality of scenes, one image frame is acquired from each scene as the target to-be-processed image, or the target to-be-processed image may be determined according to other rules, which is not particularly limited in the embodiment of the present disclosure.

In one embodiment of the present disclosure, fig. 6 shows a flowchart of obtaining feature information corresponding to target to-be-processed images, as shown in fig. 6, key frame extraction is performed on a to-be-processed video to obtain a plurality of target to-be-processed images KF-1, KF-2, … …, KF-V, and the second convolution unit 303 includes a plurality of convolution layers 601-1, 601-2, … …, 601-V, where the number of convolution layers is equal to the number of target to-be-processed images, and feature extraction is performed on each target to-be-processed image through each convolution layer to obtain feature information KFV-1, KFV-2, … …, KFV-V corresponding to the to-be-processed image.

In step S230, an interest weight corresponding to the image to be processed is determined according to the feature information and the historical interest information description, and a cover image is determined from the image set to be processed based on the feature information and the interest weight corresponding to the image to be processed.

In one embodiment of the present disclosure, in order to determine a cover image according to the interests of a user, it is necessary to determine the distribution of interest points of the user in each of the to-be-processed images, and if there is one or more to-be-processed images in which the distribution range of interest points of the user is the widest, it is indicated that the user is more interested in the to-be-processed image(s). Specifically, feature extraction may be performed by the first bi-directional gating circulation unit 304 shown in fig. 3 according to feature information and historical interest information description corresponding to the image to be processed, and a degree of coincidence between the feature information and the historical interest information description may be determined according to features corresponding to the feature information and features corresponding to the historical interest information description, where the degree of coincidence is an interest weight corresponding to the image to be processed, and the first bi-directional gating circulation unit 304 performs the above operation on the feature information and the historical interest information corresponding to each image to be processed, so that the interest weight corresponding to each image to be processed may be obtained and used to characterize the distribution of interest points of a user in each image to be processed.

In one embodiment of the present disclosure, after the interest weight corresponding to the feature information corresponding to the image to be processed is determined, the cover image may be determined from the set of images to be processed based on the feature information corresponding to the image to be processed and the corresponding interest weight. FIG. 7 is a schematic diagram showing a flow of acquiring a cover image, as shown in FIG. 7, after acquiring characteristic information KFV-1, KFV-2, … …, KFV-V corresponding to an image to be processed based on FIG. 6, the characteristic information is input to the first bi-directional gating circulation unit 304 simultaneously with a historical interest information description HFV output by the second bi-directional gating circulation unit 302, and the characteristic information KFV-1, KFV-2, … …, KFV-V corresponding to the image to be processed and the historical interest information description HFV are respectively processed by the first bi-directional gating circulation unit 304 to acquire interest weights gamma corresponding to the characteristic information KFV-1, KFV-2, … …, KFV-V corresponding to the image to be processed ₁ 、γ ₂ 、……、γ _N The method comprises the steps of carrying out a first treatment on the surface of the Then, carrying out weighting processing on feature information corresponding to the image to be processed according to the interest weight to obtain interest point feature information WKFV-1, WKFV-2, … … and WKFV-V; then, the interest point characteristic information WKFV-1, WKFV-2, … … and WKFV-V are input into a first full-connection layer 305, and nonlinear transformation processing is carried out on the interest point characteristic information WKFV-1, WKFV-2, … … and WKFV-V through the first full-connection layer 305 so as to obtain full-connection characteristic information corresponding to an image to be processed; finally, the full-connection characteristic information is input to the first normalization layer 306, the normalization processing is performed on the full-connection characteristic information through the first normalization layer 306, so as to obtain a probability value corresponding to the image to be processed, wherein the probability value is the possibility that the image to be processed can be used as a cover image, then the image to be processed with the maximum probability value can be obtained, and the image to be processed with the maximum probability value can be obtained The image is the cover image of the image set to be processed.

In addition, the image processing model fully mines the internal relevance between the historical interesting image of the user and the image to be processed when determining the cover image, and in addition, in the training process, besides the training model mines the internal relevance between the historical interesting image of the user and the image to be processed, the training model also identifies the category of the image set to be processed based on the characteristic information of the image to be processed, so that the model can determine the distribution of the interest points of the user in the images to be processed in different categories.

In one embodiment of the present disclosure, training of the image processing model to be trained is required to obtain a stable image processing model before using the image processing model for cover image prediction. When training an image processing model to be trained, multiple sets of training samples can be acquired, specifically, multiple sets of data of users can be acquired, each user corresponds to multiple sets of training samples, each set of training samples comprises a video sample, a video category sample, a video cover sample and a historical interesting image sample, fig. 8 shows the composition of the training samples, as shown in fig. 8, there are multiple sets of training samples, wherein the first set of training samples specifically is: a segment of the Xianjianqi knight-errant transmission, a television series, an image containing Zhao Linger, a historical interesting image sample of the user; the second set of training samples is specifically: a segment of the limiting challenge, a variety, an image containing a plurality of stars, a historical image sample of interest for the user; the last set of training samples is specifically: a fragment of today's parlance, legal program, image containing program host, historical interesting image sample of user.

In one embodiment of the present disclosure, after the training samples are obtained, the image processing model to be trained may be trained according to the video samples, the video category samples, the video cover samples, and the historical image samples of interest in each set of training samples, so as to obtain the image processing model. Fig. 9 shows a schematic structural diagram of an image processing model to be trained, as shown in fig. 9, the image processing model to be trained 900 includes a first training branch 901, a second training branch 902 and a third training branch 903, where the first training branch 901 includes a first convolution unit to be trained 904, a second bi-directional gating loop unit to be trained 905, and the second training branch 902 includes a second convolution unit to be trained 906, a first bi-directional gating loop unit to be trained 907, a first full connection layer to be trained 908 and a first normalization layer to be trained 909; the third training leg 903 comprises a second full connection layer 910 to be trained and a second normalization layer 911 to be trained.

In one embodiment of the present disclosure, based on the schematic structural diagram of the image processing model to be trained shown in fig. 9, fig. 10 shows a schematic flow chart for training the image processing model to be trained, where the flow at least includes steps S1001 to S1003, specifically:

In step S1001, feature extraction is performed on a historical interesting image sample through a first training branch, so as to obtain an interesting distribution information description corresponding to the historical interesting image sample; extracting features of the video sample through a second training branch, and acquiring recommended image information corresponding to the video sample based on the extracted features and the interest distribution information description; and extracting the characteristics of the video sample through a third training branch so as to acquire category information corresponding to the video sample.

In one embodiment of the present disclosure, the historical interesting image samples may be cover images of videos that have been clicked by the user, and the number of the cover images may be multiple, and accordingly, multiple convolution layers may be set in the first convolution unit to be trained 904 for performing feature extraction on the historical interesting image samples, where the number of the convolution layers is the same as the number of the historical interesting image samples. When the first training branch is used for carrying out feature extraction on the historical interesting image samples to obtain interesting distribution information descriptions corresponding to the historical interesting image samples, the feature extraction can be specifically carried out on each of the historical interesting image samples HFS-1, HFS-2, … … and HFS-N through a convolution layer in the first convolution unit 904 to be trained to obtain interesting distribution feature information HFSV-1, HFSV-2, … … and HFSV-N corresponding to each of the historical interesting image samples; and then, inputting the interest distribution characteristic information corresponding to each historical interest image sample into a second bidirectional gating circulating unit 905 to be trained, and processing the interest distribution characteristic information through the second bidirectional gating circulating unit 905 to be trained so as to acquire an interest distribution information description HFSV.

In one embodiment of the present disclosure, when feature extraction is performed on a video sample through a second training branch, and recommended image information corresponding to the video sample is obtained based on the extracted feature and interest distribution feature information, feature extraction may be performed on each image sample PS-1, PS-2, … …, PS-V in the image set sample through a convolution layer in the second convolution unit 906 to be trained, so as to obtain image sample feature information PSV-1, PSV-2, … …, PSV-V; then inputting the image sample characteristic information PSV-1, PSV-2, … …, PSV-V and interest distribution information description HFSV to a first bidirectional gating circulation unit 907 to be trained, and carrying out characteristic extraction on the image sample characteristic information and the interest distribution information description by the first bidirectional gating circulation unit 907 to be trained so as to acquire an interest distribution weight; then acquiring a plurality of weighted feature information WPSV-1, WPSV-2, … … and WPSV-V according to the feature information of the image sample and the interest distribution weight; finally, the first full-connection layer 908 to be trained performs nonlinear transformation processing on the plurality of weighted feature information, and the first normalization layer 909 to be trained performs normalization processing on the weighted feature information after the nonlinear transformation processing, so as to obtain recommended image information. The specific implementation details of each step are the same as those of the embodiment for acquiring the cover image, and are not repeated here.

In one embodiment of the present disclosure, considering the size of the number of images and the image processing efficiency, when feature extraction is performed on a video sample through a second training branch, and recommended image information corresponding to the video sample is obtained based on the extracted feature and interest distribution feature information, a plurality of target image samples may be first determined from the video sample according to a preset rule; then, feature extraction is carried out on each target image sample through a convolution layer in the second convolution unit 906 to be trained so as to acquire image sample feature information; then inputting the image sample characteristic information and the interest distribution information description into a first bidirectional gating circulation unit 907 to be trained, and extracting the characteristics of the image sample characteristic information and the interest distribution information description through the first bidirectional gating circulation unit 907 to be trained so as to acquire an interest distribution weight; then acquiring a plurality of weighted feature information according to the feature information of the image sample and the interest distribution weight; finally, the first full-connection layer 908 to be trained performs nonlinear transformation processing on the plurality of weighted feature information, and the first normalization layer 909 to be trained performs normalization processing on the weighted feature information after the nonlinear transformation processing, so as to obtain recommended image information. The preset rule may specifically be that an image frame is obtained from a video sample at a certain time interval as a target image sample, or one or more frames may be selected from each scene included in the video sample as a target image sample, or other rules, which are not specifically limited in the embodiment of the present disclosure.

In one embodiment of the present disclosure, when the feature extraction is performed on the video sample through the third training branch to obtain the category information corresponding to the video sample, the image sample feature information output by the second convolution unit 906 to be trained may be obtained first; then, adding and averaging the image sample characteristic information to obtain a video characteristic description corresponding to the video sample; finally, the video feature description is subjected to nonlinear transformation processing through the second full-connection layer 910 to be trained, and the video feature description subjected to nonlinear transformation processing is subjected to normalization processing through the second normalization layer 911 to be trained, so that category information corresponding to the video sample is obtained.

In one embodiment of the present disclosure, the image processing model 900 to be trained is a multi-task learning model, where the primary learning task is to obtain recommended image information, the secondary learning task is to identify a category of an image set, taking a recommended video cover as an example, the primary learning task is to recommend a video cover, the secondary learning task is to identify a category of a video, and determine whether the category of the video is a television (movie), a process, a financial, a legal, military, agriculture, or the like. The recognition learning task for the video category can enable the image processing model to be trained to extract corresponding characteristic information for videos of different categories more pertinently, complementary information is provided for the main learning task, the main learning task is combined with the complementary information of the auxiliary learning task, the internal connection between the historical interesting images of the user and the images to be processed is mined pointedly, and the user interest point distribution of the images of different categories is constructed.

In one embodiment of the present disclosure, the main learning task and the auxiliary learning task share the second convolution unit 903 of the image processing model to be trained and then divide into two learning branches, so that the structure of the image processing model to be trained is simpler and easy to train.

In step S1002, a first loss function is determined from the recommended image information and the video cover sample, and a second loss function is determined from the category information and the video category sample.

In one embodiment of the present disclosure, after the recommended image information generated by the second training branch and the category information corresponding to the video sample generated by the third training branch are obtained, the first loss function loss-1 may be determined according to the recommended image information and the video cover sample, the second loss function loss-2 may be determined according to the category information and the video category sample, and the first loss function loss-1 and the second loss function loss-2 may be the same loss function or different loss functions. After the first loss function loss-1 and the second loss function loss-2 are obtained, they may be weighted to obtain a model loss function loss, specifically loss=αloss-1+βloss-2, where α and β are parameters of the image processing model to be trained.

In step S1003, the image processing model is acquired by adjusting the parameters of the image processing model to be trained so as to minimize the weighted sum of the first loss function and the second loss function.

In one embodiment of the present disclosure, parameters of an image processing model to be trained are adjusted, the image processing model to be trained is updated, when a model loss function loss is minimized, it indicates that training of the image processing model to be trained is completed, and the image processing model to be trained can be used as a stable image processing model to process an image set to be processed to determine a cover image.

In one embodiment of the present disclosure, when training an image processing model to be trained and recommending a cover image by using the image processing model, the first fully connected layer to be trained, and the second fully connected layer to be trained are all used for performing nonlinear transformation processing on input feature information, fig. 11 shows a processing frame diagram of the fully connected layer, as shown in fig. 11, the frame includes an input layer 1101, a fully connected neuron layer 1102, and an output layer 1103, the input layer 1101 sends received information, such as feature information of interest, weighted feature information, and feature description of an image set, to the fully connected neuron layer 1102, and the fully connected neuron layer 1102 performs nonlinear transformation on the received information, specifically, transforms according to a calculation formula (1):

Z＝f(w ₁ X+b ₁ ) (1)

Wherein f () is an activation function, w ₁ As a weight matrix, b ₁ For the bias constant, X is the received information and Z is the full connection characteristic information. After Z is obtained, Z is output to the normalization layer connected to the full connection layer through the output layer 1103 to perform normalization processing.

In one embodiment of the disclosure, the first normalization layer to be trained, and the second normalization layer to be trained are softmax layers, which can convert the result output by the full-connection layer into the probability that each image is recommended to be a cover image, or the probability that the video corresponds to each category. Specifically, the calculation can be performed according to a calculation formula (2):

wherein z is _j For the j-th sub-feature information in the full-connection feature information, K is the total number of sub-feature information contained in the full-connection feature information, and sigma (z) _j For probability values corresponding to the images to be processed, i.e. the images are recommended as a summary of the cover imagesThe rate, which is also the probability that the video corresponds to each category.

In one embodiment of the disclosure, an image processing model to be trained is divided into a main learning task and an auxiliary learning task, the main learning task is cover image recommendation, the auxiliary learning task is video category identification, the auxiliary learning task can provide complementary information for the main learning task, internal relevance among historical information is mined when the historical interest information of a user is extracted, interest point distribution of the user on the image to be processed is built based on the obtained relevance information and the complementary information of the auxiliary learning task, i.e. the interest point weight is obtained, and then the image to be processed is reversely weighted to obtain data information containing the interest point distribution of the user, so that the image processing model is facilitated to more accurately recommend personalized cover images.

The image processing method in the embodiment of the disclosure can be applied to a plurality of scenes, for example, after a user makes a video or an atlas, a cover image can be recommended for the video or the atlas made by the user through the method in the embodiment of the disclosure; when a video interaction platform or other platform for playing videos and showing an atlas pushes videos and atlas to users, different cover images can be recommended for the videos or atlas to be pushed for different users through the method of the embodiment of the disclosure, and then the videos or atlas with different cover images are pushed to different users, so that the click rate of the videos or atlas is improved.

Taking an example of pushing an atlas to different users by an atlas display platform, firstly, data generated when a target user logs in the atlas display platform can be collected, the data mainly comprises historical interesting images of the target user, and the historical interesting images can be cover images of the atlas watched by the target user for the past time; after the historical interesting image of the target user is acquired, the historical interesting image and the to-be-pushed atlas can be input into an image processing model, and the historical interesting image and the to-be-pushed atlas are respectively processed through the image processing model so as to determine one or more images from the to-be-pushed atlas as the cover image of the to-be-pushed atlas. Specifically, a first convolution unit in an image processing model is used for extracting characteristics of a historical interest image of a target user, and a second bidirectional gating circulation unit is used for acquiring historical interest information description of the target user according to the extracted characteristics; simultaneously, extracting features of the images in the image set to be pushed through a second convolution unit to acquire image feature information; then, the interest weight corresponding to each image can be determined through the first bidirectional gating circulation unit according to the image characteristic information of each image and the historical interest information description of the target user, wherein the interest weight is the coincidence degree of the image and the historical interest information description of the target user; then weighting the image characteristic information of each image according to the interest weight to acquire interest point characteristic information corresponding to each image; and finally, carrying out nonlinear transformation processing on the interest point characteristic information corresponding to each image through a first full-connection layer, and carrying out normalization processing on the interest point characteristic information subjected to the nonlinear transformation processing through a first normalization layer so as to acquire probability values of each image which can be used as a cover image, wherein the image with the maximum probability value is the cover image. Because the historical interest information description of the target user is introduced in the process of determining the cover image, that is, the inherent associated information of the historical interest image of the target user and each image in the image set to be pushed is deeply mined in the process of determining the cover image, the finally determined cover image is the image which best accords with the interest of the target user, and if the atlas with the cover image is pushed to the target user, the target user can click to watch with high probability, so that the click rate is improved.

The image processing model in the embodiment of the disclosure can perform multi-task learning, wherein learning tasks for identifying the categories of the image set to be processed can provide complementary information for learning tasks for recommending cover images to the image set to be processed, meanwhile, historical interesting information description determined according to historical interesting images of users is introduced, so that the model can mine inherent relevance between the historical interesting images of the users and the images in the image set to be processed, and the interest distribution of the users on the image set to be processed is constructed according to the characteristic information identified by the categories, and further, recommendation of personalized cover images can be accurately performed for different users, and the click rate of the image set to be processed is further improved.

The following describes an embodiment of an apparatus of the present disclosure that may be used to perform the image processing method in the above-described embodiment of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the image processing method described above in the present disclosure.

Fig. 12 schematically illustrates a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

Referring to fig. 12, an image processing apparatus 1200 according to an embodiment of the present disclosure includes: an acquisition module 1201, a feature extraction module 1202, and an image determination module 1203.

The acquiring module 1201 is configured to acquire a to-be-processed image set and a history image set, where the history image set includes a history interesting image of a user; a feature extraction module 1202, configured to perform feature extraction on the historical interest image through an image processing model, and acquire a historical interest information description based on the extracted feature; simultaneously, extracting features of the images to be processed in the image set to be processed through the image processing model so as to obtain feature information corresponding to the images to be processed; an image determining module 1203 is configured to determine an interest weight corresponding to the image to be processed according to the feature information and the historical interest information description, and determine a cover image from the image set to be processed based on the feature information and the interest weight corresponding to the image to be processed.

In one embodiment of the present disclosure, the image processing model includes a first bi-directional gating loop unit; the image determination module 1203 is configured to: inputting the characteristic information and the historical interest information description into the first bidirectional gating circulation unit, and extracting the characteristic information and the historical interest information description through the first bidirectional gating circulation unit; and determining the coincidence ratio between the characteristic information and the historical interest information description according to the characteristic corresponding to the characteristic information and the characteristic corresponding to the historical interest information description, and taking the coincidence ratio as the interest weight.

In one embodiment of the present disclosure, the image processing model includes a first fully connected layer and a first normalized layer; the image determination module 1203 includes: the weighting unit is used for carrying out weighting processing on the feature information according to the interest weight so as to acquire the feature information of the interest point; the full-connection unit is used for inputting the feature information of the interest points to the first full-connection layer, and performing nonlinear transformation processing on the feature information of the interest points through the first full-connection layer so as to acquire the feature information of the full-connection corresponding to the image to be processed; the normalization unit is used for inputting the full-connection characteristic information to the first normalization layer, and carrying out normalization processing on the full-connection characteristic information through the first normalization layer so as to acquire a probability value corresponding to the image to be processed; and the image acquisition unit is used for acquiring the image to be processed with the maximum probability value and taking the image to be processed with the maximum probability value as the cover image.

In one embodiment of the present disclosure, the fully connected unit is configured to: and (3) carrying out nonlinear transformation processing on the interest point characteristic information according to a formula (1):

Z＝f(w ₁ X+b ₁ ) (1)

In one embodiment of the present disclosure, the image processing model includes a first convolution unit and a second bi-directional gating loop unit; the feature extraction module 1202 is configured to: extracting features of the historical interesting image through the first convolution unit to obtain interesting image feature information; and inputting the interested image characteristic information into the second bi-directional gating circulation unit, and extracting the characteristics of the interested image characteristic information through the second bi-directional gating circulation unit so as to acquire the historical interest information description.

In one embodiment of the present disclosure, the image processing model includes a second convolution unit; the feature extraction module 1202 includes: and the first feature extraction unit is used for carrying out feature extraction on the image to be processed through the second convolution unit so as to acquire feature information corresponding to the image to be processed.

In one embodiment of the present disclosure, the first feature extraction unit is configured to: acquiring a plurality of target images to be processed from the set of images to be processed according to a preset rule; and extracting the characteristics of each target to-be-processed image through the second convolution unit so as to acquire characteristic information corresponding to each target to-be-processed image.

In one embodiment of the present disclosure, the image processing apparatus 1200 is further configured to: acquiring a video to be processed, and analyzing the video to be processed to acquire an image frame; and forming the image set to be processed according to all the image frames in the video to be processed.

In one embodiment of the present disclosure, the image processing apparatus 1200 further includes: the sample acquisition module is used for acquiring a plurality of groups of training samples, wherein each group of training samples comprises a video sample, a video category sample, a video cover sample and a historical interesting image sample; and the model training model is used for training an image processing model to be trained according to the video sample, the video category sample, the video cover sample and the historical interesting image sample so as to acquire the image processing model.

In one embodiment of the present disclosure, the image processing model to be trained includes a first training branch, a second training branch, and a third training branch; the model training module comprises: the second feature extraction unit is used for carrying out feature extraction on the historical interesting image sample through the first training branch so as to obtain interest distribution information description corresponding to the historical interesting image sample; extracting features of the video sample through the second training branch, and acquiring recommended image information corresponding to the video sample based on the extracted features and the interest distribution feature information; extracting characteristics of the video sample through the third training branch so as to obtain category information corresponding to the video sample; a loss function determining unit, configured to determine a first loss function according to the recommended image information and the video cover sample, and determine a second loss function according to the category information and the video category sample; and the parameter adjusting unit is used for adjusting parameters of the image processing model to be trained so as to minimize the weighted sum of the first loss function and the second loss function and acquire the image processing model.

In one embodiment of the present disclosure, the first training branch includes a first convolution unit to be trained and a second bi-directional gating cycle unit to be trained; the second feature extraction unit is configured to: extracting features of the historical interesting image samples through a convolution layer in the first convolution unit to be trained to obtain interesting distribution feature information corresponding to the historical interesting image samples; and inputting the interest distribution characteristic information corresponding to the historical interest image sample into the second bidirectional gating circulation unit to be trained, and processing the interest distribution characteristic information through the second bidirectional gating circulation unit to be trained so as to acquire the interest distribution information description.

In one embodiment of the present disclosure, the second training branch includes a second convolution unit to be trained, a first bi-directional gating cycle unit to be trained, a first full connection layer to be trained, and a first normalization layer to be trained; the second feature extraction unit includes: the first feature extraction subunit is used for extracting features of the image samples in the video samples through the convolution layer in the second convolution unit to be trained so as to acquire feature information of the image samples; the second feature extraction subunit is used for inputting the feature information of the image sample and the interest distribution information description into the first bidirectional gating circulation unit to be trained, and extracting features of the feature information of the image sample and the interest distribution information description through the first bidirectional gating circulation unit to be trained so as to acquire an interest distribution weight; the weighting unit is used for acquiring a plurality of weighted characteristic information according to the image sample characteristic information and the interest distribution weight; and the post-processing unit is used for carrying out nonlinear transformation processing on the weighted characteristic information through the first full-connection layer to be trained, and carrying out normalization processing on the weighted characteristic information after the nonlinear transformation processing through the first normalization layer to be trained so as to acquire the recommended image information.

In one embodiment of the present disclosure, the first feature extraction subunit is configured to: determining a plurality of target image samples from the image set samples according to a preset rule; and extracting the characteristics of each target image sample through a convolution layer in the second convolution unit to be trained so as to acquire the characteristic information of the image sample.

In one embodiment of the present disclosure, the third training branch includes a second full connection layer to be trained and a second normalization layer to be trained; the second feature extraction unit is configured to: acquiring the characteristic information of the image sample output by the second convolution unit to be trained; adding and averaging the image sample feature information to obtain a video feature description corresponding to the video sample; and carrying out nonlinear transformation processing on the video feature description through the second full-connection layer to be trained, and carrying out normalization processing on the video feature description after the nonlinear transformation processing through the second normalization layer to be trained so as to acquire category information corresponding to the video sample.

It should be noted that, the computer system 1300 of the electronic device shown in fig. 13 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 13, the computer system 1300 includes a central processing unit (Central Processing Unit, CPU) 1301, which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1302 or a program loaded from a storage portion 1308 into a random access Memory (Random Access Memory, RAM) 1303, implementing the image labeling method described in the above embodiment. In the RAM 1303, various programs and data required for the system operation are also stored. The CPU 1301, ROM 1302, and RAM 1303 are connected to each other through a bus 1304. An Input/Output (I/O) interface 1305 is also connected to bus 1304.

The following components are connected to the I/O interface 1305: an input section 1306 including a keyboard, a mouse, and the like; an output portion 1307 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, a speaker, and the like; a storage portion 1308 including a hard disk or the like; and a communication section 1309 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1309 performs a communication process via a network such as the internet. The drive 1310 is also connected to the I/O interface 1305 as needed. Removable media 1311, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is installed as needed on drive 1310 so that a computer program read therefrom is installed as needed into storage portion 1308.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1309 and/or installed from the removable medium 1311. When executed by a Central Processing Unit (CPU) 1301, performs the various functions defined in the system of the present disclosure.

It should be noted that, the computer readable medium shown in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present disclosure also provides a computer-readable medium that may be contained in the image processing apparatus described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

acquiring a to-be-processed image set and a historical image set, wherein the historical image set comprises historical interesting images of a user;

extracting features of the historical interesting images through an image processing model, and acquiring historical interesting information description based on the extracted features; simultaneously, extracting features of the images to be processed in the image set to be processed through the image processing model so as to obtain feature information corresponding to the images to be processed;

And determining the coincidence degree between the characteristic information and the historical interest information description according to the characteristic information and the historical interest information description, taking the coincidence degree as an interest weight corresponding to the image to be processed, and determining a cover image from the image set to be processed based on the characteristic information corresponding to the image to be processed and the interest weight.

2. The image processing method according to claim 1, wherein the image processing model includes a first bi-directional gating loop unit;

the determining the coincidence degree between the characteristic information and the historical interest information description according to the characteristic information and the historical interest information description comprises the following steps:

inputting the characteristic information and the historical interest information description into the first bidirectional gating circulation unit, and extracting the characteristic information and the historical interest information description through the first bidirectional gating circulation unit;

and determining the coincidence ratio between the characteristic information and the historical interest information description according to the characteristic corresponding to the characteristic information and the characteristic corresponding to the historical interest information description.

3. The image processing method according to claim 1, wherein the image processing model includes a first full-connection layer and a first normalization layer;

The determining a cover image from the set of images to be processed based on the feature information corresponding to the images to be processed and the interest weight includes:

weighting the feature information according to the interest weight to acquire feature information of the interest points;

inputting the feature information of the interest points to the first full-connection layer, and performing nonlinear transformation processing on the feature information of the interest points through the first full-connection layer to obtain the full-connection feature information corresponding to the image to be processed;

inputting the full-connection characteristic information to the first normalization layer, and carrying out normalization processing on the full-connection characteristic information through the first normalization layer so as to obtain a probability value corresponding to the image to be processed;

and acquiring the image to be processed with the maximum probability value, and taking the image to be processed with the maximum probability value as the cover image.

4. The image processing method according to claim 3, wherein,

the performing nonlinear transformation processing on the feature information of the interest point through the first full-connection layer includes:

and (3) carrying out nonlinear transformation processing on the interest point characteristic information according to a formula (1):

Z＝f(w ₁ X+b ₁ ) (1)

the normalizing processing of the full-connection feature information through the first normalizing layer comprises the following steps:

and (3) carrying out normalization processing on the full-connection characteristic information according to a formula (2):

5. The image processing method according to claim 1, wherein the image processing model includes a first convolution unit and a second bi-directional gating loop unit;

the method for extracting the characteristics of the historical interesting image through the image processing model and acquiring the historical interesting information description based on the extracted characteristics comprises the following steps:

extracting features of the historical interesting image through the first convolution unit to obtain interesting image feature information;

and inputting the interested image characteristic information into the second bi-directional gating circulation unit, and extracting the characteristics of the interested image characteristic information through the second bi-directional gating circulation unit so as to acquire the historical interest information description.

6. The image processing method according to claim 1, wherein the image processing model includes a second convolution unit;

the feature extraction of the to-be-processed image in the to-be-processed image set through the image processing model to obtain feature information corresponding to the to-be-processed image includes:

acquiring a plurality of target images to be processed from the set of images to be processed according to a preset rule;

and extracting the characteristics of each target to-be-processed image through the second convolution unit so as to acquire characteristic information corresponding to each target to-be-processed image.

7. The image processing method according to claim 1, characterized in that the method further comprises:

acquiring a video to be processed, and analyzing the video to be processed to acquire an image frame;

and forming the image set to be processed according to all the image frames in the video to be processed.

8. The image processing method according to claim 1 or 7, characterized in that the method further comprises:

obtaining a plurality of groups of training samples, wherein each group of training samples comprises a video sample, a video category sample, a video cover sample and a historical interesting image sample;

Training an image processing model to be trained according to the video sample, the video category sample, the video cover sample and the historical interesting image sample to obtain the image processing model.

9. The image processing method according to claim 8, wherein the image processing model to be trained includes a first training branch, a second training branch, and a third training branch;

the training the image processing model to be trained according to the video sample, the video category sample, the video cover sample and the historical interesting image sample to obtain the image processing model comprises the following steps:

extracting features of the historical interesting image sample through the first training branch so as to obtain interesting distribution information description corresponding to the historical interesting image sample; extracting features of the video sample through the second training branch, and acquiring recommended image information corresponding to the video sample based on the extracted features and the interest distribution information description; extracting characteristics of the video sample through the third training branch so as to obtain category information corresponding to the video sample;

Determining a first loss function according to the recommended image information and the video cover sample, and determining a second loss function according to the category information and the video category sample;

and adjusting parameters of the image processing model to be trained so as to minimize a weighted sum of the first loss function and the second loss function, and acquiring the image processing model.

10. The image processing method according to claim 9, wherein the first training branch comprises a first convolution unit to be trained and a second bi-directional gating loop unit to be trained;

the feature extraction of the historical interesting image sample through the first training branch to obtain the interesting distribution information description corresponding to the historical interesting image sample includes:

extracting features of the historical interesting image samples through a convolution layer in the first convolution unit to be trained to obtain interesting distribution feature information corresponding to the historical interesting image samples;

and inputting the interest distribution characteristic information corresponding to the historical interest image sample into the second bidirectional gating circulation unit to be trained, and processing the interest distribution characteristic information through the second bidirectional gating circulation unit to be trained so as to acquire the interest distribution information description.

11. The image processing method according to claim 10, wherein the second training branch comprises a second convolution unit to be trained, a first bi-directional gating cycle unit to be trained, a first full connection layer to be trained, and a first normalization layer to be trained;

the feature extraction of the video sample through the second training branch, and the obtaining of recommended image information corresponding to the video sample based on the extracted feature and the interest distribution information description, includes:

extracting features of the image samples in the video samples through the convolution layers in the second convolution unit to be trained to obtain feature information of the image samples;

inputting the image sample characteristic information and the interest distribution information description into the first bidirectional gating circulation unit to be trained, and extracting the characteristics of the image sample characteristic information and the interest distribution information description through the first bidirectional gating circulation unit to be trained so as to acquire an interest distribution weight;

acquiring a plurality of weighted feature information according to the image sample feature information and the interest distribution weight;

and carrying out nonlinear transformation processing on the weighted characteristic information through the first full-connection layer to be trained, and carrying out normalization processing on the weighted characteristic information after the nonlinear transformation processing through the first normalization layer to be trained so as to acquire the recommended image information.

12. The image processing method according to claim 11, wherein the third training branch includes a second full connection layer to be trained and a second normalization layer to be trained;

the feature extraction of the video sample through the third training branch to obtain category information corresponding to the video sample includes:

acquiring the characteristic information of the image sample output by the second convolution unit to be trained;

adding and averaging the image sample feature information to obtain a video feature description corresponding to the video sample;

and carrying out nonlinear transformation processing on the video feature description through the second full-connection layer to be trained, and carrying out normalization processing on the video feature description after the nonlinear transformation processing through the second normalization layer to be trained so as to acquire category information corresponding to the video sample.

13. An image processing apparatus, comprising:

the acquisition module is used for acquiring an image set to be processed and a historical image set, wherein the historical image set comprises a historical interesting image of a user;

the feature extraction module is used for extracting features of the historical interesting images through an image processing model and acquiring historical interesting information description based on the extracted features; simultaneously, extracting features of the images to be processed in the image set to be processed through the image processing model so as to obtain feature information corresponding to the images to be processed;

And the image determining module is used for determining the coincidence ratio between the characteristic information and the historical interest information description according to the characteristic information and the historical interest information description, taking the coincidence ratio as an interest weight corresponding to the image to be processed, and determining the cover image from the image set to be processed based on the characteristic information and the interest weight corresponding to the image to be processed.

14. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the image processing method of any one of claims 1-12.

15. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to perform the image processing method of any of claims 1 to 12.