CN112766285B

CN112766285B - Image sample generation method and device and electronic equipment

Info

Publication number: CN112766285B
Application number: CN202110107446.4A
Authority: CN
Inventors: 周杰; 刘畅; 王长虎
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2024-03-19
Anticipated expiration: 2041-01-26
Also published as: CN112766285A

Abstract

The embodiment of the invention discloses an image sample generation method, an image sample generation device and electronic equipment. One embodiment of the method comprises the following steps: importing a first image sample into a pre-trained classification model, wherein the classification model comprises at least one feature extraction layer, and the classification model is used for representing the corresponding relation between the image and a predefined type; determining a salient region in the first image sample according to the feature map output by the feature extraction layer; and establishing a corresponding relation between the saliency region label indicating the saliency region and the first image sample, wherein the saliency region label and the first image sample are used for training to obtain a saliency detection model. Thus, a new image sample generation method can be provided.

Description

Image sample generation method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of internet, and in particular relates to an image sample generation method, an image sample generation device and electronic equipment.

Background

With the development of the internet, users increasingly use terminal devices to browse various information. The human visual system has the ability to quickly search and locate objects of interest when faced with natural scenes, a visual attention mechanism that is an important mechanism for processing visual information in people's daily lives. With the spread of large data volumes brought by the internet, how to quickly obtain important information from massive image and video data has become a key problem in the field of computer vision.

Disclosure of Invention

This disclosure is provided in part to introduce concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, an embodiment of the present disclosure provides an image sample generation method, including: importing a first image sample into a pre-trained classification model, wherein the classification model comprises at least one feature extraction layer, and the classification model is used for representing the corresponding relation between the image and a predefined type; determining a salient region in the first image sample according to the feature map output by the feature extraction layer; and establishing a corresponding relation between the saliency region label indicating the saliency region and the first image sample, wherein the saliency region label and the first image sample are used for training to obtain a saliency detection model.

In a second aspect, an embodiment of the present disclosure provides an image sample generation apparatus, including: an importing unit, configured to import a first image sample into a pre-trained classification model, where the classification model includes at least one feature extraction layer, and the classification model is used to characterize a correspondence between an image and a predefined type; the determining unit is used for determining a salient region in the first image sample according to the feature map output by the feature extraction layer; the establishing unit is used for establishing a corresponding relation between the saliency area label indicating the saliency area and the first image sample, wherein the saliency area label and the first image sample are used for training to obtain a saliency detection model.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the image sample generation method as described in the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the steps of the image sample generation method according to the first aspect.

According to the image sample generation method, the image sample generation device and the electronic equipment, the first image sample is processed by adopting the classification model, the saliency area is determined by taking the intermediate process image (feature map) of the classification model as a basis, and the saliency area in the first image sample can be determined by utilizing the feature extraction capability of the classification model. In other words, the training of the completed classification model can distinguish the primary content from the secondary content of the image and highlight the primary content of the image during feature extraction. Therefore, the method for automatically determining the saliency area in the image by the computer can be provided, so that the sample generation difficulty of the saliency detection model is reduced, and the sample generation cost is reduced.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of one embodiment of an image sample generation method according to the present disclosure;

FIG. 2 is a flow chart of an exemplary implementation according to the present disclosure;

FIG. 3 is a schematic illustration of an application scenario according to the present disclosure;

FIG. 4 is a schematic structural view of one embodiment of an image sample generation device according to the present disclosure;

FIG. 5 is an exemplary system architecture to which an image sample generation method of one embodiment of the present disclosure may be applied;

fig. 6 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Referring to fig. 1, a flow of one embodiment of an image sample generation method according to the present disclosure is shown. The image sample generation method as shown in fig. 1 includes the steps of:

step 101, a first image sample is imported into a pre-trained classification model.

In this embodiment, the execution subject (e.g., server) of the image sample generation method may import the first image sample into a pre-trained classification model.

In this embodiment, the classification model may include at least one feature extraction layer.

Here, the classification model may be used to classify the image into which the classification model is introduced, and the classification result may be a predefined type.

In this embodiment, a feature extraction layer may be used to extract image features. Here, the feature image output by the feature extraction layer may be referred to as a feature map.

In this embodiment, the classification model may be constructed based on a neural network. The classification model can comprise a feature extraction layer, a pooling layer, a fully connected layer and the like. The specific structure of the classification model can be set according to the actual application scene, and is not limited herein.

In this embodiment, the classification model is used to characterize the correspondence between the image and the predefined type.

In this embodiment, the predefined type may be a predefined type. The specific content, the number of types and the representation mode of the predefined types can be set according to the actual application scene, and are not limited herein.

Step 102, determining a salient region in the first image sample according to the feature map output by the feature extraction layer.

In this embodiment, a feature map (feature map) may be used to characterize the features of the first image sample.

In some related art, visual saliency (Visual Attention Mechanism, VA, visual attention mechanism) refers to the human automatically processing regions of interest, referred to as saliency regions, while selectively ignoring non-regions of interest when facing a scene. For pictures with unknown saliency areas, saliency detection can be performed to predict areas that people may be interested in as saliency areas.

And step 103, establishing a corresponding relation between the saliency area label indicating the saliency area and the first image sample.

In this embodiment, the above saliency region label may indicate a saliency region in the first image sample.

In this embodiment, the correspondence between the salient region label and the first image sample is established, which can be understood as adding the salient region label to the first image sample.

In this embodiment, the saliency region tag and the first image sample are used for training to obtain a saliency detection model.

It should be noted that, in the image sample generating method provided in this embodiment, the salient region is determined by using the intermediate process image (feature map) of the classification model as a basis by processing the first image sample with the classification model, and the salient region in the first image sample can be determined by using the feature extraction capability of the classification model. In other words, the training of the completed classification model can distinguish the primary content from the secondary content of the image and highlight the primary content of the image during feature extraction. Therefore, the method for automatically determining the saliency area in the image by the computer can be provided, so that the sample generation difficulty of the saliency detection model is reduced, and the sample generation cost is reduced.

In some embodiments, the step 102 may include: determining a region with the response score rate larger than a preset response score rate threshold value from the feature map; from the determined regions, a salient region is determined.

Here, the pixel value of each pixel in the feature map may be referred to as a response score.

Here, the determined response score of the region may be a sum of response scores of pixels in the region.

Here, the response score rate may be a ratio between the response score of the region and the total score of the response in the feature map.

Here, the area where the response score rate meets the above requirement (greater than the preset response score rate threshold) may be determined in various manners.

Here, the number of the determined regions may be several, and then the salient region may be determined from among the several regions. For example, from a plurality of satisfactory regions, a region with the smallest area may be selected as the salient region.

It should be noted that, from the feature map, the saliency region is determined according to the response score, so that the area of the saliency region can be as small as possible while the response score of the saliency region is ensured to be relatively large, and therefore, the saliency region with the area as small as possible is obtained, and the accuracy of the saliency region can be improved.

In some embodiments, the classification model may be trained through a first step.

In this embodiment, the first step may include step 201 and step 202.

Step 201, the second image sample is imported into the initial classification model to obtain a classification result.

Here, the initial classification model may be constructed according to an actual application scenario. The initial classification model may output a classification result for the second image sample.

Here, the initial classification model may be an untrained or an untrained neural network.

Here, the second image sample has a type tag, which may indicate an image click rate.

Alternatively, the image click rate may be represented in the form of a continuous number or in the form of a click rate level. For example, two levels of high click rate and low click rate may be employed as type tags, and the division basis of the high and low may be various.

Step 202, adjusting an initial classification model based on the classification result and the type label of the second image sample.

Here, the type tag of the second image sample may indicate an image click rate.

Here, the determination of the image click rate may be performed in various ways, which are not limited herein. For example, the image click rate may be determined from historical search presentation results.

It should be noted that, by adopting the type tag indicating the click rate of the image, the initial classification model is trained to obtain the classification model, so that the association relationship between the click rate of the image and the attention of the user can be utilized, and the feature extraction capability of the classification model for the image which the user pays attention to can be improved. In other words, the purpose of the saliency area detection is to find out an image area that the user may be interested in, and the click rate of the user can reflect the attention degree of the user to the image to a certain extent. Therefore, the classification model obtained by training the type label indicating the click rate of the image can be used for accurately extracting the area possibly concerned by the user, so that the accuracy of determining the salient area is improved.

In some embodiments, the second image sample and the corresponding type tag may be generated by a second step. Here, the second step may include: leading the candidate images into a click model to obtain estimated click rate; and determining a second image sample from the candidate images according to the estimated click rate, and generating a type label corresponding to the second image sample.

Here, the number of candidate images may be plural. The click model can obtain the estimated click rate corresponding to each candidate image according to each candidate image.

Here, the Click Model (Click Model) is modeling of the user's Click behavior. According to the historical click information of the user, modeling is conducted on interests and behaviors of the user so as to predict future click behaviors of the user and improve correlation.

It should be noted that, the click model is used to select the second image sample, and the type label of the second image sample is generated according to the estimated click rate, so that the prediction capability of the click model on the attention degree of the user can be utilized to predict the second image sample which is not displayed, thereby expanding the candidate image range. On the other hand, the click model is adopted to select the second image sample and generate the type label, so that the manual processing amount can be reduced, the processing speed can be improved, and the time and the labor cost can be saved.

In some embodiments, the determining the second image sample from the candidate image according to the estimated click rate and generating the type tag corresponding to the second image sample may include: determining candidate images with estimated click rate larger than a first click rate threshold as second candidate images corresponding to the first type labels; and determining the candidate images with estimated click rate smaller than a second click rate threshold as second candidate images corresponding to the second type labels.

Here, the first click rate threshold may be not less than the second click rate threshold. In other words, the first click rate threshold may be greater than or equal to the second click rate threshold. The determination of the first click rate threshold and the second click rate threshold may be set according to an actual application scenario, which is not limited herein. As an example, candidate images corresponding to an estimated click rate greater than the first click rate threshold may account for 30% of the total number of candidate images; the candidate images corresponding to the estimated click rate less than the second click rate threshold may account for 30% of the total number of candidate images.

Here, the first type of tag may indicate a higher click rate. The second type of tag may indicate a lower click rate.

It should be noted that, by adopting the mode of setting the click rate threshold and determining the two types of labels, the second candidate image and the corresponding type label can be simpler, and the difficulty of training the classification model by using the second candidate image is reduced while the accuracy of the trained classification model is ensured.

In some embodiments, the above method may further comprise: importing an image to be processed into a pre-trained saliency detection model to obtain saliency region indication information; based on the saliency region indication information, cutting the image to be processed to obtain a cut image corresponding to the image to be processed.

Here, the saliency detection model may be trained using the first image sample and the saliency region labels.

Here, the above-described saliency region instruction information may indicate a saliency region in an image to be processed.

Here, the image to be processed may be cropped based on the saliency region instruction information, to obtain a cropped image.

It should be noted that, since the saliency detection model is obtained by training the first image sample and the saliency region label, the accuracy of the determined saliency region can be improved by adopting the saliency region determined by the saliency detection model.

In some embodiments, the image to be processed may be cropped based on the saliency region instruction information in various ways.

In some embodiments, the clipping the image to be processed based on the saliency region indication information to obtain a clipped image corresponding to the image to be processed may include: determining a clipping reserved area according to the size of the salient area indicated by the salient area information and the target size; and cutting the image to be processed according to the determined cutting reserved area.

Here, the target size may be a size of a desired processed image.

Here, clipping a reserved area may refer to an area that needs to be reserved when clipping an image to be processed.

Alternatively, if the saliency region size is greater than the target size, clipping may be started with the center of the saliency region, and clipping may be performed with the target size as a boundary, to obtain a clipped image.

Optionally, if the size of the salient region is not greater than the target size, edge repair can be performed on the salient region, and the size of the salient region is repaired to the target size, so as to obtain a cropped image.

It should be noted that, the clipping retaining area is determined according to the size of the saliency area and the target size, and the appropriate clipping retaining area can be set according to the sizes of the saliency area and the target size, so that the requirement of the target size is met, and the clipping accuracy of the image to be processed is improved.

In some embodiments, the significance detection model may be trained by a third step. The third step may include: leading the first image sample into an initial significance detection model to obtain a detection result; and adjusting an initial saliency detection model according to the saliency region label corresponding to the first image sample and the detection result.

Here, the initial saliency detection model may be an untrained or untrained completed neural network. The specific result of the initial saliency detection model may be set according to the actual application scenario, and is not limited herein.

Here, the detection result may indicate a prediction significance region.

Here, a loss between the saliency region labels and the detection results may be calculated, and parameters of the initial saliency detection model may be adjusted according to the loss.

It should be noted that, the first image sample is used to train the saliency detection model, and because the first image sample has the characteristics of high label accuracy and low sample cost, a large number of first image samples can be obtained for training, so that the model accuracy can be improved from the angle of high accuracy of a single sample, and the model accuracy can be improved from the angle of more samples.

In some application scenarios, please refer to fig. 3, which illustrates an application scenario according to an embodiment of the present application.

The click model may process the candidate image to obtain a second image sample and a type tag for the second image sample.

The second image sample and the corresponding type label may be used to train an initial classification model to obtain a classification model.

The classification model may be used to process the first image sample to obtain a salient region label.

The first image sample and the saliency region label can be used for training an initial saliency detection model to obtain a saliency detection model.

The saliency detection model can be used for processing the image to be processed to obtain saliency region indication information of the image to be processed.

Based on the saliency region indication information, the image to be processed can be cut, and a cut image is obtained.

With further reference to fig. 4, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an image sample generating apparatus, which corresponds to the method embodiment shown in fig. 1, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the image sample generation apparatus of the present embodiment includes: an import unit 401, a determination unit 402, and a setup unit 403. The image processing device comprises an importing unit, a classifying unit and a processing unit, wherein the importing unit is used for importing a first image sample into a pre-trained classifying model, the classifying model comprises at least one feature extraction layer, and the classifying model is used for representing the corresponding relation between an image and a predefined type; the determining unit is used for determining a salient region in the first image sample according to the feature map output by the feature extraction layer; the establishing unit is used for establishing a corresponding relation between the saliency area label indicating the saliency area and the first image sample, wherein the saliency area label and the first image sample are used for training to obtain a saliency detection model.

In this embodiment, the specific processes of the importing unit 401, the determining unit 402, and the establishing unit 403 of the image sample generating device and the technical effects thereof may refer to the descriptions related to the steps 101, 102, and 103 in the corresponding embodiment of fig. 1, and are not repeated here.

In some embodiments, the determining the salient region in the first image sample according to the feature map output by the feature extraction layer includes: determining a region with a response score rate greater than a preset response score rate threshold from the feature map, wherein the response score rate is the ratio between the response score of the region and the total response score in the feature map; from the determined regions, a salient region is determined.

In some embodiments, the classification model is trained by a first step comprising: leading the second image sample into an initial classification model to obtain a classification result; and adjusting the initial classification model based on the classification result and a type label of the second image sample, wherein the type label is used for indicating the click rate of the image.

In some embodiments, the second image sample and the corresponding type tag are generated by a second step, wherein the second step comprises: importing the candidate images into a click model to obtain a predicted click rate, wherein the click model is used for representing the corresponding relation between the images and the predicted click rate; and determining the second image sample from the candidate images according to the estimated click rate, and generating a type label corresponding to the second image sample.

In some embodiments, the determining the second image sample from the candidate image according to the estimated click rate, and generating the type tag corresponding to the second image sample, includes: determining candidate images with estimated click rate larger than a first click rate threshold as second candidate images corresponding to the first type labels; and determining the candidate images with estimated click rate smaller than a second click rate threshold as second candidate images corresponding to the second type of labels, wherein the first click rate threshold is not smaller than the second click rate threshold.

In some embodiments, the apparatus is further to: importing an image to be processed into a pre-trained saliency detection model to obtain saliency region indication information, wherein the saliency detection model is obtained by training a first image sample and a saliency region label; based on the saliency region indication information, cutting the image to be processed to obtain a cut image corresponding to the image to be processed.

In some embodiments, based on the saliency region indication information, cropping the image to be processed to obtain a cropped image corresponding to the image to be processed, including: determining a clipping reserved area according to the size of the salient area indicated by the salient area indication information and the target size; and cutting the image to be processed according to the determined cutting reserved area.

In some embodiments, the significance detection model is trained by a third step, wherein the third step comprises: leading the first image sample into an initial significance detection model to obtain a detection result; and adjusting an initial saliency detection model according to the saliency region label corresponding to the first image sample and the detection result.

Referring to fig. 5, fig. 5 illustrates an exemplary system architecture in which an image sample generation method of an embodiment of the present disclosure may be applied.

As shown in fig. 5, the system architecture may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The terminal devices 501, 502, 503 may interact with the server 505 via the network 504 to receive or send messages or the like. Various client applications, such as a web browser application, a search class application, a news information class application, may be installed on the terminal devices 501, 502, 503. The client application in the terminal device 501, 502, 503 may receive the instruction of the user and perform the corresponding function according to the instruction of the user, for example, adding the corresponding information in the information according to the instruction of the user.

The terminal devices 501, 502, 503 may be hardware or software. When the terminal devices 501, 502, 503 are hardware, they may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like. When the terminal devices 501, 502, 503 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 505 may be a server that provides various services, for example, receives information acquisition requests sent by the terminal devices 501, 502, 503, and acquires presentation information corresponding to the information acquisition requests in various ways according to the information acquisition requests. And related data showing the information is sent to the terminal devices 501, 502, 503.

It should be noted that the image sample generation method provided by the embodiment of the present disclosure may be performed by a terminal device, and accordingly, the image sample generation apparatus may be provided in the terminal devices 501, 502, 503. In addition, the image sample generation method provided by the embodiment of the present disclosure may also be performed by the server 505, and accordingly, the image sample generation apparatus may be provided in the server 505.

It should be understood that the number of terminal devices, networks and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to fig. 6, a schematic diagram of a configuration of an electronic device (e.g., a terminal device or server in fig. 5) suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 6, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: importing a first image sample into a pre-trained classification model, wherein the classification model comprises at least one feature extraction layer, and the classification model is used for representing the corresponding relation between the image and a predefined type; determining a salient region in the first image sample according to the feature map output by the feature extraction layer; and establishing a corresponding relation between the saliency region label indicating the saliency region and the first image sample, wherein the saliency region label and the first image sample are used for training to obtain a saliency detection model.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Where the name of the cell does not constitute a limitation of the cell itself in some cases, for example, the import unit may also be described as "cell importing the first image sample into the pre-trained classification model".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a special standard product (a SSP), a system on a chip (SO C), a Complex Programmable Logic Device (CPLD), etc.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. An image sample generation method, comprising:

importing a first image sample into a pre-trained classification model, wherein the classification model comprises at least one feature extraction layer, and the classification model is used for representing the corresponding relation between the image and a predefined type;

determining a salient region in the first image sample according to the feature map output by the feature extraction layer;

establishing a corresponding relation between a salient region label indicating the salient region and the first image sample, wherein the salient region label and the first image sample are used for training to obtain a salient detection model;

the classification model is obtained through training in a first step, wherein the first step comprises the following steps:

leading the second image sample into an initial classification model to obtain a classification result;

adjusting the initial classification model based on the classification result and a type label of the second image sample, wherein the type label is used for indicating the click rate of the image;

the second image sample and the corresponding type tag are generated by a second step, wherein the second step comprises:

importing the candidate images into a click model to obtain a predicted click rate, wherein the click model is used for representing the corresponding relation between the images and the predicted click rate;

determining candidate images with estimated click rate larger than a first click rate threshold as second candidate images corresponding to the first type labels;

and determining the candidate images with estimated click rate smaller than a second click rate threshold as second candidate images corresponding to the second type of labels, wherein the first click rate threshold is not smaller than the second click rate threshold.

2. The method of claim 1, wherein determining the salient region in the first image sample from the feature map output by the feature extraction layer comprises:

determining a region with a response score rate greater than a preset response score rate threshold from the feature map, wherein the response score rate is the ratio between the response score of the region and the total response score in the feature map;

from the determined regions, a salient region is determined.

3. The method according to claim 1, wherein the method further comprises:

importing an image to be processed into a pre-trained saliency detection model to obtain saliency region indication information, wherein the saliency detection model is obtained by training a first image sample and a saliency region label;

based on the saliency region indication information, cutting the image to be processed to obtain a cut image corresponding to the image to be processed.

4. The method according to claim 3, wherein cropping the image to be processed based on the saliency region instruction information to obtain a cropped image corresponding to the image to be processed, includes:

determining a clipping reserved area according to the size of the salient area indicated by the salient area indication information and the target size;

and cutting the image to be processed according to the determined cutting reserved area.

5. A method according to claim 3, wherein the significance detection model is trained by a third step, wherein the third step comprises:

leading the first image sample into an initial significance detection model to obtain a detection result;

and adjusting an initial saliency detection model according to the saliency region label corresponding to the first image sample and the detection result.

6. An image sample generation apparatus, comprising:

an importing unit, configured to import a first image sample into a pre-trained classification model, where the classification model includes at least one feature extraction layer, and the classification model is used to characterize a correspondence between an image and a predefined type;

the determining unit is used for determining a salient region in the first image sample according to the feature map output by the feature extraction layer;

the establishing unit is used for establishing a corresponding relation between the saliency area label indicating the saliency area and the first image sample, wherein the saliency area label and the first image sample are used for training to obtain a saliency detection model;

7. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

8. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.