CN113538368A

CN113538368A - Image selection method, image selection device, storage medium, and electronic apparatus

Info

Publication number: CN113538368A
Application number: CN202110795019.XA
Authority: CN
Inventors: 黄海东
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-10-22

Abstract

The disclosure provides an image selection method, an image selection device, a computer readable storage medium and an electronic device, and relates to the technical field of image and video processing. The image selection method comprises the following steps: acquiring a plurality of images to be evaluated; detecting human body attachments in the image to be evaluated; determining the evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachments in each image to be evaluated; and selecting a target image from the plurality of images to be evaluated according to the evaluation value of the images to be evaluated. The method and the device can realize objective and accurate evaluation on the image containing the human body attachments so as to accurately select the target image.

Description

Image selection method, image selection device, storage medium, and electronic apparatus

Technical Field

The present disclosure relates to the field of image and video processing technologies, and in particular, to an image selection method, an image selection apparatus, a computer-readable storage medium, and an electronic device.

Background

In a scene of dynamic shooting of a person, a dynamic sense of human attachments related to a shot person is required to be reflected sometimes, for example, a graduation photo of throwing a bachelor hat is shot, a picture of swinging a silk scarf is shot, a picture of stretching a skirt during dancing is shot, and the like.

Most of the existing devices such as smart phones and digital cameras have a continuous shooting function, or users can perform continuous shooting through manual operation, but the related technology lacks a scheme for accurately evaluating shot images, so that the users need to manually select target images, and the device is very inconvenient.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those skilled in the art.

Disclosure of Invention

The present disclosure provides an image selection method, an image selection apparatus, a computer-readable storage medium, and an electronic device, thereby solving, at least to some extent, a problem in the related art that an image cannot be accurately evaluated and selected.

According to a first aspect of the present disclosure, there is provided an image selection method comprising: acquiring a plurality of images to be evaluated; detecting a human body attachment in the image to be evaluated; determining the evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachments in each image to be evaluated; and selecting a target image from the plurality of images to be evaluated according to the evaluation value of the images to be evaluated.

According to a second aspect of the present disclosure, there is provided an image selection apparatus comprising: the image acquisition module is configured to acquire a plurality of images to be evaluated; an image detection module configured to detect a human body attachment in the image to be evaluated; the image evaluation module is configured to determine an evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachment in each image to be evaluated; an image selection module configured to select a target image from the plurality of images to be evaluated according to an evaluation value of the image to be evaluated.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image selection method of the first aspect described above and possible implementations thereof.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the image selection method of the first aspect described above and possible implementations thereof via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

on one hand, a scheme for evaluating the image through the position and the morphological characteristics of the human body attachments is established, the image containing the human body attachments can be objectively and accurately evaluated without depending on an artificial scoring label, particularly, when a dynamic scene is continuously shot, the detailed distinction among multiple shot frames of images can be recognized, and the target image can be accurately selected based on the evaluation value of the image. On the other hand, compared with the method for evaluating and selecting images by adopting a neural network in the related technology, the method has the advantages that the algorithm process is simple, the required calculation force resources are less, and the image scoring labels do not need to be manually marked, so that the realization cost is lower, and the method is favorable for being applied to light-weight scenes such as mobile terminals.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

FIG. 1 shows a schematic diagram of a system architecture in the present exemplary embodiment;

fig. 2 shows a schematic configuration diagram of an electronic apparatus in the present exemplary embodiment;

FIG. 3 shows a flow chart of an image selection method in the present exemplary embodiment;

FIG. 4 shows a flowchart for detecting human body and human body attachments in the present exemplary embodiment;

fig. 5 shows a schematic diagram of the preamble detection in the present exemplary embodiment;

FIG. 6 shows a flowchart for segmenting a human body from human attachments in the present exemplary embodiment;

fig. 7 is a schematic diagram illustrating a local image is cut out from an image to be evaluated and semantic segmentation is performed in the present exemplary embodiment;

FIG. 8 illustrates a flow chart for determining an evaluation value for a flexible human body attachment in accordance with the present exemplary embodiment;

fig. 9 shows a flowchart for determining an evaluation value for a rigid human body attachment in the present exemplary embodiment;

fig. 10 shows a schematic flowchart of an image selection method in the present exemplary embodiment;

fig. 11 shows a schematic configuration diagram of an image selection apparatus in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In one scheme of the related art, a deep learning technique is used to train a neural network for image scoring, so as to score images, and further select a target image. However, the accuracy of scoring through the neural network strongly depends on manually marking scoring labels on the images, the influence of subjective factors is large, particularly, the scoring labels of different people are difficult to achieve standard unification, and for multi-frame images with only detail differences, scoring results may be very close, and the optimal target image is difficult to pick out.

In view of the above, exemplary embodiments of the present disclosure provide an image selection method. The system architecture and application scenario of the operating environment of the exemplary embodiment are described below with reference to fig. 1.

Fig. 1 shows a schematic diagram of a system architecture, and the system architecture 100 may include a terminal 110 and a server 120. The terminal 110 may be a terminal device such as a smart phone, a tablet computer, a desktop computer, or a notebook computer, and the server 120 generally refers to a background system providing the image selection related service in the exemplary embodiment, and may be a server or a cluster formed by multiple servers. The terminal 110 and the server 120 may form a connection through a wired or wireless communication link for data interaction.

In one embodiment, the image selection method described above may be performed by terminal 110. For example, after a user captures a plurality of images using the terminal 110 or the user selects a plurality of images in an album of the terminal 110, the terminal 110 evaluates the plurality of images and selects a target image therefrom.

In one embodiment, the image selection method described above may be performed by the server 120. For example, after the user takes a plurality of images using the terminal 110 or the user selects a plurality of images in an album of the terminal 110, the terminal 110 uploads the plurality of images to the server 120, the server 120 evaluates the plurality of images, selects a target image from the plurality of images, and returns a selection result to the terminal 110.

As can be seen from the above, the execution subject of the image selection method in the present exemplary embodiment may be the terminal 110 or the server 120, which is not limited by the present disclosure.

Exemplary embodiments of the present disclosure also provide an electronic device for performing the above image selection method, which may be the above terminal 110 or the server 120. The structure of the electronic device is exemplarily described below by taking the mobile terminal 200 in fig. 2 as an example. It will be appreciated by those skilled in the art that the configuration of figure 2 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.

As shown in fig. 2, the mobile terminal 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a USB (Universal Serial Bus) interface 230, a charging management Module 240, a power management Module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication Module 250, a wireless communication Module 260, an audio Module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor Module 280, a display 290, a camera Module 291, a pointer 292, a motor 293, a button 294, and a SIM (Subscriber identity Module) card interface 295.

Processor 210 may include one or more processing units, such as: the Processor 210 may include an AP (Application Processor), a modem Processor, a GPU (Graphics Processing Unit), an ISP (Image Signal Processor), a controller, an encoder, a decoder, a DSP (Digital Signal Processor), a baseband Processor, and/or an NPU (Neural-Network Processing Unit), etc.

The encoder can encode (i.e. compress) the image or video data to form corresponding code stream data so as to reduce the bandwidth occupied by data transmission; the decoder may decode (i.e., decompress) the code stream data of the image or the video to restore the image or the video data, for example, decode the code stream data of the image to be evaluated to obtain the image data, and then execute the image selection method. The mobile terminal 200 may process images or video in a variety of encoding formats, such as: image formats such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group) 1, MPEG2, h.263, h.264, and HEVC (High Efficiency Video Coding).

In one embodiment, processor 210 may include one or more interfaces through which connections are made to other components of mobile terminal 200.

Internal memory 221 may be used to store computer-executable program code, including instructions. The internal memory 221 may include volatile memory and nonvolatile memory. The processor 210 executes various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the internal memory 221.

The external memory interface 222 may be used to connect an external memory, such as a Micro SD card, for expanding the storage capability of the mobile terminal 200. The external memory communicates with the processor 210 through the external memory interface 222 to implement data storage functions, such as storing images, videos, and other files.

The USB interface 230 is an interface conforming to the USB standard specification, and may be used to connect a charger to charge the mobile terminal 200, or connect an earphone or other electronic devices.

The charge management module 240 is configured to receive a charging input from a charger. While the charging management module 240 charges the battery 242, the power management module 241 may also supply power to the device; the power management module 241 may also monitor the status of the battery.

The wireless communication function of the mobile terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 250 may provide a mobile communication solution of 2G, 3G, 4G, 5G, etc. applied to the mobile terminal 200. The Wireless Communication module 260 may provide Wireless Communication solutions such as WLAN (Wireless Local Area Networks) (e.g., Wi-Fi (Wireless Fidelity), BT (Bluetooth), GNSS (Global Navigation Satellite System), FM (Frequency Modulation), NFC (Near Field Communication), IR (Infrared technology), and the like, which are applied to the mobile terminal 200.

The mobile terminal 200 may implement a display function through the GPU, the display screen 290, the AP, and the like, and display a user interface. For example, when the user performs camera detection, the mobile terminal 200 may display an interface of a camera detection App (Application) in the display screen 290.

The mobile terminal 200 may implement a photographing function through the ISP, the camera module 291, the encoder, the decoder, the GPU, the display 290, the AP, and the like. For example, a user can start an image or video shooting function in the hidden camera detection App, and at this time, an image of a space to be detected can be acquired through the camera module 291.

The mobile terminal 200 may implement an audio function through the audio module 270, the speaker 271, the receiver 272, the microphone 273, the earphone interface 274, the AP, and the like.

The sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, a barometric pressure sensor 2804, etc. to implement a corresponding inductive detection function.

Indicator 292 may be an indicator light that may be used to indicate a state of charge, a change in charge, or may be used to indicate a message, missed call, notification, etc. The motor 293 may generate a vibration cue, may also be used for touch vibration feedback, and the like. The keys 294 include a power-on key, a volume key, and the like.

The mobile terminal 200 may support one or more SIM card interfaces 295 for connecting SIM cards to implement functions such as call and mobile communication.

The following describes an image selection method in the present exemplary embodiment with reference to fig. 3, where fig. 3 shows an exemplary flow of the image selection method, and may include:

step S310, acquiring a plurality of images to be evaluated;

step S320, detecting human body attachments in the image to be evaluated;

step S330, determining the evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachments in each image to be evaluated;

in step S340, a target image is selected from the plurality of images to be evaluated according to the evaluation value of the image to be evaluated.

Based on the method, on one hand, a scheme for evaluating the image through the position and the morphological characteristics of the human body attachments is established, the image containing the human body attachments can be objectively and accurately evaluated without depending on a manual scoring label, particularly, when a dynamic scene is continuously shot, the detailed distinction among multiple frames of shot images can be recognized, and the target image can be accurately selected based on the evaluation value of the image. On the other hand, compared with the method for evaluating and selecting images by adopting a neural network in the related technology, the method has the advantages that the algorithm process is simple, the required calculation force resources are less, and the image scoring labels do not need to be manually marked, so that the realization cost is lower, and the method is favorable for being applied to light-weight scenes such as mobile terminals.

Each step in fig. 3 is explained in detail below.

Referring to fig. 3, in step S310, a plurality of images to be evaluated are acquired.

The source of the image to be evaluated is not limited in the present disclosure, for example, the image to be evaluated may be a currently shot image or an arbitrary image selected by a user.

In an embodiment, the acquiring the plurality of images to be evaluated may include:

and acquiring a plurality of images to be evaluated from continuously acquired multi-frame images.

Wherein, the main shooting objects or shooting scenes of the multi-frame images are the same. For example, the multi-frame image may be a continuously acquired multi-frame preview image, or a multi-frame image captured in a continuous shooting mode, or a continuous multi-frame image in a video, and the like, which is not limited in this disclosure.

After the continuously collected multiple frames of images are obtained, each frame of image may be used as an image to be evaluated, or a part of the image may be selected as the image to be evaluated, for example, a frame-spacing strategy may be set, and one frame of image is selected as the image to be evaluated at certain frame intervals.

In an implementation manner, the definition of the multi-frame image may be detected, and the image with the definition reaching the definition threshold may be used as the image to be evaluated, and the definition threshold may be set according to experience and actual requirements. Therefore, fuzzy images are filtered out from the multi-frame images, only clear images are evaluated, and the efficiency of image selection can be improved.

With continued reference to fig. 3, in step S320, a human attachment in the image to be evaluated is detected.

The human body attachment refers to an object which is located outside the limbs of the human body and is related to the human body, and includes but is not limited to: hat, hair, scarf, skirt, flag, etc. Due to the need for subsequent processing, the position of the human body attachment in the image to be evaluated needs to be determined in step S320. For example, the bounding box of the human body attachment can be determined in the image to be evaluated by a Non Maximum Suppression (NMS) algorithm; or the mask of the human body attachment can be determined in the image to be evaluated through a semantic segmentation algorithm.

Determining the position of the human attachment requires an associated algorithm to process. In order to save the calculation power, before determining the position, the image to be evaluated may be subjected to pre-detection, and whether the image to be evaluated is suitable for the subsequent processing in the exemplary embodiment, so as to filter out unsuitable images or potentially low-quality images. The following is an exemplary description of the embodiment of preamble detection.

In one embodiment, referring to fig. 4, the detecting the human body attachment in the image to be evaluated may include the following steps:

step S410, detecting human body key points in an image to be evaluated;

step S420, determining whether the preset part of the human body is positioned in the image to be evaluated according to the key point of the human body;

and step S430, if the preset part of the human body is positioned in the image to be evaluated, the human body and the human body attachments are segmented from the image to be evaluated.

The method comprises the steps of detecting key points of a human body to determine whether a preset part of the human body is located within an image to be evaluated or not, wherein the detection belongs to prepositive detection. The human key point detection algorithm is not limited, and different algorithms can be adopted to detect human key points with different numbers or different positions, for example, a COCO data set training related human key point detection network is adopted to detect 17 joint points of a human body, including a nose, left and right eyes, left and right shoulders, left and right elbows and the like. The preset part of the human body is a part related to the action of the human body and is also a part related to human attachments, for example, when a hat throwing action is performed, the preset part of the human body can be an arm. In this exemplary embodiment, the preset part of the human body may be determined according to an actual action scene, or a union set of related parts in a general action scene may be taken to obtain the preset part of the human body. For example, the limbs of the human body may be used as the human body preset portions, or the entire human body may be used as the human body preset portions.

By detecting key points of the human body, whether each part of the human body is positioned in the image to be evaluated can be judged. Generally, if each human body key point of the preset part of the human body is located within the image to be evaluated, it is determined that the preset part of the human body is located within the image to be evaluated.

When the human body key points are exemplified, the mentioned nose, the left eye and the right eye belong to the human face key points, namely, the human face key points can be detected by some human body key point detection algorithms, so that the position of the head can be conveniently determined, and four limbs can be conveniently distinguished. If the adopted human key point detection algorithm does not detect the human face key points and only detects the human key points of the limbs, the human face key points can be detected independently. In one embodiment, the image selection method may further include the steps of:

and detecting the key points of the human face in the image to be evaluated.

Further, in step S420, it may be determined whether the preset portion of the human body is located within the image to be evaluated according to the key points of the human body and the key points of the human face.

The detection of the human body key points and the detection of the human face key points can be respectively executed through two algorithms, for example, a human body key point detection network and a human face key point detection network are respectively established, and images to be processed are respectively input into the two networks to obtain the human body key points and the human face key points. By combining the key points of the human face, the specific positions and directions of the key points of the human body can be determined, and whether the preset positions of the human body are positioned in the image to be evaluated or not can be judged more accurately.

If the preset part of the human body is not within the image to be evaluated, the image to be evaluated is difficult to reflect the real motion of the human body, the image to be evaluated is not suitable for subsequent processing, and preset results such as abnormal evaluation results, low evaluation values and the like can be returned or evaluation can be performed in other modes. And if the preset part of the human body is positioned in the image to be evaluated, the human body and the human body attachments are separated from the image to be evaluated so as to determine the positions of the human body and the human body attachments, and the subsequent processing is carried out. Therefore, the pre-detection in fig. 4 can filter out the images to be evaluated, which are not shot at the preset part of the human body, and reduce the number of the images to be evaluated, and the pre-detection process can be realized by detecting key points of the human body.

In one embodiment, it may also be detected whether there is a human body attachment in the image to be evaluated in the pre-detection, and the step S430 may include:

and if the preset part of the human body is positioned in the image to be evaluated and the human body attachments in the image to be evaluated are detected, the human body and the human body attachments are separated from the image to be evaluated.

Referring to fig. 5, this step corresponds to two-dimensional pre-detection of the image to be evaluated: the first aspect is to detect key points of a human body to determine whether a preset part of the human body is located within an image to be evaluated, i.e., the pre-detection shown in fig. 4. The second aspect is to detect whether there is a human body attachment in the image to be evaluated, and this detection can be realized by, for example, a lightweight image classification network that outputs only whether there is a human body attachment in the image and does not output the position of the human body attachment. Therefore, an image classification network with a simple structure can be constructed, and sample images containing and not containing human attachments are obtained and labeled to train the image classification network. In an embodiment, the image classification network may be further trained to output classification results of multiple dimensions at the same time, where a numerical value of each dimension represents a classification result of a human body attachment, for example, a 1 st dimensional value represents whether a hat exists in an image, a 2 nd dimensional value represents whether a scarf exists, a 3 rd dimensional value represents whether a skirt exists, and the like, so as to obtain a more detailed human body attachment detection result.

And if the preset part of the human body is determined to be positioned in the image to be evaluated and the human body attachments are detected to be present in the image to be evaluated, namely the two aspects of pre-detection are passed, the human body and the human body attachments are separated from the image to be evaluated so as to determine the positions of the human body and the human body attachments, and the subsequent processing is carried out. Compared with fig. 4, the front detection in fig. 5 adds detection on whether human attachments exist, and can filter images to be evaluated, which are not shot at preset parts of the human body or are not shot at human attachments, so that more sufficient image filtering is realized, the number of images to be evaluated is further reduced, and the calculation power required for detecting whether human attachments exist in the images to be evaluated is lower, namely the overall calculation power of the front detection in the two aspects is lower, and therefore, the efficiency of image selection can be improved.

The embodiment of the preamble detection is described above. In the case that the image to be evaluated passes the pre-detection, the human body and the human body attachments need to be further segmented from the image to be evaluated so as to obtain the accurate positions of the human body and the human body attachments. In one embodiment, as shown in fig. 6, the above-mentioned separating the human body and the human body attachments from the image to be evaluated may include the following steps:

step S610, taking an image to be evaluated as an image to be segmented, or taking a local image containing a human body and human body attachments in the image to be evaluated as an image to be segmented;

step S620, performing semantic segmentation on the image to be segmented to obtain semantic segmentation information of the image to be segmented, wherein the semantic segmentation information comprises a classification result of each pixel point in the image to be segmented;

and S630, optimizing the semantic segmentation information of the image to be segmented, and segmenting the human body and the human body attachments by utilizing the optimized semantic segmentation information.

The whole image to be evaluated can be used as the image to be segmented, and a local image containing the human body and the human body attachments can be intercepted and used as the image to be segmented. The process of capturing the partial image and semantic segmentation can be referred to as shown in fig. 7. After the local image is intercepted from the image to be evaluated, the processing range of semantic segmentation is reduced, and the calculation power is saved. Furthermore, the semantic segmentation network can be used for processing the image to be segmented to obtain semantic segmentation information.

The semantic segmentation information includes a classification result of each pixel point in the image to be segmented, i.e., which semantic category each pixel point belongs to. Fig. 7 shows a visualized image of semantic segmentation information, and the classification result of each pixel point is represented by different colors, for example, different colors between a human body and a human body attachment. In addition, the human body part can be divided into different parts and expressed by different semantic categories; different body attachments can be represented as different semantic categories, such as hats, scarves, etc. can be represented in different colors.

Based on the semantic segmentation information, the human body and the human body attachments can be segmented from the image to be segmented. In order to further improve the accuracy of semantic segmentation, semantic segmentation information may be optimized. Two exemplary optimization approaches are provided below:

in the first mode, the semantic segmentation information of the image to be segmented may further include confidence levels of each pixel point in the image to be segmented corresponding to different semantic categories, for example, for any pixel point in the image to be segmented, the semantic segmentation information may include that the confidence level of the pixel point corresponding to a human body is 75%, the confidence level corresponding to a hat is 4%, the confidence level corresponding to a scarf is 12%, and the like. Based on this, the optimizing the semantic segmentation information of the image to be segmented may include the following steps:

counting the confidence degrees of the pixel points corresponding to different semantic categories, and determining a human body confidence degree threshold value and at least one human body attachment confidence degree threshold value according to the counting result;

and filtering the pixel points of the area where the human body is located by using the human body confidence threshold value, and filtering the pixel points of the area where the human body attachment is located by using the human body attachment confidence threshold value.

When the confidence levels are counted, the confidence levels of the pixel points under each semantic category can be counted respectively to obtain the confidence level distribution condition of each semantic category, and then a proper confidence level threshold value is determined for each semantic category. In one embodiment, an adaptive threshold algorithm may be used to determine local confidence thresholds by counting local confidence distributions.

After the confidence threshold value of each semantic category is determined, the pixel points of each semantic category are filtered to filter out the pixel points with lower confidence, so that the accuracy of the segmented human body and the human body attachments is improved.

And in the second mode, in the scene of carrying out image evaluation on continuously acquired multi-frame images, the optimization can be carried out by utilizing the information of adjacent images. Specifically, the optimizing semantic segmentation information of the image to be segmented may include the following steps:

obtaining semantic segmentation information of at least one frame of adjacent image of an image to be evaluated or semantic segmentation information of a local image in the adjacent image to be used as reference semantic segmentation information;

and performing time domain filtering on the semantic segmentation information of the image to be segmented by utilizing the reference semantic segmentation information so as to optimize the semantic segmentation information of the image to be segmented.

The adjacent image may be one or more frames of images located before the image to be evaluated. The semantic segmentation is performed on the adjacent images, or a local image including the human body and the human body attachments is cut from the adjacent images and the semantic segmentation is performed, which may specifically refer to the processing content of the image to be evaluated in fig. 6. Generally, if the whole image to be evaluated is taken as the image to be segmented, the whole adjacent image is also subjected to semantic segmentation, and if the local image in the image to be evaluated is taken as the image to be segmented, the local image in the adjacent image is also subjected to semantic segmentation after being segmented. The semantic segmentation information of the adjacent images or the local images in the adjacent images is used as reference semantic segmentation information, and the semantic segmentation information of the image to be segmented is subjected to time domain filtering, for example, pixel points with unstable semantic categories in the time domain can be filtered out from the semantic segmentation information of the image to be segmented, so that the pixel points of the region where the human body is located and the pixel points of the region where the human body attachment is located in the reference semantic segmentation information and the semantic segmentation information of the image to be segmented have stable and continuous semantics, and the accuracy of the segmented human body and the human body attachment is improved.

It should be understood that the present exemplary embodiment may combine the above-described modes one and two. For example, a confidence threshold is determined according to a confidence statistic result of pixel points in an image to be segmented, and the pixel points of different semantic categories are filtered through the confidence threshold so as to perform first-round optimization on a semantic segmentation result of the image to be segmented; and performing time domain filtering on the semantic segmentation result of the image to be segmented after the first round of optimization by using the reference semantic segmentation result so as to perform the second round of optimization. After two rounds of optimization, a semantic segmentation result of the image to be segmented with higher accuracy is obtained, and then the human body and attachments are segmented from the semantic segmentation result.

With continued reference to fig. 3, in step S330, an evaluation value of each image to be evaluated is determined according to the position and morphological characteristics of the human body attachment in each image to be evaluated.

The position and morphological characteristics of the human body attachments are two factors reflecting the content quality of the human body attachments in the image to be evaluated, and the quality of the image to be evaluated is influenced to a great extent.

In one embodiment, step S330 may include the steps of:

determining a comprehensive evaluation value of the human body attachments according to the positions and morphological characteristics of the human body attachments;

and determining the evaluation value of the image to be evaluated based on the comprehensive evaluation value of the human body attachments.

The comprehensive evaluation value of the human body attachments is an evaluation value obtained by comprehensively considering various factors of the human body attachments, and on the basis, the comprehensive evaluation value of the human body attachments and other evaluation values in the image to be evaluated can be combined to determine the evaluation value of the image to be evaluated. Other aspects of the evaluation value may include a human body evaluation value, such as a human body posture evaluation value, an expression evaluation value, and the like. The present disclosure is not limited thereto.

The location and morphological characteristics of the human attachment may each include one or more finely divided evaluation factors. For example, the position of the human body attachment may include a distance between the human body attachment and the human body, a distance between the human body attachment and the center of the image, and the like, and the morphological characteristics of the human body attachment may include a size, a shape, and the like of the human body attachment. A

In one embodiment, the comprehensive evaluation value of the human body attachment can be determined by specifically selecting two factors, namely the size of the human body attachment and the distance between the human body attachment and the human body. Generally, the size of the human body attachment and the distance between the human body attachment and the human body are positively correlated with the overall evaluation value of the human body attachment, that is, the greater the size of the human body attachment, the greater the distance from the human body, the higher the overall evaluation value of the human body attachment.

In one embodiment, the human body attachment may be divided into a flexible object and a rigid object, and the comprehensive evaluation values thereof may be determined by different methods. Wherein, it can be predetermined whether each kind of human body attachments belongs to flexible objects or rigid objects, the flexible objects are human body attachments which can generate obvious shape change, such as scarves, skirts, flags, hairs and the like, and the rigid objects are human body attachments which can not generate obvious shape change, including human body attachments which can generate small shape change, such as hats, water bottles and the like. Therefore, when the image to be segmented is subjected to semantic segmentation, whether the image to be segmented belongs to a flexible object or a rigid object can be determined according to the semantic category of each segmented human body attachment.

The following describes the evaluation methods of the flexible object and the rigid object, respectively.

For the flexible object, referring to fig. 8, the above-mentioned determining the comprehensive evaluation value of the human body attachment according to the position and the morphological characteristics of the human body attachment may include the following steps:

step S810, when the human body attachment is a flexible object, determining a spreading degree evaluation value of the human body attachment according to the size of the human body attachment and the dispersion degree of pixel point coordinates in the human body attachment, and determining a relative position evaluation value of the human body attachment according to the distance between the human body attachment and the human body;

in step S820, a comprehensive evaluation value of the human body attachment is determined based on the extension degree evaluation value and the relative position evaluation value of the human body attachment.

The larger the size of the human body attachment is, the higher the dispersion degree of the pixel point coordinates is, which indicates that the higher the extension degree of the human body attachment is, and the higher the extension degree evaluation value thereof is. For example, the calculation of the extension degree evaluation value may be as follows:

wherein, Score_stretchRepresenting a stretching degree evaluation value; s represents the area of the human body attachments and can be equal to the total number n of pixel points of the human body attachments; p_iTo representCoordinate of ith pixel point on human body attachment, P_centerThe coordinates of the center point of the human body attachment can be represented as the average or median of the coordinates of all pixel points in the human body attachment, so that the coordinates of the center point of the human body attachment can be represented as the average or median of the coordinates of all pixel points in the human body attachment,

representing the discrete degree of the pixel point coordinates in the human body attachments; a is₁And a₂And the weight is used for weighting the discrete degree of the area and the pixel point coordinate to obtain a stretching degree evaluation value.

The farther the distance between the human body attachment and the human body is, the higher the relative position evaluation value is. For example, the calculation of the relative position evaluation value may be as follows:

Score_position＝a₃·|Point_man-Point_target| (2)

wherein, Score_positionIndicating a relative position evaluation value; point (Point)_manRepresenting the coordinates of the center point of the human body; point (Point)_targetCoordinates of the center point of the human body attachment, which is in contact with the P_centerMay be the same or different; a is₃Is a weight for weighting the relative position evaluation value and the extension degree evaluation value.

It should be understood that the present disclosure is directed to the above weight a₁、a₂、a₃The value of (b) is not limited, and may be set according to experience and actual requirements. In one embodiment, a may be set in consideration of the fact that the discrete degree of the coordinates of the pixel points in the human body attachment has a large influence on the visual perception₂Greater than a₁And a₃。

It should be added that, when there are a plurality of human bodies in the image to be evaluated, each human body attachment may be respectively corresponding to one of the human bodies, which indicates that the human body attachment is an attachment corresponding to the human body, for example, the human body closest to the human body attachment may be used as the human body corresponding to the human body attachment, or a synchronous motion relationship between the human body and the human body attachment may be determined according to motion recognition, and the human body attachment having the synchronous motion relationship may be corresponding to the human body. Multiple human attachments may correspond to the same human body. In calculating the relative position evaluation value, the distance between each human body attachment and the corresponding human body is calculated.

The extension degree evaluation value and the relative position evaluation value are summed, and since the extension degree evaluation value and the relative position evaluation value carry weights, the summation is equivalent to a weighted operation, and a comprehensive evaluation value of the human body attachment is obtained. When a plurality of human attachments exist in the image to be evaluated, the comprehensive evaluation value of each human attachment can be accumulated as follows:

wherein m is the total number of flexible human attachments, Score_softRepresents the overall evaluation value of the flexible human body attachment.

Referring to fig. 9, the above-mentioned determination of the overall evaluation value of the human body attachment according to the position and morphological characteristics of the human body attachment for the rigid object may include the following steps:

step S910, when the human body attachment is a rigid object, determining a posture evaluation value of the human body attachment according to the size of the human body attachment, and determining a motion degree evaluation value of the human body attachment according to the distance between the human body attachment and the human body;

in step S920, a comprehensive evaluation value of the human body attachment is determined based on the posture evaluation value and the motion degree evaluation value of the human body attachment.

The shape of the human body attachment belonging to the rigid object does not change significantly, so that the factor of the dispersion degree of the pixel point coordinates can be not considered, the size of the human body attachment greatly influences the visual perception in the movement process of the human body attachment, for example, when the human body attachment moves to a certain angle, the projection area in an image is large, and abundant image content is presented. Thus, the attitude evaluation value of the human body attachment is determined according to the size of the human body attachment, and the attitude evaluation value is higher as the size is larger. For example, the pose evaluation value may be calculated as follows:

Score_pose＝b₁·S (4)

wherein, Score_poseRepresenting a posture evaluation value; s represents the area of the human body attachments and can be equal to the total number n of pixel points of the human body attachments; b₁Is a weight for weighting the pose evaluation value and the motion degree evaluation value.

The farther the distance between the human body attachment and the human body is, the higher the degree of movement of the human body attachment is, for example, when a hat is thrown, it is desirable to take an image of the highest point of the hat, which means that the degree of movement of the hat is the highest. For example, the motion degree evaluation value may be calculated as follows:

Score_mo+e＝b₂·|Point_man-Point_target| (5)

wherein, Score_moveRepresenting a motion degree evaluation value; point (Point)_manRepresenting the coordinates of the center point of the human body; point (Point)_targetCoordinates of the center Point of the human body attachment and Point in the above formula (2)_targetThe calculation of (c) may be the same; b₂Is a weight for weighting the motion degree evaluation value and the pose evaluation value.

It should be understood that the present disclosure is directed to the above weight b₁、b₂The value of (b) is not limited, and may be set according to experience and actual requirements. In one embodiment, b may be set in consideration of the fact that the degree of movement of the human body attachment has a large influence on the visual perception₂Greater than b₁。

The attitude evaluation value and the motion degree evaluation value are summed, and since the attitude evaluation value and the motion degree evaluation value carry weights, the summation is equivalent to a weighting operation, and a comprehensive evaluation value of the human body attachment is obtained. When a plurality of human attachments exist in the image to be evaluated, the comprehensive evaluation value of each human attachment can be accumulated as follows:

wherein r is the total number of rigid human attachments, Score_rididThe total evaluation value of the rigid human body attachments is shown.

The total evaluation value is calculated for all flexible and rigid human attachments in the image to be evaluated as follows:

Score_addition＝Score_soft+Score_rigid (7)

Score_additionrepresents the total evaluation value of the human body attachments in the image to be evaluated.

In the case where only the human attachment factor is considered, the total evaluation value of the human attachment may be used as the evaluation value of the image to be evaluated. In addition, the evaluation values of other factors in the image to be evaluated can be considered, and the evaluation values and the evaluation value of the human body attachment can be comprehensively calculated.

In one embodiment, the evaluation value of the image to be evaluated may be obtained according to the total evaluation value of the human body attachments and the human body evaluation value. The human body evaluation value includes at least one of a human body posture evaluation value and an expression evaluation value. For example, an evaluation value may be calculated by using a correlation algorithm for human body posture evaluation on a to-be-evaluated image, an evaluation value may be calculated by using a correlation algorithm for human face expression evaluation on a to-be-evaluated image, and then the total evaluation value of human body attachments, the human body posture evaluation value, and the expression evaluation value are integrated to obtain a final evaluation value of the to-be-evaluated image, as follows:

Score_final＝Score_addition+Score_{man pose}+Score_face (8)

wherein, Score_finalIndicates the final evaluation value of the image to be evaluated, Score_man _poseRepresents the evaluation value of human body posture, Score_faceIndicating an expression evaluation value. Therefore, the image to be evaluated can be comprehensively evaluated by integrating various factors such as human attachments, human postures and expressions in the image to be evaluated.

With continued reference to fig. 3, in step S340, a target image is selected from the plurality of images to be evaluated according to the evaluation value of the image to be evaluated.

The target image may be selected according to actual requirements, and may be one or more images with a higher evaluation value, that is, an image to be evaluated with higher quality, for example, one image to be evaluated with the highest evaluation value, that is, the optimal image to be evaluated.

In one embodiment, if a plurality of images to be evaluated are acquired from continuously acquired multi-frame images in step S310, after the evaluation value of each image to be evaluated is acquired, the image to be evaluated, of which the evaluation value is the highest, may be determined as the target image in the multi-frame images. For example, after the user takes a continuous shot, the evaluation value of each image is determined by executing the image evaluation method in the present exemplary embodiment, and the image with the highest evaluation value is determined as the target image, which may be presented or recommended to the user, or only the target image may be saved and the other images may be deleted.

Fig. 10 shows a schematic flow of an image selection method, comprising:

in step S1001, a plurality of images to be evaluated, such as a plurality of frames of images which can be continuously collected, are input, and steps S1002 to S1011 are performed for each image to be evaluated.

Step S1002, detecting whether the human body in the image to be evaluated is completely within the image to be evaluated, if so, executing step S1003, and if not, executing step S1010.

Step S1003, detecting whether the image to be evaluated contains the attachments of the human body, if yes, executing step S1004, and if not, executing step S1010.

Step S1004, a partial image including the human body and the attached object is extracted from the image to be evaluated, and the human image is analyzed to obtain a segmentation result about the human body and the attached object.

Step S1005, performing adaptive threshold segmentation to optimize the segmentation result.

Step S1006, the segmentation result is further optimized through time-domain filtering to segment the human body and the attachments.

Step S1007, for a flexible adherent, calculates a degree of extension evaluation value and a relative position evaluation value thereof.

In step S1008, the posture evaluation value and the motion degree evaluation value of the rigid attached object are calculated.

Step S1009 weights the evaluation values of all flexible and rigid attachments in the image to be evaluated according to the area sizes thereof, to obtain a total evaluation value of the attachments in the image to be evaluated.

In step S1010, the total evaluation value of the attached matter is weighted with the human posture evaluation value and the expression evaluation value, and if one or more of them is missing, 0 is substituted, and for example, if no is determined in step S1002 or step S1003, the total evaluation value of the attached matter is 0 if no evaluation of the attached matter is performed.

In step S1011, the evaluation value of the image to be evaluated is determined.

In step S1012, after the evaluation value of each image to be evaluated is determined, one image to be evaluated having the highest evaluation value is selected as the target image, thereby completing the image selection process.

Exemplary embodiments of the present disclosure also provide an image selection apparatus. Referring to fig. 11, the image selection apparatus 1100 may include:

an image acquisition module 1110 configured to acquire a plurality of images to be evaluated;

an image detection module 1120 configured to detect a human attachment in an image to be evaluated;

an image evaluation module 1130 configured to determine an evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachment in each image to be evaluated;

and an image selection module 1140 configured to select a target image from the plurality of images to be evaluated according to the evaluation value of the image to be evaluated.

In one embodiment, the image acquisition module 1110 is configured to:

acquiring a plurality of images to be evaluated from continuously acquired multi-frame images;

an image selection module 1140 configured to:

and determining the image to be evaluated with the highest evaluation value as the target image in the multi-frame images.

In one embodiment, the image detection module 1120 is configured to:

detecting human body key points in an image to be evaluated;

determining whether the preset part of the human body is positioned in the image to be evaluated or not according to the key points of the human body;

and if the preset part of the human body is positioned in the image to be evaluated, the human body and the human body attachments are separated from the image to be evaluated.

In an embodiment, if the predetermined portion of the human body is located within the image to be evaluated, the method for segmenting the human body and the human body attachment from the image to be evaluated includes:

In one embodiment, the image detection module 1120 is further configured to:

detecting key points of a human face in an image to be evaluated;

the determining whether the preset part of the human body is located within the image to be evaluated according to the key point of the human body includes:

and determining whether the preset part of the human body is positioned in the image to be evaluated or not according to the human body key point and the human face key point.

In one embodiment, the image detection module 1120 is configured to:

taking an image to be evaluated as an image to be segmented, or taking a local image containing a human body and human body attachments in the image to be evaluated as an image to be segmented;

performing semantic segmentation on an image to be segmented to obtain semantic segmentation information of the image to be segmented, wherein the semantic segmentation information comprises a classification result of each pixel point in the image to be segmented;

and optimizing the semantic segmentation information of the image to be segmented, and segmenting the human body and the human body attachments by utilizing the optimized semantic segmentation information.

In one embodiment, the semantic segmentation information of the image to be segmented further includes confidence levels of each pixel point in the image to be segmented corresponding to different semantic categories; the above optimizing the semantic segmentation information of the image to be segmented includes:

In an embodiment, the optimizing semantic segmentation information of the image to be segmented includes:

obtaining semantic segmentation information of at least one frame of adjacent images of an image to be evaluated or semantic segmentation information of local images in the adjacent images as reference semantic segmentation information;

In one embodiment, the image evaluation module 1130 is configured to:

determining a comprehensive evaluation value of the human body attachments according to the position and morphological characteristics of the human body attachments in the image to be evaluated;

In one embodiment, the image evaluation module 1130 is configured to:

when the human body attachment is a flexible object, determining a stretching degree evaluation value of the human body attachment according to the size of the human body attachment and the dispersion degree of pixel point coordinates in the human body attachment, and determining a relative position evaluation value of the human body attachment according to the distance between the human body attachment and the human body;

and determining a comprehensive evaluation value of the human body attachments based on the extension degree evaluation value and the relative position evaluation value of the human body attachments.

In one embodiment, the image evaluation module 1130 is configured to:

when the attachment is a rigid object, determining a posture evaluation value of the human body attachment according to the size of the human body attachment, and determining a motion degree evaluation value of the human body attachment according to the distance between the human body attachment and the human body;

and determining a comprehensive evaluation value of the human body attachments based on the posture evaluation value and the motion degree evaluation value of the human body attachments.

The details of the above-mentioned parts of the apparatus 1100 are described in detail in the method part embodiments, and the details that are not disclosed can be referred to the method part embodiments, and thus are not described again.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which may be implemented in the form of a program product, including program code for causing an electronic device to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification, when the program product is run on the electronic device. In an alternative embodiment, the program product may be embodied as a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. An image selection method, comprising:

acquiring a plurality of images to be evaluated;

detecting human body attachments in the image to be evaluated;

determining the evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachments in each image to be evaluated;

and selecting a target image from the plurality of images to be evaluated according to the evaluation value of the images to be evaluated.

2. The method of claim 1, wherein said obtaining a plurality of images to be evaluated comprises:

the selecting a target image from the plurality of images to be evaluated according to the evaluation value of the image to be evaluated includes:

and determining the image to be evaluated with the highest evaluation value as a target image in the multi-frame images.

3. The method according to claim 1, wherein the detecting human attachment in the image to be evaluated comprises:

detecting human body key points in the image to be evaluated;

determining whether a preset part of the human body is positioned in the image to be evaluated or not according to the key points of the human body;

and if the preset part of the human body is positioned in the image to be evaluated, segmenting the human body and the human body attachments from the image to be evaluated.

4. The method of claim 3, further comprising:

detecting key points of the human face in the image to be evaluated;

determining whether the preset part of the human body is located within the image to be evaluated according to the key point of the human body comprises the following steps:

and determining whether the preset part of the human body is positioned in the image to be evaluated according to the human body key point and the human face key point.

5. The method according to claim 3, wherein the segmenting the human body and the human body attachments from the image to be evaluated comprises:

taking the image to be evaluated as an image to be segmented, or taking a local image containing the human body and the human body attachments in the image to be evaluated as an image to be segmented;

performing semantic segmentation on the image to be segmented to obtain semantic segmentation information of the image to be segmented, wherein the semantic segmentation information comprises a classification result of each pixel point in the image to be segmented;

6. The method according to claim 5, wherein the semantic segmentation information of the image to be segmented further comprises confidence levels of each pixel point in the image to be segmented corresponding to different semantic categories; the optimizing the semantic segmentation information of the image to be segmented comprises the following steps:

and filtering the pixel points of the region where the human body is located by using the human body confidence coefficient threshold value, and filtering the pixel points of the region where the human body attachments are located by using the at least one human body attachment confidence coefficient threshold value.

7. The method according to claim 5, wherein the optimizing semantic segmentation information of the image to be segmented comprises:

obtaining semantic segmentation information of at least one frame of adjacent image of the image to be evaluated or semantic segmentation information of local images in the adjacent image to be used as reference semantic segmentation information;

and performing time domain filtering on the semantic segmentation information of the image to be segmented by using the reference semantic segmentation information so as to optimize the semantic segmentation information of the image to be segmented.

8. The method according to claim 1, wherein the determining the evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachment in each image to be evaluated comprises:

9. The method according to claim 8, wherein the determining a comprehensive evaluation value of the human body attachment according to the position and morphological characteristics of the human body attachment in the image to be evaluated comprises:

when the human body attachment is a flexible object, determining a stretching degree evaluation value of the human body attachment according to the size of the human body attachment and the dispersion degree of pixel point coordinates in the human body attachment, and determining a relative position evaluation value of the human body attachment according to the distance between the human body attachment and a human body;

and determining a comprehensive evaluation value of the human body attachment based on the extension degree evaluation value and the relative position evaluation value of the human body attachment.

10. The method according to claim 8, wherein the determining a comprehensive evaluation value of the human body attachment according to the position and morphological characteristics of the human body attachment in the image to be evaluated comprises:

when the human body attachment is a rigid object, determining a posture evaluation value of the human body attachment according to the size of the human body attachment, and determining a motion degree evaluation value of the human body attachment according to the distance between the human body attachment and a human body;

and determining a comprehensive evaluation value of the human body attachment based on the attitude evaluation value and the motion degree evaluation value of the human body attachment.

11. An image selection apparatus, comprising:

the image acquisition module is configured to acquire a plurality of images to be evaluated;

an image detection module configured to detect a human body attachment in the image to be evaluated;

the image evaluation module is configured to determine an evaluation value of each image to be evaluated according to the position and morphological characteristics of the human body attachment in each image to be evaluated;

an image selection module configured to select a target image from the plurality of images to be evaluated according to an evaluation value of the image to be evaluated.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 10.

13. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 10 via execution of the executable instructions.