CN112115740B

CN112115740B - Method and apparatus for processing image

Info

Publication number: CN112115740B
Application number: CN201910533007.2A
Authority: CN
Inventors: 施皓; 陈孟飞
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2019-06-19
Filing date: 2019-06-19
Publication date: 2024-04-09
Anticipated expiration: 2039-06-19
Also published as: CN112115740A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for processing images. One embodiment of the method comprises the following steps: acquiring a target human body image; determining a target category to which a target human body image belongs from a category set, wherein the category of the preset human body image in the preset human body image set is the category corresponding to the preset human body image set, and the preset human body image set comprises a preset human body image sub-set, and the preset human body images in the preset human body image sub-set correspond to the same person; acquiring a candidate human body image set corresponding to a target category; determining whether the similarity between a person corresponding to a preset human body image subset in the candidate human body image set and a person corresponding to the target human body image is greater than or equal to a target similarity threshold; in response to determining that the subset of the preset human body images is greater than or equal to the subset of the result human body images. The embodiment can improve the accuracy and efficiency of image matching and reduce the resources consumed by image processing.

Description

Method and apparatus for processing image

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and more particularly, to a method and apparatus for processing images.

Background

Pedestrian re-recognition (Person-identification) is also called pedestrian re-recognition, and is a technology for judging whether a specific pedestrian object exists in a photographed image by using a computer vision technology.

At present, due to the difference between different camera devices, pedestrians have the characteristics of rigidity and flexibility, and the appearance is easily influenced by wearing, dimensions, shielding, postures, visual angles and the like, so that the re-recognition of the pedestrians becomes a hot subject with research value and challenges in the field of computer vision.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatus for processing images.

In a first aspect, embodiments of the present disclosure provide a method for processing an image, the method comprising: acquiring a target human body image; determining a category to which a target human body image belongs from a predetermined category set as a target category, wherein the category in the category set corresponds to a preset human body image set, the category of the preset human body image in the preset human body image set is the category corresponding to the preset human body image set, the preset human body image set comprises at least one preset human body image sub-set, and the preset human body images in the preset human body image sub-sets correspond to the same person; acquiring a preset human body image set corresponding to a target category as a candidate human body image set; determining whether the similarity between a person corresponding to a preset human body image subset in the candidate human body image set and a person corresponding to the target human body image is greater than or equal to a target similarity threshold; and in response to determining that the preset human body image subset which is larger than or equal to the target similarity threshold value is determined as a result human body image subset corresponding to the target human body image.

In some embodiments, the set of categories includes a contained face class and a non-contained face class divided based on whether the human image contains a face object; and determining, from a predetermined set of categories, a category to which the target human body image belongs as a target category includes: inputting the target human body image into a pre-trained human face detection model to obtain a detection result, wherein the detection result is used for indicating whether the target human body image contains a human face object or not; based on the detection result, a category to which the target human body image belongs is determined from the category set as a target category.

In some embodiments, the set of categories includes a front class, a side class, and a back class that are partitioned based on the orientation of the human object in the human image; and determining, from a predetermined set of categories, a category to which the target human body image belongs as a target category includes: inputting the target human body image into a pre-trained human face detection model to obtain a detection result, wherein the detection result is used for indicating whether the target human body image contains a human face object or not; inputting a target human body image into a pre-trained human body skeleton key point recognition model to obtain a recognition result, wherein the recognition result is used for indicating the position of the human body skeleton key point in the target human body image; based on the obtained detection result and recognition result, a category to which the target human body image belongs is determined from the category set as a target category.

In some embodiments, the method further comprises: and adding the target human body image into the result human body image sub-set to obtain an updated result human body image sub-set.

In some embodiments, the target human body image is a human body image extracted from a target human body video; and after determining the preset human body image subset which is greater than or equal to the target similarity threshold as the result human body image subset corresponding to the target human body image, the method further comprises: selecting a target number of personal images from the target human body video; for human body images in the target number of human body images, the following processing steps are performed: determining the category to which the human body image belongs from the category set; and adding the human body image into the result human body image sub-set to obtain an updated result human body image sub-set in response to the human body image belonging to the same category as the target category.

In some embodiments, the processing step further comprises: determining a preset human body image subset corresponding to the same person as the result human body image subset from the preset human body image set corresponding to the category to which the human body image belongs in response to the fact that the category to which the human body image belongs is different from the target category; and adding the human body image into the determined preset human body image subset to obtain an updated preset human body image subset.

In some embodiments, determining whether the similarity of the person corresponding to the subset of the preset human body images in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to the target similarity threshold comprises: for a preset human body image subset in the candidate human body image set, performing the following steps: determining the similarity between the preset human body image included in the preset human body image subset and the target human body image as candidate similarity; based on the obtained candidate similarity, generating the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image; determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold.

In some embodiments, the preset subset of human images corresponds to a target similarity threshold; and determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold value comprises: determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold corresponding to the preset human body image subset.

In some embodiments, the target similarity threshold corresponding to the preset human body image subset is obtained by: determining the number of preset human body images included in the preset human body image subset; and based on the corresponding relation between the number and the similarity threshold, acquiring the similarity threshold corresponding to the determined number as a target similarity threshold corresponding to the preset human body image subset, wherein the number and the similarity threshold are positively correlated.

In some embodiments, determining the subset of preset human body images that is greater than or equal to the target similarity threshold as the subset of resultant human body images corresponding to the target human body image includes: determining whether the candidate human body image set comprises at least two preset human body image subsets with the similarity greater than or equal to a corresponding target similarity threshold value; the response comprises the step of selecting a preset human body image subset with the largest number of the included preset human body images from at least two preset human body image subsets with the similarity larger than or equal to the corresponding target similarity threshold value to be used as a result human body image subset corresponding to the target human body image.

In some embodiments, after determining the preset human body image subset that is greater than or equal to the target similarity threshold as the resultant human body image subset corresponding to the target human body image, the method further comprises: and outputting prompt information for representing and determining a result human body image subset corresponding to the target human body image.

In a second aspect, embodiments of the present disclosure provide an apparatus for processing an image, the apparatus comprising: a first acquisition unit configured to acquire a target human body image; a first determining unit configured to determine, as a target category, a category to which the target human body image belongs from a predetermined category set, wherein the category in the category set corresponds to a preset human body image set, the category of the preset human body image in the preset human body image set is the category corresponding to the preset human body image set, the preset human body image set includes at least one preset human body image sub-set, and the preset human body images in the preset human body image sub-set correspond to the same person; the second acquisition unit is configured to acquire a preset human body image set corresponding to the target category as a candidate human body image set; a second determining unit configured to determine whether a similarity between a person corresponding to a preset human body image subset in the candidate human body image set and a person corresponding to the target human body image is greater than or equal to a target similarity threshold; and a third determining unit configured to determine, as a result human body image subset corresponding to the target human body image, a preset human body image subset that is greater than or equal to the target similarity threshold in response to determining that the similarity of the person corresponding to the preset human body image subset in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to the target similarity threshold.

In some embodiments, the set of categories includes a contained face class and a non-contained face class divided based on whether the human image contains a face object; the first determination unit includes: the first input module is configured to input a target human body image into a pre-trained human face detection model to obtain a detection result, wherein the detection result is used for indicating whether the target human body image contains a human face object or not; the first determining module is configured to determine a category to which the target human body image belongs from the category set as a target category based on the detection result.

In some embodiments, the set of categories includes a front class, a side class, and a back class that are partitioned based on the orientation of the human object in the human image; the first determination unit includes: the second input module is configured to input the target human body image into a pre-trained human face detection model to obtain a detection result, wherein the detection result is used for indicating whether the target human body image contains a human face object or not; the third input module is configured to input a target human body image into a pre-trained human body skeleton key point recognition model to obtain a recognition result, wherein the recognition result is used for indicating the position of the human body skeleton key point in the target human body image; and a second determination module configured to determine, as a target category, a category to which the target human body image belongs from the category set based on the obtained detection result and the identification result.

In some embodiments, the apparatus further comprises: and the adding unit is configured to add the target human body image to the result human body image subset to obtain an updated result human body image subset.

In some embodiments, the target human body image is a human body image extracted from a target human body video; the apparatus further comprises: a selecting unit configured to select a target number of personal images from a target personal video; an execution unit configured to execute, for a human body image in the target number of human body images, the following processing steps: determining the category to which the human body image belongs from the category set; and adding the human body image into the result human body image sub-set to obtain an updated result human body image sub-set in response to the human body image belonging to the same category as the target category.

In some embodiments, the second determining unit is further configured to: for a preset human body image subset in the candidate human body image set, performing the following steps: determining the similarity between the preset human body image included in the preset human body image subset and the target human body image as candidate similarity; based on the obtained candidate similarity, generating the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image; determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold.

In some embodiments, the preset subset of human images corresponds to a target similarity threshold; and the second determination unit is further configured to: determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold corresponding to the preset human body image subset.

In some embodiments, the third determining unit comprises: a third determining module configured to determine whether the candidate human body image set includes at least two preset human body image subsets with similarity greater than or equal to a corresponding target similarity threshold; the selecting module is configured to select a preset human body image subset with the largest number of the included preset human body images from at least two preset human body image subsets with the similarity larger than or equal to the corresponding target similarity threshold value to be used as a result human body image subset corresponding to the target human body image in response to the selection.

In some embodiments, the apparatus further comprises: and the output unit is configured to output prompt information for representing a result human body image subset corresponding to the determined target human body image.

In a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for processing an image described above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements a method of any of the embodiments of the methods for processing images described above.

According to the method and the device for processing the images, the target human body images are obtained, then the category which the target human body images belong to is determined from the predetermined category set to be used as the target category, the category in the category set corresponds to the preset human body image set, the category of the preset human body images in the preset human body image set is the category corresponding to the preset human body image set, the preset human body image set comprises at least one preset human body image sub-set, the preset human body images in the preset human body image sub-set correspond to the same person, then the preset human body image set corresponding to the target category is obtained as a candidate human body image set, then whether the similarity between the person corresponding to the preset human body image sub-set in the candidate human body image set and the person corresponding to the target human body image is larger than or equal to the target similarity threshold is determined, and then the preset human body image sub-set which is larger than or equal to the target similarity threshold is determined to be the result human body image sub-set corresponding to the target human body image according to the determination, and because the images of the same category generally comprise similar image features, the method can be used for obtaining the matching result sub-set from the preset human body images corresponding to the target human body image sub-set, and the matching result can be more accurate; in addition, in the process of matching the result human body image subsets, the method only needs to match the preset human body image subsets in the preset human body image sets corresponding to the target categories, and compared with the method which does not classify the human body image subsets and matches all the preset human body image subsets, the method can reduce the complexity of matching, improve the efficiency of image processing and reduce the resources consumed in the image processing process.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method for processing an image according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for processing images according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a method for processing an image according to the present disclosure;

FIG. 5 is a schematic structural view of one embodiment of an apparatus for processing images according to the present disclosure;

fig. 6 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the methods of the present disclosure for processing images or apparatuses for processing images may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as an image processing class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, various electronic devices with cameras may be used, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg 3), MP4 (Moving Picture Experts Group Audio Layer IV, mpeg 4) players, laptop and desktop computers, etc. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server that provides various services, such as an image processing server that processes a target human body image obtained by photographing by the terminal devices 101, 102, 103. The image processing server may perform analysis or the like on the received data of the target human body image or the like, and obtain a processing result (e.g., a result human body image subset). In some application scenarios, the image processing server may also feed back the processing result to the terminal device.

It should be noted that, the method for processing an image provided by the embodiment of the present disclosure may be performed by the terminal devices 101, 102, 103, or may be performed by the server 105, and accordingly, the apparatus for processing an image may be provided in the terminal devices 101, 102, 103, or may be provided in the server 105.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., a plurality of software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In particular, in the case where the data used in the process of generating the resulting sub-set of body images need not be acquired from a remote location, the system architecture described above may not include a network, but may include only a terminal device or server.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for processing an image according to the present disclosure is shown. The method for processing an image comprises the steps of:

step 201, a target human body image is acquired.

In the present embodiment, an execution subject of the method for processing an image (e.g., a server shown in fig. 1) may acquire a target human body image from a remote location or a local location by a wired connection or a wireless connection. The target human body image may be a human body image of whether a person corresponding to the target human body image is a predetermined person to be determined.

In practice, the target human body image may be an image obtained by photographing a human body (e.g., a pedestrian). Specifically, the executing body or other electronic devices may control the camera to perform continuous shooting, and in response to shooting a human body object, acquire an image of the shot human body object as a target human body image.

Step 202, determining a category to which the target human body image belongs from a predetermined category set as a target category.

In this embodiment, based on the target human body image obtained in step 201, the execution subject may determine, as the target category, a category to which the target human body image belongs from a predetermined category set. The categories in the category set are classified by technicians according to a specific human body image classification mode. For example, the classification is performed according to whether the human body image contains a human face object, and the classification can be performed to classify the human body image into a class containing a human face and a class not containing a human face, and further, the class set can include a class containing a human face and a class not containing a human face; according to the sex division of the characters corresponding to the human body image, the male type and the female type can be divided, and further, the category set can comprise the male type and the female type.

In this embodiment, the categories in the category set correspond to a preset human body image set. Specifically, for each category in the category set, a preset human body image set corresponding to the category is predetermined, and the categories of the preset human body images in the preset human body image set corresponding to the category are all the category (for example, the sexes of the people corresponding to the preset human body images in the preset human body image set corresponding to the male category are all men).

In this embodiment, the preset human body image set may be a human body image set formed by previously using an existing human body image. Each preset human body image set comprises at least one preset human body image subset. The preset human body images in the preset human body image subset correspond to the same person.

As an example, the preset human body image set a may include a preset human body image sub-set a and a preset human body image sub-set b. The preset human body images in the preset human body image subset a can be images obtained by shooting the person a in advance, and then the preset human body images in the preset human body image subset a correspond to the person a; the preset human body images in the preset human body image sub-set b may be images obtained by photographing the person b in advance, and then the preset human body images in the preset human body image sub-set b correspond to the person b.

Specifically, the preset human body image subset may be used for matching with the target human body image to determine whether the person corresponding to the preset human body image subset is the same as the person corresponding to the target human body image. Here, the preset human body images corresponding to a certain person are combined into the preset human body image subset in advance, so that the number of the preset human body images for matching the person can be increased, and further matching accuracy when the person corresponding to the target human body image is matched with the person can be improved.

In this embodiment, the execution subject may determine, from the category set, the target category to which the target human body image belongs by using various methods based on the classification manner of the categories in the category set.

In some optional implementations of the present embodiment, the category set may include a face-containing class and a face-non-containing class divided based on whether the human image contains a face object; and the execution subject may determine a target category to which the target human body image belongs from the category set by: first, the execution subject may input the target human body image into a face detection model trained in advance, to obtain a detection result. Then, the execution subject may determine, based on the detection result, a target category to which the target human body image belongs from the category set.

In this implementation, the face detection model is used to detect whether the input image contains a face object. Specifically, as an example, the face detection model may be a model obtained after training an initial model (for example, a convolutional neural network) using a machine learning method based on a training sample. The above detection result is used to indicate whether the target human body image contains a human face object, and may include, but is not limited to, at least one of the following: literal, numeric, symbolic, image. For example, the number of the cells to be processed, the detection result may include a number "0" or a number "1". Wherein, the number "0" may be used to indicate that the target human image does not contain a human face object; the number "1" may be used to indicate that the target body image contains a face object.

Specifically, the executing body may determine, in response to determining that the detection result indicates that the target human body image includes a human face object, that a target class to which the target human body image belongs includes a human face class; and in response to the determination that the detection result indicates that the target human body image does not contain a human face object, determining that the target class to which the target human body image belongs does not contain a human face class.

In some optional implementations of the present embodiment, the set of categories may include a front class, a side class, and a back class that are partitioned based on the orientation of the human object in the human image; and the execution subject may determine a target category to which the target human body image belongs from a predetermined category set by:

firstly, inputting a target human body image into a pre-trained face detection model to obtain a detection result, wherein the detection result is used for indicating whether the target human body image contains a face object.

Then, inputting the target human body image into a pre-trained human body skeleton key point recognition model to obtain a recognition result, wherein the recognition result is used for indicating the position of the human body skeleton key point in the target human body image.

Here, the human skeleton key point recognition model may be used to recognize the human body image to determine the position of the human skeleton key point in the human body image. Specifically, as an example, the human skeleton key point recognition model may be a model obtained after training an initial model (for example, a convolutional neural network) by using a machine learning method based on a training sample.

In practice, the key points of human bones refer to key skeletal points on a human subject, and may generally include points corresponding to the head, points corresponding to the shoulders, points corresponding to the crotch, points corresponding to the knees, and the like.

Finally, based on the obtained detection result and recognition result, a category to which the target human body image belongs is determined from the category set as a target category.

Specifically, the execution subject may determine, from the category set, the target category to which the target human body image belongs, using various methods, based on the obtained detection result and the obtained identification result.

As an example, the above-described execution subject may directly determine, in response to determining that the detection result indicates that the target human body image does not contain a human face object, a back class in the class set to which the target human body image belongs. It is understood that when the orientation of the human object in the target human image is the back surface, the human face object cannot be detected from the target human image, and when the orientation of the human object in the target human image is the front surface or the side surface, the human face object can be detected from the target human image, so the execution subject can determine whether the target human image belongs to the back surface class based on whether the target human image contains the human face object.

In this example, if the detection result indicates that the target human body image contains a human face object, it may be determined that the target human body image belongs to a front class or a side class. And specifically belongs to which of the front class and the side class, the execution subject can be determined based on the recognition result output by the human skeleton key point recognition model.

Specifically, the executing body may determine, by using various methods, a target class to which the target human body image belongs from the front class and the side class, for example, the recognition result includes a position of a key point corresponding to a shoulder, and the executing body may determine whether a distance between the key points corresponding to the two shoulders is greater than or equal to a preset threshold, and determine, in response to the distance between the key points being greater than or equal to the preset threshold, that the target human body image belongs to the front class; and if the target human body image is smaller than the preset threshold value, determining that the target human body image belongs to the side class. In order to improve accuracy of the target category, the execution subject may determine the target category to which the target human image belongs based on a plurality of human bone key points at the same time, for example, may determine the target category based on a key point corresponding to a shoulder and a key point corresponding to a crotch at the same time (here, for the frontal category, a distance between the key points corresponding to two crotch is large, and for the lateral category, a distance between the key points corresponding to two crotch is small).

In particular, for the back class, the execution body may determine whether the target human body image belongs to the back class based on whether the target human body image includes a human face object and whether the human skeleton key points in the target human body image meet a preset requirement (for example, a distance between the key points corresponding to the two shoulders is greater than or equal to a preset threshold). With this, the accuracy of category identification can be further improved.

Step 203, obtaining a preset human body image set corresponding to the target category as a candidate human body image set.

In this embodiment, based on the target category determined in step 202, the execution subject may acquire, as the candidate human body image set, a preset human body image set corresponding to the target category based on a correspondence between the category and the preset human body image set.

Step 204, determining whether the similarity between the person corresponding to the preset human body image subset in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to a target similarity threshold.

In this embodiment, for the preset human body image subset in the candidate human body image set obtained in step 203, the execution subject may first determine whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to the target similarity threshold. Where similarity is a numerical value used to characterize the degree of similarity between two objects (e.g., two people, two images, etc.), specifically, the greater the similarity, the higher the degree of similarity that can be characterized for the two objects. The target similarity threshold may be a predetermined similarity threshold, or may be a similarity threshold determined during the current execution of step 204. The similarity threshold is the minimum value of the similarity.

Specifically, for a preset human body image subset in the candidate human body image set, the executing body may determine whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to the target similarity threshold by using various methods.

In some optional implementations of this embodiment, for a preset subset of the candidate human body images, the executing body may execute the following steps: determining the similarity between the preset human body image included in the preset human body image subset and the target human body image as candidate similarity; based on the obtained candidate similarity, generating the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image; determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold.

In this implementation manner, the executing body may adopt various methods, based on the obtained candidate similarity, generate a similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image, for example, may perform average calculation on the obtained candidate similarity, and obtain a calculation result as a similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image; or, the smallest candidate similarity among the obtained candidate similarities may be determined as the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image.

It should be noted that, the execution body may determine the candidate similarity by using various image similarity calculation methods, for example, a feature point-based similarity calculation method, a perceptual hash algorithm-based similarity calculation method, or a peak signal-to-noise ratio-based similarity calculation method may be used.

In some optional implementations of this embodiment, the preset subset of human images corresponds to a target similarity threshold (i.e., different preset subsets of human images correspond to different target similarity thresholds); and for a preset human body image subset in the candidate human body image set, the execution body can determine whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold corresponding to the preset human body image subset.

It can be understood that the human body characteristics of different people are different, and the matching difficulty is different, so that the implementation mode can set different target similarity thresholds for different preset human body image subsets, thereby improving the matching adaptability and realizing more flexible image matching.

In some optional implementations of this embodiment, the target similarity threshold corresponding to the preset human body image subset may be obtained by the execution body or other electronic devices through the following steps: determining the number of preset human body images included in the preset human body image subset; and based on the corresponding relation between the number and the similarity threshold, acquiring the similarity threshold corresponding to the determined number as a target similarity threshold corresponding to the preset human body image subset, wherein the number and the similarity threshold are positively correlated (namely, the larger the number is, the larger the similarity threshold is).

It can be understood that the more the number of preset human body images included in the preset human body image subset is, the more the reference data is used for matching the human body, and the more accurate the obtained matching result is, at this time, a stricter judgment standard (corresponding to a larger similarity threshold) can be set to determine whether the human body corresponding to the target human body image is matched with the human body corresponding to the preset human body image subset; on the contrary, the fewer the number of preset human body images included in the preset human body image subset, the fewer the reference data for matching the human body images, and the more inaccurate the obtained matching result, at this time, a looser judgment standard (corresponding to a smaller similarity threshold) may be set to determine whether the human body corresponding to the target human body image matches the human body corresponding to the preset human body image subset. Therefore, on the premise of ensuring the matching precision, the passing rate of matching can be improved, and the application range of image matching can be enlarged.

In step 205, in response to determining that the similarity between the person corresponding to the preset subset of human body images in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to the target similarity threshold, determining the preset subset of human body images greater than or equal to the target similarity threshold as the result subset of human body images corresponding to the target human body image.

In this embodiment, the executing body may determine, as the resultant sub-set of human body images corresponding to the target human body image, the sub-set of preset human body images that is greater than or equal to the target similarity threshold in response to determining that the similarity between the person corresponding to the sub-set of preset human body images in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to the target similarity threshold. The result human body image subset is a matched preset human body image subset which is most likely to correspond to the same person corresponding to the target human body image.

Specifically, the executing body may first determine a similarity between a person corresponding to each preset subset of the candidate human body images and a person corresponding to the target human body image. Then, the executing body may acquire a preset human body image subset with a similarity greater than or equal to the target similarity threshold as a result human body image subset corresponding to the target human body image.

In some optional implementations of this embodiment, after obtaining the result human body image subset corresponding to the target human body image, the execution subject may add the target human body image to the result human body image subset, to obtain the updated result human body image subset. Therefore, the number of the human body images included in the result human body image sub-set can be increased, and the matching precision can be improved when the updated result human body image sub-set is used for image matching later.

It should be noted that, if each preset sub-set of human body images corresponds to a target similarity threshold, and the target similarity threshold is determined based on the number of human body images included in the corresponding preset sub-set of human body images, in this implementation manner, as the number of human body images in the updated sub-set of result human body images increases, the target similarity threshold corresponding to the updated sub-set of result human body images increases.

In some optional implementations of this embodiment, the executing body may determine the subset of preset human body images that is greater than or equal to the target similarity threshold as the subset of resultant human body images corresponding to the target human body image by: firstly, the execution body may determine whether the candidate human body image set includes at least two preset human body image subsets with similarity greater than or equal to a corresponding target similarity threshold. Then, the executing body may select, in response to determining that at least two preset human body image subsets with similarity greater than or equal to the corresponding target similarity threshold are included in the candidate human body image set, a subset of preset human body images with the largest number of preset human body images included from the at least two preset human body image subsets with similarity greater than or equal to the corresponding target similarity threshold as a result human body image subset corresponding to the target human body image.

It can be understood that the more the number of preset human body images in the preset human body image subsets is, the larger the target similarity threshold corresponding to the preset human body image subsets can be, and further, the higher the matching precision when the preset human body image subsets are used for image matching, so that the accuracy of the matched result human body image subsets can be further improved by selecting the preset human body image subset with the largest number of the preset human body images from the preset human body image subsets with at least two similarity values larger than or equal to the target similarity threshold as the result human body image subsets.

In some optional implementations of this embodiment, after obtaining the result sub-set of human body images, the execution body may further output prompt information for characterizing the result sub-set of human body images corresponding to the determined target human body image. Specifically, the execution main body can output and present the prompt information; or the executing body may output the prompt information to other electronic devices (such as the terminal device shown in fig. 1) in the communication connection, so that the electronic device presents the prompt information. Here, the hint information may be information in various forms (e.g., text, image, audio, video, etc.). Specifically, the prompt information may be predetermined information (e.g., "1"), or may be information generated based on the result human body image subset (e.g., the result human body subset is directly determined as the prompt information).

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing an image according to the present embodiment.

In the application scenario of fig. 3, the server 31 may first acquire the target human body image 33 transmitted by the mobile phone 32. Then, the server 31 may determine, as the target class 35, a class to which the target human body image 33 belongs from a predetermined class set 34, where the class set 34 includes a first class 341 (for example, including a human face class) and a second class 342 (for example, including no human face class), the first class 341 and the second class 342 correspond to a preset human body image set 361 and a preset human body image set 362, respectively, the class of the preset human body image in the preset human body image set 361 is the first class 341, the class of the preset human body image in the preset human body image set 362 is the second class 342, and each of the preset human body image sets 361 and 362 includes at least one preset human body image subset, and the preset human body images in the preset human body image subset correspond to the same person.

Then, the server 31 may acquire a preset human body image set corresponding to the target category 35 as the candidate human body image set 37.

Finally, the server 31 may determine whether the similarity between the person corresponding to the preset subset of the candidate human body images 37 and the person corresponding to the target human body image 33 is greater than or equal to the target similarity threshold, and in response to determining that the similarity is greater than or equal to the target similarity threshold, determine the preset subset of human body images as the resultant subset of human body images 38 corresponding to the target human body image 33.

The method provided by the embodiment of the disclosure can match the human body image sub-set from the preset human body image set corresponding to the target category based on the target category to which the target human body image belongs, so that the matching precision can be improved, and a more accurate result human body image sub-set is obtained; in addition, in the process of matching the result human body image subsets, the method only needs to match the preset human body image subsets in the preset human body image sets corresponding to the target categories, and compared with the method which does not classify the human body image subsets and matches all the preset human body image subsets, the method can reduce the complexity of matching, improve the efficiency of image processing and reduce the resources consumed in the image processing process; in addition, the human body images with the similarity meeting the preset requirement can be determined in batches based on the matching of the preset human body image subset and the target human body image, so that the diversity of the determined human body images can be improved, and meanwhile, the accuracy of image matching can be further improved.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for processing an image is shown. The flow 400 of the method for processing an image comprises the steps of:

step 401, acquiring a target human body image.

In the present embodiment, an execution subject of the method for processing an image (e.g., a server shown in fig. 1) may acquire a target human body image from a remote location or a local location by a wired connection or a wireless connection. The target human body image is a human body image extracted from a target human body video. The target human body video may be a human body video of whether or not a person to which it is to be determined corresponds is a predetermined person. Specifically, the target human body video may be a video obtained by photographing a human body (e.g., a pedestrian).

It can be understood that, since the target human body video is obtained by photographing a human body, the video frames in the target human body video may include human body objects, and further, the video frames in the target human body video are substantially human body images, and extracting the human body images from the target human body video is equivalent to extracting the video frames from the target human body video. Specifically, the execution body or other electronic devices may extract the human body image from the target human body video as the target human body image by using various methods. For example, a random extraction may be employed, or a human body image with higher definition may be extracted as the target human body image.

Step 402, determining a category to which the target human body image belongs from a predetermined category set as a target category.

In this embodiment, based on the target human body image obtained in step 401, the execution subject may determine, as the target category, a category to which the target human body image belongs from a predetermined category set. Wherein, the category in the category set corresponds to the preset human body image set. The preset human body image set may be a human body image set composed by previously using an existing human body image. Each preset human body image set comprises at least one preset human body image subset. The preset human body images in the preset human body image subset correspond to the same person.

Step 403, obtaining a preset human body image set corresponding to the target category as a candidate human body image set.

In this embodiment, based on the target category determined in step 402, the execution subject may acquire, as the candidate human body image set, a preset human body image set corresponding to the target category based on a correspondence between the category and the preset human body image set.

Step 404, determining whether the similarity between the person corresponding to the preset subset of the candidate human body images and the person corresponding to the target human body image is greater than or equal to the target similarity threshold.

In this embodiment, for the preset human body image subset in the candidate human body image set obtained in step 403, the execution subject may first determine whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to the target similarity threshold. The similarity is a numerical value for representing the similarity degree between two objects, and specifically, the greater the similarity degree, the higher the similarity degree between two objects can be represented. The target similarity threshold may be a predetermined similarity threshold, or may be a similarity threshold determined during the current execution of step 404.

In step 405, in response to determining that the similarity between the person corresponding to the preset subset of human body images in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to the target similarity threshold, determining the preset subset of human body images greater than or equal to the target similarity threshold as the result subset of human body images corresponding to the target human body image.

The steps 401, 402, 403, 404 and 405 may be performed in a similar manner to the steps 201, 202, 203, 204 and 205 in the foregoing embodiments, and the descriptions of the steps 201, 202, 203, 204 and 205 are also applicable to the steps 401, 402, 403, 404 and 405, which are not repeated herein.

Step 406, selecting a target number of personal images from the target personal video.

In this embodiment, the execution subject may acquire a target human body video to which the target human body image belongs, and select a target number of human body images from the target human body video. The target number may be a preset number (e.g., "3"), or may be a number determined based on the number of video frames included in the target human body video (e.g., the target number is one half of the number of video frames included in the target human body video).

Specifically, the executing body may select the target number of personal images from the target human body video by using various methods, for example, the executing body may select by adopting a random selection manner, or may select the target number of personal images ranked in front in a sequence of personal images (i.e. a video frame sequence) corresponding to the target human body video.

Step 407, for a human body image in the target number of human body images, performing the following processing steps: determining the category to which the human body image belongs from the category set; and adding the human body image into the result human body image sub-set to obtain an updated result human body image sub-set in response to the human body image belonging to the same category as the target category.

In the present embodiment, the above-described execution subject may execute the following processing steps for the human body images among the target number of human body images obtained in step 406:

first, determining the category to which the human body image belongs from a category set.

Here, the category to which the human body image belongs may be determined by referring to the determination manner of the target category in step 402, which is not described herein.

And a second step of adding the human body image to the result human body image sub-set to obtain an updated result human body image sub-set in response to the human body image belonging to the same category as the target category.

It can be understood that the result human body image subset is matched, which can indicate that the person corresponding to the target human body image is matched with the person corresponding to the result human body image subset, and further, can indicate that the person corresponding to the target human body video is matched with the person corresponding to the result human body image subset. However, the resulting sub-set of body images also corresponds to the target category (e.g., the front category), so if a body image selected from the target body video is to be added to the resulting sub-set of body images, it is also necessary to ensure that the selected category of body images is the target category.

In some optional implementations of this embodiment, the foregoing processing steps may further include:

thirdly, determining a preset human body image subset corresponding to the same person as the result human body image subset from the preset human body image set corresponding to the category to which the human body image belongs in response to the fact that the category to which the human body image belongs is different from the target category; and adding the human body image into the determined preset human body image subset to obtain an updated preset human body image subset.

In practice, the preset human body image subsets corresponding to the same person may be pre-established with an association relationship (for example, the same person label is stored in an associated manner), and further, the executing body may firstly obtain the preset human body image set corresponding to the category (for example, a side category) to which the human body image belongs in response to the category to which the human body image belongs being different from the target category, and then determine the preset human body image subset having the association relationship with the result human body image subset from the preset human body image set, that is, determine the preset human body image subset corresponding to the category to which the human body image belongs and corresponding to the same person as the human body image.

Here, through the above processing steps, the number of human body images included in the human body image sub-set (including the result human body image sub-set and the preset human body image sub-set) can be increased, which is helpful to improve matching accuracy when the updated human body image sub-set is subsequently utilized for matching the characters.

It should be noted that, if each preset sub-set of human body images corresponds to a target similarity threshold, and the target similarity threshold is determined based on the number of human body images included in the corresponding preset sub-set of human body images, after the above processing step is performed, as the number of human body images in the updated sub-set of human body images increases, the target similarity threshold corresponding to the updated sub-set of human body images increases.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for processing images in this embodiment highlights that after determining the sub-set of the result body images corresponding to the target body images, a target number of body images are selected from the target body videos to which the target body images belong, and for the body images in the target number of body images, the category to which the body images belong is determined from the category set, and in response to the category to which the body images belong being the same as the target category, the body images are added to the sub-set of the result body images, so as to obtain the updated sub-set of the result body images. Therefore, after the result human body image subset is determined, the number of human body images in the result human body image subset can be increased based on the target human body video to which the target human body image belongs, so that diversity of the result human body image subset can be improved, data for matching can be increased when the result human body image subset is used for matching new target human body images later, and accuracy of image matching can be improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for processing an image, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for processing an image of the present embodiment includes: a first acquisition unit 501, a first determination unit 502, a second acquisition unit 503, a second determination unit 504, and a third determination unit 505. Wherein the first acquisition unit 501 is configured to acquire a target human body image; the first determining unit 502 is configured to determine, as a target category, a category to which the target human body image belongs from a predetermined category set, where the category in the category set corresponds to a preset human body image set, the category of the preset human body image in the preset human body image set is the category corresponding to the preset human body image set, the preset human body image set includes at least one preset human body image sub-set, and the preset human body images in the preset human body image sub-set correspond to the same person; the second obtaining unit 503 is configured to obtain a preset human body image set corresponding to the target category as a candidate human body image set; the second determining unit 504 is configured to determine whether a similarity of a person corresponding to a preset human body image subset in the candidate human body image set and a person corresponding to the target human body image is greater than or equal to a target similarity threshold; the third determining unit 505 is configured to determine, as the resultant human body image subset corresponding to the target human body image, a subset of preset human body images that is greater than or equal to the target similarity threshold in response to determining that the similarity of the person corresponding to the subset of preset human body images in the candidate human body image set to the person corresponding to the target human body image is greater than or equal to the target similarity threshold.

In the present embodiment, the first acquisition unit 501 of the apparatus 500 for processing an image may acquire a target human body image from a remote location or a local location through a wired connection or a wireless connection. The target human body image may be a human body image of whether a person corresponding to the target human body image is a predetermined person to be determined.

In the present embodiment, based on the target human body image obtained by the first obtaining unit 501, the first determining unit 502 may determine, as the target category, a category to which the target human body image belongs from a predetermined category set. Wherein, the category in the category set corresponds to the preset human body image set. The preset human body image set may be a human body image set composed by previously using an existing human body image. Each preset human body image set comprises at least one preset human body image subset. The preset human body images in the preset human body image subset correspond to the same person.

In this embodiment, based on the target category determined by the first determining unit 502, the second obtaining unit 503 may obtain a preset human body image set corresponding to the target category as the candidate human body image set.

In this embodiment, for a preset human body image subset in the candidate human body image set obtained by the second obtaining unit 503, the second determining unit 504 may first determine whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to the target similarity threshold. The similarity is a numerical value for representing the similarity degree between two objects, and the target similarity threshold may be a predetermined similarity threshold or a similarity threshold determined currently.

In this embodiment, the third determining unit 505 may determine, as the resultant human body image sub-set corresponding to the target human body image, a preset human body image sub-set that is greater than or equal to the target similarity threshold in response to determining that the similarity of the person corresponding to the preset human body image sub-set in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to the target similarity threshold. The result human body image subset is the matched human body image subset which is most likely to correspond to the same person with the target human body image.

It will be appreciated that the elements described in the apparatus 500 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 500 and the units contained therein, and are not described in detail herein.

The device 500 provided in the above embodiment of the present disclosure may match a subset of human body images from a preset human body image set corresponding to a target category based on the target category to which the target human body image belongs, so that the matching accuracy may be improved, and further a more accurate subset of the result human body images may be obtained; in addition, in the process of matching the result human body image subsets, the method only needs to match the preset human body image subsets in the preset human body image sets corresponding to the target categories, and compared with the method which does not classify the human body image subsets and matches all the preset human body image subsets, the method can reduce the complexity of matching, improve the efficiency of image processing and reduce the resources consumed in the image processing process; in addition, the human body images with the similarity meeting the preset requirement can be determined in batches based on the matching of the preset human body image subset and the target human body image, so that the diversity of the determined human body images can be improved, and meanwhile, the accuracy of image matching can be further improved.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., a terminal device or server in fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency) and the like, or any suitable combination of the above.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target human body image; determining a category to which a target human body image belongs from a predetermined category set as a target category, wherein the category in the category set corresponds to a preset human body image set, the category of the preset human body image in the preset human body image set is the category corresponding to the preset human body image set, the preset human body image set comprises at least one preset human body image sub-set, and the preset human body images in the preset human body image sub-sets correspond to the same person; acquiring a preset human body image set corresponding to a target category as a candidate human body image set; determining whether the similarity between a person corresponding to a preset human body image subset in the candidate human body image set and a person corresponding to the target human body image is greater than or equal to a target similarity threshold; and in response to determining that the preset human body image subset which is larger than or equal to the target similarity threshold value is determined as a result human body image subset corresponding to the target human body image.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit is not limited to the unit itself in some cases, and for example, the first acquisition unit may also be described as "a unit that acquires a target human body image".

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. A method for processing an image, comprising:

acquiring a target human body image, wherein the target human body image is a human body image extracted from a target human body video;

determining a category to which the target human body image belongs from a predetermined category set as a target category, wherein the category in the category set corresponds to a preset human body image set, the category of the preset human body image in the preset human body image set corresponds to the category corresponding to the preset human body image set, the preset human body image set comprises at least one preset human body image sub-set, and the preset human body images in the preset human body image sub-sets correspond to the same person;

Acquiring a preset human body image set corresponding to a target category as a candidate human body image set;

determining whether the similarity between the person corresponding to the preset human body image subset in the candidate human body image set and the person corresponding to the target human body image is greater than or equal to a target similarity threshold;

in response to determining that the target similarity threshold is greater than or equal to a preset human body image subset, determining the preset human body image subset as a result human body image subset corresponding to the target human body image;

selecting a target number of personal images from the target personal video;

for human body images in the target number of human body images, performing the following processing steps: determining the category to which the human body image belongs from the category set; and adding the human body image into the result human body image sub-set to obtain an updated result human body image sub-set in response to the category to which the human body image belongs being the same as the target category.

2. The method of claim 1, wherein the set of categories includes a contained face class and a non-contained face class divided based on whether the human image contains a face object; and

the determining, from a predetermined set of categories, a category to which the target human body image belongs as a target category includes:

Inputting the target human body image into a pre-trained human face detection model to obtain a detection result, wherein the detection result is used for indicating whether the target human body image contains a human face object or not;

and determining the category to which the target human body image belongs from the category set as a target category based on the detection result.

3. The method of claim 1, wherein the set of categories includes a front class, a side class, and a back class that are partitioned based on an orientation of a human object in a human image; and

inputting the target human body image into a pre-trained human body skeleton key point recognition model to obtain a recognition result, wherein the recognition result is used for indicating the position of the human body skeleton key point in the target human body image;

and determining the category to which the target human body image belongs from the category set as a target category based on the obtained detection result and the identification result.

4. The method of claim 1, wherein the method further comprises:

and adding the target human body image into the result human body image sub-set to obtain an updated result human body image sub-set.

5. The method of claim 1, wherein the processing step further comprises:

determining a preset human body image subset corresponding to the same person as the result human body image subset from the preset human body image set corresponding to the category to which the human body image belongs in response to the fact that the category to which the human body image belongs is different from the target category; and adding the human body image into the determined preset human body image subset to obtain an updated preset human body image subset.

6. The method of claim 1, wherein the determining whether the similarity of the person corresponding to the subset of the preset human body images in the set of candidate human body images to the person corresponding to the target human body image is greater than or equal to a target similarity threshold comprises:

for a preset human body image subset in the candidate human body image set, executing the following steps: determining the similarity between the preset human body image included in the preset human body image subset and the target human body image as candidate similarity; based on the obtained candidate similarity, generating the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image; determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold.

7. The method of claim 6, wherein the preset subset of human images corresponds to a target similarity threshold; and

the determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold value comprises:

determining whether the similarity between the person corresponding to the preset human body image subset and the person corresponding to the target human body image is greater than or equal to a target similarity threshold corresponding to the preset human body image subset.

8. The method of claim 7, wherein the target similarity threshold corresponding to the preset subset of human images is obtained by:

determining the number of preset human body images included in the preset human body image subset;

and based on the corresponding relation between the number and the similarity threshold, acquiring the similarity threshold corresponding to the determined number as a target similarity threshold corresponding to the preset human body image subset, wherein the number and the similarity threshold are positively correlated.

9. The method of claim 8, wherein the determining the subset of preset human body images that is greater than or equal to a target similarity threshold as the subset of resultant human body images that corresponds to the target human body image comprises:

Determining whether the candidate human body image set comprises at least two images with high similarity a preset human body image subset which is equal to or smaller than a corresponding target similarity threshold;

in response to the inclusion of the one or more additional elements, and selecting a preset human body image subset with the largest number of the included preset human body images from the at least two preset human body image subsets with the similarity larger than or equal to the corresponding target similarity threshold value as a result human body image subset corresponding to the target human body image.

10. The method according to one of claims 1-9, wherein after said determining a preset subset of human body images greater than or equal to a target similarity threshold as a resulting subset of human body images corresponding to said target human body images, the method further comprises:

and outputting prompt information for representing and determining a result human body image subset corresponding to the target human body image.

11. An apparatus for processing an image, comprising:

a first acquisition unit configured to acquire a target human body image, wherein the target human body image is a human body image extracted from a target human body video;

a first determining unit configured to determine, as a target category, a category to which the target human body image belongs from a predetermined category set, wherein the category in the category set corresponds to a preset human body image set, the category of the preset human body image in the preset human body image set is the category corresponding to the preset human body image set, the preset human body image set includes at least one preset human body image sub-set, and the preset human body images in the preset human body image sub-set correspond to the same person;

The second acquisition unit is configured to acquire a preset human body image set corresponding to the target category as a candidate human body image set;

a second determining unit configured to determine whether a similarity between a person corresponding to a preset human body image subset in the candidate human body image set and a person corresponding to the target human body image is greater than or equal to a target similarity threshold;

a third determining unit configured to determine, as a result human body image subset corresponding to the target human body image, a preset human body image subset that is greater than or equal to a target similarity threshold in response to determining that a similarity of a person corresponding to the preset human body image subset in the candidate human body image set and a person corresponding to the target human body image is greater than or equal to the target similarity threshold;

a third acquisition unit configured to select a target number of personal images from the target personal video;

a fourth determination unit configured to perform the following processing steps for human body images among the target number of human body images: determining the category to which the human body image belongs from the category set; and adding the human body image into the result human body image sub-set to obtain an updated result human body image sub-set in response to the category to which the human body image belongs being the same as the target category.

12. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-10.

13. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-10.