CN116012619A

CN116012619A - Clothing matching method and device, electronic equipment and storage medium

Info

Publication number: CN116012619A
Application number: CN202211554743.4A
Authority: CN
Inventors: 屈杨
Original assignee: Beijing IQIYI Science and Technology Co Ltd
Current assignee: Beijing IQIYI Science and Technology Co Ltd
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-04-25

Abstract

The clothing matching method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention comprise the steps of carrying out object detection and clothing detection on a first image, marking an object detection frame of an object and a clothing detection frame of clothing on the first image, and searching a first clothing detection frame matched with a target object detection frame corresponding to the first object; when a plurality of first ornament detection frames corresponding to the upper garment or the lower garment are arranged, determining first human body key points of the first object based on the second image of the image area where the intercepted target object detection frames are positioned; and determining a target clothing detection frame from the plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object. According to the invention, the clothing truly matched with the object can be accurately determined from a plurality of matched clothing detection frames based on the human body key points corresponding to the object, and the accuracy of clothing and corresponding human body matching in a complex scene is improved.

Description

Clothing matching method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a clothing matching method, a device, electronic equipment and a storage medium.

Background

In the video scene, the type and style of the clothes worn by the person can be analyzed, so that the image and character of the person can be further known, a proper product can be recommended to the person conveniently, or the positioning of the person in the whole play and the story line trend can be analyzed, and on the premise that the person can be analyzed through the clothes, the person and the corresponding clothes need to be matched.

Currently, in order to obtain a matching relationship between clothes and a person, an AI (Artificial Intelligence ) model can be used for detecting a clothes frame and a human body frame in a video, on the basis, an IOU (Intersection over Union, cross-over ratio) or other proportion score between the clothes frame and the human body frame is calculated, on the basis of the score, a threshold value is set, all matching pairs higher than the threshold value are reserved, or other post-processing strategies are used, so that clothes matched with the human body are finally obtained.

However, due to the complexity in the video scene, when two people approach closely, the problem of mismatching easily occurs by adopting an IOU method or the like, for example, the clothes worn by a plurality of different people are all matched to the same person, and the accuracy of clothes-human body matching is reduced.

Disclosure of Invention

In view of the above, in order to solve the above technical problems or part of the technical problems, embodiments of the present invention provide a clothing matching method, device, electronic apparatus, and storage medium, which can improve the accuracy of clothing-human body matching in complex scenes.

In a first aspect, an embodiment of the present invention provides a method for matching apparel, where the method includes:

performing object detection and clothes detection on the first image, and marking an object detection frame of an object and a clothes detection frame of clothes on the first image; the first image is a single-frame image of a video sequence, wherein the single-frame image comprises an object and clothes, and the clothes comprise upper clothes and/or lower clothes;

taking the object corresponding to each object detection frame as a first object, and executing the following operations on each first object:

searching a first decoration detection frame matched with a target object detection frame corresponding to the first object;

when a plurality of first ornament detection frames corresponding to the upper garment or the lower garment are arranged, intercepting a second image of an image area where the target object detection frame is positioned, and determining a first human body key point of the first object based on the second image;

and determining a target clothing detection frame from the plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object.

In one possible embodiment, the object detection and the garment detection are performed on the first image, respectively, and the object detection frame of the object and the garment detection frame of the garment are marked on the first image, including:

inputting the first image into a first neural network model, and extracting object position features of the first image by using the first neural network model to obtain a first feature map of the first image;

generating a plurality of first anchor boxes for each point on the first feature map, wherein each first anchor box is different in size;

for each first anchor frame, calculating a first confidence that an object exists in the first anchor frame;

reserving a first target anchor frame with the first confidence coefficient being greater than or equal to a first threshold value, and determining a region surrounded by the first target anchor frame as an object detection frame of an object;

inputting the first image into a second neural network model to extract clothing position features of the first image by using the second neural network model to obtain a second feature map of the first image;

generating a plurality of second anchor boxes for each point on the second feature map, wherein each second anchor box is of a different size;

calculating a second confidence level of the existence of the clothes in the second anchor frame aiming at each second anchor frame;

And reserving a second target anchor frame with the second confidence coefficient being greater than or equal to the first threshold value, and determining an area surrounded by the second target anchor frame as a clothing detection frame of the clothing.

In one possible implementation, searching for a first decoration detection frame matching a target object detection frame corresponding to the first object includes:

calculating IOU values of the target object detection frames and the clothing detection frames;

and taking the clothing detection frame corresponding to the maximum IOU value as a first clothing detection frame.

In one possible embodiment, determining the first human keypoints of the first object based on the second image comprises:

inputting the second image into a third neural network model, and extracting human body characteristics of the second image by using the third neural network model to obtain a third characteristic diagram of the second image;

converting the third feature map into a plurality of thermodynamic diagrams, wherein one thermodynamic diagram corresponds to one preset human body key point;

determining a thermodynamic value for each point on the thermodynamic diagram in each thermodynamic diagram, and ordering each point on the thermodynamic diagram according to the thermodynamic values;

carrying out weighted average on the coordinate positions of points before the target ordering position to obtain the coordinate information of the predicted human body key points corresponding to each preset human body key point, wherein the first human body key point comprises the coordinate information of the predicted human body key points;

And drawing predicted human body key points in the second image according to the coordinate information.

In one possible implementation, determining a target apparel detection box from a plurality of first apparel detection boxes based on the first human keypoints includes:

counting the number of key points comprising first human key points in each first decoration detection frame;

and determining the first decoration detection frame corresponding to the maximum key point number as a target decoration detection frame.

In one possible implementation, the first human keypoint carries a first object identification of the first object;

counting the number of key points including first human key points in each first decoration detection frame, including:

for each first ornament detection frame, searching first human body key points carrying first object identifiers in the first ornament detection frames;

and counting the number of key points of the first human body.

In a second aspect, an embodiment of the present invention provides a key frame extraction method, where the method includes:

acquiring a target video sequence;

extracting a single frame image from a target video sequence;

performing clothing matching on the single-frame image, and determining matching clothing of a target object in the single-frame image;

determining a single-frame image which is matched with the clothes and meets the target condition as a key frame;

The step of performing clothing matching on the single-frame image comprises the clothing matching method.

In a third aspect, an embodiment of the present invention provides a garment matching apparatus, where the apparatus includes:

the detection module is used for carrying out object detection and clothes detection on the first image, and an object detection frame of an object and a clothes detection frame of clothes are marked on the first image; the first image is a single-frame image of a video sequence, wherein the single-frame image comprises an object and clothes, and the clothes comprise upper clothes and/or lower clothes;

the execution module is used for taking the object corresponding to each object detection frame as a first object respectively, and executing the following operations for each first object:

the searching module is used for searching a first decoration detection frame matched with the target object detection frame corresponding to the first object;

the first determining module is used for intercepting a second image of an image area where the target object detection frame is positioned when the first ornament detection frames corresponding to the upper garment or the lower garment are multiple, and determining first human key points of the first object based on the second image;

and the second determining module is used for determining a target clothing detection frame from the plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including: the system comprises a processor and a memory, wherein the processor is used for executing clothes matching and key frame extraction programs stored in the memory so as to realize the steps of the method.

In a fifth aspect, embodiments of the present invention provide a storage medium storing one or more programs executable by one or more processors to implement the steps of the above-described method.

The clothing matching method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention comprise the steps of carrying out object detection and clothing detection on a first image, marking an object detection frame of an object and a clothing detection frame of clothing on the first image, taking the object corresponding to each object detection frame as a first object, and searching a first clothing detection frame matched with a target object detection frame corresponding to the first object; when a plurality of first ornament detection frames corresponding to the upper garment or the lower garment are arranged, intercepting a second image of an image area where the target object detection frame is positioned, and determining a first human body key point of the first object based on the second image; and determining a target clothing detection frame from the plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object. According to research, the clothing detection frames corresponding to the clothing matched with the objects are supposed to comprise the most human body key points of the objects, so that under the condition that the objects are in mismatching with a plurality of upper clothing or lower clothing, the clothing truly matched with the objects can be accurately determined from the plurality of matched clothing detection frames based on the human body key points corresponding to the objects, the problem of mismatching between human bodies and clothing of dense crowds in complex scenes is effectively solved, and the accuracy of matching between the clothing and corresponding human bodies in complex scenes is improved.

Drawings

FIG. 1 is a schematic diagram of a hardware environment of a garment matching method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a clothing matching method according to an embodiment of the present invention;

fig. 3 is a schematic distribution diagram of preset human body key points according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a key frame extraction method according to an embodiment of the present invention;

FIG. 5 is a block diagram of an embodiment of a garment matching device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For the purpose of facilitating an understanding of the embodiments of the present invention, reference will now be made to the following description of specific embodiments, taken in conjunction with the accompanying drawings, which are not intended to limit the embodiments of the invention.

In this embodiment, the above-described clothing matching method may be applied to a hardware environment constituted by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, server 103 is connected to terminal 101 via a network that may be used to provide services (e.g., apparel matching services) to the terminal or to clients installed on the terminal, and a database 105 may be provided on or separate from the server for providing data storage services to server 103, including but not limited to: a wide area network, metropolitan area network, or local area network, and terminal 101 includes, but is not limited to, a PC, a cell phone, a tablet computer, etc.

A clothing matching method in this embodiment may be performed by the server 103, or may be performed by the server 103 and the terminal 101 together, as shown in fig. 2, and the method may include the following steps:

step 201, object detection and clothes detection are carried out on a first image, and an object detection frame of an object and a clothes detection frame of clothes are marked on the first image;

the first image is a single-frame image comprising an object and clothes in the video sequence; specifically, the first image is a single frame image in a video sequence, and the video sequence includes, but is not limited to, videos in various existing service fields, such as monitoring videos in security fields, recorded videos of sports fitness, videos of cultural movie works, and the like.

The clothes comprise upper clothes and/or lower clothes; therefore, the clothing detection frame marked on the first image is the clothing detection frame of the upper clothing and/or the clothing detection frame of the lower clothing, that is, in this embodiment, the clothing matching of the upper clothing and/or the lower clothing of the object can be achieved.

Step 202, taking the object corresponding to each object detection frame as a first object, and executing the operations of steps 203 to 205 for each first object:

in this embodiment, each object is used as a first object, and a garment matched with the first object is determined.

Step 203, searching a first decoration detection frame matched with a target object detection frame corresponding to the first object;

because there are a plurality of objects in the first image, there are a plurality of clothes, and a plurality of clothes detection frames can be framed through step 201, in order to perform clothes matching on the same first object, first a first clothes detection frame matched with a target object detection frame of the first object needs to be found from the plurality of clothes detection frames, and the clothes framed in the first clothes detection frame can be upper-garment or lower-garment.

If it can be determined by step 202 that the target object detection frames of the first object match the first garment detection frame of an upper garment and/or the first garment detection frame of a lower garment, it is clear that there is no problem with the garment matched by the first object, and therefore, steps 204-205 do not need to be performed.

Step 204, when the number of the first decoration detection frames corresponding to the upper garment or the lower garment is multiple, intercepting a second image of an image area where the target object detection frame is positioned, and determining a first human body key point of the first object based on the second image;

only in the case that the target object detection frame matches the first clothing detection frame corresponding to the plurality of upper clothing items or lower clothing items, it is explained that the first object matches the clothing items worn by the other object, and therefore steps 204 to 205 need to be performed to determine the clothing item uniquely matching the first object from the matched plurality of first clothing item detection frames.

Step 205, determining a target clothing detection frame from a plurality of first clothing detection frames based on the first human body key points, so as to match clothing corresponding to the target clothing detection frame with the first object.

Studies have shown that, in the present embodiment, the clothing corresponding to the clothing matching the object is detected as the matching clothing of the first object, and therefore, in step 205, the target clothing detection frame is determined from the plurality of first clothing detection frames based on the first human body key point, so that the clothing corresponding to the target clothing detection frame is matched with the first object.

In an embodiment, determining a specific implementation of the target 5 garment detection box from the plurality of first garment detection boxes based on the first human keypoints may comprise: counting the first person included in each first decoration detection frame

The number of keypoints of the volumetric keypoints; and determining the first decoration detection frame corresponding to the maximum key point number as a target decoration detection frame.

To indicate which object the detected human body key point is, a step 204 is performed to obtain a first

The first object identification of the first object is carried on the first human body key points of the objects, and the first object identification pair 0 can uniquely identify the first object. Therefore, the first decorative detection frames are counted

The specific process of the key point number of the key points of the human body is as follows: searching a first human body key point carrying a first object identifier in a first decoration detection frame; and counting the number of key points of the first human body.

In actual use, if the first pair is not found to be carried in the first decoration detection frame

If the first human body key point is identified, the key point data of the first human body key point is calculated to be 0,5, which means that the target object detection frame of the first object is in mismatching with the first decoration detection frame, the first decoration detection frame

The clothing of a clothing detection frame is the clothing worn by other objects, but not the clothing worn by the first object.

Assume that there are 3 first ornament detection frames matched with the target object detection frame, wherein the number of key points of the first human body key points counted by the first ornament detection frame 1 is 0, and the first ornament is detected

The number of key points of the first human body key points counted by the measuring frame 2 is 3, and the number of key points of the first human body key points counted by the first decoration detecting frame 3 is 10, so that the first decoration detecting frame 3 can be paired with the first decoration detecting frame 3

And matching the corresponding clothing with the first object, namely determining that the clothing corresponding to the first clothing detection frame 3 is the clothing worn by the first object.

By counting the number of key points of the first human key points in each first decoration detection frame

The clothing 5 matching mode of the clothing corresponding to the first clothing detection frame with the largest key point number as the clothing of the first object can quantitatively determine the clothing matched with the first object based on the key points of the human body, and the clothing is convenient to root

And analyzing and adjusting parameters according to actual business scenes, for example, selecting a clothing graph corresponding to a clothing detection frame which is most matched with the first object so as to accurately match with the panning link.

The clothing matching method provided by the embodiment of the invention comprises the steps of carrying out object detection and clothing detection on a first image, marking object detection frames of objects and clothing detection frames of clothing on the first image, taking the objects corresponding to each object detection frame as first objects respectively, and searching a first clothing detection frame matched with a target object detection frame corresponding to the first objects; when a plurality of first ornament detection frames corresponding to the upper garment or the lower garment are arranged, intercepting a second image of an image area where the target object detection frame is positioned, and determining a first human body key point of the first object based on the second image; and determining a target clothing detection frame from the plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object. According to research, the clothing detection frames corresponding to the clothing matched with the objects are supposed to comprise the most human body key points of the objects, so that under the condition that the objects are in mismatching with a plurality of upper clothing or lower clothing, the clothing truly matched with the objects can be accurately determined from the plurality of matched clothing detection frames based on the human body key points corresponding to the objects, the problem of mismatching between human bodies and clothing of dense crowds in complex scenes is effectively solved, and the accuracy of matching between the clothing and corresponding human bodies in complex scenes is improved.

In some embodiments, the step 201 may be specifically implemented by the following steps:

a1, inputting a first image into a first neural network model, and extracting object position features of the first image by using the first neural network model to obtain a first feature map of the first image;

in order to meet the input requirement of the first neural network model, the first image can be scaled to be of a fixed size, then the first neural network model is input, the first neural network model can be obtained by training based on a deep learning network such as a convolutional neural network and a cyclic neural network, and the first neural network model is used for extracting the position characteristics of an object in the first image to obtain a first characteristic diagram.

A2, generating a plurality of first anchor frames for each point on the first feature map, wherein the sizes of the first anchor frames are different;

the generated first anchor frames have different sizes and can be used for matching human body targets with different sizes, wherein the number of the first anchor frames can be set according to actual requirements, and the method is not limited.

Step A3, calculating a first confidence coefficient of the existence of the object in the first anchor frame aiming at each first anchor frame;

step A4, reserving a first target anchor frame with the first confidence coefficient being greater than or equal to a first threshold value, and determining a region surrounded by the first target anchor frame as an object detection frame of an object;

Specifically, a score (first confidence) of the presence of a person in each first anchor frame in the first feature map may be calculated, so that whether a person is present in each first anchor frame may be determined according to the score. The first anchor frame with the human body is a first target anchor frame with the first confidence coefficient being greater than or equal to a first threshold value, and the area surrounded by the first target anchor frame is an object detection frame with the object marked in the first image, and the first threshold value can be set according to actual requirements and is not limited.

In order to avoid the situation of repeated detection when the plurality of first anchor frames hit the same object, a non-maximum inhibition value algorithm may be adopted to select the best one from the plurality of first anchor frames, which specifically is: determining the merging ratio of a plurality of first target anchor frames under the condition that the plurality of first target anchor frames exist; and under the condition that the intersection ratio is greater than or equal to a second threshold value, reserving a first target anchor frame with the highest first confidence coefficient, and determining an area surrounded by the first target anchor frame as an object detection frame.

Firstly, calculating the intersection ratio of a plurality of first target anchor frames, and identifying which first target anchor frames hit the same human body through the intersection ratio, wherein the first target anchor frames with the intersection ratio being larger than or equal to a second threshold value are the first anchor frames hit the same human body, and the second threshold value can be set according to actual requirements; and selecting a first target anchor frame with highest first confidence coefficient from a plurality of first target anchor frames of the same human body, taking the first target anchor frame with highest first confidence coefficient as an anchor frame of a final human body region detection result, and determining a region surrounded by the first target anchor frame as an object detection frame.

In addition to the object detection frames for marking the objects based on the anchor frame method in the above steps A1 to A4, other object detection methods may be used, which are not limited herein.

Step A6, inputting the first image into a second neural network model, so as to extract clothing position features of the first image by using the second neural network model, and obtaining a second feature map of the first image;

to meet the input requirements of the second neural network model, the first image may be scaled to a fixed size and then input into the second neural network model. The second neural network model can be obtained based on deep learning network training such as a convolutional neural network and a cyclic neural network, and is used for extracting the position features of the clothes in the first image to obtain a second feature map.

Step A7, generating a plurality of second anchor frames for each point on the second feature map, wherein the sizes of the second anchor frames are different;

step A8, calculating a second confidence coefficient of the existence of the clothes in each second anchor frame aiming at each second anchor frame;

and A9, reserving a second target anchor frame with the second confidence coefficient being greater than or equal to the first threshold value, and determining an area surrounded by the second target anchor frame as a clothing detection frame of the clothing.

Step A6 to step A9 are also processes of marking the clothing detection frame of the clothing based on the anchor frame method, and the manner of detecting the clothing detection frame is completely identical to the manner of detecting the object detection frame, and will not be described herein.

In some embodiments, the step 203 may be specifically implemented by the following steps:

step B1, calculating IOU values of a target object detection frame and each clothing detection frame;

in this embodiment, an IOU (Intersection over Union, cross-over ratio) matching tracking algorithm may be used to match the target object detection box with each clothing detection box to determine the clothing corresponding to the same first object. For example, the clothing detection frames detected in the first image include a clothing detection frame 1, a clothing detection frame 2 and a clothing detection frame 3, and when the detection frames are matched, the IOU values of the target object detection frame and the 3 clothing detection frames can be calculated respectively by utilizing the algorithm characteristic of the IOU matching tracking algorithm, wherein the IOU value of the target object detection frame and the clothing detection frame 1 is 0.4, the IOU value of the target object detection frame and the clothing detection frame 2 is 0.6, and the IOU value of the target object detection frame and the clothing detection frame 3 is 0.9.

And B2, taking the clothing detection frame corresponding to the maximum IOU value as a first clothing detection frame.

In the previous example, the IOU values of the target object detection frame and the clothing detection frame 3 are highest, so that the clothing detection frame 3 can be determined to be the first clothing detection frame corresponding to the target object detection frame.

According to the method and the device for detecting the clothing matching of the first object, the first clothing detection frame matched with the target object detection frame can be accurately obtained by utilizing the IOU matching tracking algorithm, the clothing matching detection of the same first object is achieved based on the target object detection frame and the first clothing detection frame, and the accuracy of clothing matching is improved.

In some embodiments, the step 204 may be specifically implemented by the following steps:

step C1, inputting the second image into a third neural network model, so as to extract human body characteristics of the second image by using the third neural network model, and obtaining a third characteristic diagram of the second image;

in order to meet the input requirement of the third neural network model, the second image may be scaled to a fixed size and then input into the third neural network model. The third neural network model can be obtained based on deep learning network training such as a convolutional neural network and a cyclic neural network, and is used for extracting human body characteristics of the first object in the second image to obtain a third characteristic diagram.

Step C2, converting the third characteristic diagram into a plurality of thermodynamic diagrams, wherein one thermodynamic diagram corresponds to one preset human body key point;

step C3, determining the thermodynamic value of each point on the thermodynamic diagram in each thermodynamic diagram, and sorting each point on the thermodynamic diagram according to the thermodynamic values;

as shown in fig. 3, the actual position prediction may be performed on a plurality of preset human body keypoints including the keypoints of human body parts such as the head top, the left ear, the right ear, the left eye, the right eye, the nose, the left shoulder, the right shoulder, the left elbow, the left wrist, the right elbow, the right wrist, the left hand portion, the right crotch portion, the left knee, the left ankle, the right knee, the right ankle, the left chest, the right chest, the navel, and the like.

The number of the thermodynamic diagrams is the same as the number of the preset human body key points and corresponds to one by one, the thermodynamic value of each point on each thermodynamic diagram is the probability value of the point belonging to the corresponding preset human body key point, and the probability value is calculated by the second neural network model.

Step C4, carrying out weighted average on the coordinate positions of points before the target ordering position to obtain the coordinate information of the predicted human body key points corresponding to each preset human body key point, wherein the first human body key point comprises the coordinate information of the predicted human body key points;

The target ordering position can be 2, 3 and the like before ranking, and can be set according to actual requirements. And carrying out weighted average on the coordinate positions of the points before the target ordering position to obtain the actual positions of the predicted human body key points corresponding to the preset human body key points.

And C5, drawing predicted human body key points in the second image according to the coordinate information.

The predicted human body key points are first human body key points of a first object determined based on the second image, and the determined first human body key points comprise one or more of the following: top of head, left ear, right ear, left eye, right eye, nose, left shoulder, right shoulder, left elbow, left wrist, right elbow, right wrist, left instrument, right crotch, left knee, left ankle, right knee, right ankle, left chest, right chest, navel.

In this embodiment, besides the human body key point detection method based on thermodynamic diagram in the above step 204, other human body key point detection methods may be used, which is not limited herein.

The present embodiment also provides a key frame extraction method, which may be executed by the server 103, or may be executed by the server 103 and the terminal 101 together, as shown in fig. 4, and the method may include the following steps:

Step 401, obtaining a target video sequence;

step 402, extracting a single frame image from a target video sequence;

step 403, performing clothing matching on the single-frame image, and determining matching clothing of a target object in the single-frame image;

step 404, determining a single frame image which is matched with the clothing and meets the target condition as a key frame;

In this embodiment, a video may be input, frame extraction operation may be performed on the video, and clothing matching may be performed on a picture obtained after frame extraction, so as to obtain matching clothing of a target object in the video, and finally, a video frame where the matching clothing meeting a condition (for example, the matching clothing exposed in the picture exceeds 50%) may be selected according to an actual service requirement. If an image is needed to be selected from the short video to be used as the cover of the short video, the most complete visible picture which is free of shielding and is matched with the clothing of the target object is used as the cover, and the key frame required by the actual business can be obtained.

Referring to fig. 5, a block diagram of an embodiment of a garment matching device according to an embodiment of the present invention is provided; as shown in fig. 5, the apparatus may include:

The detection module 51 is configured to perform object detection and clothing detection on the first image, and mark an object detection frame of an object and a clothing detection frame of clothing on the first image; the first image is a single-frame image of a video sequence, wherein the single-frame image comprises an object and clothes, and the clothes comprise upper clothes and/or lower clothes;

the execution module 52 is configured to take the object corresponding to each object detection frame as a first object, and for each first object, perform the following operations:

a searching module 53, configured to search a first decoration detection frame that matches a target object detection frame corresponding to the first object;

the first determining module 54 is configured to intercept a second image of an image area where the target object detection frame is located when the first ornament detection frames corresponding to the upper garment or the lower garment are multiple, and determine a first human key point of the first object based on the second image;

the second determining module 55 is configured to determine a target clothing detection frame from the plurality of first clothing detection frames based on the first human body key points, so as to match clothing corresponding to the target clothing detection frame with the first object.

The clothing matching device provided by the embodiment of the invention comprises object detection and clothing detection on a first image, wherein an object detection frame of an object and a clothing detection frame of clothing are marked on the first image, the object corresponding to each object detection frame is respectively used as a first object, and a first clothing detection frame matched with a target object detection frame corresponding to the first object is searched; when a plurality of first ornament detection frames corresponding to the upper garment or the lower garment are arranged, intercepting a second image of an image area where the target object detection frame is positioned, and determining a first human body key point of the first object based on the second image; and determining a target clothing detection frame from the plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object. According to research, the clothing detection frames corresponding to the clothing matched with the objects are supposed to comprise the most human body key points of the objects, so that under the condition that the objects are in mismatching with a plurality of upper clothing or lower clothing, the clothing truly matched with the objects can be accurately determined from the plurality of matched clothing detection frames based on the human body key points corresponding to the objects, the problem of mismatching between human bodies and clothing of dense crowds in complex scenes is effectively solved, and the accuracy of matching between the clothing and corresponding human bodies in complex scenes is improved.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and an electronic device 500 shown in fig. 6 includes: at least one processor 501, memory 502, at least one network interface 504, and other user interfaces 503. The various components in the electronic device 500 are coupled together by a bus system 505. It is understood that bus system 505 is used to enable connected communications between these components. The bus system 505 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 505 in fig. 6.

The user interface 503 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a 5-track ball (trackball), a touch pad, or a touch screen, etc.

It will be appreciated that the memory 502 in embodiments of the invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Wherein the nonvolatile Memory can be Read-Only Memory (ROM), programmable ROM (PROM), erasable PROM (0 EPROM), electrically Erasable EPROM (EEPROM), or EEPROM

And (3) a flash memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic random access memory (dram)

Access Memory (DRAM), synchronous Dynamic random access memory 5 (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct RAM (DRAM). The memory 502 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 502 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 5021 and application programs 5022.

The operating system 5021 includes various system programs, such as a framework layer, a core library layer, a 5-driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 5022 includes various application programs such as a Media Player (Media Player), a Browser (Browser), and the like for realizing various application services. A program for implementing the method according to the embodiment of the present invention may be included in the application 5022.

In the embodiment of the present invention, the processor 501 is configured to execute the method steps provided in the method embodiments by calling a program or an instruction stored in the memory 502, specifically, a program or an instruction stored in the application 5022.

The method disclosed in the above embodiment of the present invention may be applied to the processor 501 or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 501. The processor 501 may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software elements in a decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 502, and the processor 501 reads information in the memory 502 and, in combination with its hardware, performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (dspev, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be an electronic device as shown in fig. 6, and may perform all the steps of the methods shown in fig. 2 and 4, so as to achieve the technical effects of the methods shown in fig. 2 and 4, and the detailed description with reference to fig. 2 and 4 is omitted herein for brevity.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium here stores one or more programs. Wherein the storage medium may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid state disk; the memory may also comprise a combination of the above types of memories.

When the one or more programs are executed by the one or more processors in the storage medium, the methods described above are implemented.

The processor is configured to execute the garment matching and key frame extraction program stored in the memory to implement the steps of the method shown in fig. 2 or fig. 4.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method of apparel matching, the method comprising:

performing object detection and clothes detection on a first image, and marking an object detection frame of an object and a clothes detection frame of clothes on the first image; the first image is a single-frame image comprising an object and clothes in a video sequence, and the clothes comprise upper clothes and/or lower clothes;

when the first ornament detection frames corresponding to the upper garment or the lower garment are multiple, second images of the image areas where the target object detection frames are located are intercepted, and first human key points of the first object are determined based on the second images;

and determining a target clothing detection frame from a plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object.

2. The method of claim 1, wherein the object detection and garment detection are performed on a first image, respectively, and the object detection frame of the object and the garment detection frame of the garment are marked on the first image, comprising:

calculating a first confidence that the object exists in the first anchor frame for each first anchor frame;

reserving a first target anchor frame with the first confidence coefficient being greater than or equal to a first threshold value, and determining an area surrounded by the first target anchor frame as an object detection frame of the object;

inputting the first image into a second neural network model, and extracting clothing position features of the first image by using the second neural network model to obtain a second feature map of the first image;

generating a plurality of second anchor boxes for each point on the second feature map, wherein each second anchor box is different in size;

calculating a second confidence level that the apparel exists in the second anchor frame for each second anchor frame;

and reserving a second target anchor frame with the second confidence coefficient being greater than or equal to a first threshold value, and determining an area surrounded by the second target anchor frame as a clothing detection frame of the clothing.

3. The method of claim 1, wherein the locating a first decorative detection box that matches a target object detection box corresponding to the first object comprises:

Calculating IOU values of the target object detection frame and each clothing detection frame;

4. The method of claim 1, wherein the determining a first human keypoint of the first object based on the second image comprises:

determining a thermodynamic value for each point on the thermodynamic diagram in each of the thermodynamic diagrams, and ordering each point on the thermodynamic diagram according to the thermodynamic values;

carrying out weighted average on the coordinate positions of points before the target ordering position to obtain the coordinate information of the predicted human body key point corresponding to each preset human body key point, wherein the first human body key point comprises the coordinate information of the predicted human body key point;

and drawing the predicted human body key points in the second image according to the coordinate information.

5. The method of claim 1, wherein the determining a target apparel detection box from a plurality of the first apparel detection boxes based on the first human keypoints comprises:

counting the number of key points including the first human key points in each first decoration detection frame;

and determining the first decoration detection frames corresponding to the maximum number of the key points as target decoration detection frames.

6. The method of claim 5, wherein the first human keypoint carries a first object identification of the first object;

the counting the number of key points including the first human key points in each first decoration detection frame includes:

for each first ornament detection frame, searching a first human body key point carrying the first object identifier in the first ornament detection frame;

and counting the number of key points of the first human body.

7. A key frame extraction method, the method comprising:

acquiring a target video sequence;

extracting a single frame image from the target video sequence;

Determining the single-frame image of which the matched clothes meet a target condition as a key frame;

the step of performing clothing matching on the single frame image includes a clothing matching method according to any one of claims 1 to 6.

8. A garment matching device, the device comprising:

the detection module is used for carrying out object detection and clothes detection on the first image, and an object detection frame of an object and a clothes detection frame of clothes are marked on the first image; the first image is a single-frame image comprising an object and clothes in a video sequence, and the clothes comprise upper clothes and/or lower clothes;

the first determining module is used for intercepting a second image of an image area where the target object detection frame is located when the first ornament detection frames corresponding to the upper garment or the lower garment are multiple, and 5 determining a first human key point of the first object based on the second image;

And the second determining module is used for determining a target clothing detection frame from a plurality of first clothing detection frames based on the first human body key points so as to match clothing corresponding to the target clothing detection frame with the first object.

9. An electronic device, comprising: a processor and a memory, said processor 0 being adapted to execute a garment matching and key frame extraction program stored in said memory to implement the steps of the method of any one of claims 1 to 6 or 7.

10. A storage medium storing one or more programs executable by one or more processors to implement the steps of the method of any of claims 1-6 or 7.