CN112907583B

CN112907583B - Target object posture selection method, image scoring method and model training method

Info

Publication number: CN112907583B
Application number: CN202110333334.0A
Authority: CN
Inventors: 袁小青; 肖潇; 卢琨
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2023-04-07
Anticipated expiration: 2041-03-29
Also published as: CN112907583A

Abstract

The application provides a target object posture selection method, an image scoring method and a model training method, wherein a target object posture selection model comprises a key point detection module, a posture estimation module and a posture selection module, and the target object posture selection model training method comprises the following steps: acquiring a training sample, wherein the training sample comprises a sample image, key points of a target object in the sample image, a three-dimensional posture of the target object in the sample image and an optimal posture of the target object; inputting the sample image into a target object posture selection model; and training a target object posture selection model based on the difference between the key points of the target object in the sample image, the three-dimensional posture of the target object in the sample image and the optimal posture of the target object, wherein the key points of the target object are detected by a key point detection module of the target object posture selection model, the three-dimensional posture estimated by a posture estimation module and the optimal posture selected by a posture selection module. The present application enables assessment of the pose of a target object in an image.

Description

Target object posture selection method, image scoring method and model training method

Technical Field

The application relates to the field of image processing, in particular to a target object posture selection method, an image scoring method and a model training method.

Background

Currently, with the development of the field of image processing, merely identifying a target object in an image is not enough to meet the demands of various types of scenes. For example, it is important that automobiles (cars, trucks, etc.) can be effectively identified as vehicles which are closely related to human life. In the traffic violation treatment, vehicles escaping from hit-and-run accidents are searched one by one on the road by simply depending on traffic polices, and the traffic violation treatment is a time-consuming, labor-consuming and semi-successful matter. At present, a monitoring camera is visible everywhere, and a community, a parking lot, a highway and the like provide good guarantee for finding a motor vehicle. The video image is analyzed, the target in the image is detected, the attribute analysis is carried out, and the information such as the color, the license plate number, the vehicle type, the style and the like of the target is obtained, so that the target is locked and tracked, and the effect of achieving double results with half effort is achieved.

The motor vehicle identification aims to quickly determine which brand and which vehicle type a vehicle to be analyzed is through an intelligent algorithm, and if images of any input algorithm are analyzed, the situation of identification errors can be caused because the images are too fuzzy, or the images are influenced by acquisition conditions, color change and the like exist.

Therefore, how to evaluate the pose of the target object in the image, so as to determine the image for performing the subsequent recognition analysis, so as to improve the accuracy of image recognition, is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

In order to overcome the defects in the prior art, the present application provides a target object posture selection method, an image scoring method, and a model training method, so as to achieve the evaluation of the posture of the target object in the image, thereby facilitating the determination of the image for performing the subsequent recognition analysis, and improving the accuracy of the image recognition.

According to an aspect of the present application, there is provided a target object pose selection model training method, where the target object pose selection model includes a key point detection module, a pose estimation module, and a pose selection module, the key point detection module detects key points of a target object based on an input image, the pose estimation module estimates a three-dimensional pose of the target object at least according to the key points of the target object, the pose selection module selects an optimal pose according to the three-dimensional pose estimated by the pose estimation module, the target object pose selection model training method includes:

acquiring a training sample, wherein the training sample comprises a sample image, key points of a target object in the sample image, a three-dimensional posture of the target object in the sample image and an optimal posture of the target object;

inputting the sample image into the target object pose selection model;

and training the target object posture selection model based on the difference between the key points of the target object detected by the key point detection module of the target object posture selection model, the three-dimensional posture estimated by the posture estimation module and the optimal posture selected by the posture selection module, the key points of the target object in the sample image, the three-dimensional posture of the target object in the sample image and the optimal posture of the target object.

In some embodiments of the present application, the keypoint detection module detects keypoints of a target object of an input image by:

predicting a unit vector pointing to a key point for each pixel of an input image;

and voting the positions of the key points by adopting a random sampling consistency algorithm to obtain the key points of the target object.

In some embodiments of the present application, the pose estimation module estimates the three-dimensional pose of the target object from the key points of the target object by:

acquiring two-dimensional coordinates of key points of the target object;

converting the two-dimensional coordinates of the key points of the target object into three-dimensional coordinates;

and estimating the three-dimensional posture of the target object according to the three-dimensional coordinates of the key points of the target object.

In some embodiments of the present application, the pose selection module is a sequence-based prediction model, and the keypoint detection module and the pose estimation module provide the estimated sequence of three-dimensional poses of the target object to the pose selection module based on a sequence of sample images formed by a plurality of sample images containing the same target object.

In some embodiments of the present application, the keypoint detection module is connected in parallel with a target frame recognition module, and the pose estimation module estimates the three-dimensional pose of the target object according to the keypoints of the target object and the target frame of the target object recognized by the target frame recognition module.

According to yet another aspect of the present application, there is also provided an image scoring model training method, the image scoring model including a target object pose selection model that estimates a pose score of a target object based on an input image and at least one image attribute scoring model that estimates an attribute score of an image based on an input image, the image scoring model estimating an image total score based on the pose score and the attribute score, the image scoring model training method including:

acquiring a training sample, wherein the training sample comprises a sample image, a posture score of a target object in the sample image, at least one attribute score of the sample image and a total score of the sample image;

inputting the sample image into the image scoring model;

training the target object posture selection model based on the posture score of the target object estimated by the target object posture selection model, the attribute score of the image estimated by the image attribute scoring model and the total image score estimated by the image scoring model, and the difference of the posture score of the target object in the sample image, the at least one attribute score of the sample image and the total score of the sample image.

In some embodiments of the present application, the image attribute scoring model comprises one or more of an image sharpness scoring model, an image distortion scoring model, and a target object integrity scoring model.

According to another aspect of the present application, there is also provided a target object posture selection method, including:

inputting a plurality of images to be processed into a target object posture selection model, wherein the target object posture selection model is trained by the target object posture selection model training method;

and selecting the posture of the target object of the image to be processed according to the output of the target object posture selection model.

According to still another aspect of the present application, there is also provided an image scoring method including:

inputting an image to be processed into an image scoring model, wherein the image scoring model is trained through the image scoring model training method;

and acquiring the total image score of the image to be processed according to the output of the image score model.

According to another aspect of the present application, there is also provided a target object posture selection model training apparatus, where the target object posture selection model includes a key point detection module, a posture estimation module, and a posture selection module, the key point detection module detects key points of a target object based on an input image, the posture estimation module estimates a three-dimensional posture of the target object at least according to the key points of the target object, the posture selection module selects an optimal posture according to the three-dimensional posture estimated by the posture estimation module, and the target object posture selection model training apparatus:

an obtaining module configured to obtain a training sample, the training sample including a sample image, key points of a target object in the sample image, a three-dimensional pose of the target object in the sample image, and an optimal pose of the target object;

an input module configured to input the sample image into the target object pose selection model;

and the training module is configured to train the target object posture selection model based on the differences between the key points of the target object in the sample image, the three-dimensional posture of the target object in the sample image and the optimal posture of the target object, which are detected by the key point detection module of the target object posture selection model, the three-dimensional posture estimated by the posture estimation module and the optimal posture selected by the posture selection module.

According to yet another aspect of the present application, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.

According to yet another aspect of the present application, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.

Therefore, compared with the prior art, the scheme provided by the application has the following advantages:

according to the method and the device, the target object posture selection model comprising the key point detection module, the posture estimation module and the posture selection module is used for realizing the three-dimensional posture recognition and evaluation of the target object based on key point detection, so that the evaluation of the posture of the target object in the image is realized based on the target object posture selection model, the image for executing subsequent recognition analysis is convenient to determine, and the accuracy of image recognition is improved.

According to the method and the device, the target object posture selection model and the image attribute scoring model are combined to achieve scoring of the image from multiple angles such as the posture of the target object and the image attribute, so that the image which is easier to identify and higher in identification accuracy can be selected according to the multi-dimensional scoring of the image, and image analysis and image processing can be conveniently performed.

Drawings

The above and other features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

FIG. 1 shows a flowchart of a target object pose selection model training method according to an embodiment of the present application.

FIG. 2 shows a schematic diagram of a target object pose selection model according to an embodiment of the application.

Fig. 3 shows a flowchart of an image scoring model training method according to an embodiment of the present application.

Fig. 4 shows a schematic diagram of an image scoring model according to an embodiment of the present application.

FIG. 5 shows a flow diagram of a target object pose selection method according to an embodiment of the application.

Fig. 6 shows a flow chart of an image scoring method according to an embodiment of the application.

Fig. 7 is a block diagram illustrating a target object recognition apparatus according to an embodiment of the present application.

Fig. 8 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the disclosure.

Fig. 9 schematically illustrates an electronic device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In order to overcome the defects in the prior art, the application provides a target object posture selection method, an image scoring method, a model training method and a model training device, so that the accuracy of subsequent target object identification is improved through screening and scoring of pictures. Specifically, the target object posture selection method, the image scoring method, the model training method and the device provided by the present application may be applied to various image recognition scenarios, such as vehicle recognition, pedestrian recognition and other application scenarios, which is not limited to this.

Referring initially to fig. 1 and 2, fig. 1 illustrates a flow diagram of a target object pose selection model training method according to an embodiment of the present application. FIG. 2 shows a schematic diagram of a target object pose selection model according to an embodiment of the application.

The target object pose selection model 100 includes a keypoint detection module 101, a pose estimation module 103, and a pose selection module 104.

The keypoint detection module 101 detects keypoints of a target object based on an input image. The pose estimation module 103 estimates the three-dimensional pose of the target object at least according to the key points of the target object. The pose selection module 104 selects an optimal pose according to the three-dimensional pose estimated by the pose estimation module.

In some embodiments of the present application, the number and definition of the key points of the target object may be determined depending on the shape of the target object. When the target object is a vehicle, it is considered that the vehicle has a substantially quadrangular frustum shape, and therefore, eight vertexes of the quadrangular frustum of the vehicle can be defined as key points of the target object. When the target object is a human body, a head, a crotch, hands, feet, and the like may be defined as key points of the human body in consideration of the skeletal structure of the human body. The above is merely an example for schematically describing the key points of the present application, and the present invention is not limited thereto.

Specifically, the keypoint detection module 101 may detect keypoints of a target object of an input image by: predicting a unit vector pointing to a key point for each pixel of an input image; and voting the positions of the key points by adopting a random sampling consensus algorithm to obtain the key points of the target object. Specifically, the keypoint detection module 101 may, for example, return Pixel unit vectors pointing to keypoints for Pixel-wise Voting Network (PVNet), and use these vectors to vote for the keypoint position using RANSAC (random sample consensus algorithm). The pixel voting network can avoid the influence of key point identification caused by the fact that the target object is blocked or truncated. The invention is not limited thereto, and other methods of detecting key points are also within the scope of the invention.

Specifically, the pose estimation module 103 may estimate the three-dimensional pose of the target object from the key points of the target object by: acquiring two-dimensional coordinates of key points of the target object; converting the two-dimensional coordinates of the key points of the target object into three-dimensional coordinates; and estimating the three-dimensional posture of the target object according to the three-dimensional coordinates of the key points of the target object. Specifically, the step of converting the two-dimensional coordinates of the key points of the target object into the three-dimensional coordinates may be implemented by a PnP (peer-n-point) algorithm. In combination with the above embodiments, during the generation of the key points, the PVNet can also generate the probability distribution of the object key points, that is, the mean (mean) and covariance (covariance) of the spatial distribution of the key points. Subsequently, in the PnP algorithm, the uncertainty of the key point can be utilized, thereby improving the robustness of the pose estimation module 103. Further, thereby, the three-dimensional pose of the target object is estimated by mapping the two-dimensional coordinates of the key points to the three-dimensional coordinates, and the acquisition of the two-dimensional coordinates of the key points is relatively simple and efficient, as compared to directly performing the estimation of the three-dimensional pose and the rotation angle.

In one particular implementation, the pose selection module 104 may be a sequence-based predictive model. The keypoint detection module 101 and the pose estimation module 103 provide the sequence of estimated three-dimensional poses of the target object to the pose selection module 104 based on a sequence of sample images formed by a plurality of sample images containing the same target object. Specifically, the gesture selection module 104 may be, for example, a Long-Short Term Memory (LSTM) model, so that the gesture selection module 104 may include a corresponding encoder and decoder (structure within the LSTM), which is not limited in this application, and other sequence-based (e.g., time-series) prediction models are within the scope of the present application. Specifically, the pose selection module 104 may output only the optimal pose. In some variations, the pose selection module 104 may output the pose of the target object of each image and the probability that the pose is the optimal pose, so as to provide more data for more comprehensive analysis.

In some embodiments of the present application, the keypoint detection module 101 incorporates an object box identification module 102. The input image is input to both the keypoint detection module 101 and the target frame identification module 102. The pose estimation module 103 estimates the three-dimensional pose of the target object according to the key points of the target object and the target frame of the target object identified by the target frame identification module 102. Therefore, more complete data can be provided for the pose estimation module 103, and the situation that the keypoints partially located in the target frame due to the error of the keypoint detection module 101 are still referred to by the pose estimation module 103 so as to reduce the pose estimation accuracy is avoided.

The target object posture selection model training method provided by the application comprises the following steps:

step S111: obtaining a training sample, wherein the training sample comprises a sample image, key points of a target object in the sample image, a three-dimensional posture of the target object in the sample image and an optimal posture of the target object.

Specifically, step S111 may further include preprocessing of the sample image. Preprocessing mainly includes preprocessing such as scaling, clipping, mirror image transformation, normalization and the like on an input image so as to realize data set expansion and further achieve the purpose of enhancing model robustness.

In particular, the sample image may be obtained from a video that intercepts successive image frames over a period of time. Thereby facilitating the formation of a sequence of sample images for processing by the target object pose selection model. For example, in the embodiment of traffic monitoring, step S111 may input continuous images captured in a real road condition video captured at a frequency of 10 hz within 15 minutes. The set of training samples may include light, medium and heavy traffic conditions. On one hand, when continuous images of moderate and crowded traffic conditions are collected, samples far away from a collecting device (such as a camera) can be collected, and at the moment, a target object is relatively small in the images, so that training of the small target object can be realized, and the subsequent selection and posture estimation processing effects of small targets are improved; on the other hand, the sample size for training can be increased by acquiring continuous images of different crowded traffic conditions; in another aspect, the richness of the samples can be increased, and the network has greater robustness.

Step S112: inputting the sample image into the target object pose selection model.

Step S113: and training the target object posture selection model based on the differences of the key points of the target object detected by the key point detection module of the target object posture selection model, the three-dimensional posture estimated by the posture estimation module and the optimal posture selected by the posture selection module, the key points of the target object in the sample image, the three-dimensional posture of the target object in the sample image and the optimal posture of the target object.

Specifically, the method and the device can train the key point detection module, the gesture selection module and the gesture selection module in sequence. In other embodiments, the keypoint detection module, the gesture selection module, and the gesture selection module may also be trained simultaneously as a whole. Further, the present application may use a back propagation algorithm to train the target object pose selection model, but the present application is not limited thereto, and other training algorithms are within the scope of the present invention.

Referring now to fig. 5, fig. 5 illustrates a flow chart of a target object pose selection method according to an embodiment of the present application. Fig. 5 shows the following steps in total:

step S131: a plurality of images to be processed are input into a target object posture selection model, and the target object posture selection model is trained through the target object posture selection model training method.

Specifically, step S131 may also include preprocessing of the image to be processed. The preprocessing method may be the same as the preprocessing method of the sample image, and is not described herein.

Step S132: and selecting the posture of the target object of the image to be processed according to the output of the target object posture selection model.

Specifically, step S132 may select, for example, the posture of the optimal target object output by the target object posture selection model as the posture of the target object of the image to be processed.

Therefore, in the target object posture selection model training method and the target object posture selection method provided by the application, the target object posture selection model comprising the key point detection module, the posture estimation module and the posture selection module realizes the three-dimensional posture recognition and evaluation of the target object based on key point detection, so that the evaluation of the posture of the target object in the image is realized based on the target object posture selection model, the image for executing subsequent recognition analysis is determined conveniently, and the accuracy of image recognition is improved.

The image scoring model training method provided by the present application is described below with reference to fig. 3 and 4. Fig. 3 shows a flowchart of an image scoring model training method according to an embodiment of the present application. Fig. 4 shows a schematic diagram of an image scoring model according to an embodiment of the application.

The image scoring model 200 may include a target object pose selection model 100 and at least one image attribute scoring model 201. The target object pose selection model 100 estimates a pose score for the target object based on the input image. Specifically, the target object pose selection model 100 may be, for example, the target object pose selection model shown in fig. 2. In the present embodiment, the target object pose selection model outputs the pose of each input image and the probability that the pose is the optimal pose. Thus, the probability that the output posture is the optimal posture can be used as the posture score of the target object. The image attribute scoring model 201 estimates an attribute score of an image based on an input image. Specifically, the image attribute scoring model may include one or more of an image sharpness scoring model, an image distortion scoring model, and a target object integrity scoring model. In embodiments such as traffic monitoring, the image attributes may include, for example, sharpness, integrity of the vehicle, color distortion, whether a license plate is included, integrity of the license plate, inclination of the license plate, etc., which are described only schematically and are not intended to be limiting. The image scoring model 200 estimates an image total score based on the pose score and the attribute score. Specifically, the image scoring model 200 may compute and obtain the total image score by fusing the pose score and the attribute score in a score fusion manner such as weighted summation and average.

The image scoring model training method comprises the following steps:

step S121: obtaining a training sample, wherein the training sample comprises a sample image, a posture score of a target object in the sample image, at least one attribute score of the sample image and a total score of the sample image.

Specifically, step S121 may further include preprocessing of the sample image. The preprocessing is mainly to perform preprocessing such as scaling, clipping, mirror image transformation, normalization and the like on the input image so as to realize data set expansion and further achieve the purpose of enhancing the robustness of the model.

Step S122: inputting the sample image into the image scoring model.

Step S123: training the target object posture selection model based on the posture score of the target object estimated by the target object posture selection model, the attribute score of the image estimated by the image attribute score model and the total image score estimated by the image score model, and the difference of the posture score of the target object in the sample image, the at least one attribute score of the sample image and the total score of the sample image.

Specifically, the target object posture selection model and the image attribute scoring model can be trained sequentially. In other embodiments, the target object pose selection model and the image attribute scoring model may also be trained simultaneously as a whole.

Referring now to fig. 6, fig. 6 shows a flow chart of an image scoring method according to an embodiment of the present application. Fig. 6 shows the following steps in total:

step S141: the image to be processed is input into an image scoring model, and the image scoring model is trained through the image scoring model training method.

Specifically, step S141 may also include preprocessing of the image to be processed. The preprocessing method may be the same as the preprocessing method of the sample image, and is not described herein.

Step S142: and acquiring the total image score of the image to be processed according to the output of the image score model.

Therefore, in the image scoring model training method and the image scoring method provided by the application, the image scoring is realized from multiple angles such as the posture of the target object, the image attribute and the like by combining the target object posture selection model and the image attribute scoring model, so that the image which is easier to identify and higher in identification accuracy can be selected according to the image multidimensional scoring, and the image analysis and the image processing are convenient to execute.

The above exemplary embodiments of the present application are shown, the present application is not limited thereto, and in each embodiment, the addition, the omission, and the sequence change of the steps are all within the protection scope of the present application; the embodiments may be implemented individually or in combination.

The target object pose selection model training apparatus 300 provided by the present application is described below with reference to fig. 7. The target object posture selection model comprises a key point detection module, a posture estimation module and a posture selection module, wherein the key point detection module detects key points of a target object based on an input image, the posture estimation module estimates the three-dimensional posture of the target object at least according to the key points of the target object, and the posture selection module selects an optimal posture according to the three-dimensional posture estimated by the posture estimation module. The target object pose selection model training apparatus 300 includes an acquisition module 310, an input module 320, and a training module 330.

The obtaining module 310 is configured to obtain a training sample, the training sample including a sample image, key points of a target object in the sample image, a three-dimensional pose of the target object in the sample image, and an optimal pose of the target object;

the input module 320 is configured to input the sample image into the target object pose selection model;

the training module 330 is configured to train the target object pose selection model based on the differences between the keypoints of the target object in the sample image, the three-dimensional pose of the target object in the sample image, and the optimal pose of the target object, which are detected by the keypoint detection module of the target object pose selection model, the three-dimensional pose estimated by the pose estimation module, and the optimal pose selected by the pose selection module.

In the training device 300 for the target object posture selection model provided by the application, the target object posture selection model comprising the key point detection module, the posture estimation module and the posture selection module realizes the three-dimensional posture recognition and evaluation of the target object based on key point detection, so that the evaluation of the posture of the target object in the image is realized based on the target object posture selection model, the image for executing subsequent recognition analysis is convenient to determine, and the accuracy of image recognition is improved.

The invention may also provide an image scoring model training device configured to perform the steps performed by the image scoring model training method shown in fig. 3. The present invention may also provide a target object pose selection apparatus configured to perform the steps performed by the target object pose selection method shown in fig. 5. The present invention may also provide an image scoring apparatus configured to perform the steps performed by the image scoring method shown in fig. 6.

The target object posture selection model training device 300, the image scoring model training device, the target object posture selection device and the image scoring device can be realized through software, hardware, firmware and any combination thereof. Fig. 7 is a schematic diagram of the target object posture selection model training apparatus 300 provided in the present application, and the splitting, merging, and adding of modules are all within the scope of the present application without departing from the concept of the present application.

In an exemplary embodiment of the disclosure, a computer readable storage medium is also provided, on which a computer program is stored, which when executed by a processor, for example, may implement the steps of one or more of the target object pose selection model training method, the image scoring model training method, the target object pose selection method, and the image scoring method described in any of the above embodiments. In some possible embodiments, the various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present application described in one or more of the method parts of the target object pose selection model training method, the image scoring model training method, the target object pose selection method and the image scoring method described above in this specification, if the program product is run on the terminal device.

Referring to fig. 8, a program product 800 for implementing the above method according to an embodiment of the present application is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not so limited, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing devices may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In an exemplary embodiment of the present disclosure, there is also provided an electronic device, which may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform, via execution of the executable instructions, the steps of one or more of the target object pose selection model training method, the image scoring model training method, the target object pose selection method, and the image scoring method of any of the above embodiments.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 600 according to this embodiment of the present application is described below with reference to fig. 9. The electronic device 600 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 9, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.

Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform the steps according to various exemplary embodiments of the present application described in one or more of the methods section of the target object pose selection model training method, the image scoring model training method, the target object pose selection method, and the image scoring method described above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 2 or 3.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, to name a few.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a mobile hard disk, or the like) or on a network, and includes several instructions to cause a computing device (which may be a personal computer, a server, or a network device, or the like) to execute one or more of the target object posture selection model training method, the image scoring model training method, the target object posture selection method, and the image scoring method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A training method for a target object posture selection model is characterized in that the target object posture selection model comprises a key point detection module, a posture estimation module and a posture selection module, the key point detection module detects key points of a target object based on an input image sequence, the posture estimation module estimates three-dimensional postures of the target object according to the key points of the target object of each image in the input image sequence to form a three-dimensional posture sequence, the posture selection module outputs the probability that each three-dimensional posture in the three-dimensional posture sequence is the optimal posture according to the three-dimensional posture sequence estimated by the posture estimation module based on the input image sequence to select the optimal posture, and the training method for the target object posture selection model comprises the following steps:

inputting the sample image into the target object pose selection model;

training the target object pose selection model based on the differences between the key points of the target object in the sample image, the three-dimensional pose of the target object in the sample image and the optimal pose of the target object, the differences being detected by the key point detection module of the target object pose selection model, the three-dimensional pose estimated by the pose estimation module and the optimal pose selected by the pose selection module,

and the probability that the three-dimensional posture of the target object is the optimal posture is used as the posture score of the target object, and the target object posture selection model is trained on continuous traffic images with different crowding degrees.

2. The method of claim 1, wherein the keypoint detection module detects keypoints of the target object for the input image by:

and voting the positions of the key points by adopting a random sampling consensus algorithm to obtain the key points of the target object.

3. The method of claim 1, wherein the pose estimation module estimates the three-dimensional pose of the target object from the key points of the target object by:

acquiring two-dimensional coordinates of key points of the target object;

4. The method as claimed in claim 1, wherein the pose selection module is a sequence-based prediction model, and the keypoint detection module and the pose estimation module provide the sequence of estimated three-dimensional poses of the target object to the pose selection module based on a sequence of sample images formed by a plurality of sample images containing the same target object.

5. The method as claimed in claim 1, wherein the keypoint detection module is coupled to a target frame recognition module, and the pose estimation module estimates the three-dimensional pose of the target object according to the keypoints of the target object and the target frame of the target object recognized by the target frame recognition module.

6. An image scoring model training method, wherein the image scoring model includes a target object posture selection model and at least one image attribute scoring model, the target object posture selection model estimates a probability that a three-dimensional posture of a target object is a best posture based on an input image as a posture score of the target object, the image attribute scoring model estimates an attribute score of an image based on the input image, and the image scoring model estimates an image total score based on the posture score and the attribute score, the image scoring model training method comprising:

acquiring a training sample, wherein the training sample comprises a sample image, a posture score of a target object in the sample image, at least one attribute score of the sample image and a total score of the sample image, and the sample image comprises continuous traffic images with different crowding degrees;

inputting the sample image into the image scoring model;

and training the image scoring model based on the difference between the attitude score of the target object estimated by the target object attitude selection model, the attribute score of the image estimated by the image attribute scoring model and the total image score estimated by the image scoring model and the attitude score of the target object in the sample image, the at least one attribute score of the sample image and the total score of the sample image.

7. The image scoring model training method of claim 6, wherein the image attribute scoring model comprises one or more of an image sharpness scoring model, an image distortion scoring model, and a target object integrity scoring model.

8. A target object pose selection method, comprising:

inputting a plurality of images to be processed into a target object pose selection model, the target object pose selection model being trained via a target object pose selection model training method according to any one of claims 1 to 5;

9. An image scoring method, comprising:

inputting an image to be processed into an image scoring model, wherein the image scoring model is trained through the image scoring model training method according to claim 6 or 7;

10. A training device of a target object posture selection model is characterized in that the target object posture selection model comprises a key point detection module, a posture estimation module and a posture selection module, the key point detection module detects key points of a target object based on an input image sequence, the posture estimation module estimates three-dimensional postures of the target object according to the key points of the target object of each image in the input image sequence to form a three-dimensional posture sequence, the posture selection module outputs the probability that each three-dimensional posture in the three-dimensional posture sequence is the optimal posture according to the three-dimensional posture sequence estimated by the posture estimation module based on the input image sequence so as to select the optimal posture, and the training device of the target object posture selection model comprises:

a training module configured to train the target object pose selection model based on a difference between the keypoints of the target object in the sample image, the three-dimensional pose of the target object in the sample image, and the optimal pose of the target object, detected by the keypoint detection module of the target object pose selection model, the three-dimensional pose estimated by the pose estimation module, and the optimal pose selected by the pose selection module,