WO2022142783A1 - 一种图像处理方法以及相关设备 - Google Patents
一种图像处理方法以及相关设备 Download PDFInfo
- Publication number
- WO2022142783A1 WO2022142783A1 PCT/CN2021/130651 CN2021130651W WO2022142783A1 WO 2022142783 A1 WO2022142783 A1 WO 2022142783A1 CN 2021130651 W CN2021130651 W CN 2021130651W WO 2022142783 A1 WO2022142783 A1 WO 2022142783A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- image
- frame
- angle
- measured
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 34
- 238000002372 labelling Methods 0.000 claims abstract description 146
- 238000000034 method Methods 0.000 claims abstract description 115
- 238000001514 detection method Methods 0.000 claims description 217
- 238000012549 training Methods 0.000 claims description 181
- 238000012545 processing Methods 0.000 claims description 96
- 238000005259 measurement Methods 0.000 claims description 93
- 230000006870 function Effects 0.000 claims description 35
- 230000003993 interaction Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 5
- 230000002452 interceptive effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 59
- 238000010586 diagram Methods 0.000 description 27
- 238000004891 communication Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 10
- 230000009471 action Effects 0.000 description 8
- 230000003044 adaptive effect Effects 0.000 description 5
- 230000019771 cognition Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 206010061274 Malocclusion Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
Definitions
- the embodiments of the present application relate to the field of data processing, and in particular, to an image processing method and related equipment.
- AI Artificial intelligence
- AI technology is a technology that studies the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Through AI technology, the determination of the angle of the object to be measured can be achieved.
- the angle of the object to be labeled in the sample image is manually labeled, and the angle measurement model is trained according to the manually labeled angle and the sample image. Then, the angle of the object to be measured is determined according to the trained angle measurement model.
- Embodiments of the present application provide an image processing method and related devices, which are used to accurately determine the rotation angle of an object in an image.
- a first aspect of the embodiments of the present application provides an image processing method, the method comprising:
- the sample image includes the object to be marked
- the reference template image includes the reference object corresponding to the object to be marked
- the reference template image is marked with the reference key points of the reference object, and the number of reference key points greater than or equal to 2.
- the information of the sample key points is received, and the information of the sample key points is obtained by the user marking the object to be marked in the sample image based on the reference template image.
- the sample rotation angle is determined, and the sample rotation angle is the rotation angle of the object to be marked relative to the reference object.
- the reference object in the reference template image is used as the benchmark for measuring the rotation angle of the object, that is, the rotation angle is relative to the reference object, so the rotation angle of the reference object relative to the reference template image is 0°.
- the sample rotation angle is determined based on the sample key points, and the sample key points are marked based on the reference object in the reference image, that is, the reference object is the standard for determining the sample rotation angle. Therefore, the sample rotation angle has a unified standard, and the angle measurement model is trained according to the sample rotation angle, which reduces the difficulty of training. Moreover, according to the target angle measurement model obtained by training, the rotation angle of the object to be measured in the image to be measured can be accurately determined.
- the number of reference key points and sample key points are both 2, and determining the sample rotation angle may specifically include: according to the reference key line and the sample key line The included angle between them determines the rotation angle of the sample.
- the reference key line is the connection between the two reference key points, and the sample key line is the connection between the two sample key points.
- the sample rotation angle is determined by the angle between the sample key line and the reference key line. Since the sample key line is derived from the sample key point, the labeling of the sample key point is derived from the reference object marked with the reference key point. , so the determination of the sample rotation angle is based on the feature of the reference key point, that is, the determination of the sample rotation angle is based on the unified benchmark of the reference key point.
- the angle measurement model is trained according to the data obtained based on the unified benchmark, which reduces the difficulty of training. Through the target angle measurement model obtained by training, the rotation angle of the object to be measured in the image to be measured can be accurately determined.
- the method may further include: acquiring a reference template image.
- the reference annotation box represents the position of the reference object in the reference template image.
- the sample annotation frame is determined, and the sample annotation frame represents the position of the object to be marked in the sample image.
- the sample labeling frame represents the position of the object to be labelled in the sample image
- the information of the sample labeling frame is subjected to operation processing, that is, the operation processing is performed on the position information of the marked object.
- the position of the object obtains relevant information, which improves the flexibility of the scheme.
- determining the sample annotation frame according to the reference annotation frame may specifically include: according to the reference annotation frame, the information of reference key points and The information of the sample key points determines the sample annotation frame, the reference key point and the reference annotation frame have a reference position relationship, and the sample key point and the sample annotation frame have the reference position relationship.
- the positional relationship between the sample key point and the sample annotation frame is the reference positional relationship, that is, the sample annotation frame is based on the reference positional relationship. It is determined by a unified benchmark. Therefore, through the training model of the sample annotation frame, the target model obtained by training can infer the image to be tested based on the unified standard of the reference position relationship, and the obtained result is obtained based on the unified standard, which improves the reasoning ability. Accuracy.
- the ability of the rotating object detection model to determine the position frame of the object in the image can also be trained, specifically:
- the information of the sample image and the sample annotation frame is input into the initial rotating object detection model, so that the position of the object to be marked is regressed by the initial rotating object detection model, and the information of the sample regression position frame is obtained.
- the initial rotating object detection model is trained to obtain the target rotating object detection model, and the target object detection model is used to determine the position of the object to be measured in the image to be measured.
- the training process of the initial rotating object detection model may include: performing iterative training on the initial rotating object detection model according to the information of the sample regression position frame, the information of the sample annotation frame and the position regression loss function, until the preset conditions are met. .
- the sample annotation frame is determined based on the unified reference of the reference position relationship, and the process of performing position regression on the sample annotation frame can specifically determine the accurate regression position frame by finding features related to the reference position relationship. , so the training process is simpler.
- the ability of the rotating object detection model to classify objects in the image can also be trained, specifically: received sample category, sample category The category of the object to be annotated in the sample image by the user. Input the sample category into the initial rotating object detection model to classify the object to be labeled by the initial rotating object detection model to obtain the predicted sample category. According to the predicted sample category, sample category and classification loss function, the initial rotating object detection model is trained to obtain the target rotating object detection model. The target rotating object detection model is used to determine the category of the object to be measured in the image to be measured.
- the category of the object to be measured which is used to determine the predicted rotation angle of the object to be measured relative to the reference object.
- the training process of the initial rotating object detection model may include: performing iterative training on the initial rotating object detection model according to the predicted sample category, the sample category and the classification loss function until the preset conditions are met.
- the ability of the rotating object detection model to classify objects in the image is trained.
- the category of the object to be detected can be obtained according to the target rotating object detection model, which simplifies the process of determining the predicted rotation angle.
- the predicted category includes at least one of positive information and negative information of the object to be measured.
- the object to be tested on the front side and the object to be tested on the reverse side can be distinguished, and different operations can be performed for the object to be tested on the front side or the object to be tested on the reverse side, which improves the flexibility of the solution.
- an angle measurement model may also be trained to determine the ability of similar image pairs and heterogeneous image pairs, specifically: :
- the sample image can be intercepted according to the sample annotation frame to obtain the intercepted sample image.
- rotate and intercept the sample images according to the n first rotation angles to obtain n rotated sample images.
- the n first rotation angles are obtained according to the sample rotation angles, and the n first rotation angles correspond to the n rotated sample images one-to-one. is an integer greater than or equal to 2.
- Feed n rotated sample images into the angle training gallery. Similar sample image pairs and heterogeneous sample image pairs are determined in the angle training gallery.
- Objects in similar sample image pairs have the same angle and category, and objects in heterogeneous sample image pairs have different angles or categories.
- the initial angle measurement model is trained according to the same sample image pair and the heterogeneous sample image pair, and the target angle measurement model is obtained.
- the ability of the angle measurement model to determine the same type of image pair and the different type of image pair is trained, and the objects in the same type of image pair have the same category and angle, that is, the images in the same type of image pair have the same or similar shape. That is to say, the embodiment of the present application trains the angle measurement model in the ability to determine the same or similar shape.
- the training process of the angle measurement model in the embodiment of the present application is more efficient Targeted, the training process is simpler and more precise.
- the angle measurement model can be trained according to the sample category.
- the step of determining the same sample image pair and the heterogeneous sample image pair in the angle training gallery may specifically include: determining the same sample image pair and the heterogeneous sample image pair in the angle training gallery according to the sample category.
- the target rotating object detection model can be used to infer the regression position frame of the object to be measured in the image to be measured. Specifically: it can be Input the image to be tested into the target rotating object detection model to perform position regression of the object to be measured in the image to be measured through the target rotating object detection model, and obtain the regression position frame of the object to be measured.
- the regression position frame indicates that the object to be measured is in the image to be measured.
- the position in the regression position box is used to determine the predicted rotation angle, and the predicted rotation angle is the predicted value of the rotation angle of the object to be measured relative to the reference object.
- the position of the object to be measured in the image to be measured is regressed by using the target rotating object detection model trained in the fourth embodiment of the aforementioned first aspect, because the target rotating object detection model is based on the reference object
- the regression position frame obtained by the unified benchmark training and the position regression in this embodiment is also obtained based on the unified benchmark of the reference object, and the determined regression position frame is more accurate.
- the rotation angle of the object to be measured in the image to be measured can be inferred by the target angle measurement model, Specifically: the image to be tested can be input into the target rotating object detection model to perform position regression of the object to be measured in the image to be measured through the target rotating object detection model to obtain the regression position frame of the object to be measured, and the regression position frame represents the object to be measured. position in the image to be measured. Then intercept the image to be tested according to the regression position frame to obtain the intercepted image.
- the m second rotation angles may also be determined according to the regression position frame, where m is an integer greater than or equal to 2.
- the intercepted images are rotated according to the m second rotation angles to obtain m rotated images, and the m second rotation angles are in one-to-one correspondence with the m rotated images.
- the target angle measurement model the target image in the m rotated images is determined.
- the object in the target image has the same category and angle as the reference object in the reference template image.
- the predicted rotation angle corresponding to the target image can be determined among the m second rotation angles.
- m second rotation angles may be determined according to the return position frame, which may specifically include: determining a frame according to the return position frame Rotation angle.
- the frame rotation angle is the rotation angle of the return position frame relative to the horizontal frame, and the horizontal frame has a horizontal edge. And, the frame rotation angle is greater than or equal to 0° and less than or equal to 90°. Then, m second rotation angles are determined according to the rotation angle of the frame.
- the target image in the m rotating images is determined by the target angle measurement model, specifically: It may include constructing an image pair from each of the m rotated images, and an image in the template image library. Then, through the target angle measurement model, the image pairs of the same type are determined in the image pairs. Among them, objects in homogeneous image pairs have the same angle and class. The target image in the homogeneous image pair can then be determined. The target image is included in the m rotating images.
- the rotation angle of the object may also be determined according to the type of the object.
- the sample category is input into the initial rotating object detection model, so that the object to be labeled is classified by the initial rotating object detection model, and the predicted sample category is obtained.
- the training process of the initial rotating object detection model may include: performing iterative training on the initial rotating object detection model according to the predicted sample category, the sample category and the classification loss function until the preset conditions are met.
- the category prediction of the object to be measured can be performed through the target rotating object detection model to obtain the predicted category. Then, a reference template image is determined according to the predicted category, and the reference object in the reference template image has the predicted category.
- the step of constructing an image pair may specifically include: constructing an image pair by using each image in the n rotated images and a reference template image determined according to the predicted category. Among them, the reference template image is included in the image in the reference template library.
- the reference template image is determined in the template image library according to the predicted category, which greatly reduces the number of constructed image pairs, and also greatly reduces the workload of the device for determining similar image pairs in the image pairs, saving The computing resources and storage resources of the device.
- the efficiency of determining the target image is improved, and the efficiency of determining the predicted rotation angle is also improved.
- the ability of a key point detection model to determine the key points of objects in an image can be trained, specifically:
- the information of the sample image and the sample key points can be input into the initial key point detection model, so as to perform position regression on the points in the sample image through the initial key point detection model to obtain the information of the regression sample key points.
- the initial key point detection model is trained to obtain the target key point detection model.
- the predicted keypoints of the object are measured.
- the training process of the initial key point detection model may specifically include: performing iterative training on the initial key point detection model according to the information of the key points of the regression samples, the information of the key points of the samples, and the regression loss function of the position of the key points, until it satisfies the preset conditions.
- a key point detection model may also be trained to detect key points of objects in an image.
- the ability to classify specifically: it can receive the sample key point category, and the key point category is the category of the object to be labeled in the sample image. Then, the sample image and the sample key point category are input into the initial key point detection model, and the sample key points in the sample image are classified through the initial key point detection model to obtain the predicted sample key point category. Then, according to the predicted sample key point category, the sample key point category and the key point classification loss function, the initial key point detection model is trained to obtain the target key point detection model.
- the target key point detection model is used to determine the category of the object to be tested in the image to be tested.
- the training process of the initial key point detection model may specifically include: performing iterative training on the initial key point detection model according to the predicted sample key point category, the sample key point category and the key point classification loss function until the preset conditions are met.
- the sample key point category may include at least one of positive information and negative information of the sample object.
- the predicted key points of the object to be measured in the image to be measured can be determined through the target key point detection model.
- the image to be measured can be input into the target key point detection model, so as to locate the points in the image to be measured through the target key point detection model.
- the predicted key points of the object to be measured are obtained, and the predicted key points are used to determine the predicted rotation angle.
- the target key point detection model may be used to determine the category of the object to be measured in the image to be measured, that is, to predict key points.
- Category specific: the predicted key points can be classified by the target key point detection model to obtain the predicted key point category, and the predicted key point category is used to determine the predicted rotation angle.
- the number of prediction key points and reference key points are both 2, and the number of prediction key points and prediction key points can be
- the category determines the predicted rotation angle, specifically: the reference template image can be determined by predicting the key point category. Among them, the category of the reference object in the reference template image is the same as the category of the predicted key point. Then, the rotation angle of the predicted key line relative to the reference key line can be determined, and the rotation angle is the predicted rotation angle.
- the prediction key line consists of 2 prediction key points
- the reference key line consists of 2 reference key points
- the 2 prediction key points correspond to the 2 reference key points one-to-one.
- the posture of the three-dimensional object in the image can be predicted by using the image on the two-dimensional plane.
- the predicted rotation angle of the three-dimensional object corresponding to the characteristic shape may be determined according to the characteristic shape on the two-dimensional image. It can be further expressed as: the feature shape can be determined through k prediction key points.
- the reference shape is then determined by referring to k reference key points corresponding to the k prediction key points in the template image.
- the posture of the object and the three-dimensional rotation angle the posture of the object to be measured can be reflected.
- k is an integer greater than or equal to 2.
- the prediction of the three-dimensional posture of the object in the two-dimensional image based on the two-dimensional image is realized through the target model.
- the method does not need to construct a three-dimensional model, simplifies the process of determining the three-dimensional attitude, and saves resources such as computation and storage consumed by the device for determining the three-dimensional attitude of the object.
- a second aspect of the embodiments of the present application provides an image processing apparatus, the apparatus includes: an interaction unit and a processing unit.
- the interaction unit is used for providing the user with the sample image and the reference template image.
- the sample image includes the object to be labeled
- the reference template image includes the reference object corresponding to the object to be labeled
- the reference template image is labeled with reference key points of the reference object, and the number of reference key points is greater than or equal to 2.
- the reference object in the reference template image is used as the benchmark for measuring the rotation angle of the object, that is, the rotation angle is relative to the reference object, so the rotation angle of the reference object relative to the reference template image is 0°.
- the interaction unit is further configured to receive the information of the sample key points, and the information of the sample key points is obtained by the user marking the objects to be marked in the sample image based on the reference template image.
- the processing unit is configured to determine the sample rotation angle according to the information of the sample key points of the object to be marked in the sample image and the information of the reference key points of the reference object in the reference template image, and the sample rotation angle is the rotation of the object to be marked relative to the reference object angle.
- the image processing apparatus is used to perform the method of the aforementioned first aspect.
- a third aspect of the embodiments of the present application provides a computer program product, which, when running on a computer, enables the computer to execute the image processing method described in the first aspect.
- a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the image processing method described in the first aspect above .
- a fifth aspect of the embodiments of the present application provides an image processing apparatus, including a processor and a memory, and the processor and the memory are coupled.
- Memory is used to store programs.
- the processor is configured to execute the program in the memory, so that the processor executes the image processing method described in the first aspect.
- a sixth aspect of the embodiments of the present application provides a chip system, the chip system includes at least one processor and a communication interface, the communication interface and the at least one processor are interconnected through a line, and the at least one processor is used to run a computer program or instruction to perform The image processing method described in any one of the possible implementation manners of the first aspect.
- the communication interface in the chip may be an input/output interface, a pin, a circuit, or the like.
- the chip system described above in this application further includes at least one memory, where instructions are stored in the at least one memory.
- the memory may be a storage unit inside the chip, such as a register, a cache, etc., or a storage unit of the chip (eg, a read-only memory, a random access memory, etc.).
- Fig. 1 is a kind of schematic flow chart of model training
- FIG. 2a is a schematic diagram of a system architecture provided by an embodiment of the present application.
- 2b is a schematic structural diagram of a rotating object pose detection system provided by an embodiment of the application.
- FIG. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
- FIG. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
- 5a is a schematic diagram of a labeling method provided by an embodiment of the present application.
- FIG. 5b is a schematic diagram of another labeling method provided by an embodiment of the present application.
- FIG. 6 is a schematic flowchart of another image processing method provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of a training method for a rotating object detection model provided by an embodiment of the present application.
- FIG. 8 is a schematic flowchart of another image processing method provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of a training method for an angle measurement model provided by an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of another image processing apparatus provided by an embodiment of the present application.
- FIG. 11 is a schematic flowchart of another image processing method provided by an embodiment of the present application.
- FIG. 12 is a schematic diagram of a method for a model inference process provided by an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of another image processing apparatus provided by an embodiment of the present application.
- FIG. 15 is a schematic structural diagram of another image processing apparatus provided by an embodiment of the present application.
- 16 is a schematic flowchart of another image processing method provided by an embodiment of the present application.
- FIG. 17 is a schematic structural diagram of another image processing apparatus provided by an embodiment of the present application.
- FIG. 18 is a schematic structural diagram of another image processing apparatus provided by an embodiment of the present application.
- the embodiment of the present application provides an image processing method for accurately labeling a sample image, so as to train a model according to the sample image, and realize accurate prediction of the pose of a rotating object through the trained model.
- the pose represents the position and pose of the object
- the pose prediction includes the prediction of the position and the prediction of the pose.
- inference refers to the prediction of a certain element of the object to be measured in the image to be measured, for example, the inference of the position of the object to be measured refers to the prediction of the position of the object to be measured.
- FIG. 1 is a schematic diagram of a model training process.
- the angle of the object to be labelled is obtained.
- the angle represents the orientation of the object to be annotated in the sample image.
- the initial model is trained to obtain the target model.
- the target model has the ability to predict the posture of the object to be tested in the image to be tested.
- the embodiments of the present application provide an image processing method and an image processing apparatus, which annotate a sample image based on a unified standard, so as to ensure the consistency and accuracy of the annotation results.
- Training the model according to the labeling result can improve the accuracy of the model's prediction of the angle of the object to be measured in the image to be measured, and can also reduce the difficulty of model training.
- the labeling of key points ensures the consistency and accuracy of labeling results.
- the embodiment of the present application provides a method for inferring the pose of an object in an image, and specifically describes the key point labeling and the application process of the labeling result.
- the method will be described in detail by taking two embodiments as examples. It is worth noting that the two implementations are only examples of the process of labeling key points and applying the labeling results. Any method for predicting the pose of an object in an image based on key point annotations falls within the scope described in the embodiments of the present application, and is not limited here.
- the rotation angle of the object is determined by determining the position frame of the object in the image.
- the embodiment of the present application provides a system architecture.
- the system architecture includes an execution device 210 , a training device 220 , a database 230 , a terminal device 240 , a data storage system 250 and a data acquisition device 260 , wherein the execution device 210 includes a computing module 211 .
- the data acquisition device 260 is used to obtain sample data and loss values generated by training, and store them in the database 230.
- the training device 220 generates the target model/model based on the sample data maintained in the database 230 and the loss values generated by training. Rule 213.
- the target model/rule 213 can adaptively adjust the weight parameters corresponding to the loss value, and at the same time use the advantages of parallel computing to explore the effectiveness of the weights and inherit the excellent network parameters and weights in the training process, so as to achieve the optimal result within a training time. Train the model.
- a reference template image may be stored in the database 230 .
- the training device 220 is used for generating a model, and performing iterative training on the model using the reference template images in the database 230 to obtain a target model.
- the execution device 210 determines the rotation angle of the object in the image according to the target model, the rotation angle can be sent to different devices, to the terminal device 240, or to the data storage system 250, which is not specifically limited here.
- the terminal device 240 and the execution device 210 may be independent devices, respectively, or may be a whole, which is not specifically limited here.
- the execution device 210 is configured with a communication interface 212 for data interaction with the terminal device 240.
- the user can obtain the reference template image and related information through the terminal device 240, and the user can input samples to the communication interface 212 through the terminal device 240.
- the execution device 210 can train the initial model according to the sample key points and categories to obtain the target model; in the model prediction stage, the user can input the image to be tested to the communication interface 212 through the terminal device 240, and the execution device 210
- the predicted rotation angle of the object to be measured in the image to be measured can be determined according to the image to be measured and the target model, and the execution device 210 can send the predicted rotation angle to the client device 240 through the communication interface 212 to provide the user with the predicted rotation angle.
- FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices and components shown in the figure does not constitute any limitation.
- the user may also be another subject other than a human, for example, an industrial robot, an intelligent system, etc., as long as it is an entity that can use the system, which is not limited here.
- FIG. 2b is a schematic structural diagram of a rotating object pose detection system according to an embodiment of the present application.
- the rotating object pose detection system provided by the embodiment of the present application includes a rotating object pose labeling module 201 and a cascaded rotating object pose detection module 202 .
- the cascade-type rotating object pose detection module 202 includes two sub-modules, namely, a cascade-type rotating object pose detection training sub-module 2021 and a cascade-type rotating object pose detection sub-module 2022 .
- the application of the rotating object pose detection system includes two stages, namely, a model training stage and a model inference stage.
- the model training stage is realized by the rotating object pose labeling module 201 and the cascade rotating object pose detection training sub-module 2021 .
- the rotated object pose labeling module 201 mainly includes but is not limited to the following functions:
- the category of the object to be labeled is also called a sample category.
- the rotating object pose labeling module 201 is also referred to as a key point-based adaptive rotating object pose labeling module.
- Keypoint-based means that the module's annotation of sample keypoints and sample annotation boxes is based on reference keypoints.
- Adaptive means that the process of determining the sample labeling frame and the sample rotation angle is automatically realized by this module, and there is no need to manually label the sample labeling frame and the sample rotation angle.
- the cascaded rotating object pose detection training sub-module 2021 is used to train the initial model to obtain the target model.
- cascaded means that two models are required to cooperate to realize pose detection. Therefore, in the model training stage, it is necessary to train these two models, namely the rotating object detection model and the angle measurement model.
- the training process of the rotating object detection model is as follows: the information from the sample annotation frame of the rotating object pose labeling module 201 and the sample image are input into the initial rotating object detection model, so as to train the rotating object detection model to perform position regression on the objects in the image. ability to obtain a target rotating object detection model with this ability.
- sample category from the rotating object pose labeling module 201 can also be input into the rotating object detection model to train the rotating object detection model's ability to determine the category of the object in the image.
- the training process of the angle measurement model is as follows: the information of the sample labeling frame, the sample rotation angle, the sample category, and the sample image from the rotating object pose labeling module 201 are input into the angle measurement model, and the training angle measurement model is used to determine the similar image pairs. ability to obtain the target angle measurement model with this ability. Among them, the objects in the same image pair have the same angle and category.
- the model inference stage is implemented by the cascade rotation object pose detection sub-module 2022 .
- “Cascade type” means that in the inference stage, it is necessary to cooperate with two-level models to realize the prediction of the rotation angle of the object to be measured.
- two stages represent two stages, namely stage 1: rotating object detection stage, and stage 2: angle measurement stage.
- stage 1 rotating object detection stage
- stage 2 angle measurement stage.
- the rotating object detection stage the position of the object to be measured in the image to be measured is regressed by the target rotating object detection model, and a regression position frame is obtained, and the regression position frame is used to represent the position of the object to be measured in the image to be measured.
- the angle measurement stage the rotation angle of the object to be measured is determined through the target angle measurement model and the regression position frame obtained in the rotation object detection stage.
- the category of the object to be measured can also be determined, which is used to determine the rotation angle of the object to be measured in the angle measurement stage.
- FIG. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
- the image processing device is used to realize the labeling of the sample labeling frame and the sample rotation angle, the training of the rotating object detection model, and the training of the angle measurement model, that is, the model training stage in the embodiment shown in FIG. 2b.
- the sample image processing apparatus 300 provided in this embodiment of the present application includes a labeling module 301 , a first training module 302 and a second training module 303 .
- the labeling module 301 corresponds to the rotating object pose labeling module 201 in the embodiment shown in FIG. 2b; the first training module 302 and the second training module 303 correspond to the cascading rotating object position in the embodiment shown in FIG. 2b Pose detection training sub-module 2021.
- the labeling module 301 is used to provide the user with the reference template image and the sample image.
- the reference template image includes the image of the reference object
- the sample image includes the image of the object to be marked.
- Reference key points of the reference object are marked in the reference template image, and the number of reference key points is greater than or equal to 2.
- the labeling module 301 is further configured to receive information of sample key points, where the information of sample key points is information obtained by the user marking objects to be labelled in the sample image based on the reference template image and the reference key points. And according to the information of the sample key points and the information of the reference key points, the rotation angle of the sample is determined.
- the sample rotation angle is the rotation angle of the object to be marked in the sample image relative to the reference object.
- the labeling module 301 is further configured to label the sample image according to the sample key points marked by the user, as well as the reference key points and the reference label frame, to obtain the sample label frame.
- the sample labeling frame represents the position of the object to be labelled in the sample image
- the reference labeling frame represents the position of the reference object in the reference template image.
- the reference object has a reference positional relationship between the image in the reference template image and the reference annotation frame
- the to-be-annotated object also has the reference positional relationship between the image in the sample image and the sample annotation frame. Therefore, there is a unified standard for the labeling of the sample labeling frame, which is the reference position relationship.
- the reference position relationship refer to the description of the embodiment shown in FIG. 5 .
- the labeling module 301 is further configured to transmit the above-mentioned sample labeling frame and sample image to the first training module 302 to train the initial rotating object detection model to obtain the target rotating object detection model, so that the target rotating object detection model has The ability of an object to perform position box regression.
- the labeling module 301 can also be used to receive the sample category marked by the user for the object category to be labeled in the sample image, and transmit the sample category to the first training module 302 to train the initial rotating object detection model to obtain the target rotation.
- the object detection model enables the target rotating object detection model to have the ability to classify objects in the image.
- the first training module 302 is used to train a rotating object detection model, so it can also be called a rotating object detection training module.
- the specific purposes of this module are as follows:
- the first training module 302 is used to perform position regression on the sample labeling frame in the sample image through the initial rotating object detection model, and train the initial rotating object detection model according to the sample regression position frame obtained by the regression and the sample labeling frame, and obtain Target rotating object detection model.
- the target rotating object detection model is used to determine the regression position frame
- the regression position frame is used to represent the position of the object to be measured in the image to be measured.
- the first training module 302 can also be used to classify the objects to be marked in the sample images by using the initial rotating object detection model, and perform iterative training on the initial rotating object detection model according to the classification result and the sample category, Get the target rotating object detection model.
- the target rotating object detection model here is used to classify the objects in the image to determine the predicted rotation angle according to the classification result.
- the labeling module 301 can also be used to transmit the above-mentioned sample image, sample rotation angle, sample labeling frame information and sample category to the second training module 303 to train the initial angle measurement model to obtain the target angle measurement model.
- the target angle measurement model is used to determine the predicted rotation angle of the object to be measured relative to the reference object in the image to be measured.
- the second training module 303 is used to train the angle measurement model, so it can also be called an angle measurement training module.
- the specific uses of this module are as follows:
- the second training module 303 is configured to intercept the sample image according to the sample labeling frame to obtain the intercepted sample image, and rotate the intercepted sample image according to the n first rotation angles to obtain n rotated sample images.
- the n first rotation angles are obtained according to the sample rotation angles.
- the n rotated sample images are input into the angle training gallery, and the homogeneous sample image pairs and the heterogeneous sample image pairs are determined in the angle training gallery.
- the sample images in the same sample image pair have the same category and angle
- the sample images in the heterogeneous sample image pair have different categories or angles.
- the initial angle measurement model is trained according to the same sample image pair and the heterogeneous sample image pair, and the target angle measurement model is obtained.
- the target angle measurement model is used to determine the predicted rotation angle, which is the predicted value of the rotation angle of the object to be measured relative to the reference object.
- the processing flow of the sample image processing apparatus 300 is described in detail, which is mainly divided into three stages: labeling the sample image, training the rotating object detection model, and training the angle measurement model.
- FIG. 4 is a schematic flowchart of an image processing method provided by an embodiment of the present application, and the process includes:
- the labeling module 301 acquires the sample image and the reference template image.
- the labeling module 301 may acquire a reference template image, and the reference template image includes a reference object.
- the reference template image serves as a measure of perspective for subsequent model training and prediction processes.
- an image includes an object, which specifically means that the image includes an image of the object.
- a reference template image includes a reference object, which means that the reference template image includes an image of the reference object.
- the object in the image means the image of the object included in the image.
- the reference template image may be acquired from a reference image library, and the reference template image may also be acquired in other ways, such as photographing and acquiring a reference object, which is not limited here.
- the reference image library includes multiple reference template images.
- the reference template images in the embodiments of the present application represent the reference template images obtained in step 401 .
- the reference object in the reference template image is used as the benchmark for measuring the rotation angle of the object, that is, the rotation angles are all angles relative to the reference object, so the rotation angle of the reference object is 0°.
- the annotation module 301 can also obtain sample images.
- the sample image includes objects to be labeled, and the objects to be labeled correspond to reference objects.
- Correspondence here means that the object to be marked and the reference object have the same category, and the orientation of the object to be marked and the orientation of the reference object may be the same or different.
- the sample image may be obtained from a sample image library, and the sample image may also be obtained in other ways, such as obtaining by photographing the object to be marked, which is not limited here.
- the labeling module 301 obtains information of reference key points.
- the labeling module 301 can also obtain information of reference key points of the reference object in the reference template image, and the information of the reference key points can be used as the identification of the reference template image and the reference object as a basis for comparison between the reference template image and the sample image.
- the information of the reference key points can also be used for other purposes, such as determining the relationship between the reference object and the object to be labeled. For example, the relationship can be the angle between the two etc., which are not limited here.
- the labeling module 301 may provide the user with the reference template image, and receive information about the reference key points marked by the user on the reference object in the reference template image.
- the annotation module 301 can also obtain the information of the reference key points in other ways, for example, the sample image processing apparatus 300 annotates and obtains the reference template image; or, in step 401, the reference image library can also obtain the reference When the template image is acquired at the same time, reference key points are already marked in the reference template image here; it is not limited here.
- the reference key points may be two points that are farthest from each other on the image of the reference object in the reference image.
- the reference key points can also be other points, for example, two points defined by the user; or two points with obviously different characteristics from other points in the image of the reference object, etc., which are not limited here.
- the number of reference key points may be any integer greater than 2, such as 3 or 4, in addition to 2, which is not limited here.
- the embodiments of the present application only take two reference key points as an example, and do not limit the number of reference key points.
- the labeling module 301 provides the user with a sample image and a reference template image marked with reference key points.
- the annotation module 301 provides sample images and reference template images to the user. Among them, two reference key points of the reference object are marked in the reference template image, and the reference template image is used as a reference and basis for the user to mark the key points.
- the labeling module 301 receives the information of the sample key points labelled by the user.
- the user can mark the sample key points of the object to be marked in the sample image according to the position of the image of the reference object in the reference template image and the two reference key points.
- the user's labeling process for the sample key points is as follows: since the reference object and the object to be labelled have the same category, the two have similar or the same shape. That is, the image of the reference object in the reference template image and the image of the object to be annotated in the sample image have similar or the same shape.
- the reference key point is a point on the shape in the reference template image, and the user can match the position of the sample key point in the sample image according to the positional relationship between the reference key point and the shape in the reference template image. That is, the user can match the corresponding sample key points in the sample image according to the positions of the reference key points in the reference template image, so as to realize the labeling of the sample key points.
- FIG. 5a is a schematic diagram of a labeling method provided by an embodiment of the present application.
- the user can mark the corresponding sample key points K1' and K2' in the sample image according to the two reference key points K1 and K2 in the reference template image for the image of the object to be marked.
- the sample key point K1' corresponds to the reference key point K1
- the sample key point K2' corresponds to the reference key point K2.
- the object to be labeled is also called the object to be measured, and the process of the user determining the key points of the sample is also called the labeling of the training sample.
- the labeling module 301 determines the sample rotation angle according to the information of the sample key points and the information of the reference key points.
- the labeling module 301 may determine the rotation angle of the object to be labelled relative to the reference object according to the information of the reference key point and the information of the sample key point.
- the rotation angle is also referred to as the sample rotation angle.
- the sample rotation angle can be determined by key lines.
- the key lines include reference key lines, sample key lines, and prediction key lines.
- the prediction key lines will appear during the inference process and will not be explained in detail here.
- the line connecting the two reference key points is called the reference key line
- the line connecting the two sample key points is called the sample key line.
- the rotation direction of the object to be marked relative to the reference object can also be determined according to the angle between the two key lines, that is, the rotation direction of the sample key line relative to the reference key line. Clockwise or counterclockwise.
- the reference key point K1 in the reference template image can be made to coincide with the sample key point K1' in the sample image to obtain the included angle a between the two key lines, and the size of the included angle a is a degrees, the direction of the included angle a is the counterclockwise direction. Therefore, it can be determined that the object to be marked is rotated counterclockwise by a degree relative to the reference object. At this time, the sample rotation angle is a degree, and the direction is counterclockwise.
- the labeling module 301 obtains the reference labeling frame of the reference object in the reference template image, and the reference labeling frame is used to indicate the position of the reference object in the reference template image, that is, the position of the image of the reference object in the reference template image.
- the labeling module 301 provides the reference template image to the user, and receives the reference labeling frame that the user marks on the reference object in the reference template image.
- the annotation module 301 can obtain the reference annotation frame in other ways, for example, the sample image processing apparatus 300 annotates and obtains the reference template image; or, in step 401, it can also obtain the reference template image from the reference image library. At the same time, the reference frame is already marked in the reference template image here; it is not limited here.
- the reference frame may be a rectangular frame, and the reference frame has a horizontal edge.
- the reference frame in addition to the rectangular frame, can also be in other shapes, such as a triangular frame, a trapezoid frame, etc., which are not limited here; in addition to the horizontal side, the reference frame can also be There is a certain included angle between the horizontal sides, for example, an included angle of 90° or 10°, which is not limited here.
- step 402 the sample image processing apparatus 300 acquires information of reference key points, and in step 406, the process of acquiring a reference annotation frame is also referred to as reference object annotation.
- the labeling module 301 determines the sample labeling frame according to the reference labeling frame.
- the labeling module 301 can determine the sample labeling frame in the sample image according to the reference labeling frame.
- the labeling module can determine the sample labeling frame according to the reference labeling frame, reference key point information and sample key point information, and the process is as follows:
- the reference positional relationship represents the positional relationship between the reference key point and the reference frame.
- the labeling module 301 can label the sample labeling frame at the place having the reference position relationship with the sample key point according to the reference position relationship and the sample key point. There is a reference position relationship between the sample key points and the sample annotation frame.
- the reference frame has point A and point C
- point A and reference key point K1 have a relative position relationship 1
- point A and reference key point K2 have a relative position relationship 2, which can be determined according to the sample key point.
- K1', sample key point K2', relative position relationship 1 and relative position relationship 2 determine point A'.
- other points B', C', and D' of the sample annotation frame can also be determined according to other sample key points and the relative positional relationship corresponding to the sample key points, which will not be repeated here.
- key points A, B, A', B', etc. appearing in the embodiments of the present application are all examples of reference key points or sample key points, and do not limit the aforementioned key points.
- the relative positional relationship 1 and the relative positional relationship 2 belong to the relative positional relationship.
- the relative positional relationship represents the positional relationship between the reference key point and the point on the reference frame, and also reflects the positional relationship between the sample key point corresponding to the reference key point and the point on the sample frame.
- the relative position relationship can be a vector, and in addition to the vector, it can also be other unknown relationships, such as distance, angle, coordinate system, etc., which are not limited here.
- the sample annotation frame in addition to determining the position of each key point of each sample, can also be determined by other methods, such as matching the sample annotation frame according to the reference position relationship, the rotation angle of the sample and the position of the key point of the sample, etc. There are no restrictions.
- steps 406 and 407 may also be performed before step 405, as long as they are performed after step 404, which is not limited here.
- the process of determining the sample rotation angle in step 405 is also referred to as adaptive attitude annotation
- the process of determining the sample annotation frame in step 407 is also referred to as adaptive position annotation; therefore, steps 405 and 407 can be collectively referred to as Annotate the adaptive pose.
- the labeling module 301 acquires the sample category.
- the labeling module 301 may also obtain the category of the object to be labeled in the sample image.
- the category of the object to be labeled is also referred to as a sample category.
- the labeling module 301 may receive the sample category labelled by the user for the object to be labelled in the sample image.
- the annotation module 301 can also obtain the sample category in other ways. For example, in step 401, when the sample image is obtained from the sample image library, the sample image is already marked with the sample category; limited.
- the labeling module 301 sends the information of the sample image and the sample labeling frame to the first training module 302.
- the labeling module 301 may send the sample image and the information of the sample labeling frame to the first training module 302 for training the rotating object detection model.
- step 409 may also be performed before step 408, as long as it is performed after step 407, which is not limited here.
- the labeling module 301 may also send the sample category to the first training module 302 for training the rotating object detection model.
- the labeling module 301 sends the information of the sample image, the sample category, the sample rotation angle and the sample labeling frame to the second training module 303.
- the annotation module 301 can send the information of the sample image, the sample category, the sample rotation angle and the sample annotation frame to the second training module 303, Used to train the angular metric model.
- FIG. 5b is a schematic diagram of a labeling method provided by an embodiment of the present application.
- the user annotates the object category, key points and rectangular frame for the reference object in the reference template image.
- the object category may include positive information and negative information of the object.
- the object category may also include other information, such as material, purpose, destination and other information, which is not limited here.
- the labeling of the rectangular frame may include labeling of the vertices of the rectangular frame.
- the rectangular frame here is also referred to as a reference frame.
- the shape and position features of the reference position frame are as described in step 406 of the embodiment shown in FIG. 4 , and details are not repeated here.
- the labeling of the reference object in the reference template image can be performed by other subjects besides the user, for example, by the labeling module 301, as long as the labeling module 301 can obtain the labeling result, which is not done here. limited.
- Stage 2 Annotation of sample key points for a large number of sample images.
- the labeling result of the reference template image is used for sample labeling
- the labeling module 301 provides the reference template image and the sample image to the user, and the user selects the corresponding reference template image according to the sample image.
- This process is called template selection.
- the user determines and annotates the object category and key points of the sample object in the sample image according to the reference template image. In this embodiment, this process is also referred to as sample labeling.
- stage 3 the sample position frame is automatically acquired.
- the labeling module 301 may determine the sample location frame according to the reference label frame and reference key points of the reference object in the reference template image, and the labelled sample key points.
- the sample location frame is also referred to as a sample labeling frame.
- this process refer to step 407 in the embodiment shown in FIG. 4 , and details are not repeated here. In this embodiment, this process is also referred to as position box generation.
- labeling, training, and reasoning of pictures of different scales can also be implemented.
- the size ratio of the sample image relative to the reference template image can be determined according to the information of the sample key points and the information of the reference key points.
- the sample image is scaled according to the scale to obtain a sample image of the same scale as the reference template image, and then the scaled sample image is used to determine the annotation frame, train the model, and so on.
- the reference template image may be scaled, and labeling or training may be performed according to the scaled reference template image, which is not limited here.
- the image to be tested can be zoomed, and the inference can be performed on the zoomed image to be tested.
- a multi-scale model can also be used to infer the image to be tested, etc., which is not limited here.
- the multi-scale model is a model trained according to the scaled sample image or the scaled reference template image.
- the first training module 302 can train the rotating object detection model according to the received sample image and the information of the sample annotation frame. Next, the training phase of the rotating object detection model is described.
- FIG. 6 is a schematic flowchart of an image processing method provided by an embodiment of the present application. Based on the image processing apparatus shown in FIG. 3, an image processing method provided by an embodiment of the present application includes:
- the first training module 302 receives the information of the sample image and the sample labeling frame from the labeling module 301.
- the first training module 302 performs position regression on the object to be marked in the sample image to obtain a sample regression position frame.
- FIG. 7 is a schematic diagram of a training method of a rotating object detection model provided by an embodiment of the present application.
- the first training module 302 trains the rotating object detection model according to the information of the sample regression position frame, the information of the sample label frame and the position regression loss function.
- the first training module 302 can iteratively train the initial rotating object detection model according to the information of the sample regression position frame, the information of the sample label frame and the position regression loss function, until the preset conditions are met, and the target rotating object detection model is obtained.
- the error value between the sample regression position frame and the sample label frame can be determined by the position regression loss function, and the preset condition can be that the error value is smaller than a certain threshold.
- the error value may be the position regression loss Lreg. It is worth noting that Lreg is the symbol of position regression loss, which is only an example of position regression loss, and does not limit the position regression loss.
- other information can also be determined through the position regression loss function, and the conditions corresponding to the information can be used as preset conditions, for example, the number of times of iterative training of the rotating object detection model reaches a certain threshold, etc., which is not limited here.
- the ability of the target rotating object detection model to label the return position box for the object in the image can be trained.
- the ability of the target rotating object detection model to classify objects in the image can also be trained.
- the specific training process is as follows:
- the first training module 302 receives the sample category from the labeling module 301.
- the marking module 301 may transmit the category to the first training module 302 .
- step 604 may also be performed before steps 601, 602 or 603, which is not limited here.
- the first training module 302 classifies the object to be labeled to obtain a predicted sample category.
- the first training module 302 can input the sample image into the initial rotating object detection model, classify the objects to be marked in the sample image through the initial rotating object detection model, and obtain and output the predicted sample category.
- the first training module 302 trains the rotating object detection model according to the predicted sample category, the sample category and the classification loss function.
- the first training module 302 can iteratively train the initial rotating object detection model according to the predicted sample category, the sample category and the classification loss function, until the preset conditions are met, and the target rotating object detection model is obtained.
- the error value between the predicted sample category and the sample category can be determined through the classification loss function, and the preset condition can be that the error value is smaller than a certain threshold. As shown in Figure 7, this error value can be the classification loss Lcls. It is worth noting that Lcls is the symbol of classification loss, which is only an example of classification loss, and does not cause a limitation on classification loss.
- other information can also be determined through the classification loss function, and the conditions corresponding to the information can be used as preset conditions, for example, the number of times of iterative training of the rotating object detection model reaches a certain threshold, etc., which is not limited here. .
- the ability of the target rotating object detection model to classify objects in the image can be trained.
- steps 605 and 606 can also be performed before steps 601 , 602 or 603 , as long as they are performed after step 604 .
- steps 604-606 are optional.
- the ability of the rotating object detection model to classify objects in the image may not be trained. That is, steps 604 , 605 and 606 may not be included in the method shown in FIG. 6 .
- the target rotating object detection model trained by the embodiment shown in FIG. 6 can be used to determine the regression position frame of the object in the image.
- the regression position frame is determined by matching the reference position relationship according to the reference template image. That is, for objects to be labeled of the same category, the regression position frame determined by the target rotating object detection model has a unified standard, which is the reference position relationship. Therefore, subsequent operations based on the regression position frame are based on the unified standard of the reference position relationship. Since the reference position relationship is determined according to the reference object in the reference template image, in the final analysis, subsequent operations based on the regression position frame are all based on the unified standard of the reference object. Therefore, the rotation angle predicted according to the regression position frame is also determined based on the unified benchmark of the reference object, and the predicted result is more accurate.
- the second training module 303 can train the angle measurement model according to the received sample images and other information, and the training phase of the angle measurement model is described next.
- FIG. 8 is a schematic flowchart of an image processing method provided by an embodiment of the present application. Based on the image processing apparatus shown in FIG. 3, an image processing method provided by an embodiment of the present application includes:
- the second training module 303 receives the information of the sample image, the sample category, the sample rotation angle and the sample annotation frame from the labeling module 301.
- the second training module 303 intercepts the sample image according to the sample labeling frame.
- the second training module 303 can intercept the image inside the sample labeling frame in the sample image according to the position of the sample labeling frame in the sample image to obtain the intercepted sample image.
- the process of intercepting a sample image is also referred to as matting.
- the second training module 303 determines four first rotation angles according to the rotation angles of the samples.
- the second training module 303 can determine four first rotation angles according to the sample rotation angles.
- FIG. 9 is a schematic diagram of a training method of an angle measurement model provided by an embodiment of the present application. As shown in FIG. 9 , if the rotation angle of the sample is a degree, the first rotation angle can be determined to be -a degree, -a-90 degree, -a-180 degree and -a-270 degree.
- the number of first rotation angles is n, and n may also be other integers other than 4, such as 5, 8, etc., as long as n is greater than or equal to 2, which is not limited here.
- the relationship between the first rotation angle and the sample rotation angle may have a certain difference.
- the difference is the product of x and 90°, where x is any one from 0 to n-1 Integer.
- the magnitude of the difference may also follow other rules, such as the product of y and a certain angle, where y is any integer, or the difference is any angle, which is not limited here.
- step 803 may also be performed before step 802, as long as it is performed after step 801, which is not limited here.
- the second training module 303 rotates and intercepts the sample images to obtain 4 rotated sample images.
- the second training module 303 rotates the intercepted sample images according to the four first rotation angles to obtain four rotated sample images. For example, as shown in Fig. 9, when the first rotation angle is -a degree, -a-90 degree, -a-180 degree, and -a-270 degree, the four rotated sample images obtained by rotation all have horizontal border. In the embodiment of the present application, corresponding to the aforementioned first rotation angle, the number of rotated sample images is also n, which is not repeated here.
- step 803 can be implemented by other modules besides the second training module 303, for example, it can be implemented by the labeling module 301 or other modules, as long as the second training module 303 may acquire n first rotation angles, which is not limited here.
- the second training module 303 inputs the four rotated sample images into the angle training gallery.
- the second training module 303 may input the four rotated sample images into the angle training gallery.
- the angle training gallery may also include rotated sample images obtained by rotating other sample images.
- the objects to be marked included in other sample images and the objects to be marked included in the sample images in this embodiment may have the same category or different categories; may have the same orientation, or may have Different orientation points are not limited here.
- the illustrated angle training gallery is also called the four-angle training gallery.
- the name of the gallery can also be changed accordingly, which is not limited here. .
- the second training module 303 determines the same sample image pair and the heterogeneous sample image pair in the angle training gallery.
- the second training module 303 may determine the same sample image pair and the heterogeneous sample image pair in the angle training gallery according to the category of the sample.
- the objects in the same sample image pair have the same category and the same angle; the objects in the heterogeneous sample image pair have different categories or different angles.
- the second training module 303 trains the initial angle measurement model according to the same sample image pair and the heterogeneous sample image pair to obtain the target angle measurement model.
- the second training module 303 inputs the same sample image pair and the heterogeneous sample image pair into the initial angle measurement model, encodes the same sample image pair and the heterogeneous sample image pair through the initial angle measurement model, and obtains the encoding result of the image in the similar sample image pair.
- the distance D same between , and the distance D diff between the encoding results of images in heterogeneous sample image pairs, the training goal is to make D same less than D diff .
- the dotted box shown in FIG. 9 includes a plurality of graphs, and the plurality of graphs represent the encoding results of the images in the plurality of image pairs.
- the shape of the graph represents the category, and the line thickness of the graph represents the angle.
- the distance between the encoding results of images of the same category and the same angle is smaller than the distance between the encoding results of images of different categories or different angles. The distance between the encoding results of the same angle is small.
- the second training module 303 performs iterative training on the initial angle measurement model according to D same , D diff and the distance loss function until the preset conditions are met, and the target angle measurement model is obtained.
- the second training module 303 can use the distance loss function to determine the error value of the encoding result of the same sample image pair and the heterogeneous sample image pair, and the preset condition can be that the error value is smaller than a certain threshold.
- the encoding result is also referred to as a feature obtained by encoding, or referred to as a feature.
- the condition corresponding to the information can be used as a preset condition.
- the preset condition can also be that the number of times of iterative training of the angle measurement model reaches a certain threshold, etc. , which is not limited here.
- the feature distances obtained by encoding images of the same category and the same angle are short, and the feature distances obtained by encoding images of different angles or different categories are far away. Therefore, it can be determined by comparing the distance between two features whether they belong to the same category and the same angle, that is, belong to the same image pair.
- the ability of the target angle metric model to determine the same image pair can be trained. Objects in homogeneous image pairs have the same class and angle.
- the processing of the image by the target angle metric model does not depend on the category, and its generalization ability is stronger than that of the traditional classification model.
- the target rotating object detection model and the target angle measurement model trained by the embodiments shown in FIGS. 4 , 6 and 8 are also used to infer the position and posture of the objects in the image, and then implement the inference.
- the functions of the image processing device are described.
- FIG. 10 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
- a target rotating object detection model and a target angle measurement model are used in the device.
- the image processing apparatus 1000 provided in this embodiment of the present application includes a rotating object detection module 1001 and an angle measurement module 1002 .
- the rotating object detection module 1001 is used to determine the regression position frame of the object to be measured in the image to be measured by using the target rotating object detection model, and transmit the regression position frame to the angle measurement module 1002 .
- the angle measurement module 1002 is used to intercept the image to be measured according to the regression position frame from the rotating object detection module 1001 to obtain the intercepted image; and determine m second rotation angles according to the regression position frame, and rotate according to the m second rotation angles Take an image to get m rotated images.
- the target image among the m rotated images is then determined.
- the target image and the reference template image are the same image pair.
- the predicted rotation angle of the object to be measured relative to the reference object is determined.
- m is an integer greater than or equal to 2.
- the rotating object detection module 1001 is further configured to determine the type of the object to be measured in the image to be measured by using the target rotating object detection model, and transmit the type of the object to be measured to the angle measurement module 1002 .
- the angle measurement module 1002 is further configured to determine a reference template image of the same category as the object to be detected according to the category of the object to be detected from the rotating object detection module 1001, and to determine a pair of images of the same type according to the reference template image.
- the processing flow of the image to be measured by the image processing apparatus 1000 is described in detail, that is, the inference process of the pose of the object to be measured in the image to be measured.
- FIG. 11 is a schematic flowchart of an image processing method provided by an embodiment of the present application. Based on the image processing apparatus shown in FIG. 10, an image processing method provided by an embodiment of the present application includes:
- the rotating object detection module 1001 acquires an image to be measured.
- the rotating object detection module 1001 performs position regression on the object to be measured in the image to be measured to obtain a regression position frame.
- the rotating object detection module 1001 can perform position regression on the image to be measured through the target rotating object detection model to obtain a regression position frame, where the regression position frame represents the position of the object to be measured in the image to be measured.
- the rotating object detection module 1001 transmits the information of the image to be measured and the regression position frame to the angle measurement module 1002 .
- the angle measurement module 1002 determines the frame rotation angle according to the information of the regression position frame.
- the angle measurement module 1002 can determine the angle of the angle between the edge of the return position box and the horizontal edge.
- the angle is called the frame rotation angle.
- the frame rotation angle is the angle between any side of the return position frame and the angle between the horizontal side or the vertical side.
- the edge of the regression position frame used to determine the rotation angle of the frame may be randomly selected or determined by other methods, such as selecting the longer side of the regression position frame, etc., which is not limited here.
- the horizontal side refers to the side in the horizontal direction
- the vertical side refers to the side in the vertical direction.
- Both the horizontal edge and the vertical edge belong to the reference edge, and the reference edge is used to measure the frame rotation angle.
- the reference edge may also have other orientation features, such as forming a certain angle with the horizontal edge, which is not limited here.
- the angle measurement module 1002 determines four second rotation angles according to the frame rotation angle.
- the angle measurement module 1002 may determine four second rotation angles according to the frame rotation angle.
- FIG. 12 is a schematic diagram of a method of a model reasoning process provided by an embodiment of the present application.
- the frame rotation angle is a degree
- it can be determined that the second rotation angle is -a degree, -a-90 degree, -a-180 degree and -a-270 degree.
- the number of second rotation angles is m, and m may also be an integer greater than or equal to 2 except 4, such as 5, 8, etc., which is not limited here.
- the relationship between the second rotation angle and the frame rotation angle may have a certain difference.
- the difference is the product of x and 90°, where x is any integer from 0 to 3.
- the magnitude of the difference may also follow other rules, such as the product of y and a certain angle, where y is any integer, or the difference is any angle, which is not limited here.
- the action of determining the frame rotation angle in step 1104, or the action of determining m second rotation angles in step 1105 can be realized by other modules besides the angle measurement module 1002, for example, by the rotating object detection module 1001 or other modules, as long as the angle measurement module 1003 can obtain m second rotation angles, which is not limited here.
- the angle measurement module 1002 intercepts the image to be measured according to the regression position frame to obtain the intercepted image.
- the angle measurement module 1002 can intercept the image inside the regression position frame in the image to be measured according to the position of the regression position frame in the image to be measured to obtain the intercepted image.
- step 1106 may also be performed before step 1104 or step 1105, as long as it is performed after step 1103, which is not limited here.
- the angle measurement module 1002 rotates the intercepted images according to the four second rotation angles to obtain four rotated images.
- the angle measurement module 1002 rotates the intercepted image according to the four second rotation angles to obtain four rotated images.
- the second rotation angle is -a degree, -a-90 degree, -a-180 degree and -a-270 degree
- the four rotated images obtained by rotation all have horizontal borders.
- the number of rotated images is also m, which is not repeated here.
- the angle measurement module 1002 determines the target image in the four rotated images through the target angle measurement model.
- the angle measurement module 1002 can construct an image pair by using each of the 4 rotated images and the images in the template image library. Specifically, one image in the image pair is any one of the four rotated images; the other image in the image pair is an image in the template image library.
- the template image library includes multiple reference template images, and objects in the multiple reference templates may have different categories.
- the target angle measurement model has been trained to determine the ability of image pairs of the same type. Therefore, similar image pairs can be determined from the constructed image pairs through the target angle measurement model. Since the two objects included in the two images in the same image pair have the same category and angle, the images of the same category and the same angle as the reference template image can be determined in the four rotated images. In the embodiment of the present application, the two images are of the same category and the same angle, indicating that the two objects included in the two images have the same category and angle. The image with the same class and angle as the reference template image is called the target image.
- the angle measurement module 1002 determines the predicted rotation angle according to the target image.
- the angle measurement module 1002 determines the target image in the four rotated images, it can determine the angle at which the target image is obtained by rotating the intercepted image, that is, the second rotation angle corresponding to the target image can be determined, and the rotation angle is to predict the rotation angle.
- the example shown in FIG. 4 includes step 408 and the example shown in FIG. 6 includes steps 604 to 606, that is, under the condition that the ability of the rotating object detection model to classify objects in the image is trained
- the example shown in FIG. 6 may also determine the predicted rotation angle according to the category of the object to be detected. Specifically include the following steps:
- the rotating object detection module 1001 performs category prediction on the object to be tested in the image to be tested to obtain the predicted category.
- the target rotating object detection model has the ability to classify objects in the image.
- the rotating object detection model 1001 can use the target rotating object detection model to classify the objects to be measured in the image to be measured, and determine the type of the object to be measured.
- the category of the object to be measured is also referred to as a predicted category.
- the rotating object detection module 1001 transmits the predicted category to the angle measurement module 1002.
- the angle measurement module 1002 determines a reference template image according to the predicted category.
- the angle measurement module 1002 may determine a reference template image including a reference object of the same category as the object to be measured from the template image library according to the predicted category.
- Steps 1110 to 1112 may be performed before any of steps 1102 to 1107, as long as they are performed after step 1101, which is not limited here.
- the determination of the target image in step 1108 can be performed in the following ways:
- An image pair is constructed with the reference template image determined in step 1112 through the four rotated images respectively.
- the image pairs of the same type are determined among the constructed image pairs.
- the reference template image is determined according to the category of the object to be measured, and the one to be used can be determined from a plurality of reference template images in the template image library, which can greatly reduce the number of image pairs constructed by the angle measurement module 1002, thereby reducing the number of pairs of operations. Consumption of resources and storage resources. The number of image pairs is reduced, and the time and computing power required by the target angle measurement module to determine similar image pairs can also be greatly reduced, thereby improving efficiency.
- step 1108 can also realize the determination of the target image.
- the embodiments of the present application provide an image processing method and an image processing device, which are used to mark sample key points of objects to be marked in a sample image, and the sample key points are used to train a key point detection model to detect target key points obtained through training.
- the model realizes the determination of key points in the image, and determines the rotation angle of the object in the image according to the determined key points.
- FIG. 13 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
- the image processing apparatus 1300 provided in this embodiment of the present application includes a key point labeling module 1301 and a third training module 1302 .
- the key point labeling module 1301 is configured to provide a user with a reference template image and a sample image, where the reference template image includes the reference object, and the sample image includes the object to be labelled. Reference key points of the reference object are marked in the reference template image, and the number of reference key points is greater than or equal to 2.
- an image includes an object, which means that the image includes an image of the object.
- the reference template image includes a reference object, which means that the reference template image includes an image of the reference object.
- the object in the image represents the image of the object contained in the image.
- the key point labeling module 1301 is further configured to receive the user based on the reference template image and the reference key points, and the information of the sample key points labelled on the object to be labelled in the sample image. And according to the information of the sample key points and the information of the reference key points, the rotation angle of the sample is determined.
- the sample rotation angle is the rotation angle of the object to be marked in the sample image relative to the reference object.
- the key point labeling module 1301 is further configured to receive the sample key point category marked by the user on the category of the object to be marked in the sample image. And the above-mentioned sample image, sample key point information and sample key point category are transmitted to the third training module 1302 to train the initial key point detection model to obtain the target key point detection model.
- the category of the sample key point represents the category of the sample object corresponding to the sample key point in the sample image including the key point.
- the third training module 1302 is configured to train the initial key point detection model according to the information of the sample key points and the sample image, and obtain the target key point detection model. Therefore, in this embodiment of the present application, the third training module 1302 may also be referred to as a keypoint detection training module.
- FIG. 14 is a schematic flowchart of an image processing method provided by an embodiment of the present application. Based on the image processing apparatus shown in FIG. 13, an image processing method provided by an embodiment of the present application includes:
- the key point labeling module 1301 acquires a sample image and a reference template image.
- the key point labeling module 1301 obtains information of reference key points.
- the key point labeling module 1301 provides the user with a sample image and a reference template image marked with reference key points.
- the key point labeling module 1301 receives the information of the sample key points labelled by the user.
- Steps 1401 to 1404 refer to steps 401 to 404 in the embodiment in FIG. 4 , the difference is that the execution body of the action is changed from the annotation module 301 in the example of FIG. 4 to the key point annotation module 1301 in the embodiment in FIG. 14 . It will not be repeated here.
- the key point labeling module 1301 transmits the information of the sample image and the sample key points to the third training module 1302 .
- the third training module 1302 performs key point detection on the object to be marked in the sample image to obtain the predicted sample key point.
- the third training module 1302 After the third training module 1302 obtains the information of the sample image and the sample key points, it can perform position regression on the points of the sample object in the sample image through the initial key point detection model to obtain the regression sample key points.
- the keypoint labeling module 1301 acquires the sample keypoint category.
- the keypoint labeling module 1301 can also obtain the category of sample keypoints in the sample image.
- the category of the sample key point is also called the category of the sample key point, which represents the category of the sample object corresponding to the sample key point in the sample image including the sample key point.
- the sample key point category can be obtained by the user marking the sample key points in the sample image, and the sample key point category can also be obtained through other means, for example, obtained from the sample image library when the sample image is obtained in step 1401, and so on. There are no restrictions.
- the keypoint labeling module 1301 transmits the sample keypoint category to the third training module 1302 .
- steps 1407 and 1408 may be performed before any of steps 1402 to 1406, as long as they are performed after step 1401, which is not limited here.
- the third training module 1302 classifies the sample key points to obtain the predicted sample key point category.
- the third training module 1302 inputs the sample image and the sample keypoint category into the initial keypoint detection model, performs category prediction on the sample keypoints in the sample image through the initial keypoint detection model, and obtains and outputs the predicted sample keypoint category.
- step 1409 may be performed before any of steps 1405 to 1408, as long as it is performed after step 1404, which is not limited here.
- the third training module 1302 trains the initial key point detection model to obtain the target key point detection model.
- the third training module 1302 can iteratively train the initial keypoint detection model according to the information of the regression sample keypoints, the information of the sample keypoints, and the keypoint position regression loss function.
- the third training module 1302 can also perform iterative training on the initial keypoint detection model according to the predicted sample keypoint category, the sample keypoint category and the keypoint classification loss function.
- the target key point detection model can accurately determine the key points of objects in the image, and determine the category of the object corresponding to the key points.
- the target key point detection model trained by the embodiment shown in FIG. 14 is specifically used to infer the posture of the objects in the image. Next, the image processing apparatus that realizes the inference function will be described.
- FIG. 15 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
- the target key point detection model obtained by training the process shown in FIG. 14, applied to this device.
- the image processing apparatus 1500 provided by this embodiment of the present application includes a key point detection module 1501 and an angle calculation module 1502.
- the key point detection module 1501 is used to determine the predicted key points of the object to be measured in the image to be measured through the target key point detection model, and transmit the information of the predicted key points to the angle calculation module 1502 .
- the angle calculation module 1502 is configured to determine the rotation angle of the object to be measured relative to the reference object according to the predicted key point information from the key point detection module 1501 . In this embodiment of the present application, this angle is also referred to as a predicted rotation angle.
- the keypoint detection module 1501 is further configured to determine the predicted keypoint category through the target keypoint detection model, and transmit the predicted keypoint category to the angle calculation module 1502 .
- the angle calculation module 1502 is further configured to, according to the predicted keypoint category from the keypoint detection module 1501, determine a reference template image of the same category as the object to be detected, and determine the predicted rotation angle according to the reference template image.
- the predicted key point category represents the category of the object to be tested corresponding to the predicted key point in the image to be tested including the predicted key point.
- FIG. 16 is a schematic flowchart of an image processing method provided by an embodiment of the present application. Based on the image processing apparatus shown in FIG. 15 , an embodiment of the present application provides an image processing method An image processing method, the flow of which includes:
- the key point detection module 1501 acquires the image to be tested.
- the key point detection module 1501 determines the predicted key points in the image to be tested.
- the key point detection module 1501 can perform position regression on the point of the object to be measured in the image to be measured through the target key point detection model to obtain the predicted key point.
- the key point detection module 1501 transmits the information of the predicted key point to the angle calculation module 1502.
- the keypoint detection module 1501 determines the predicted keypoint category.
- the target key point detection model has the ability to classify key points.
- the prediction key point is determined.
- the key point detection model 1501 can determine the prediction key point category through the target key point detection model.
- the predicted keypoint category represents the category of the object to be detected in the image to be detected.
- the keypoint detection module 1501 transmits the predicted keypoint category to the angle calculation module 1502.
- Step 1604 and step 1605 may be performed simultaneously with step 1602, or may be performed before or after step 1603, which is not limited here.
- the angle calculation module 1502 determines the predicted rotation angle.
- the predicted key point category is obtained in step 1604, and the angle calculation module 1502 can determine the reference key point with this category according to the predicted key point category, or, according to the predicted key point category, determine the reference object with this category or include the reference object.
- the reference template image of the object is obtained in step 1604, and the angle calculation module 1502 can determine the reference key point with this category according to the predicted key point category, or, according to the predicted key point category, determine the reference object with this category or include the reference object.
- the reference template image of the object is obtained in step 1604, and the angle calculation module 1502 can determine the reference key point with this category according to the predicted key point category, or, according to the predicted key point category, determine the reference object with this category or include the reference object.
- the angle calculation module 1502 determines the angle between the predicted key line and the reference key line.
- the reference key line here is the connection of the reference key points determined according to the predicted key points.
- the predicted keypoint and the reference keypoint have the same category, indicating that the object to be tested has the same category as the reference object.
- the direction of the predicted key line is compared with the direction of the reference key line, and the angle between the predicted key line and the reference key line is obtained.
- the angle reflects the rotation of the object to be measured in the predicted image relative to the reference object. angle. In this embodiment of the present application, this angle is also referred to as a predicted rotation angle.
- the predicted rotation angle of the object to be measured relative to the reference object in the image to be measured is determined by determining the key points of prediction, without determining the position frame of the object to be measured, the process of determining the predicted rotation angle is more concise, and the process of realizing the The structure of the device is simpler, and at the same time, resources such as computation and storage required for determining the position frame and operating the position frame can be saved.
- the posture prediction of the three-dimensional object can also be realized by using a two-dimensional image.
- a two-dimensional image reflects the projection of a three-dimensional object on a two-dimensional plane, so it can truly reflect the posture of a three-dimensional object.
- the feature shape can be determined by the information of the key points in the two-dimensional image.
- the reference shape is determined according to the information of the reference key point corresponding to the key point. And according to the shape difference between the characteristic shape and the reference shape, the rotation angle of the three-dimensional object presented by the two-dimensional image relative to the reference object is determined, so that the posture of the three-dimensional object presented by the two-dimensional image can be determined.
- the reference key point is the point corresponding to the key point in the projection of the reference object on the two-dimensional plane.
- the model can be trained to determine the ability of the rotation angle between the three-dimensional objects corresponding to the two-dimensional images according to the shape difference between the corresponding two-dimensional images, so that the target model obtained by training can be determined by the two-dimensional images. 3D rotation angle function.
- the feature shape corresponds to the reference shape.
- the feature shape consists of n sample key points or n prediction key points, and the reference shape also consists of n reference key points, where n is an integer greater than or equal to 2.
- FIG. 17 is a schematic structural diagram of an image processing apparatus provided in an embodiment of the present application.
- a structure of the sample image processing apparatus includes: an interaction unit 1701 and processing unit 1702.
- the interaction unit 1701 is used to provide the user with a sample image and a reference template image, the sample image includes the object to be marked, the reference template image includes a reference object corresponding to the object to be marked, and the rotation angle of the reference object relative to the reference template image is zero , the reference key points of the reference object are marked in the reference template image, and the number of reference key points is greater than or equal to 2.
- the interaction unit 1701 is further configured to receive the information of the sample key points, where the information of the sample key points is obtained by the user marking the object to be marked in the sample image based on the reference template image.
- the processing unit 1702 is configured to determine the sample rotation angle according to the information of the sample key points of the object to be marked in the sample image and the information of the reference key points of the reference object in the reference template image, and the sample rotation angle is the rotation angle of the object to be marked relative to the reference object. Rotation angle.
- the number of reference key points and sample key points are both 2
- the processing unit 1702 is specifically configured to: determine the sample rotation angle according to the angle between the reference key line and the sample key line,
- the reference key line is the line connecting two reference key points
- the sample key line is the line connecting two sample key points.
- the image processing apparatus further includes an acquisition unit 1703, and the acquisition unit 1703 is configured to: acquire a reference frame of the reference template image, where the reference frame represents the position of the reference object in the reference template image.
- the processing unit 1702 is further configured to: determine a sample labeling frame according to the reference labeling frame, where the sample labeling frame represents the position of the object to be labelled in the sample image.
- the processing unit 1702 is specifically configured to: determine the sample annotation frame according to the reference annotation frame, the information of the reference key points, and the information of the sample key points, and there is a reference between the reference key point and the reference annotation frame. Positional relationship, there is a reference positional relationship between the sample key points and the sample annotation frame.
- the processing unit 1702 is further configured to: input the information of the sample image and the sample annotation frame into the initial rotating object detection model, so as to perform position regression of the object to be marked through the initial rotating object detection model to obtain the sample. Returns the information for the position box.
- the initial rotating object detection model is iteratively trained until the preset conditions are met, and the target rotating object detection model is obtained. The position of the object to be measured in the measured image.
- the interaction unit 1701 is further configured to: receive a sample category, where the sample category is a category marked by the user on the object to be marked in the sample image.
- the processing unit 1702 is further configured to: input the sample category into the initial rotating object detection model, so as to classify the object to be marked by the initial rotating object detection model to obtain the predicted sample category. According to the predicted sample category, the sample category and the classification loss function, the initial rotating object detection model is iteratively trained until the preset conditions are met, and the target rotating object detection model is obtained.
- the predicted category includes at least one item of positive information and negative information of the object to be detected.
- the processing unit 1702 is further configured to: intercept the sample image according to the sample annotation frame to obtain the intercepted sample image. Rotate and intercept the sample images according to the n first rotation angles to obtain n rotated sample images. The n first rotation angles are obtained according to the sample rotation angles. The n first rotation angles correspond to the n rotated sample images one-to-one, and n is Integer greater than or equal to 2. Feed n rotated sample images into the angle training gallery. Similar sample image pairs and heterogeneous sample image pairs are determined in the angle training gallery. Objects in similar sample image pairs have the same angle and category, and objects in heterogeneous sample image pairs have different angles or categories. The initial angle measurement model is trained according to the same sample image pair and the heterogeneous sample image pair, and the target angle measurement model is obtained.
- the interaction unit 1701 is further configured to: receive a sample category, where the sample category is a category marked by the user on the object to be marked in the sample image.
- the processing unit 1702 is specifically configured to: determine the same sample image pair and the heterogeneous sample image pair in the angle training gallery according to the sample category.
- the processing unit 1702 is further configured to: input the image to be measured into the target rotating object detection model, so as to perform position regression of the object to be measured in the image to be measured by the target rotating object detection model, to obtain the object to be measured.
- the regression position frame of the object to be measured, the regression position frame represents the position of the object to be measured in the image to be measured, and the regression position frame is used to determine the predicted rotation angle.
- the processing unit 1702 is further configured to: input the image to be measured into the target rotating object detection model, so as to perform position regression of the object to be measured in the image to be measured by the target rotating object detection model, to obtain the object to be measured.
- the regression position box of the object to be measured, the regression position box represents the position of the object to be measured in the image to be measured.
- the image to be tested is intercepted according to the regression position frame, and the intercepted image is obtained. Determine m second rotation angles according to the regression position frame, where m is an integer greater than or equal to 2.
- the intercepted images are rotated according to the m second rotation angles to obtain m rotated images, and the m second rotation angles are in one-to-one correspondence with the m rotated images.
- the target angle measurement model the target image in the m rotated images is determined, and the object in the target image has the same category and angle as the reference object in the reference template image.
- a predicted rotation angle corresponding to the target image is determined among the n second rotation angles.
- the processing unit 1702 is specifically configured to: determine the frame rotation angle according to the return position frame, the frame rotation angle is the rotation angle of the return position frame relative to the horizontal frame, the horizontal frame has a horizontal edge, and the frame rotates.
- the angle is greater than or equal to 0° and less than or equal to 90°.
- m second rotation angles are determined according to the rotation angle of the frame.
- the processing unit 1702 is specifically configured to: construct an image pair by using each image in the m rotated images and an image in the template image library. Through the target angle measurement model, similar image pairs are determined in the image pairs, and the objects in the same image pairs have the same angle and category. Determine the target image in the same type of image pair, and the target image is contained in the m rotated images.
- the interaction unit 1701 is further configured to: receive a sample category, where the sample category is the category marked by the user on the object to be marked in the sample image;
- the processing unit 1702 is further configured to: input the sample category into the initial rotating object detection model, so as to classify the object to be marked by the initial rotating object detection model to obtain the predicted sample category.
- the initial rotating object detection model is iteratively trained until the preset conditions are met, and the target rotating object detection model is obtained.
- Predict the category of the object to be measured through the target rotating object detection model and obtain the predicted category.
- the reference template image is determined according to the predicted category, and the reference object in the reference template image has the predicted category.
- An image pair is constructed from each of the n rotated images, and a reference template image contained in the reference template library.
- the sample image processing apparatus shown in FIG. 17 is used to execute the methods in the foregoing embodiments shown in FIGS. 4 to 12 , and the processing unit 1702 is used to execute the labeling module 301 , the first training module 302 , the first training module 302 , the first training module 302 , the Two actions performed by the training module 303, the rotating object detection module 1001, and the angle measurement module 1002 that do not require interaction with the user;
- the interaction unit 1701 is used to perform the actions performed by the labeling module 301 in the methods shown in the foregoing embodiments that require interaction with the user Action:
- the obtaining unit 1703 is configured to execute the action of obtaining the information related to the reference template image, such as the reference frame or the reference key point, in the method shown in the foregoing embodiment.
- the acquisition unit 1703 may exist independently of the interaction unit 1701, or may be a part of the interaction unit 1701, which is not limited here.
- FIG. 18 is a schematic structural diagram of an image processing apparatus provided in an embodiment of the present application.
- the image processing apparatus 1800 may include one or more central processing units (CPUs) 1801 and a memory 1805.
- the memory 1805 stores a one or more applications or data.
- the memory 1805 may be volatile storage or persistent storage.
- the program stored in the memory 1805 may include one or more modules, and each module may include a series of instructions to operate on the image processing apparatus.
- the central processing unit 1801 may be configured to communicate with the memory 1805 to execute a series of instruction operations in the memory 1805 on the image processing apparatus 1800 .
- the image processing apparatus 1800 may also include one or more communication interfaces 1803, and/or, one or more operating systems, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- operating systems such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
- the image processing apparatus 1800 may further include one or more power sources 1802 .
- the image processing apparatus 1800 can perform the operations performed by the image processing apparatus in the embodiments shown in the foregoing FIG. 4 to FIG. 12 , and details are not repeated here.
- Embodiments of the present application also provide a computer program product, which, when running on a computer, causes the computer to perform the steps performed by the image processing apparatus in the methods described in the foregoing embodiments shown in FIG. 4 to FIG. 12 .
- Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the program shown in FIG. 4 to FIG. 12 above. Steps performed by the image processing apparatus in the method described in the exemplary embodiment.
- the image processing apparatus may specifically be a chip.
- the chip includes: a processing unit and a communication unit.
- the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit.
- the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the training device executes the steps performed by the image processing apparatus in the methods described in the embodiments shown in FIG. 4 to FIG. 12 .
- the storage unit can be a storage unit in the chip, such as a register, a cache, etc., and the storage unit can also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM). ) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
- ROM read-only memory
- the disclosed system, apparatus and method may be implemented in other manners.
- the apparatus embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
- the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
一种图像的处理方法,用于准确地标注样本图像,以根据样本图像的标注结果训练得到目标模型,通过目标模型准确地推理图像中物体的旋转角度。该方法包括:接收用户基于参考模板图像,对样本图像中的待标注物体进行标注所得的样本关键点的信息;根据样本关键点的信息和参考关键点的信息,确定样本旋转角度,样本旋转角度为待标注物体相对于参考物体的旋转角度。
Description
本申请要求于2020年12月29日提交中国专利局、申请号为202011598644.7、发明名称为“一种图像处理方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请实施例涉及数据处理领域,尤其涉及一种图像处理方法以及相关设备。
人工智能(artificial intelligence,AI)技术是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策功能的一种技术。通过AI技术,可以实现对待测物体的角度的确定。
在一种方案中,人工标注样本图像中待标注物体的角度,根据人工标注的角度和样本图像,训练角度度量模型。然后,根据训练好的角度度量模型确定待测物体的角度。
由于待标注物体的角度依据人的主观认知标注,导致标注的结果混乱。依据该标注结果训练角度度量模型,训练难度高,根据训练得到的模型,确定出的待测物体的角度也不准确。
发明内容
本申请实施例提供了一种图像处理方法以及相关设备,用于准确的确定图像中物体的旋转角度。
本申请实施例第一方面提供了一种图像处理方法,该方法包括:
向用户提供样本图像和参考模板图像,样本图像中包括待标注物体,参考模板图像中包括与待标注物体对应的参考物体,参考模板图像中标注有参考物体的参考关键点,参考关键点的数量大于或等于2。接收样本关键点的信息,样本关键点的信息为用户基于参考模板图像,对样本图像中的待标注物体进行标注所得。根据样本关键点的信息和参考关键点的信息,确定样本旋转角度,样本旋转角度为待标注物体相对于参考物体的旋转角度。其中,将参考模板图像中的参考物体作为衡量物体旋转角度的基准,即旋转角度都是相对于参考物体的角度,因此参考物体相对于参考模板图像的旋转角度为0°。
在本申请实施例中,样本旋转角度是基于样本关键点确定的,样本关键点是基于参考图像中的参考物体所标注的,即,参考物体是确定样本旋转角度的标准。因此样本旋转角度具有统一的标准,依据样本旋转角度训练角度度量模型,训练的难度降低。并且,根据训练得到的目标角度度量模型,可以准确地确定待测图像中待测物体的旋转角度。
对于用户来说,只需要标注与参考关键点对应的样本关键点,不需要再标注样本的角度和样本标注框,减少了用户的劳动量,提升了标注的效率。
结合第一方面,本申请实施例第一方面的第一种实施方式中,参考关键点和样本关键点的数量均为2,确定样本旋转角度,具体可以包括:根据参考关键线与样本关键线之间 的夹角,确定样本旋转角度,参考关键线为2个参考关键点的连线,样本关键线为2个样本关键点的连线。
在本申请实施例中,通过样本关键线和参考关键线之间的夹角确定样本旋转角度,由于样本关键线来源于样本关键点,样本关键点的标注来源于标注有参考关键点的参考物体,因此样本旋转角度的确定都是基于参考关键点这一特征,即样本旋转角度的确定都是基于参考关键点这一统一的基准。根据基于统一基准得到的数据训练角度度量模型,训练的难度降低,通过训练得到的目标角度度量模型,可以准确地确定待测图像中待测物体的旋转角度。
结合第一方面或第一方面的第一种实施方式,本申请实施例第一方面的第二种实施方式中,接收用户标注的样本关键点的信息之后,还可以包括:获取参考模板图像的参考标注框,参考标注框表示参考物体在参考模板图像中的位置。根据参考标注框,确定样本标注框,样本标注框表示待标注物体在样本图像中的位置。
在本申请实施例中,样本标注框表示待标注物体在样本图像中的位置,对样本标注框的信息进行运算处理,也就是对带标注物体的位置信息进行运算处理,也就可以根据待标注物体的位置获取相关信息,提升了方案的灵活性。
结合第一方面的第二种实施方式,本申请实施例第一方面的第三种实施方式中,根据参考标注框确定样本标注框,具体可以包括:根据参考标注框、参考关键点的信息和样本关键点的信息确定样本标注框,参考关键点与参考标注框之间具有参考位置关系,样本关键点与样本标注框之间具有该参考位置关系。
在本申请实施例中,参考关键点与参考标注框之间具有参考位置关系,样本关键点与样本标注框之间的位置关系为该参考位置关系,即,样本标注框是基于参考位置关系这一统一基准确定的,因此通过该样本标注框训练模型,训练得到的目标模型可以基于参考位置关系这一统一标准,对待测图像进行推理,得到的结果是基于统一标准得到的,提高了推理的准确度。
结合第一方面的第二种或第三种实施方式,本申请实施例第一方面的第四种实施方式中,还可以训练旋转物体检测模型确定图像中物体的位置框的能力,具体的:将样本图像和样本标注框的信息输入初始旋转物体检测模型,以通过初始旋转物体检测模型对待标注物体进行位置回归,得到样本回归位置框的信息。根据样本回归位置框的信息、样本标注框的信息和位置回归损失函数,训练初始旋转物体检测模型,得到目标旋转物体检测模型,目标物体检测模型用于确定待测图像中待测物体的位置。具体的,对初始旋转物体检测模型的训练过程,可以包括:根据样本回归位置框的信息、样本标注框的信息和位置回归损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件。
在本申请实施例中,样本标注框是基于参考位置关系这一统一基准确定的,对样本标注框进行位置回归的过程,具体可以通过寻找与参考位置关系相关的特征,确定准确的回归位置框,因此训练过程更简单。
结合第一方面的第四种实施方式,本申请实施例第一方面的第五种实施方式中,还可以训练旋转物体检测模型对图像中物体分类的能力,具体的:接收样本类别,样本类别为 用户对样本图像中待标注物体标注的类别。将样本类别输入初始旋转物体检测模型,以通过初始旋转物体检测模型对待标注物体进行分类,得到预测样本类别。根据预测样本类别、样本类别和分类损失函数,训练初始旋转物体检测模型,得到目标旋转物体检测模型。目标旋转物体检测模型用于确定待测图像中待测物体的类别。待测物体的类别,用于确定待测物体相对于参考物体的预测旋转角度。具体的,对初始旋转物体检测模型的训练过程,可以包括:根据预测样本类别、样本类别和分类损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件。
在本申请实施例中,训练了旋转物体检测模型对图像中物体进行分类的能力,在推理过程中,可以根据目标旋转物体检测模型得到待测物体的类别,简化确定预测旋转角度的过程。
结合第一方面的第五种实施方式,本申请实施例第一方面的第六种实施方式中,预测类别包括待测物体的正面信息和反面信息中的至少一项。
在本申请实施例中,可以区分正面的待测物体与反面的待测物体,可以针对正面的待测物体或反面的待测物体进行不同的操作,提高了方案的灵活性。
结合第一方面的第二种实施方式或第三种实施方式,本申请实施例第一方面的第七种实施方式中,还可以训练角度度量模型确定同类图像对和异类图像对的能力,具体的:可以根据样本标注框截取样本图像,得到截取样本图像。再根据n个第一旋转角度旋转截取样本图像,得到n个旋转样本图像,n个第一旋转角度为根据样本旋转角度获取,n个第一旋转角度与n个旋转样本图像一一对应,n为大于或等于2的整数。将n个旋转样本图像输入角度训练图库。在角度训练图库中确定同类样本图像对和异类样本图像对,同类样本图像对中的物体具有相同的角度和类别,异类样本图像对中的物体具有不同的角度或类别。根据同类样本图像对和异类样本图像对训练初始角度度量模型,得到目标角度度量模型。
在本申请实施例中,训练了角度度量模型确定同类图像对和异类图像对的能力,同类图像对中的物体具有相同的类别和角度,即同类图像对中的图像具有相同或近似的形状。也就是说,本申请实施例训练了角度度量模型确定相同或近似形状的能力,相较于现有技术中针对混乱的标注结果进行角度分类,本申请实施例中角度度量模型的训练过程更具有针对性,训练过程更简单精确。
结合第一方面的第七种实施方式,本申请实施例第一方面的第八种实施方式中,可以根据样本类别训练角度度量模型,具体的:可以接收样本类别,样本类别为用户对样本图像中的待标注物体标注的类别。在角度训练图库中确定同类样本图像对和异类样本图像对这一步骤,具体可以包括:根据样本类别,在角度训练图库中确定同类样本图像对和异类样本图像对。
结合第一方面的第四种实施方式,本申请实施例第一方面的第九种实施方式中,可以通过目标旋转物体检测模型推理待测图像中待测物体的回归位置框,具体的:可以将待测图像输入目标旋转物体检测模型,以通过目标旋转物体检测模型对待测图像中的待测物体进行位置回归,得到待测物体的回归位置框,回归位置框表示待测物体在待测图像中的位置,回归位置框用于确定预测旋转角度,预测旋转角度为对待测物体相对于参考物体的旋 转角度的预测值。
在本申请实施例中,通过前述第一方面的第四种实施方式中训练得到的目标旋转物体检测模型,对待测图像中的待测物体进行位置回归,由于目标旋转物体检测模型是基于参考物体这一统一基准训练得到的,本实施方式中位置回归得到的回归位置框,也是基于参考物体这一统一基准得到的,确定出的回归位置框更加准确。
结合第一方面的第七种实施方式或第八种实施方式,本申请实施例第一方面的第十种实施方式中,可以通过目标角度度量模型推理待测图像中待测物体的旋转角度,具体的:可以将待测图像输入目标旋转物体检测模型,以通过目标旋转物体检测模型对待测图像中的待测物体进行位置回归,得到待测物体的回归位置框,回归位置框表示待测物体在待测图像中的位置。再根据回归位置框截取待测图像,得到截取图像。还可以根据回归位置框确定m个第二旋转角度,其中,m为大于或等于2的整数。再根据m个第二旋转角度旋转截取图像,得到m个旋转图像,m个第二旋转角度与m个旋转图像一一对应。通过目标角度度量模型,确定m个旋转图像中的目标图像。其中,目标图像中的物体,与参考模板图像中的参考物体,具有相同的类别和角度。于是就可以在m个第二旋转角中确定与目标图像对应的预测旋转角度。
结合第一方面的第十种实施方式,本申请实施例第一方面的第十一种实施方式中,可以根据回归位置框确定m个第二旋转角度,具体可以包括:根据回归位置框确定边框旋转角度。其中,边框旋转角度为回归位置框相对于水平框的旋转角度,水平框具有水平边。并且,边框旋转角度大于或等于0°且小于或等于90°。再根据边框旋转角度确定m个第二旋转角度。
结合第一方面的第十种实施方式或第十一种实施方式,本申请实施例第一方面的第十二种实施方式中,通过目标角度度量模型确定m个旋转图像中的目标图像,具体可以包括:通过m个旋转图像中的每一个图像,和模板图像库中的图像,构建图像对。再通过目标角度度量模型,在图像对中确定同类图像对。其中,同类图像对中的物体具有相同的角度和类别。然后就可以确定同类图像对中的目标图像。其中,目标图像就包含于前述m个旋转图像。
结合第一方面的第十二种实施方式,本申请实施例第一方面的第十三种实施方式中,还可以根据物体的类别确定物体的旋转角度,具体的:可以接收样本类别,样本类别为用户对样本图像中待标注物体标注的类别。然后将样本类别输入初始旋转物体检测模型,以通过初始旋转物体检测模型对待标注物体进行分类,得到预测样本类别。再根据预测样本类别、样本类别和分类损失函数,训练初始旋转物体检测模型,得到目标旋转物体检测模型。具体的,对初始旋转物体检测模型的训练过程,可以包括:根据预测样本类别、样本类别和分类损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件。
获取目标旋转物体检测模型后,就可以通过目标旋转物体检测模型对待测物体进行类别预测,得到预测类别。然后根据预测类别确定参考模板图像,参考模板图像中的参考物体具有该预测类别。
构建图像对的步骤,具体可以包括:通过n个旋转图像中的每一个图像,和根据预测 类别确定出的参考模板图像,构建图像对。其中,参考模板图像包含于参考模板库中的图像。
在本申请实施例中,根据预测类别在模板图像库中确定参考模板图像,大大减少了构建的图像对的数量,也就大大减少了装置在图像对中确定同类图像对的工作量,节省了装置的运算资源和存储资源。同时提升了确定目标图像的效率,也就提升了确定预测旋转角度的效率。
结合第一方面或第一方面的第一种实施方式,本申请实施例第一方面的第十四种实施方式中,可以训练关键点检测模型确定图像中物体的关键点的能力,具体的:可以将样本图像和样本关键点的信息输入初始关键点检测模型,以通过初始关键点检测模型对样本图像中的点进行位置回归,得到回归样本关键点的信息。再根据回归样本关键点的信息,样本关键点的信息,以及关键点位置回归损失函数,训练初始关键点检测模型,得到目标关键点检测模型,目标关键点检测模型用于确定待测图像中待测物体的预测关键点。其中,对初始关键点检测模型的训练过程,具体可以包括:根据回归样本关键点的信息,样本关键点的信息,以及关键点位置回归损失函数,对初始关键点检测模型进行迭代训练,直至满足预设条件。
结合第一方面、第一方面的第一种实施方式或第二种实施方式,本申请实施例第一方面的第十五种实施方式中,还可以训练关键点检测模型对图像中物体关键点分类的能力,具体的:可以接收样本关键点类别,关键点类别为用户对样本图像中待标注物体标注的类别。然后将样本图像和样本关键点类别输入初始关键点检测模型,通过初始关键点检测模型,对样本图像中的样本关键点进行分类,得到预测样本关键点类别。再根据预测样本关键点类别、样本关键点类别和关键点分类损失函数,训练初始关键点检测模型,得到目标关键点检测模型,目标关键点检测模型用于确定待测图像中待测物体的类别。其中,对初始关键点检测模型的训练过程,具体可以包括:根据预测样本关键点类别、样本关键点类别和关键点分类损失函数,对初始关键点检测模型进行迭代训练,直至满足预设条件。
结合第一方面的第十五种实施方式,本申请实施例第一方面的第十六种实施方式中,样本关键点类别可以包括样本物体的正面信息和反面信息中的至少一项。
结合第一方面、第一方面的第一种实施方式、第十四种实施方式至第十六种实施方式中的任一种,本申请实施例第一方面的第十七种实施方式中,可以通过目标关键点检测模型确定待测图像中待测物体的预测关键点,具体的:可以将待测图像输入目标关键点检测模型,以通过目标关键点检测模型对待测图像中的点进行位置回归,得到待测物体的预测关键点,预测关键点用于确定预测旋转角度。
结合第一方面的第十七种实施方式,本申请实施例第一方面的第十八种实施方式中,可以通过目标关键点检测模型确定待测图像中待测物体的类别,即预测关键点类别,具体的:可以通过目标关键点检测模型对预测关键点进行分类,得到预测关键点类别,预测关键点类别用于确定预测旋转角度。
结合第一方面的第十八种实施方式,本申请实施例第一方面的第十九种实施方式中,预测关键点和参考关键点的数量均为2,可以通过预测关键点和预测关键点类别确定预测 旋转角度,具体的:可以通过预测关键点类别确定参考模板图像。其中,参考模板图像中的参考物体的类别,与预测关键点类别相同。然后就可以确定预测关键线相对于参考关键线的旋转角度,该旋转角度即为预测旋转角度。其中,预测关键线由2个预测关键点组成,参考关键线由2个参考关键点组成,2个预测关键点与2个参考关键点一一对应。
结合第一方面的第十八种实施方式,本申请实施例第一方面的第二十种实施方式中,可以通过二维平面上的图像,预测图像中三维物体的姿态。具体的,可以根据二维图像上的特征形状,确定该特征形状所对应的三维物体的预测旋转角度。进一步可以表示为:可以通过k个预测关键点确定特征形状。通过预测关键点类别确定参考模板图像。其中,参考模板图像中的参考物体的类别与预测关键点类别相同。再通过参考模板图像中与k个预测关键点对应的k个参考关键点,确定参考形状。将特征形状输入目标模型,以通过目标模型确定特征形状相对于参考形状的形状差异,并根据该形状差异确定特征形状所对应的待测物体,相对于参考形状所对应的参考物体的三维旋转角度。通过参考物体的姿态和该三维旋转角度,就可以反映待测物体的姿态。其中,k为大于或等于2的整数。
在本申请实施例中,通过目标模型,实现了基于二维图像,对二维图像中物体的三维姿态的预测。该方法不用构建三维模型,简化了确定三维姿态的过程,节省了装置确定物体的三维姿态所消耗的运算、存储等资源。
本申请实施例第二方面提供了一种图像处理装置,该装置包括:交互单元和处理单元。
交互单元用于,向用户提供样本图像和参考模板图像。其中,样本图像中包括待标注物体,参考模板图像中包括与待标注物体对应的参考物体,参考模板图像中标注有参考物体的参考关键点,参考关键点的数量大于或等于2。其中,将参考模板图像中的参考物体作为衡量物体旋转角度的基准,即旋转角度都是相对于参考物体的角度,因此参考物体相对于参考模板图像的旋转角度为0°。
交互单元还用于,接收样本关键点的信息,样本关键点的信息为用户基于参考模板图像,对样本图像中的待标注物体进行标注所得。
处理单元用于,根据样本图像中待标注物体的样本关键点的信息和参考模板图像中参考物体的参考关键点的信息,确定样本旋转角度,样本旋转角度为待标注物体相对于参考物体的旋转角度。
该图像处理装置用于执行前述第一方面的方法。
由于第二方面的图像装置用于执行第一方面的方法,所以第二方面的有益效果参见第一方面,此处不再赘述。
本申请实施例第三方面提供了一种计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的图像处理方法。
本申请实施例第四方面提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面所述的图像处理方法。
本申请实施例第五方面提供了一种图像处理装置,包括处理器和存储器,处理器与存储器耦合。存储器用于存储程序。处理器用于执行存储器中的程序,使得处理器执行如前 述第一方面所述的图像处理方法。
本申请实施例第六方面提供了一种芯片系统,该芯片系统包括至少一个处理器和通信接口,通信接口和至少一个处理器通过线路互联,至少一个处理器用于运行计算机程序或指令,以进行第一方面任一种可能的实施方式中任一项所描述的图像处理方法。
其中,芯片中的通信接口可以为输入/输出接口、管脚或电路等。
在一种可能的实现中,本申请中上述描述的芯片系统还包括至少一个存储器,该至少一个存储器中存储有指令。该存储器可以为芯片内部的存储单元,例如,寄存器、缓存等,也可以是该芯片的存储单元(例如,只读存储器、随机存取存储器等)。
图1为一种模型训练的流程示意图;
图2a为本申请实施例提供的一种系统架构示意图;
图2b为本申请实施例提供的一种旋转物体位姿检测系统的结构示意图;
图3为本申请实施例提供的一种图像处理装置的结构示意图;
图4为本申请实施例提供的一种图像处理方法的流程示意图;
图5a为本申请实施例提供的一种标注方法示意图;
图5b为本申请实施例提供的另一标注方法示意图;
图6为本申请实施例提供的另一图像处理方法的流程示意图;
图7为本申请实施例提供的一种旋转物体检测模型的训练方法的示意图;
图8为本申请实施例提供的另一图像处理方法的流程示意图;
图9为本申请实施例提供的一种角度度量模型的训练方法的示意图;
图10为本申请实施例提供的另一图像处理装置的结构示意图;
图11为本申请实施例提供的另一图像处理方法的流程示意图;
图12为本申请实施例提供的一种模型推理过程的方法示意图;
图13为本申请实施例提供的另一图像处理装置的结构示意图;
图14为本申请实施例提供的另一图像处理方法的流程示意图;
图15为本申请实施例提供的另一图像处理装置的结构示意图;
图16为本申请实施例提供的另一图像处理方法的流程示意图;
图17为本申请实施例提供的另一图像处理装置的结构示意图;
图18为本申请实施例提供的另一图像处理装置的结构示意图。
本申请实施例提供了一种图像处理方法,用于准确地标注样本图像,以根据该样本图像训练模型,通过训练出的模型实现对旋转物体位姿的准确预测。
位姿表示物体的位置和姿态,位姿预测包括对位置的预测和对姿态的预测。
在本申请实施例中,推理表示对待测图像中待测物体的某一元素的预测,例如对待测物体位置的推理,就表示对待测物体位置的预测。
为了实现对图像中物体位姿的预测,需要训练模型,通过训练得到的目标模型预测图像中物体的位姿。如图1所示,图1为模型训练的流程示意图。通过人工对样本图像进行标注,得到待标注物体的角度。该角度表示样本图像中待标注物体的方位指向。再通过样本图像和待标注物体的角度,训练初始模型,得到目标模型。使目标模型具备对待测图像中的待测物体进行姿态预测的能力。
由于人对图像进行的角度标注,是依据主观认知完成的,主观认知因人而异,就算是一个人,也无法保证主观认知的前后一致性,因此人工标注会导致标注结果混乱,影响模型训练,进一步影响目标模型对待测图像中待测物体姿态预测的准确性。
针对上述缺陷,本申请实施例提供了一种图像处理方法和图像处理装置,基于统一的标准对样本图像进行标注,确保了标注结果的一致性与准确性。根据该标注结果对模型进行训练,可以提升模型对待测图像中待测物体角度预测的准确性,同时也可以降低模型训练的难度。
本申请实施例中,通过关键点的标注,保证标注结果的一致性与准确性。接下来将对如何进行关键点标注和如何应用标注结果进行说明。
本申请实施例提供了一种推理图像中物体姿态的方法,具体阐述了关键点标注和对标注结果的应用过程。接下来将会用两种实施方式举例,对该方法进行详细说明。值得注意的是,该两种实施方式仅是对关键点标注和对标注结果的应用过程的示例。任何基于关键点标注实现对图像中物体姿态的预测的方法,都属于本申请实施例所说明的范围,此处不作限定。
接下来对本申请实施例提供的一种推理图像中物体姿态的方法进行描述。
一、根据位置框推理图像中物体的位姿。
本申请实施例提供的一种图像处理方法中,通过确定图像中物体的位置框,确定物体的旋转角度。接下来对实现该方法的系统及具体流程进行详细描述。
1、图像处理系统。
1.1、系统架构。
以本申请实施例中图像处理场景为例进行说明,如图2a所示,本申请实施例提供了一种系统架构。在图2a所示的实施例中,系统架构包括执行设备210,训练设备220,数据库230,终端设备240和数据存储系统250和数据采集设备260,其中执行设备210包括计算模块211。其中,数据采集设备260用于获取样本数据,以及训练产生的损失值,并将其存入数据库230,训练设备220基于数据库230中维护的样本数据,以及训练产生的损失值,生成目标模型/规则213。下面将更详细地描述训练设备220如何基于样本数据以及训练产生的损失值得到目标模型/规则213。目标模型/规则213能够自适应的调整损失值对应的权重参数,同时在训练过程中利用并行计算优势探索权重的有效性以及继承优秀的网络参数和权重,从而实现在一个训练时间内得到最优训练模型。
可选的,数据库230中可以存有参考模板图像。训练设备220用于生成模型,并利用数据库230中的参考模板图像对该模型进行迭代训练,从而得到目标模型。执行设备210根据目标模型确定图像中物体的旋转角度之后,可以将该旋转角度发送给不同的设备,可 以发送给终端设备240,也可以发送给数据存储系统250,具体此处不做限定。
可选的,终端设备240和执行设备210可以分别为独立的设备,也可以为一个整体,具体此处不做限定。执行设备210配置有通信接口212,用于与终端设备240进行数据交互,在模型训练阶段,用户可以通过终端设备240获取参考模板图像及相关信息,用户可以通过终端设备240向通信接口212输入样本图像中物体的关键点或类别,执行设备210可以根据样本关键点和类别训练初始模型,得到目标模型;在模型预测阶段,用户可以通过终端设备240向通信接口212输入待测图像,执行设备210可以根据待测图像和目标模型,确定待测图像中待测物体的预测旋转角度,执行设备210可以通过通信接口212将预测旋转角度发送给客户设备240,提供给用户。
需要注意的是,图3仅是本申请实施例提供的系统架构示意图,图中所示的设备、器件之间的位置关系并不构成任何限制。在本申请实施例的模型预测阶段中,用户也可以为除了人以外的其他主体,例如还可以是工业机器人、智能系统等,只要是可以使用该系统的实体即可,此处不作限定。
1.2系统结构。
请参阅图2b,图2b为本申请实施例提供的一种旋转物体位姿检测系统的结构示意图。本申请实施例提供的旋转物体位姿检测系统,包括旋转物体位姿标注模块201和级联型旋转物体位姿检测模块202。
级联型旋转物体位姿检测模块202包括两个子模块,即级联型旋转物体位姿检测训练子模块2021,和级联型旋转物体位姿检测子模块2022。
在本申请实施例中,对旋转物体位姿检测系统的应用包括两个阶段,即模型训练阶段和模型推理阶段。
1.21、模型训练阶段。
模型训练阶段由旋转物体位姿标注模块201和级联型旋转物体位姿检测训练子模块2021实现。
旋转物体位姿标注模块201主要包括但不限于以下功能:
1、基于参考模板图像中参考物体的参考关键点,获取样本图像中待标注物体的样本关键点的信息。
2、基于样本关键点的信息,确定样本图像中的待标注物体相对于参考物体的样本旋转角度。
3、基于参考模板图像中参考物体的关键点和参考标注框,确定样本图像中待标注物体的样本标注框。
4、获取样本图像中待标注物体的类别。
在本申请实施例中,待标注物体的类别也称为样本类别。
在本申请实施例中,旋转物体位姿标注模块201也称为基于关键点的自适应旋转物体位姿标注模块。“基于关键点”表示,该模块对样本关键点和样本标注框的标注,是基于参考关键点实现的。“自适应”表示,该模块对样本标注框和样本旋转角度的确定过程是自动实现的,不需要人为标注样本标注框和样本旋转角度。
级联型旋转物体位姿检测训练子模块2021用于,对初始模型进行训练,得到目标模型。在本申请实施例中,“级联型”表示需要两个模型配合实现位姿检测。因此在模型训练阶段需要训练这两个模型,即旋转物体检测模型和角度度量模型。
对旋转物体检测模型的训练过程如下:将来自旋转物体位姿标注模块201的样本标注框的信息,以及样本图像输入初始旋转物体检测模型,以训练旋转物体检测模型对图像中物体进行位置回归的能力,得到具有该能力的目标旋转物体检测模型。
可选的,也可将来自旋转物体位姿标注模块201的样本类别输入旋转物体检测模型,用于训练旋转物体检测模型确定图像中物体的类别的能力。
对角度度量模型的训练过程如下:将来自旋转物体位姿标注模块201的样本标注框的信息、样本旋转角度、样本类别,以及样本图像输入角度度量模型,以训练角度度量模型确定同类图像对的能力,得到具有该能力的目标角度度量模型。其中,同类图像对中的物体,具有相同的角度和类别。
1.22、模型推理阶段。
模型推理阶段由级联型旋转物体位姿检测子模块2022实现。
“级联型”表示,在推理阶段,需要通过两级模型配合,实现对待测物体旋转角度的预测。在本申请实施例中,两级表示两个阶段,即阶段1:旋转物体检测阶段,和阶段2:角度度量阶段。旋转物体检测阶段,通过目标旋转物体检测模型对待测图像中的待测物体进行位置回归,得到回归位置框,回归位置框用于表示待测物体在待测图像中的位置。角度度量阶段,通过目标角度度量模型和旋转物体检测阶段得到的回归位置框,确定待测物体的旋转角度。
可选的,旋转物体检测阶段还可以确定待测物体的类别,用于角度度量阶段对待测物体旋转角度的确定。
接下来将对模型训练的过程进行详细的描述。
2、模型训练阶段。
请参阅图3,图3为本申请实施例提供的一种图像处理装置的结构示意图。该图像处理装置用于实现对样本标注框和样本旋转角度的标注、对旋转物体检测模型的训练以及对角度度量模型的训练,即图2b所示实施例中的模型训练阶段。本申请实施例提供的样本图像处理装置300包括标注模块301、第一训练模块302和第二训练模块303。
标注模块301,对应于图2b所示实施例中的旋转物体位姿标注模块201;第一训练模块302和第二训练模块303,对应于图2b所示实施例中的级联型旋转物体位姿检测训练子模块2021。
标注模块301用于,向用户提供参考模板图像和样本图像。其中,参考模板图像中包括参考物体的影像,样本图像中包括待标注物体的影像。参考模板图像中标注有参考物体的参考关键点,参考关键点的数量大于或等于2。
标注模块301还用于,接收样本关键点的信息,样本关键点的信息为用户基于参考模板图像和参考关键点,对样本图像中的待标注物体进行标注所得的信息。并依据样本关键点的信息和参考关键点的信息,确定样本旋转角度。其中,样本旋转角度为样本图像中的 待标注物体相对于参考物体的旋转角度。
标注模块301还用于,依据用户标注的样本关键点,以及参考关键点和参考标注框,对样本图像进行标注,得到样本标注框。其中,样本标注框表示待标注物体在样本图像中的位置,参考标注框表示参考物体在参考模板图像中的位置。参考物体在参考模板图像中的影像与参考标注框之间具有参考位置关系,待标注物体在样本图像中的影像与样本标注框之间,也具有该参考位置关系。因此,对样本标注框的标注,具有统一的标准,该标准即为参考位置关系。对参考位置关系的具体描述,参见图5所示实施例的说明。
标注模块301还用于,将上述样本标注框和样本图像,传输给第一训练模块302,以训练初始旋转物体检测模型,得到目标旋转物体检测模型,使得目标旋转物体检测模型,具有对图像中物体进行位置框回归的能力。
可选的,标注模块301还可以用于,接收用户对样本图像中待标注物体类别标注的样本类别,并将样本类别传输给第一训练模块302,以训练初始旋转物体检测模型,得到目标旋转物体检测模型,使得目标旋转物体检测模型,具有对图像中的物体进行分类的能力。
第一训练模块302用于训练旋转物体检测模型,因此也可称为旋转物体检测训练模块,该模块的具体用途如下所述:
第一训练模块302用于,通过初始旋转物体检测模型,对样本图像中的样本标注框进行位置回归,并依据回归得到的样本回归位置框,与样本标注框,训练初始旋转物体检测模型,得到目标旋转物体检测模型。目标旋转物体检测模型用于确定回归位置框,回归位置框用于表示待测物体在待测图像中的位置。
可选的,第一训练模块302还可以用于,通过初始旋转物体检测模型,对样本图像中的待标注物体进行分类,并依据分类结果与样本类别,对初始旋转物体检测模型进行迭代训练,得到目标旋转物体检测模型。此处的目标旋转物体检测模型,用于对图像中的物体进行分类,以依据分类结果确定预测旋转角度。
标注模块301还可以用于,将上述样本图像、样本旋转角度、样本标注框的信息和样本类别,传输给第二训练模块303,以训练初始角度度量模型,得到目标角度度量模型。目标角度度量模型,用于确定待测图像中待测物体相对于参考物体的预测旋转角度。
第二训练模块303用于训练角度度量模型,因此也可称为角度度量训练模块,该模块的具体用途如下所述:
第二训练模块303用于,根据样本标注框截取样本图像,得到截取样本图像,并根据n个第一旋转角度旋转该截取样本图像,得到n个旋转样本图像。其中,n个第一旋转角度为根据样本旋转角度获取。然后将n个旋转样本图像输入角度训练图库,并在角度训练图库中确定同类样本图像对和异类样本图像对。其中,同类样本图像对中的样本图像具有相同的类别和角度,异类样本图像对中的样本图像具有不同的类别或角度。再根据同类样本图像对和异类样本图像对,训练初始角度度量模型,得到目标角度度量模型。目标角度度量模型用于确定预测旋转角,预测旋转角为对待测物体相对于参考物体的旋转角度的预测值。
接下来详细描述样本图像处理装置300对样本图像的处理流程,主要分为三个阶段:对样本图像的标注、对旋转物体检测模型的训练和对角度度量模型的训练。
接下来描述样本图像的标注阶段。
2.1、对样本图像的标注。
基于图3所示的图像处理装置,本申请实施例提供了一种图像处理方法。请参阅图4,图4为本申请实施例提供的一种图像处理方法的流程示意图,其流程包括:
401、标注模块301获取样本图像和参考模板图像。
标注模块301可以获取参考模板图像,参考模板图像中包括参考物体。参考模板图像作为后续模型训练和预测过程衡量角度的标准。
在本申请实施例中,图像中包括物体,其具体含义为图像中包括物体的影像,例如参考模板图像中包括参考物体,其含义就是参考模板图像中包括参考物体的影像。同理,图像中的物体,其含义为图像中所包括的,物体的影像。
具体的,参考模板图像可以从参考图像库中获取,参考模板图像还可以通过其他方式获取,例如对参考物体拍摄获取等,此处不作限定。参考图像库中包括多张参考模板图像,除了特别说明的情况,本申请实施例中的参考模板图像,表示步骤401中获取的参考模板图像。
在本申请实施例中,将参考模板图像中的参考物体作为衡量物体旋转角度的基准,即旋转角度都是相对于参考物体的角度,因此参考物体的旋转角度为0°。
标注模块301还可以获取样本图像。样本图像中包括待标注物体,待标注物体与参考物体相对应。此处的相对应指的是,待标注物体与参考物体具有相同的类别,待标注物体的方位指向与参考物体的方位指向,可以相同也可以不同。
具体的,样本图像可以从样本图像库中获取,样本图像还可以通过其他方式获取,例如对待标注物体拍摄获取等,此处不作限定。
402、标注模块301获取参考关键点的信息。
标注模块301还可以获取参考模板图像中参考物体的参考关键点的信息,参考关键点的信息可以作为参考模板图像和参考物体的标识,用于作为参考模板图像与样本图像之间的对比依据。除了作为两图像之间的对比依据,参考关键点的信息还可以有其他用途,例如用于确定参考物体与待标注物体之间的关系,示例地,该关系可以为两者之间的夹角等,此处不作限定。
具体的,标注模块301可以向用户提供参考模板图像,并接收用户对参考模板图像中参考物体所标注的参考关键点的信息。
除了用户标注,标注模块301还可以通过其他方式获取参考关键点的信息,例如由样本图像处理装置300对参考模板图像进行标注获取;或者,也可以在步骤401中,从参考图像库中获取参考模板图像的同时获取,此处的参考模板图像中就已经标注有参考关键点;此处不作限定。
具体的,参考关键点可以是参考图像中参考物体的影像上距离最远的两个点。参考关键点也可以是其他的点,例如,用户自定义的两个点;或者在参考物体的影像中,与其他 点具有明显不同特征的两个点等,此处不做限定。
在本申请实施例中,参考关键点的数量除了是2个,也可以为大于2的任一整数,例如3个或4个,此处不做限定。本申请实施例仅以2个参考关键点为例,并不造成对参考关键点数量的限定。
403、标注模块301向用户提供样本图像和标注有参考关键点的参考模板图像。
标注模块301向用户提供样本图像和参考模板图像。其中,参考模板图像中标注有参考物体的2个参考关键点,参考模板图像作为用户标注关键点的参考与依据。
404、标注模块301接收用户标注的样本关键点的信息。
用户可以根据参考模板图像中参考物体的影像的位置,与2个参考关键点,实现对样本图像中待标注物体的样本关键点的标注。
具体的,用户对样本关键点的标注过程如下:由于参考物体与待标注物体具有相同的类别,两者就具有相似或相同的形状。也就是说,参考模板图像中参考物体的影像,与样本图像中待标注物体的影像,具有相似或相同的形状。参考关键点是参考模板图像中该形状上的点,用户可以根据参考关键点与参考模板图像中该形状的位置关系,匹配出样本关键点在样本图像中该形状的位置。即,用户可以根据参考模板图像中参考关键点的位置,在样本图像中匹配出相应的样本关键点,从而实现对样本关键点的标注。
为了更加直观地描述标注过程,结合图5a,对标注过程进行示例说明。图5a为本申请实施例提供的一种标注方法示意图。如图5a所示,用户可以根据参考模板图像中的两个参考关键点K1和K2,在样本图像中,对待标注物体的图像标注对应的样本关键点K1’和K2’。其中,样本关键点K1’对应于参考关键点K1,样本关键点K2’对应于参考关键点K2。
在本申请实施例中,待标注物体也称为待测物体,用户确定样本关键点的过程也称为训练样本标注。
405、标注模块301根据样本关键点的信息和参考关键点的信息确定样本旋转角度。
标注模块301可以根据参考关键点的信息和样本关键点的信息确定待标注物体相对于参考物体的旋转角度。在本申请实施例中,该旋转角度也称为样本旋转角度。
可选的,可以通过关键线确定样本旋转角度。关键线包括参考关键线、样本关键线和预测关键线,预测关键线会在推理过程中出现,此处不做详细解释。2个参考关键点的连线称为参考关键线,2个样本关键点的连线称为样本关键线。使参考关键线与样本关键线中一对对应的点重合,可以得到参考关键线与样本关键线之间的夹角,据此就可确定两个关键线之间的角度大小,两个关键线之间的角度大小反映了待标注物体与参考物体之间的角度大小。
可选的,除了角度大小,根据两个关键线之间的夹角,还可以确定待标注物体相对于参考物体的旋转方向,即样本关键线相对于参考关键线的旋转方向,具体可以为顺时针或逆时针。
例如,如图5a所示,可以使参考模板图像中的参考关键点K1,与样本图像中的样本关键点K1’重合,得到两关键线之间的夹角a,夹角a的大小为a度,夹角a的方向,即 为逆时针方向。因此就可以确定待标注物体相对于参考物体,逆时针旋转了a度。此时样本旋转角度即为a度,且方向为逆时针。
在本申请实施例中,除了通过将关键点重合的方法确定样本旋转角度,也可以通过其他方法确定,例如根据关键线的矢量信息确定等,此处不做限定。
406、获取参考模板图像的参考标注框。
标注模块301获取参考模板图像中参考物体的参考标注框,参考标注框用于表示参考物体在参考模板图像中的位置,即参考物体的影像在参考模板图像中的位置。
具体的,标注模块301以向用户提供参考模板图像,并接收用户对参考模板图像中参考物体所标注的参考标注框。
除了用户标注,标注模块301可以通过其他方式获取参考标注框,例如由样本图像处理装置300对参考模板图像进行标注获取;或者,也可以在步骤401中,从参考图像库中获取参考模板图像的同时获取,此处的参考模板图像中就已经标注有参考标注框;此处不作限定。
在本申请实施例中,参考标注框可以为矩形框,且参考标注框具有水平边。
值得注意的是,在本申请实施例中,除了矩形框,参考标注框也可以是其他形状,例如三角形框、梯形框等,此处不作限定;除了水平边,参考标注框的边也可以与水平边之间具有一定夹角,例如90°或10°的夹角等,此处不作限定。
在本申请实施例中,步骤402中,样本图像处理装置300获取参考关键点的信息的过程,以及步骤406中获取参考标注框的过程,也称为参考物体标注。
407、标注模块301根据参考标注框,确定样本标注框。
标注模块301可以根据参考标注框,在样本图像中确定样本标注框。
具体的,标注模块可以根据参考标注框,参考关键点的信息和样本关键点的信息,确定样本标注框,其过程如下:
根据参考位置关系匹配样本标注框。在本申请实施例中,参考位置关系表示参考关键点与参考标注框之间的位置关系。标注模块301可以根据参考位置关系与样本关键点,在与样本关键点具有参考位置关系的地方,标注样本标注框。样本关键点与样本标注框之间就具有参考位置关系。
示例地,如图5a所示,参考标注框具有点A和点C,点A和参考关键点K1具有相对位置关系一,点A和参考关键点K2具有相对位置关系二,可以根据样本关键点K1’、样本关键点K2’、相对位置关系一和相对位置关系二,确定点A’。同理,也可以根据其他的样本关键点和与样本关键点相对应的相对位置关系,确定样本标注框的其他点B’、C’和D’,此处不做赘述。
值得注意的是,本申请实施例中出现的关键点A、B、A’、B’等,均是对参考关键点或样本关键点的举例,并不造成对前述关键点的限定。
在本申请实施例中,相对位置关系一和相对位置关系二属于相对位置关系。相对位置关系表示参考关键点与参考标注框上的点的位置关系,同时反映了与参考关键点对应的样本关键点,与样本标注框上的点的位置关系。相对位置关系可以是向量,除了向量,还可 以是其他未知关系,例如距离、夹角、坐标系等,此处不做限定。
在本申请实施例中,除了通过确定各个样本关键点的位置,还可以通过其他方法确定样本标注框,例如根据参考位置关系、样本旋转角度和样本关键点的位置,匹配样本标注框等,此处不作限定。
值得注意的是,步骤406和步骤407,也可以在步骤405之前执行,只要在步骤404之后执行即可,此处不作限定。
在本实施例中,步骤405中确定样本旋转角度的过程也称为自适应姿态标注,步骤407中确定样本标注框的过程也称为自适应位置标注;因此,步骤405和步骤407可以合称为自适应位姿标注。
408、标注模块301获取样本类别。
可选的,标注模块301还可以获取样本图像中待标注物体的类别,在本申请实施例中,待标注物体的类别也称为样本类别。
具体的,标注模块301可以接收用户对样本图像中待标注物体所标注的样本类别。
除了用户标注,标注模块301还可以通过其他方式获取样本类别,例如在步骤401中,从样本图像库中获取样本图像的同时获取,此处的样本图像中就已经标注有样本类别;此处不作限定。
409、标注模块301向第一训练模块302发送样本图像和样本标注框的信息。
在步骤407确定样本标注框之后,标注模块301可以将样本图像和样本标注框的信息发送给第一训练模块302,用于训练旋转物体检测模型。
在本申请实施例中,步骤409也可以在步骤408之前执行,只要在步骤407之后执行即可,此处不做限定。
可选的,在步骤408之后,标注模块301也可以将样本类别发送给第一训练模块302,用于训练旋转物体检测模型。
410、标注模块301向第二训练模块303发送样本图像、样本类别、样本旋转角度和样本标注框的信息。
在步骤405确定样本旋转角度、步骤407确定样本标注框以及步骤408确定样本类别之后,标注模块301可以将样本图像、样本类别、样本旋转角度和样本标注框的信息发送给第二训练模块303,用于训练角度度量模型。
为了更清楚地说明对样本图像的标注过程,下面将结合图5b,对标注过程进行描述。图5b为本申请实施例提供的一种标注方法示意图。
为了实现对样本图像的标注,需要经过三个阶段,1、参考模板图像标注;2、大量样本图像的样本关键点标注;3、样本位置框自动获取。
阶段1、参考模板图像标注。
用户对参考模板图像中的参考物体,进行物体类别、关键点以及矩形框的标注。可选的,物体类别可以包括物体的正面信息和反面信息,除了正面信息和反面信息,物体类别还可以包括其他信息,例如材质、用途、目的地等信息,此处不做限定。
可选的,对矩形框的标注可以包括对矩形框顶点的标注。在本申请实施例中,此处的 矩形框也称为参考标注框。参考位置框的形状与位置特征如图4所示实施例的步骤406所述,此处不再赘述。
在本申请实施例中,对参考模板图像中参考物体的标注,除了用户,也可以由其他主体实现,例如由标注模块301实现等,只要标注模块301能获取标注结果即可,此处不做限定。
阶段2、大量样本图像的样本关键点标注。
对参考模板图像的标注结果用于样本标注,标注模块301向用户提供参考模板图像和的样本图像,用户根据样本图像选择与其对应的参考模板图像。该过程称为模板选择。用户再根据参考模板图像,确定并标注样本图像中样本物体的物体类别和关键点。在本实施例中,该过程也称为样本标注。
阶段3、样本位置框自动获取。
模板选择和样本标注之后,标注模块301可以根据参考模板图像中参考物体的参考标注框和参考关键点、以及标注出的样本关键点,确定样本位置框。在本申请实施例中,样本位置框也称为样本标注框。该过程参见图4所示实施例的步骤407,此处不再赘述。在本实施例中,该过程也称为位置框生成。
在本申请实施例中,还可以实现对不同尺度图片的标注、训练和推理。
例如,在接收用户标注的样本关键点的信息后,可以根据样本关键点的信息与参考关键点的信息,确定样本图像相对于参考模板图像的大小比例。并将样本图像根据该比例缩放,得到与参考模板图像同尺度的样本图像,再通过缩放后的样本图像确定标注框、训练模型等。或者也可以对参考模板图像进行缩放,根据缩放后的参考模板图像进行标注或训练等,此处不做限定。
例如,在模型推理过程中,可以对待测图像进行缩放,对缩放后的待测图像进行推理。或者,也可以使用多尺度模型对待测图像进行推理等,此处不作限定。其中,多尺度模型为根据缩放后的样本图像或缩放后的参考模板图像训练得到的模型。
在图4所示实施例的步骤409之后,第一训练模块302就可以根据接收到的样本图像以及样本标注框的信息,训练旋转物体检测模型,接下来描述旋转物体检测模型的训练阶段。
2.2、对旋转物体检测模型的训练。
请参阅图6,图6为本申请实施例提供的一种图像处理方法的流程示意图,基于图3所示的图像处理装置,本申请实施例提供的一种图像处理方法,其流程包括:
601、第一训练模块302接收来自标注模块301的样本图像和样本标注框的信息。
602、第一训练模块302对样本图像中的待标注物体进行位置回归,得到样本回归位置框。
为了更加清楚地描述旋转物体检测模型的训练过程,接下来将结合图7进行说明,图7为本申请实施例提供的一种旋转物体检测模型的训练方法的示意图。
将样本图像和样本标注框的信息输入初始旋转物体检测模型,通过初始旋转物体检测模型,对样本图像进行样本标注框的位置回归,得到并输出样本回归位置框的信息。
603、第一训练模块302根据样本回归位置框的信息、样本标注框的信息和位置回归损失函数,训练旋转物体检测模型。
第一训练模块302可以根据样本回归位置框的信息、样本标注框的信息和位置回归损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件,得到目标旋转物体检测模型。
具体的,通过位置回归损失函数可以确定样本回归位置框与样本标注框之间的误差值,预设条件可以为该误差值小于某一阈值。如图7所示,该误差值可以是位置回归损失Lreg。值得注意的是,Lreg是位置回归损失的符号,仅是位置回归损失的示例,并不造成对位置回归损失的限定。
除了上述条件,也可以通过位置回归损失函数确定其他信息,并将该信息对应的条件作为预设条件,例如对旋转物体检测模型进行迭代训练的次数达到某一阈值等,此处不做限定。
通过上述训练过程,可以训练目标旋转物体检测模型对图像中的物体标注回归位置框的能力。除了标注回归位置框的能力,在图4所示实施例的方法中包括步骤408的情况下,还可以训练目标旋转物体检测模型对图像中物体进行分类的能力,具体的训练过程如下所示:
604、第一训练模块302接收来自标注模块301的样本类别。
可选的,获取待标注物体的类别之后,标注模块301可以将该类别传输给第一训练模块302。
具体的,步骤604也可以在步骤601、602或步骤603之前执行,此处不作限定。
605、第一训练模块302对待标注物体进行分类,得到预测样本类别。
第一训练模块302可以将样本图像输入初始旋转物体检测模型,通过初始旋转物体检测模型,对样本图像中的待标注物体进行分类,得到并输出预测样本类别。
606、第一训练模块302根据预测样本类别、样本类别和分类损失函数,训练旋转物体检测模型。
第一训练模块302可以根据预测样本类别、样本类别和分类损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件,得到目标旋转物体检测模型。
具体的,通过分类损失函数可以确定预测样本类别与样本类别之间的误差值,预设条件可以为该误差值小于某一阈值。如图7所示,该误差值可以是分类损失Lcls。值得注意的是,Lcls是分类损失的符号,仅是分类损失的示例,并不造成对分类损失的限定。
除了上述条件,也可以通过分类损失函数确定其他的信息,并将该信息所对应的条件作为预设条件,例如对旋转物体检测模型进行迭代训练的次数达到某一阈值等,此处不做限定。
通过上述训练过程,可以训练目标旋转物体检测模型对图像中的物体分类的能力。
值得注意的是,步骤605和606也可以在步骤601、602或603之前执行,只要在步骤604之后执行即可。
值得注意的是,步骤604-606为非必选项,当图4所实施示例中不获取样本类别时, 也可以不训练旋转物体检测模型对图像中物体分类的能力。即,图6所示的方法中也可以不包括步骤604、605和606。
图6所示实施例训练出的目标旋转物体检测模型,可以用于确定图像中物体的回归位置框。其中,回归位置框为根据参考模板图像,匹配参考位置关系确定的。即对于相同类别的待标注物体来说,通过目标旋转物体检测模型确定的回归位置框,都具有统一的标准,该标准即为参考位置关系。因此后续基于回归位置框进行的操作,都是基于参考位置关系这一统一标准进行的。由于参考位置关系是根据参考模板图像中的参考物体确定的,归根结底来说,后续基于回归位置框进行的操作,都是基于参考物体这一统一标准进行的。因此根据回归位置框预测得到的旋转角度,也是基于参考物体这一统一基准确定的,预测的结果更加准确。
在图4所实施示例的步骤410之后,第二训练模块303可以根据接收到的样本图像等信息训练角度度量模型,接下来描述角度度量模型的训练阶段。
2.3、对角度度量模型的训练。
请参阅图8,图8为本申请实施例提供的一种图像处理方法的流程示意图,基于图3所示的图像处理装置,本申请实施例提供的一种图像处理方法,其流程包括:
801、第二训练模块303接收来自标注模块301的样本图像、样本类别、样本旋转角度和样本标注框的信息。
802、第二训练模块303根据样本标注框截取样本图像。
第二训练模块303可以根据样本标注框在样本图像中的位置,截取样本图像中在样本标注框内部的图像,得到截取样本图像。
在本申请实施例中,截取样本图像的过程,也称为抠图。
803、第二训练模块303根据样本旋转角度,确定4个第一旋转角度。
步骤801中获取了样本旋转角度,第二训练模块303就可以根据样本旋转角度,确定4个第一旋转角度。为了更加清楚地描述角度度量模型的训练过程,接下来将结合图9进行示例说明,图9为本申请实施例提供的一种角度度量模型的训练方法的示意图。如图9所示,若样本旋转角度为a度,可以确定第一旋转角度为-a度、-a-90度、-a-180度和-a-270度。
在本申请实施例中,第一旋转角度的数量为n个,n也可以是除4以外的其他整数,例如5个、8个等,只要n大于或等于2即可,此处不作限定。第一旋转角度与样本旋转角之间的关系,可以是具有某一差值,例如本实施例中,该差值即为x与90°的乘积,其中x为0至n-1的任一整数。在本申请实施例中,差值的大小也可以遵循其他规律,例如y与某一角度的乘积,其中y为任一整数,或差值为任意角度大小等,此处不作限定。
值得注意的是,在本申请实施例中,步骤803也可以在步骤802之前执行,只要在步骤801之后执行即可,此处不作限定。
804、第二训练模块303旋转截取样本图像,得到4个旋转样本图像。
第二训练模块303根据4个第一旋转角度,旋转截取样本图像,得到4个旋转样本图像。例如,如图9所示,在第一旋转角度为-a度、-a-90度、-a-180度和-a-270度的情况 下,旋转得到的4个旋转样本图像都具有水平的边框。在本申请实施例中,与前述第一旋转角度相对应,旋转样本图像的数量也为n个,此处不再赘述。
值得注意的是,步骤803中确定n个第一旋转角度的动作,除了由第二训练模块303实现,也可以由其他模块实现,例如可以由标注模块301或其他模块实现,只要第二训练模块303可以获取n个第一旋转角度即可,此处不做限定。
805、第二训练模块303将4个旋转样本图像输入角度训练图库。
获取4个旋转样本图像之后,第二训练模块303可以将4个旋转样本图像输入角度训练图库。此时,角度训练图库中除了包括步骤805输入的4个图像,也可以包括根据其他样本图像旋转得到的旋转样本图像。其他的样本图像中包括的待标注物体,与本实施例中样本图像中包括的待标注物体之间,可以具有相同的类别,也可以具有不同的类别;可以具有相同的方位指向,也可以具有不同的方位指向,此处不做限定。
具体在图9所实施示例中,由于n为4,所示角度训练图库也称为四角度训练图库,当n为其他数值时,图库的名称也可做出相应的改变,此处不做限定。
806、第二训练模块303在角度训练图库中确定同类样本图像对和异类样本图像对。
第二训练模块303可以根据样本的类别,在角度训练图库中确定同类样本图像对和异类样本图像对。同类样本图像对中的物体,具有相同的类别和相同的角度;异类样本图像对中的物体,具有不同的类别或不同的角度。
807、第二训练模块303根据同类样本图像对和异类样本图像对训练初始角度度量模型,得到目标角度度量模型。
第二训练模块303将同类样本图像对和异类样本图像对输入初始角度度量模型,通过初始角度度量模型对同类样本图像对和异类样本图像对进行编码,得到同类样本图像对中图像的编码结果之间的距离D
same,以及异类样本图像对中图像的编码结果之间的距离D
diff,训练的目标是使得D
same小于D
diff。
例如图9所示虚线框中包括多个图形,多个图形表示对多个图像对中图像的编码结果。其中图形的形状表示类别,图形的线条粗细表示角度。同类别同角度图像的编码结果之间的距离,小于非同类别或非同角度图像的编码结果之间的距离,如图中两个粗线圆形之间的距离,就比其他不具备同类别同角度的编码结果之间的距离小。
第二训练模块303根据D
same、D
diff和距离损失函数,对初始角度度量模型进行迭代训练,直至满足预设条件,得到目标角度度量模型。
具体的,第二训练模块303可以通过距离损失函数,确定对同类样本图像对和异类样本图像对的编码结果的误差值,预设条件可以为该误差值小于某一阈值。例如,距离损失函数可以是L=max(0,D
same-D
diff+margin),其中D
same表示相同类别相同角度图像的编码结果之间的距离,D
diff表示不同类别或不同角度图像的编码结果之间的距离,margin表示距离间隔,L为误差值。在本申请实施例中,编码结果也称为编码得到的特征,或称为特征。
除了上述条件,也可以通过距离损失函数确定其他的信息,并将该信息所对应的条件作为预设条件,例如,预设条件也可以是对角度度量模型进行迭代训练的次数达到某一阈 值等,此处不做限定。
角度度量模型训练完成后,对相同类别相同角度的图像编码得到的特征距离近,对不同角度或不同类别图像编码得到的特征距离远。因此,可以通过对比两个特征之间的距离确定它们是否属于同类别同角度,即属于同类图像对。
通过上述训练过程,可以训练目标角度度量模型确定同类图像对的能力。同类图像对中的物体,具有相同的类别和角度。目标角度度量模型对图像的处理不依赖于类别,其泛化能力比传统的分类模型更强。
经过图4、图6以及图8所示实施例训练出的目标旋转物体检测模型和目标角度度量模型,具体还用于对图像中的物体进行位置和姿态的推理,接下来就对实现该推理功能的图像处理装置进行描述。
3、模型推理阶段。
接下来描述模型推理过程所要用到的装置,请参阅图10,图10为本申请实施例提供的一种图像处理装置的结构示意图,将图4、图6以及图8所示流程训练得到的目标旋转物体检测模型和目标角度度量模型,应用于该装置中。本申请实施例提供的图像处理装置1000包括旋转物体检测模块1001和角度度量模块1002。
旋转物体检测模块1001用于,通过目标旋转物体检测模型,确定待测图像中待测物体的回归位置框,并将回归位置框传输给角度度量模块1002。
角度度量模块1002用于,根据来自旋转物体检测模块1001的回归位置框,截取待测图像,得到截取图像;以及根据回归位置框确定m个第二旋转角度,并根据m个第二旋转角度旋转截取图像,得到m个旋转图像。然后确定m个旋转图像中的目标图像。其中,目标图像与参考模板图像为同类图像对。根据目标图像对应的第二旋转角度,确定待测物体相对于参考物体的预测旋转角度。其中,m为大于或等于2的整数。
可选的,旋转物体检测模块1001还用于,通过目标旋转物体检测模型,确定待测图像中待测物体的类别,并将待测物体的类别传输给角度度量模块1002。角度度量模块1002还用于,根据来自旋转物体检测模块1001的待测物体的类别,确定与待测物体同类别的参考模板图像,并根据参考模板图像确定同类图像对。
接下来详细描述图像处理装置1000对待测图像的处理流程,即对待测图像中待测物体位姿的推理过程。
请参阅图11,图11为本申请实施例提供的一种图像处理方法的流程示意图,基于图10所示的图像处理装置,本申请实施例提供的一种图像处理方法,其流程包括:
1101、旋转物体检测模块1001获取待测图像。
1102、旋转物体检测模块1001对待测图像中的待测物体进行位置回归,得到回归位置框。
旋转物体检测模块1001可以通过目标旋转物体检测模型,对待测图像进行位置回归,得到回归位置框,回归位置框表示待测物体在待测图像中的位置。
1103、旋转物体检测模块1001将待测图像和回归位置框的信息传输给角度度量模块1002。
1104、角度度量模块1002根据回归位置框的信息确定边框旋转角度。
角度度量模块1002可以确定回归位置框的边与水平边的夹角的角度。在本申请实施例中,该角度称为边框旋转角度。边框旋转角度为回归位置框的任一条边,与水平边或与竖直边之间的夹角的角度。用于确定边框旋转角的回归位置框的边,可以是随机选取的,也可以通过其他方式确定,例如选取回归位置框中较长的边等,此处不做限定。
在本申请实施例中,水平边指水平方向的边,竖直边指竖直方向的边。水平边与竖直边都属于基准边,基准边用于衡量边框旋转角。除了水平或竖直,基准边还可以有其他的方位特征,例如与水平边呈一定夹角等,此处不做限定。
1105、角度度量模块1002根据边框旋转角度确定4个第二旋转角度。
角度度量模块1002可以根据边框旋转角度,确定4个第二旋转角度。
为了更清楚地说明模型推理,接下来将结合图12进行说明,图12为本申请实施例提供的一种模型推理过程的方法示意图。如图12所示,若边框旋转角度为a度,则可以确定第二旋转角度为-a度、-a-90度、-a-180度和-a-270度。
在本申请实施例中,第二旋转角度的数量为m,m也可以是除4以外的其他大于或等于2的整数,例如5个、8个等,此处不作限定。第二旋转角度与边框旋转角之间的关系,可以是具有某一差值,例如本实施例中,该差值即为x与90°的乘积,其中x为0至3的任一整数。在本申请实施例中,差值的大小也可以遵循其他规律,例如y与某一角度的乘积,其中y为任一整数,或差值为任意角度大小等,此处不作限定。
值得注意的是,步骤1104中确定边框旋转角度的动作,或步骤1105中确定m个第二旋转角度的动作,除了由角度度量模块1002实现,也可以由其他模块实现,例如由旋转物体检测模块1001或其他模块实现,只要角度度量模块1003可以获取m个第二旋转角度即可,此处不做限定。
1106、角度度量模块1002根据回归位置框截取待测图像,得到截取图像。
获取待测图像和回归位置框的信息之后,角度度量模块1002可以根据回归位置框在待测图像中的位置,截取待测图像中在回归位置框内部的图像,得到截取图像。
值得注意的是,在本申请实施例中,步骤1106也可以在步骤1104或步骤1105之前执行,只要在步骤1103之后执行即可,此处不作限定。
1107、角度度量模块1002根据4个第二旋转角度旋转截取图像,得到4个旋转图像。
角度度量模块1002根据4个第二旋转角度,旋转截取图像,得到4个旋转图像。
如图12所示,在第二旋转角度为-a度、-a-90度、-a-180度和-a-270度的情况下,旋转得到的4个旋转图像都具有水平的边框。在本申请实施例中,与前述第二旋转角度相对应,旋转图像的数量也为m个,此处不再赘述。
1108、角度度量模块1002通过目标角度度量模型,确定4个旋转图像中的目标图像。
得到4个旋转图像后,角度度量模块1002可以通过4张旋转图像中的每一张,与模板图像库中的图像,构建图像对。具体的,图像对中的一张图像为4张旋转图像中的任一张;图像对中的另一张图像,为模板图像库中的图像。模板图像库中包括多张参考模板图像,该多张参考模板中的物体,可以具有不同的类别。
由于图6所示实施例中,已经训练了目标角度度量模型确定同类图像对的能力。因此,可以通过目标角度度量模型,在构建出的图像对中确定同类图像对。由于同类图像对中两张图像所包括的两个物体,具有相同的类别和角度,因此就能在4个旋转图像中,确定出与参考模板图像同类别同角度的图像。在本申请实施例中,两张图像同类别同角度,表示两张图像中所包括的两个物体具有相同的类别与角度。与参考模板图像同类别同角度的图像,称为目标图像。
1109、角度度量模块1002根据目标图像确定预测旋转角度。
角度度量模块1002确定出4张旋转图像中的目标图像后,就可以确定目标图像是根据多大的角度旋转截取图像所得的,即,可以确定目标图像所对应的第二旋转角度,该旋转角度即为预测旋转角度。
可选的,在图4所实施示例包括步骤408、图6所实施示例包括步骤604至步骤606的情况下,即训练了旋转物体检测模型对图像中物体分类的能力的情况下,图11所示实施例还可以通过待测物体的类别确定预测旋转角度。具体包括以下步骤:
1110、旋转物体检测模块1001对待测图像中的待测物体进行类别预测,得到预测类别。
此时的目标旋转物体检测模型具备对图像中物体进行分类的能力。旋转物体检测模型1001可以通过目标旋转物体检测模型,对待测图像中的待测物体进行分类,确定待测物体的类别。在本申请实施例中,待测物体的类别也称为预测类别。
1111、旋转物体检测模块1001将预测类别传输给角度度量模块1002。
1112、角度度量模块1002根据预测类别确定参考模板图像。
角度度量模块1002可以根据预测类别,从模板图像库中,确定包括与待测物体同类别的参考物体的参考模板图像。
步骤1110至步骤1112,可以在步骤1102至步骤1107中的任一步之前执行,只要在步骤1101之后执行即可,此处不作限定。
在存在步骤1110至步骤1112的情况下,步骤1108中对目标图像的确定,可以通过以下方式:
分别通过4张旋转图像,与步骤1112中确定出的参考模板图像构建图像对。
通过目标角度度量模型,在构建出的图像对中,确定同类图像对。
在同类图像对中确定目标图像。
根据待测物体的类别确定参考模板图像,可以从模板图像库中的多个参考模板图像中确定出要用到的一张,可以大大减少角度度量模块1002构建图像对的数量,从而减少对运算资源和存储资源的消耗。减少了图像对的数量,也能大大减少目标角度度量模块确定同类图像对所需的时间和算力等消耗,提升效率。
值得注意的是,步骤1110至步骤1112在本申请实施例中并不是必须的,当不存在步骤1110至步骤1112时,步骤1108也可实现对目标图像的确定。
接下来对本申请实施例提供的推理图像中物体姿态的另一种方法进行描述。
二、根据关键点推理图像中物体的姿态。
本申请实施例提供了一种图像处理方法和图像处理装置,用于标注样本图像中待标注物体的样本关键点,样本关键点用于训练关键点检测模型,以通过训练得到的目标关键点检测模型,实现对图像中关键点的确定,并根据确定出的关键点确定图像中物体的旋转角度。
1、模型训练阶段。
接下来描述模型训练过程所要用到的装置,请参阅图13,图13为本申请实施例提供的一种图像处理装置的结构示意图。本申请实施例提供的图像处理装置1300包括关键点标注模块1301和第三训练模块1302。
关键点标注模块1301用于,向用户提供参考模板图像和样本图像,参考模板图像中包括参考物体,样本图像中包括待标注物体。参考模板图像中标注有参考物体的参考关键点,参考关键点的数量大于或等于2。
在本申请实施例中,图像中包括物体,其含义为图像中包括物体的影像,例如参考模板图像中包括参考物体,即表示参考模板图像中包括参考物体的影像。同理,图像中的物体,表示图像中所包含的,物体的影像。
关键点标注模块1301还用于,接收用户基于参考模板图像和参考关键点,以及对样本图像中的待标注物体标注的样本关键点的信息。并依据样本关键点的信息和参考关键点的信息,确定样本旋转角度。样本旋转角度为样本图像中的待标注物体相对于参考物体的旋转角度。
关键点标注模块1301还用于,接收用户对样本图像中待标注物体类别标注的样本关键点类别。并将上述样本图像、样本关键点的信息和样本关键点类别,传输给第三训练模块1302,以训练初始关键点检测模型,得到目标关键点检测模型。
在本申请实施例中,样本关键点类别表示包括该关键点的样本图像中,样本关键点所对应的样本物体的类别。
第三训练模块1302,用于根据样本关键点的信息和样本图像,训练初始关键点检测模型,得到目标关键点检测模型。因此,在本申请实施例中,第三训练模块1302也可称为关键点检测训练模块。
接下来详细描述图像处理装置1300对样本图像的处理流程,即对关键点检测模型的训练过程。
请参阅图14,图14为本申请实施例提供的一种图像处理方法的流程示意图,基于图13所示的图像处理装置,本申请实施例提供的一种图像处理方法,其流程包括:
1401、关键点标注模块1301获取样本图像和参考模板图像。
1402、关键点标注模块1301获取参考关键点的信息。
1403、关键点标注模块1301向用户提供样本图像和标注有参考关键点的参考模板图像。
1404、关键点标注模块1301接收用户标注的样本关键点的信息。
步骤1401至步骤1404参见图4所实施例的步骤401至步骤404,其区别在于动作的 执行主体由图4所实施示例的标注模块301,变为图14所实施例的关键点标注模块1301,此处不再赘述。
1405、关键点标注模块1301将样本图像和样本关键点的信息传输给第三训练模块1302。
1406、第三训练模块1302对样本图像中的待标注物体进行关键点检测,得到预测样本关键点。
第三训练模块1302获取样本图像和样本关键点的信息后,可以通过初始关键点检测模型,对样本图像中样本物体的点进行位置回归,得到回归样本关键点。
1407、关键点标注模块1301获取样本关键点类别。
关键点标注模块1301还可以获取样本图像中样本关键点的类别。在本申请实施例中,样本关键点的类别也称为样本关键点类别,表示包括该样本关键点的样本图像中,与样本关键点所对应的样本物体的类别。
具体的,样本关键点类别可以是用户对样本图像中的样本关键点标注所得,样本关键点类别也可以通过其他途径获取,例如在步骤1401中获取样本图像时从样本图像库中获取等,此处不做限定。
1408、关键点标注模块1301将样本关键点类别传输给第三训练模块1302。
值得注意的是,步骤1407和1408可以在步骤1402至1406中的任一步之前执行,只要在步骤1401之后执行即可,此处不作限定。
1409、第三训练模块1302对样本关键点进行分类,得到预测样本关键点类别。
第三训练模块1302将样本图像和样本关键点类别输入初始关键点检测模型,通过初始关键点检测模型,对样本图像中的样本关键点进行类别预测,得到并输出预测样本关键点类别。
值得注意的是,步骤1409可以在步骤1405至1408中的任一步之前执行,只要在步骤1404之后执行即可,此处不作限定。
1410、第三训练模块1302训练初始关键点检测模型,得到目标关键点检测模型。
第三训练模块1302可以根据回归样本关键点的信息,样本关键点的信息,以及关键点位置回归损失函数,对初始关键点检测模型进行迭代训练。
第三训练模块1302还可以根据预测样本关键点类别、样本关键点类别和关键点分类损失函数,对初始关键点检测模型进行迭代训练。
当上述迭代训练的结果满足预设条件,就可以得到目标关键点检测模型。目标关键点检测模型可以准确地确定图像中物体的关键点,并确定该关键点所对应的物体的类别。
经过图14所示实施例训练出的目标关键点检测模型,具体用于对图像中的物体进行姿态的推理,接下来就对实现该推理功能的图像处理装置进行描述。
2、模型推理阶段。
接下来描述模型推理过程所要用到的装置,请参阅图15,图15为本申请实施例提供的一种图像处理装置的结构示意图,将图14所示流程训练得到的目标关键点检测模型,应用于该装置中。本申请实施例提供的图像处理装置1500包括关键点检测模块1501和角 度计算模块1502。
关键点检测模块1501用于,通过目标关键点检测模型,确定待测图像中待测物体的预测关键点,并将预测关键点的信息传输给角度计算模块1502。
角度计算模块1502用于,根据来自关键点检测模块1501的预测关键点的信息,确定待测物体相对于参考物体的旋转角度。在本申请实施例中,该角度也称为预测旋转角度。
关键点检测模块1501还用于,通过目标关键点检测模型,确定预测关键点类别,并将预测关键点类别传输给角度计算模块1502。
角度计算模块1502还用于,根据来自关键点检测模块1501的预测关键点类别,确定与待测物体同类别的参考模板图像,并根据参考模板图像确定预测旋转角度。在本申请实施例中,预测关键点类别表示包括预测关键点的待测图像中,与预测关键点所对应的待测物体的类别。
接下来描述模型推理过程的方法,请参阅图16,图16为本申请实施例提供的一种图像处理方法的流程示意图,基于图15所示的图像处理装置,本申请实施例提供的一种图像处理方法,其流程包括:
1601、关键点检测模块1501获取待测图像。
1602、关键点检测模块1501确定待测图像中的预测关键点。
关键点检测模块1501可以通过目标关键点检测模型,对待测图像中待测物体的点进行位置回归,得到预测关键点。
1603、关键点检测模块1501将预测关键点的信息传输给角度计算模块1502。
1604、关键点检测模块1501确定预测关键点类别。
此时的目标关键点检测模型具备对关键点进行分类的能力。步骤1602确定了预测关键点,此时关键点检测模型1501就可以通过目标关键点检测模型,确定预测关键点类别。预测关键点类别表示待测图像中待测物体的类别。
1605、关键点检测模块1501将预测关键点类别传输给角度计算模块1502。
步骤1604和步骤1605,可以与步骤1602同时进行,也可以在步骤1603之前或之后实施,此处不作限定。
1606、角度计算模块1502确定预测旋转角度。
步骤1604中获取了预测关键点类别,角度计算模块1502可以根据预测关键点类别,确定具有该类别的参考关键点,或者说,根据预测关键点类别,确定具有该类别的参考物体或包含该参考物体的参考模板图像。
角度计算模块1502确定预测关键线与参考关键线之间的夹角角度。此处的参考关键线,为根据预测关键点确定出的参考关键点的连线。预测关键点与参考关键点具有相同的类别,表示待测物体与参考物体具有相同的类别。
将预测关键线的方向,与参考关键线方向进行对比,得到预测关键线相对于参考关键线的夹角角度,该夹角角度即体现了预测图像中的待测物体,相对于参考物体的旋转角度。在本申请实施例中,该角度也称为预测旋转角度。
本申请实施例中,通过确定预测关键点确定待测图像中待测物体相对于参考物体的预 测旋转角度,不用确定待测物体的位置框,确定预测旋转角度的过程更加简洁,实现该过程的装置结构更加简单,同时还能节省确定位置框以及对位置框进行操作所需的运算和存储等资源。
在本申请实施例中,还可以通过二维的图像,实现对三维物体的姿态预测。
二维图像反映三维物体在二维平面上的投影,也就可以真实反映三维物体的姿态。在本申请实施例中,可以通过二维图像中关键点的信息,确定特征形状。根据与关键点对应的参考关键点的信息,确定参考形状。并根据特征形状与参考形状之间的形状差异,确定二维图像所呈现的三维物体,相对于参考物体的旋转角度,从而就能确定二维图像所呈现的三维物体的姿态。其中,参考关键点为参考物体在二维平面上的投影中,与关键点对应的点。
可选的,可以训练模型根据对应的二维图像之间的形状差异,确定二维图像所对应的三维物体之间的旋转角度的能力,从而通过训练得到的目标模型,实现通过二维图像确定三维旋转角度的功能。
特征形状与参考形状是对应的,特征形状由n个样本关键点或n个预测关键点组成,则参考形状也由n个参考关键点组成,其中,n为大于或等于2的整数。
下面对本申请实施例中的样本图像处理装置进行描述,请参阅图17,图17为本申请实施例提供的一种图像处理装置的结构示意图,样本图像处理装置的一种结构包括:交互单元1701和处理单元1702。
交互单元1701,用于向用户提供样本图像和参考模板图像,样本图像中包括待标注物体,参考模板图像中包括与待标注物体对应的参考物体,参考物体相对于参考模板图像的旋转角度为零,参考模板图像中标注有参考物体的参考关键点,参考关键点的数量大于或等于2。
交互单元1701还用于,接收样本关键点的信息,样本关键点的信息为用户基于参考模板图像,对样本图像中的待标注物体进行标注所得。
处理单元1702,用于根据样本图像中待标注物体的样本关键点的信息和参考模板图像中参考物体的参考关键点的信息,确定样本旋转角度,样本旋转角度为待标注物体相对于参考物体的旋转角度。
可选的,在一种实现方式中,参考关键点和样本关键点的数量均为2,处理单元1702具体用于:根据参考关键线与样本关键线之间的夹角,确定样本旋转角度,参考关键线为2个参考关键点的连线,样本关键线为2个样本关键点的连线。
可选的,在一种实现方式中,图像处理装置还包括获取单元1703,获取单元1703用于:获取参考模板图像的参考标注框,参考标注框表示参考物体在参考模板图像中的位置。
处理单元1702还用于:根据参考标注框,确定样本标注框,样本标注框表示待标注物体在样本图像中的位置。
可选的,在一种实现方式中,处理单元1702具体用于:根据参考标注框、参考关键点的信息和样本关键点的信息确定样本标注框,参考关键点与参考标注框之间具有参考位 置关系,样本关键点与样本标注框之间具有参考位置关系。
可选的,在一种实现方式中,处理单元1702还用于:将样本图像和样本标注框的信息输入初始旋转物体检测模型,以通过初始旋转物体检测模型对待标注物体进行位置回归,得到样本回归位置框的信息。根据样本回归位置框的信息、样本标注框的信息和位置回归损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件,得到目标旋转物体检测模型,目标物体检测模型用于确定待测图像中待测物体的位置。
可选的,在一种实现方式中,交互单元1701还用于:接收样本类别,样本类别为用户对样本图像中待标注物体标注的类别。
处理单元1702还用于:将样本类别输入初始旋转物体检测模型,以通过初始旋转物体检测模型对待标注物体进行分类,得到预测样本类别。根据预测样本类别、样本类别和分类损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件,得到目标旋转物体检测模型。
可选的,在一种实现方式中,预测类别包括待测物体的正面信息和反面信息中的至少一项。
可选的,在一种实现方式中,处理单元1702还用于:根据样本标注框截取样本图像,得到截取样本图像。根据n个第一旋转角度旋转截取样本图像,得到n个旋转样本图像,n个第一旋转角度为根据样本旋转角度获取,n个第一旋转角度与n个旋转样本图像一一对应,n为大于或等于2的整数。将n个旋转样本图像输入角度训练图库。在角度训练图库中确定同类样本图像对和异类样本图像对,同类样本图像对中的物体具有相同的角度和类别,异类样本图像对中的物体具有不同的角度或类别。根据同类样本图像对和异类样本图像对训练初始角度度量模型,得到目标角度度量模型。
可选的,在一种实现方式中,交互单元1701还用于:接收样本类别,样本类别为用户对样本图像中待标注物体标注的类别。
处理单元1702具体用于:根据样本类别,在角度训练图库中确定同类样本图像对和异类样本图像对。
可选的,在一种实现方式中,处理单元1702还用于:将待测图像输入目标旋转物体检测模型,以通过目标旋转物体检测模型对待测图像中的待测物体进行位置回归,得到待测物体的回归位置框,回归位置框表示待测物体在待测图像中的位置,回归位置框用于确定预测旋转角度。
可选的,在一种实现方式中,处理单元1702还用于:将待测图像输入目标旋转物体检测模型,以通过目标旋转物体检测模型对待测图像中的待测物体进行位置回归,得到待测物体的回归位置框,回归位置框表示待测物体在待测图像中的位置。根据回归位置框截取待测图像,得到截取图像。根据回归位置框确定m个第二旋转角度,m为大于或等于2的整数。根据m个第二旋转角度旋转截取图像,得到m个旋转图像,m个第二旋转角度与m个旋转图像一一对应。通过目标角度度量模型,确定m个旋转图像中的目标图像,目标图像中的物体,与参考模板图像中的参考物体,具有相同的类别和角度。在n个第二旋转角中确定与目标图像对应的预测旋转角。
可选的,在一种实现方式中,处理单元1702具体用于:根据回归位置框确定边框旋转角度,边框旋转角度为回归位置框相对于水平框的旋转角度,水平框具有水平边,边框旋转角度大于或等于0°且小于或等于90°。根据边框旋转角度确定m个第二旋转角度。
可选的,在一种实现方式中,处理单元1702具体用于:通过m个旋转图像中的每一个图像,和模板图像库中的图像,构建图像对。通过目标角度度量模型,在图像对中确定同类图像对,同类图像对中的物体具有相同的角度和类别。确定同类图像对中的目标图像,目标图像包含于m个旋转图像。
可选的,在一种实现方式中,交互单元1701还用于:接收样本类别,样本类别为用户对样本图像中待标注物体标注的类别;
处理单元1702还用于:将样本类别输入初始旋转物体检测模型,以通过初始旋转物体检测模型对待标注物体进行分类,得到预测样本类别。根据预测样本类别、样本类别和分类损失函数,对初始旋转物体检测模型进行迭代训练,直至满足预设条件,得到目标旋转物体检测模型。通过目标旋转物体检测模型对待测物体进行类别预测,得到预测类别。根据预测类别确定参考模板图像,参考模板图像中的参考物体具有预测类别。通过n个旋转图像中的每一个图像,和参考模板图像,构建图像对,参考模板图像包含于参考模板库中的图像。
图17所示样本图像处理装置用于执行前述图4至图12所示实施例中的方法,处理单元1702用于执行前述实施例所示方法中的标注模块301、第一训练模块302、第二训练模块303、旋转物体检测模块1001、角度度量模块1002所执行的不需要与用户交互的动作;交互单元1701用于执行前述实施例所示方法中标注模块301所执行的需要与用户交互的动作;获取单元1703用于执行前述实施例所示方法中获取参考标注框或参考关键点等与参考模板图像相关的信息的动作。获取单元1703可以独立于交互单元1701存在,也可以是交互单元1701的一部分,此处不作限定。
图18是本申请实施例提供的一种图像处理装置的结构示意图,该图像处理装置1800可以包括一个或一个以上中央处理器(central processing units,CPU)1801和存储器1805,该存储器1805中存储有一个或一个以上的应用程序或数据。
其中,存储器1805可以是易失性存储或持久存储。存储在存储器1805的程序可以包括一个或一个以上模块,每个模块可以包括对图像处理装置中的一系列指令操作。更进一步地,中央处理器1801可以设置为与存储器1805通信,在图像处理装置1800上执行存储器1805中的一系列指令操作。
图像处理装置1800还可以包括一个或一个以上通信接口1803,和/或,一个或一个以上操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等。
可选的,图像处理装置1800还可以包括一个或一个以上电源1802。
该图像处理装置1800可以执行前述图4至图12所示实施例中图像处理装置所执行的操作,具体此处不再赘述。
本申请实施例中还提供一种计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图4至图12所示实施例描述的方法中图像处理装置所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图4至图12所示实施例描述的方法中图像处理装置所执行的步骤。
本申请实施例提供的图像处理装置具体可以为芯片,芯片包括:处理单元和通信单元,处理单元例如可以是处理器,通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使训练设备内的芯片执行上述图4至图12所示实施例描述的方法中图像处理装置所执行的步骤。可选地,存储单元可以为芯片内的存储单元,如寄存器、缓存等,存储单元还可以是无线接入设备端内的位于芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
Claims (23)
- 一种图像处理方法,其特征在于,所述方法包括:向用户提供样本图像和参考模板图像,其中,所述样本图像中包括待标注物体,所述参考模板图像中包括与所述待标注物体对应的参考物体,所述参考物体相对于所述参考模板图像的旋转角度为零,所述参考模板图像中标注有所述参考物体的参考关键点,所述参考关键点的数量大于或等于2;接收样本关键点的信息,所述样本关键点的信息为所述用户基于所述参考模板图像,对所述样本图像中的所述待标注物体进行标注所得;根据所述样本关键点的信息和所述参考关键点的信息,确定样本旋转角度,所述样本旋转角度为所述待标注物体相对于所述参考物体的旋转角度。
- 根据权利要求1所述的方法,其特征在于,所述参考关键点和所述样本关键点的数量均为2,所述确定样本旋转角度,包括:根据参考关键线与样本关键线之间的夹角,确定所述样本旋转角度,所述参考关键线为2个参考关键点的连线,所述样本关键线为2个样本关键点的连线。
- 根据权利要求1或2所述的方法,其特征在于,在所述接收样本关键点的信息之后,所述方法还包括:获取所述参考模板图像的参考标注框,所述参考标注框表示所述参考物体在所述参考模板图像中的位置;根据所述参考标注框,确定样本标注框,所述样本标注框表示所述待标注物体在所述样本图像中的位置。
- 根据权利要求3所述的方法,其特征在于,所述根据所述参考标注框,确定样本标注框,包括:根据所述参考标注框、所述参考关键点的信息和所述样本关键点的信息确定所述样本标注框,所述参考关键点与所述参考标注框之间具有参考位置关系,所述样本关键点与所述样本标注框之间具有所述参考位置关系。
- 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:将所述样本图像和所述样本标注框的信息输入初始旋转物体检测模型,以通过所述初始旋转物体检测模型对所述待标注物体进行位置回归,得到样本回归位置框的信息;根据所述样本回归位置框的信息、所述样本标注框的信息和位置回归损失函数,训练所述初始旋转物体检测模型,得到目标旋转物体检测模型,所述目标旋转物体检测模型用于确定待测图像中待测物体的位置。
- 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:根据所述样本标注框截取所述样本图像,得到截取样本图像;根据n个第一旋转角度旋转所述截取样本图像,得到n个旋转样本图像,所述n个第一旋转角度为根据所述样本旋转角度获取,所述n个第一旋转角度与所述n个旋转样本图像一一对应,所述n为大于或等于2的整数;将所述n个旋转样本图像输入角度训练图库;在所述角度训练图库中确定同类样本图像对和异类样本图像对,所述同类样本图像对中的物体具有相同的角度和类别,所述异类样本图像对中的物体具有不同的角度或类别;根据所述同类样本图像对和所述异类样本图像对训练初始角度度量模型,得到目标角度度量模型。
- 根据权利要求6所述的方法,其特征在于,所述方法还包括:接收样本类别,所述样本类别为所述用户对所述样本图像中所述待标注物体标注的类别;所述在所述角度训练图库中确定同类样本图像对和异类样本图像对,包括:根据所述样本类别,在所述角度训练图库中确定所述同类样本图像对和所述异类样本图像对。
- 根据权利要求5所述的方法,其特征在于,所述方法还包括:将所述待测图像输入所述目标旋转物体检测模型,以通过所述目标旋转物体检测模型对所述待测图像中的待测物体进行位置回归,得到所述待测物体的回归位置框,所述回归位置框表示所述待测物体在所述待测图像中的位置,所述回归位置框用于确定所述预测旋转角度。
- 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:将所述待测图像输入所述目标旋转物体检测模型,以通过所述目标旋转物体检测模型对所述待测图像中的所述待测物体进行位置回归,得到所述待测物体的所述回归位置框,所述回归位置框表示所述待测物体在所述待测图像中的位置;根据所述回归位置框截取所述待测图像,得到截取图像;根据所述回归位置框确定m个第二旋转角度,所述m为大于或等于2的整数;根据所述m个第二旋转角度旋转所述截取图像,得到m个旋转图像,所述m个第二旋转角度与所述m个旋转图像一一对应;通过所述目标角度度量模型,确定所述m个旋转图像中的目标图像,所述目标图像中的物体,与所述参考模板图像中的参考物体,具有相同的类别和角度;在所述m个第二旋转角中确定与所述目标图像对应的所述预测旋转角。
- 根据权利要求9所述的方法,其特征在于,所述根据所述回归位置框确定m个第二旋转角度,包括:根据所述回归位置框确定边框旋转角度,所述边框旋转角度为所述回归位置框相对于水平框的旋转角度,所述水平框具有水平边,所述边框旋转角度大于或等于0°且小于或等于90°;根据所述边框旋转角度确定所述m个第二旋转角度。
- 一种图像处理装置,其特征在于,所述装置包括:交互单元,用于:向用户提供样本图像和参考模板图像,所述样本图像中包括待标注物体,所述参考模板图像中包括与所述待标注物体对应的参考物体,所述参考物体相对于所述参考模板图像的旋转角度为零,所述参考模板图像中标注有所述参考物体的参考关键点,所述参考关键 点的数量大于或等于2;接收样本关键点的信息,所述样本关键点的信息为所述用户基于所述参考模板图像,对所述样本图像中的所述待标注物体进行标注所得;处理单元,用于根据所述样本关键点的信息和所述参考关键点的信息,确定样本旋转角度,所述样本旋转角度为所述待标注物体相对于所述参考物体的旋转角度。
- 根据权利要求11所述的装置,其特征在于,所述参考关键点和所述样本关键点的数量均为2,所述处理单元具体用于:根据参考关键线与样本关键线之间的夹角,确定所述样本旋转角度,所述参考关键线为2个参考关键点的连线,所述样本关键线为2个样本关键点的连线。
- 根据权利要求11或12所述的装置,其特征在于,所述装置还包括获取单元,所述获取单元用于:获取所述参考模板图像的参考标注框,所述参考标注框表示所述参考物体在所述参考模板图像中的位置;所述处理单元还用于:根据所述参考标注框,确定样本标注框,所述样本标注框表示所述待标注物体在所述样本图像中的位置。
- 根据权利要求13所述的装置,其特征在于,所述处理单元具体用于:根据所述参考标注框、所述参考关键点的信息和所述样本关键点的信息确定所述样本标注框,所述参考关键点与所述参考标注框之间具有参考位置关系,所述样本关键点与所述样本标注框之间具有所述参考位置关系。
- 根据权利要求13或14所述的装置,其特征在于,所述处理单元还用于:将所述样本图像和所述样本标注框的信息输入初始旋转物体检测模型,以通过所述初始旋转物体检测模型对所述待标注物体进行位置回归,得到样本回归位置框的信息;根据所述样本回归位置框的信息、所述样本标注框的信息和位置回归损失函数,训练所述初始旋转物体检测模型,得到目标旋转物体检测模型,所述目标旋转物体检测模型用于确定待测图像中待测物体的位置。
- 根据权利要求13或14所述的装置,其特征在于,所述处理单元还用于:根据所述样本标注框截取所述样本图像,得到截取样本图像;根据n个第一旋转角度旋转所述截取样本图像,得到n个旋转样本图像,所述n个第一旋转角度为根据所述样本旋转角度获取,所述n个第一旋转角度与所述n个旋转样本图像一一对应,所述n为大于或等于2的整数;将所述n个旋转样本图像输入角度训练图库;在所述角度训练图库中确定同类样本图像对和异类样本图像对,所述同类样本图像对中的物体具有相同的角度和类别,所述异类样本图像对中的物体具有不同的角度或类别;根据所述同类样本图像对和所述异类样本图像对训练初始角度度量模型,得到目标角度度量模型。
- 根据权利要求16所述的装置,其特征在于,所述交互单元还用于:接收样本类别,所述样本类别为所述用户对所述样本图像中所述待标注物体标注的类别;所述处理单元具体用于:根据所述样本类别,在所述角度训练图库中确定所述同类样 本图像对和所述异类样本图像对。
- 根据权利要求15所述的装置,其特征在于,所述处理单元还用于:将所述待测图像输入所述目标旋转物体检测模型,以通过所述目标旋转物体检测模型对所述待测图像中的待测物体进行位置回归,得到所述待测物体的回归位置框,所述回归位置框表示所述待测物体在所述待测图像中的位置,所述回归位置框用于确定所述预测旋转角度。
- 根据权利要求16或17所述的装置,其特征在于,所述处理单元还用于:将所述待测图像输入所述目标旋转物体检测模型,以通过所述目标旋转物体检测模型对所述待测图像中的所述待测物体进行位置回归,得到所述待测物体的所述回归位置框,所述回归位置框表示所述待测物体在所述待测图像中的位置;根据所述回归位置框截取所述待测图像,得到截取图像;根据所述回归位置框确定m个第二旋转角度,所述m为大于或等于2的整数;根据所述m个第二旋转角度旋转所述截取图像,得到m个旋转图像,所述m个第二旋转角度与所述m个旋转图像一一对应;通过所述目标角度度量模型,确定所述m个旋转图像中的目标图像,所述目标图像中的物体,与所述参考模板图像中的参考物体,具有相同的类别和角度;在所述n个第二旋转角中确定与所述目标图像对应的所述预测旋转角。
- 根据权利要求19所述的装置,其特征在于,所述处理单元具体用于:根据所述回归位置框确定边框旋转角度,所述边框旋转角度为所述回归位置框相对于水平框的旋转角度,所述水平框具有水平边,所述边框旋转角度大于或等于0°且小于或等于90°;根据所述边框旋转角度确定所述m个第二旋转角度。
- 一种计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1至10中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1至10中任一项所述的方法。
- 一种图像处理装置,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合;所述存储器,用于存储程序;所述处理器,用于执行所述存储器中的程序,使得所述处理器执行如权利要求1至10中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011598644.7A CN114757250A (zh) | 2020-12-29 | 2020-12-29 | 一种图像处理方法以及相关设备 |
CN202011598644.7 | 2020-12-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022142783A1 true WO2022142783A1 (zh) | 2022-07-07 |
Family
ID=82259030
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/130651 WO2022142783A1 (zh) | 2020-12-29 | 2021-11-15 | 一种图像处理方法以及相关设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114757250A (zh) |
WO (1) | WO2022142783A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116188919A (zh) * | 2023-04-25 | 2023-05-30 | 之江实验室 | 一种测试方法、装置、可读存储介质及电子设备 |
CN116468998A (zh) * | 2022-09-09 | 2023-07-21 | 国网湖北省电力有限公司超高压公司 | 一种基于视觉特征的输电线路小部件及挂点部件检测方法 |
CN116664988A (zh) * | 2023-07-24 | 2023-08-29 | 广立微(上海)技术有限公司 | 图片自动标注方法、装置、计算机设备和存储介质 |
WO2024011873A1 (zh) * | 2022-07-12 | 2024-01-18 | 青岛云天励飞科技有限公司 | 目标检测方法、装置、电子设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170228890A1 (en) * | 2016-02-06 | 2017-08-10 | Huawei Technologies Co., Ltd. | Object detection method and computer device |
CN108876934A (zh) * | 2017-12-20 | 2018-11-23 | 北京旷视科技有限公司 | 关键点标注方法、装置和系统及存储介质 |
CN110147701A (zh) * | 2018-06-27 | 2019-08-20 | 腾讯科技(深圳)有限公司 | 关键点标注方法、装置、计算机设备及存储介质 |
CN111079752A (zh) * | 2019-12-19 | 2020-04-28 | 国网重庆市电力公司电力科学研究院 | 识别红外图像中的断路器的方法、装置及可读存储介质 |
CN111241947A (zh) * | 2019-12-31 | 2020-06-05 | 深圳奇迹智慧网络有限公司 | 目标检测模型的训练方法、装置、存储介质和计算机设备 |
CN111695628A (zh) * | 2020-06-11 | 2020-09-22 | 北京百度网讯科技有限公司 | 关键点标注方法、装置、电子设备及存储介质 |
-
2020
- 2020-12-29 CN CN202011598644.7A patent/CN114757250A/zh active Pending
-
2021
- 2021-11-15 WO PCT/CN2021/130651 patent/WO2022142783A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170228890A1 (en) * | 2016-02-06 | 2017-08-10 | Huawei Technologies Co., Ltd. | Object detection method and computer device |
CN108876934A (zh) * | 2017-12-20 | 2018-11-23 | 北京旷视科技有限公司 | 关键点标注方法、装置和系统及存储介质 |
CN110147701A (zh) * | 2018-06-27 | 2019-08-20 | 腾讯科技(深圳)有限公司 | 关键点标注方法、装置、计算机设备及存储介质 |
CN111079752A (zh) * | 2019-12-19 | 2020-04-28 | 国网重庆市电力公司电力科学研究院 | 识别红外图像中的断路器的方法、装置及可读存储介质 |
CN111241947A (zh) * | 2019-12-31 | 2020-06-05 | 深圳奇迹智慧网络有限公司 | 目标检测模型的训练方法、装置、存储介质和计算机设备 |
CN111695628A (zh) * | 2020-06-11 | 2020-09-22 | 北京百度网讯科技有限公司 | 关键点标注方法、装置、电子设备及存储介质 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024011873A1 (zh) * | 2022-07-12 | 2024-01-18 | 青岛云天励飞科技有限公司 | 目标检测方法、装置、电子设备及存储介质 |
CN116468998A (zh) * | 2022-09-09 | 2023-07-21 | 国网湖北省电力有限公司超高压公司 | 一种基于视觉特征的输电线路小部件及挂点部件检测方法 |
CN116188919A (zh) * | 2023-04-25 | 2023-05-30 | 之江实验室 | 一种测试方法、装置、可读存储介质及电子设备 |
CN116664988A (zh) * | 2023-07-24 | 2023-08-29 | 广立微(上海)技术有限公司 | 图片自动标注方法、装置、计算机设备和存储介质 |
CN116664988B (zh) * | 2023-07-24 | 2023-11-21 | 广立微(上海)技术有限公司 | 图片自动标注方法、装置、计算机设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN114757250A (zh) | 2022-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022142783A1 (zh) | 一种图像处理方法以及相关设备 | |
US20220414911A1 (en) | Three-dimensional reconstruction method and three-dimensional reconstruction apparatus | |
JP7040278B2 (ja) | 顔認識のための画像処理装置の訓練方法及び訓練装置 | |
WO2017186016A1 (zh) | 图像形变处理的方法和装置、计算机存储介质 | |
US11816404B2 (en) | Neural network control variates | |
Qin et al. | Real-time hand gesture recognition from depth images using convex shape decomposition method | |
JP6594129B2 (ja) | 情報処理装置、情報処理方法、プログラム | |
CN110349152A (zh) | 人脸图像质量检测方法及装置 | |
Fernandez-Labrador et al. | Unsupervised learning of category-specific symmetric 3d keypoints from point sets | |
Ilonen et al. | Three-dimensional object reconstruction of symmetric objects by fusing visual and tactile sensing | |
WO2022001106A1 (zh) | 关键点检测方法、装置、电子设备及存储介质 | |
WO2019100886A1 (zh) | 用于确定目标对象的外接框的方法、装置、介质和设备 | |
US11816857B2 (en) | Methods and apparatus for generating point cloud histograms | |
CN115210763A (zh) | 用于包括姿态和大小估计的对象检测的系统和方法 | |
Wen et al. | Disp6d: Disentangled implicit shape and pose learning for scalable 6d pose estimation | |
US20240037769A1 (en) | Body Measurement Prediction from Depth Images and Associated Methods and Systems | |
WO2022247126A1 (zh) | 视觉定位方法、装置、设备、介质及程序 | |
US9569661B2 (en) | Apparatus and method for neck and shoulder landmark detection | |
Amador et al. | Benchmarking head pose estimation in-the-wild | |
Xu et al. | Estimating 3D camera pose from 2D pedestrian trajectories | |
Yang et al. | Automatic 3D reconstruction of a polyhedral object from a single line drawing under perspective projection | |
Jin et al. | DOPE++: 6D pose estimation algorithm for weakly textured objects based on deep neural networks | |
Xu et al. | 3D joints estimation of human body using part segmentation | |
Mohsin et al. | Clustering and Identification of key body extremities through topological analysis of multi-sensors 3D data | |
Wang et al. | Multi-task learning and joint refinement between camera localization and object detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21913526 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21913526 Country of ref document: EP Kind code of ref document: A1 |