WO2022021029A1 - Detection model training method and device, detection model using method and storage medium - Google Patents

Detection model training method and device, detection model using method and storage medium Download PDF

Info

Publication number
WO2022021029A1
WO2022021029A1 PCT/CN2020/104973 CN2020104973W WO2022021029A1 WO 2022021029 A1 WO2022021029 A1 WO 2022021029A1 CN 2020104973 W CN2020104973 W CN 2020104973W WO 2022021029 A1 WO2022021029 A1 WO 2022021029A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection model
feature
target
salient
image
Prior art date
Application number
PCT/CN2020/104973
Other languages
French (fr)
Chinese (zh)
Inventor
张雪
席迎来
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080015995.2A priority Critical patent/CN113490947A/en
Priority to PCT/CN2020/104973 priority patent/WO2022021029A1/en
Publication of WO2022021029A1 publication Critical patent/WO2022021029A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present application relates to the technical field of computer vision, and in particular, to a detection model training method, an apparatus, a detection model use method, and a storage medium.
  • the technology of identifying objects in images has become one of the very important technologies in computer vision, and the application of deep learning in the field of image object detection has achieved great breakthroughs. For example, the region where a face is located can be identified from a given image.
  • the technical focus of the target detection algorithm in the existing detection model is on the accuracy of the detection result, so the scale of the existing detection model is large, which makes the existing detection model run slowly and cannot be used in resource allocation. If it is implemented on a small mobile terminal, if the scale of the model is reduced and applied to the mobile terminal, the performance of the detection model cannot be guaranteed, and the scope of use of the model is limited.
  • Embodiments of the present application provide a detection model training method, device, detection model use method, and storage medium, which can reduce the scale of the first detection model and improve the reliability and accuracy of the first detection model training.
  • an embodiment of the present application provides a detection model training method, including:
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • an embodiment of the present application further provides a detection model training apparatus, including a processor and a memory, where a computer program is stored in the memory, and the processor executes the implementation of the present application when calling the computer program in the memory Any of the detection model training methods provided in the example.
  • an embodiment of the present application also provides a method for using a detection model, which is applied to computer equipment, where the detection model is a first detection model after training, and the first detection model after training is implemented using the present application.
  • a model obtained by training any of the detection model training methods provided in the example is deployed in the computer equipment; the detection model using method includes:
  • the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
  • an embodiment of the present application further provides a storage medium, where the storage medium is used to store a computer program, and the computer program is loaded by a processor to execute:
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • This embodiment of the present application may perform feature extraction on the sample image by using the first detection model to obtain the first feature information, and perform feature extraction on the sample image by using the second detection model to obtain the second feature information. Then, the salient region corresponding to the target can be determined based on the position information of the target in the sample image, the first salient region feature can be obtained according to the first feature information and the salient region, and the second salient region can be obtained according to the second feature information and the salient region. salient regional features. At this time, the parameters of the first detection model may be adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • This solution can use the trained second detection model to accurately train the first detection model, so that the trained first detection model can be applied to the mobile terminal to detect the target, which can reduce the number of first detection
  • the scale of the model, and the training of the first detection model based on the salient region corresponding to the determined target and its salient region characteristics can improve the reliability and accuracy of training the first detection model, making the first detection model applicable. wide range.
  • FIG. 1 is a schematic flowchart of a detection model training method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an image of an area where an extraction target is located provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a process for preprocessing an initial image and key points of a human face provided by an embodiment of the present application;
  • FIG. 4 is a schematic diagram of generating multiple candidate regions provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for using a detection model provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of training a first detection model provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a detection model training apparatus provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a detection model training method provided by an embodiment of the application.
  • the detection model training method can be applied to a detection model training device, and is used to accurately train the smaller-scale first detection model by using the second detection model.
  • the detection model training device may include a mobile phone, a computer, a server, or an unmanned aerial vehicle.
  • the UAV can be a rotary-wing UAV, such as a quad-rotor UAV, a hexa-rotor UAV, an octa-rotor UAV, or a fixed-wing UAV, or a rotary-wing and fixed-wing unmanned aerial vehicle.
  • a rotary-wing UAV such as a quad-rotor UAV, a hexa-rotor UAV, an octa-rotor UAV, or a fixed-wing UAV, or a rotary-wing and fixed-wing unmanned aerial vehicle.
  • the combination of man and machine is not limited here.
  • the detection model training method may include steps S101 to S104 and so on.
  • the first detection model and the second detection model can be flexibly set according to actual needs, and the specific types are not limited here.
  • the first detection model and the second detection model can be neural networks.
  • the detection model training method can be applied to a distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
  • the distillation algorithm may be to use one or more trained teacher models (also called Teacher models, which may be larger-scale models) to guide the student model (also called the Student model, the Student model). can be a smaller model) for training.
  • the process of the distillation algorithm can be: Teacher model training, Student model training, and joint training with the Teacher model and the Student model to improve the performance of the Student model.
  • the teacher model and the student model can be trained separately through sample images. After the teacher model and the student model are trained separately, the parameters of the teacher model are fixed, that is, the teacher model only performs feature extraction and no parameter update is performed, and the student model continues. Distillation training.
  • distillation techniques there are a small number of distillation techniques that can be applied to the detection model, but they are based on the Two-stage (two-stage) target detection technology, which is not applicable to the one-stage (one-stage) target detection.
  • This application The embodiment can obtain the saliency area corresponding to the target object, and thereby obtain the salient area features of the first detection model (student model) and the second detection model (teacher model), so as to determine the first detection model based on the salient area features of the two.
  • the detection model is trained not only for the distillation algorithm of two-stage (two-stage) target detection, but also for the distillation algorithm of one-stage (one-stage) target detection, which has wider practicability and improves the efficiency of training.
  • the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
  • the pre-trained second detection model may be used to guide the training of the first detection model.
  • the sample image may be acquired by a collection device such as a camera or a camera, or the sample image may be acquired from a preset local database or server, or the sample image may be an initial image obtained from the acquisition Generated after preprocessing such as rotation or scaling.
  • the sample image may contain a target, and the type of the target may be flexibly set according to actual needs.
  • the target may include objects such as a human face, a vehicle, a ball, or a dog.
  • a sample image can include multiple images, and the size of each sample image can be the same or different.
  • a sample image can contain one or more objects of the same type, or, a sample image
  • targets that can be included in , which are not specifically limited here.
  • the detection model training method may further include: acquiring an initial image; extracting an image of a region where the target is located from the initial image; extracting the target from the regional image
  • the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points; the position information of the target object in the sample image is determined according to the preprocessed key points.
  • the obtained initial images can be preprocessed to obtain abundant sample images, so as to use the abundant sample images to train the first detection model, so as to solve the problem of limited existing data resources.
  • the initial image may be collected by a collection device such as a camera or a camera, or the initial image may be obtained from a preset local database or server, or the like.
  • the initial image may contain objects.
  • the types of objects may include objects such as faces, vehicles, balls, or dogs.
  • the image of the area where the target is located can be extracted from the initial image.
  • the image of the area where the user's face is located can be extracted from the initial image containing the user; for another example, the image of the area containing the vehicle can be extracted.
  • the image of the area where the vehicle is located is extracted from the initial image.
  • the key points of the target can be extracted from the area image, and the number, shape, position or size of the key points can be flexibly set according to actual needs, and the specific content is not limited here.
  • key points such as eyes, nose, mouth, and contours of the face can be extracted from the image of the area where the face is located.
  • the wheels, lights, windows, and body of the vehicle can be extracted from the image of the area where the vehicle is located. etc. key points.
  • the initial image can be preprocessed to obtain a sample image
  • the keypoints can be preprocessed to obtain preprocessed keypoints.
  • preprocessing the initial image and the key points to obtain the sample image and the preprocessed key points may include: rotating, translating, zooming and/or adjusting the brightness of the initial image and the key points according to a preset angle , get the sample image and the preprocessed keypoints.
  • the preprocessing may be flexibly set according to actual needs, for example, the preprocessing may include processing such as rotation, cropping, flipping, translation, scaling, brightness reduction and/or brightness enhancement.
  • the preset angle can be flexibly set according to actual needs.
  • the way of preprocessing the initial image and the way of preprocessing keypoints can be consistent or inconsistent. For example, both the initial image and the key points can be rotated 90 degrees clockwise to obtain the sample image and the preprocessed key points; for another example, the initial image can be rotated 90 degrees clockwise to obtain the sample image, and the The key points are rotated 45 degrees clockwise to obtain the pre-processed key points.
  • the position information of the object in the sample image can be determined according to the preprocessed key points.
  • the position of the preprocessed key point in the sample image can be determined, and the position of the preprocessed key point in the sample image can be determined according to the position of the preprocessed key point in the sample image.
  • the area of the target object in the sample image is generated, and the area can be a rectangle or a square, and the position information of the target object in the sample image is determined based on the area of the target object in the sample image.
  • the position information may be the pixel coordinates of the target object, or the pixel coordinates of the vertex of the region of the target object in the sample image, or the like.
  • the process of preprocessing the initial image and the key points of the face may include:
  • the automatic preprocessing of the initial image and key points (also known as data enhancement processing) is realized, saving time and effort. It should be noted that the initial image and key points can also be preprocessed manually.
  • feature extraction can be performed on the sample image through the first detection model to obtain first feature information
  • feature extraction can be performed on the sample image through the second detection model to obtain second feature information
  • the salient area pos-anchors corresponding to the target can be determined based on the position information of the target in the sample image through the first detection model.
  • the salient area can be an area that is convenient for model learning, and can only include positive sample areas or positive samples. region and negative sample region, etc.
  • determining the salient area corresponding to the target object based on the position information of the target object in the sample image may include: acquiring a plurality of candidate areas; determining the target area of the target object based on the position information of the target object; From the candidate regions, select the region whose coincidence degree with the target region is greater than the first preset threshold, and obtain a positive sample region; and select from the plurality of candidate regions that the coincidence degree with the target region is within the preset range, and the classification probability value is greater than A region with a preset probability threshold is obtained as a negative sample region, and the preset range can be an interval smaller than the first preset threshold and greater than the second preset threshold; the positive sample region and the negative sample region are set as salient regions corresponding to the target.
  • salient regions containing positive sample regions and negative sample regions can be obtained to train the model.
  • multiple candidate regions may be acquired.
  • acquiring multiple candidate regions may include: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
  • the shape or size of the candidate region can be flexibly set according to actual needs.
  • the sample image can be detected by the second detection model to generate multiple candidate regions;
  • the multiple candidate regions marked in advance, and the multiple candidate regions marked in advance can be manually marked or automatically marked.
  • the target area of the target object is determined based on the position information, for example, the target area of the target object may be determined based on the pixel coordinate positions of the four vertex corners of the quadrilateral where the target object is located. Then, the degree of coincidence between each candidate area and the target area can be calculated separately.
  • the intersection over union (IOU) algorithm can be used to calculate the degree of coincidence between each candidate area and the target area: Obtain the candidate area The intersection area with the target area, and the union area between the candidate area and the target area are obtained, and the degree of coincidence between the candidate area and the target area is calculated according to the intersection area and the union area.
  • the calculation method of the degree of coincidence between the candidate area and the target area can be as follows: formula (1):
  • IOU(A, B) represents the degree of coincidence between the candidate area A and the target area B
  • represents the intersection area between the candidate area A and the target area B
  • represents the union area between the candidate area A and the target area B.
  • the degree of coincidence with the target region can be calculated by formula (1).
  • the degree of coincidence of each object can be calculated separately.
  • a region whose coincidence degree with the target region is greater than the first preset threshold can be selected from the multiple candidate regions to obtain a positive sample region.
  • the specific value of the first preset threshold can be flexibly set according to actual needs. If the coincidence degree between the candidate area and the target area is greater than the first preset threshold, it indicates that the similarity between the candidate area and the target area is high.
  • the classification probability value of each candidate region is calculated, and the value range of the classification probability value may be 0 to 1, for example, the classification probability value of the candidate region being a face region is 0.6 or 0.9.
  • a region with a degree of coincidence with the target region within a preset range and a classification probability value greater than a preset probability threshold can be selected from the multiple candidate regions to obtain a negative sample region, where the preset range is smaller than the first preset probability threshold.
  • the threshold value is set in an interval greater than the second preset threshold value, and the specific value of the second preset threshold value can be flexibly set according to actual needs.
  • the positive sample region and the negative sample region can be set as the salient regions corresponding to the target.
  • the positive sample area not only the positive sample area, but also the information of the negative sample area is obtained to train the first detection model, so that the training is more sufficient, the obtained first detection model is more accurate and reliable, and the shortage of existing training resources is solved.
  • the problem not only the positive sample area, but also the information of the negative sample area is obtained to train the first detection model, so that the training is more sufficient, the obtained first detection model is more accurate and reliable, and the shortage of existing training resources is solved.
  • the first salient region feature and the second salient region feature may be flexibly set according to actual needs, and the specific content is not limited here.
  • the first salient region feature may be a feature related to the first feature information in the salient region
  • the second salient region feature may be a feature related to the second feature information in the salient region.
  • the first salient region feature is acquired according to the first feature information and the salient region, and according to the second feature information and the salient region, Obtaining the features of the second salient region may include: obtaining first feature information in the positive sample region and the negative sample region, respectively, to obtain the first salient region feature; and obtaining the second feature information of the positive sample region and the negative sample region, respectively, to obtain The second salient regional feature.
  • the first detection model is used to detect the type and position of the target.
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature
  • obtaining the trained first detection model may include: acquiring the first salient region feature and the second salient region feature The similarity between the features; the loss value obtained by the first detection model for detecting the sample image; and the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model.
  • the similarity between the first salient region feature and the second salient region feature can be obtained, and the similarity can be calculated by Euclidean distance (ie Euclidean distance). characterization.
  • the similarity includes Euclidean distance
  • obtaining the similarity between the first salient region feature and the second salient region feature may include: determining the similarity between the first salient region feature and the second salient region feature Euclidean distance to get the similarity between the first salient region feature and the second salient region feature.
  • the Euclidean distance L2-loss (distill-loss) between the first salient region feature and the second salient region feature can be calculated, and the Euclidean distance L2-loss is the first salient region feature and the second salient region feature.
  • the loss value loss obtained by detecting the sample image by the first detection model is obtained, and then the parameters of the first detection model can be adjusted according to the similarity L2-loss and the loss value loss to obtain the trained first detection model.
  • adjusting the parameters of the first detection model according to the similarity and the loss value, and obtaining the trained first detection model may include: performing a weighted average operation on the similarity and the loss value to obtain the target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
  • the parameters of the first detection model may be adjusted according to the target loss value, so that the parameters of the first detection model are adjusted to appropriate values, and the trained first detection model is obtained. Therefore, a high-precision trained first detection model that meets the requirements can be obtained under the dual constraints of limited computing resources, and a large amount of collected data can be saved, saving time and resources, on the premise of achieving the same effect.
  • the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature, and after the trained first detection model is obtained, the detection model training method may further include: acquiring the to-be-detected The image; the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
  • the trained first detection model can be used to accurately detect the target in the image.
  • the image to be detected may be collected by a collection device such as a camera or a camera, or the image to be detected may be obtained from a preset local database or server, or the like.
  • the target object in the image can be detected by the trained first detection model, and the target position information of the target object in the image can be obtained.
  • the trained first detection model can be used to detect the face in the image to obtain the target position information of the face in the image, and the target position information can be the vertex position of the polygon (for example, quadrilateral) face frame.
  • This embodiment of the present application may perform feature extraction on the sample image by using the first detection model to obtain the first feature information, and perform feature extraction on the sample image by using the second detection model to obtain the second feature information. Then, the salient region corresponding to the target can be determined based on the position information of the target in the sample image, the first salient region feature can be obtained according to the first feature information and the salient region, and the second salient region can be obtained according to the second feature information and the salient region. salient regional features. At this time, the parameters of the first detection model may be adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  • the second detection model can be used to accurately train the smaller-scale first detection model, so that the trained first detection model can be applied to the mobile terminal to detect the target, which reduces the size of the first detection model.
  • the scale of the first detection model, and training the first detection model based on the salient region corresponding to the determined target and its salient region characteristics can improve the reliability and accuracy of the training of the first detection model, so that the scope of application of the first detection model can be improved. wide.
  • FIG. 5 is a schematic flowchart of a method for using a detection model provided by an embodiment of the application.
  • the method for using the detection model can be applied to computer equipment for accurately detecting the target in the image based on the trained first detection model.
  • the computer equipment may include mobile terminals, drones, servers, cameras, etc., and the mobile terminals may include mobile phones and tablet computers.
  • the detection model is a trained first detection model, and the trained first detection model is a model obtained by using the above-mentioned detection model training method, and is deployed in a computer device.
  • the process of training the first detection model may include:
  • the method for using the detection model may include steps S201 to S202 and so on.
  • S202 Detect the target in the image by using the trained first detection model to obtain target position information of the target in the image.
  • the image to be detected may be collected by a collection device such as a camera or a camera, or the image to be detected may be obtained from a preset local database or server, or the like.
  • the target object in the image can be detected by the trained first detection model, and the target position information of the target object in the image can be obtained.
  • the trained first detection model can be used to detect the face in the image to obtain the target position information of the face in the image, and the target position information can be the vertex position of the polygon (for example, quadrilateral) face frame.
  • the first detection model after training is used to accurately detect the target in the image.
  • FIG. 7 is a schematic block diagram of a detection model training apparatus provided by an embodiment of the present application.
  • the detection model training apparatus 11 may include a processor 111 and a memory 112, and the processor 111 and the memory 112 are connected through a bus, such as an I2C (Inter-integrated Circuit) bus.
  • I2C Inter-integrated Circuit
  • the processor 111 may be a micro-controller unit (Micro-controller Unit, MCU), a central processing unit (Central Processing Unit, CPU), or a digital signal processor (Digital Signal Processor, DSP) or the like.
  • MCU Micro-controller Unit
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • the memory 112 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) magnetic disk, an optical disk, a U disk, or a mobile hard disk, etc., and may be used to store computer programs.
  • ROM Read-Only Memory
  • the memory 112 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) magnetic disk, an optical disk, a U disk, or a mobile hard disk, etc., and may be used to store computer programs.
  • the processor 111 is configured to call the computer program stored in the memory 112, and implement the detection model training method provided by the embodiment of the present application when executing the computer program, for example, the following steps may be performed:
  • the parameters of the first detection model are adjusted to obtain the trained first detection model.
  • the processor 111 when the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor 111 is configured to execute: obtain the first salient region feature. The similarity between the regional feature and the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value, and obtain the first detection model after training.
  • a detection model is configured to execute: obtain the first salient region feature. The similarity between the regional feature and the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value, and obtain the first detection model after training.
  • the processor 111 when the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model, the processor 111 is configured to perform: weighted average operation on the similarity and the loss value , obtain the target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
  • the similarity includes Euclidean distance.
  • the processor 111 is configured to perform: determining the first salient region feature and the second salient region feature. The Euclidean distance between the two salient region features, to obtain the similarity between the first salient region feature and the second salient region feature.
  • the processor 111 when determining the salient region corresponding to the target object based on the position information of the target object in the sample image, is configured to perform: acquiring a plurality of candidate regions; determining the target region of the target object based on the position information; A positive sample region is obtained by screening out the regions whose coincidence degree with the target region is greater than the first preset threshold from the plurality of candidate regions; screening out from the plurality of candidate regions that the coincidence degree with the target region is within a preset range, and the classification probability is The area whose value is greater than the preset probability threshold value is obtained as a negative sample area, and the preset range is an interval smaller than the first preset threshold value and greater than the second preset threshold value; the positive sample area and the negative sample area are set as the significant area corresponding to the target object .
  • the processor 111 when acquiring multiple candidate regions, is configured to perform: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
  • the processor 111 when acquiring the first salient region feature according to the first feature information and the salient region, and acquiring the second salient region feature according to the second feature information and the salient region, the processor 111 is configured to execute: respectively acquiring The first feature information in the positive sample region and the negative sample region is obtained to obtain the first salient region feature; and the second feature information of the positive sample region and the negative sample region are respectively obtained to obtain the second salient region feature.
  • the processor 111 is further configured to execute: acquiring the to-be-detected Detect the target object in the image through the trained first detection model, and obtain the target position information of the target object in the image.
  • the processor 111 before the feature extraction is performed on the sample image by the first detection model, is configured to perform: acquiring an initial image; extracting an image of the region where the target object is located from the initial image; extracting the target object from the region image
  • the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points; the position information of the target object in the sample image is determined according to the preprocessed key points.
  • the processor 111 when the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points, the processor 111 is configured to perform: rotate and translate the initial image and the key points according to a preset angle , scaling and/or brightness adjustment to obtain sample images and preprocessed keypoints.
  • the target includes a human face.
  • the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
  • the storage medium is applied to the distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
  • Embodiments of the present application further provide a storage medium, where the storage medium is a computer-readable storage medium, where a computer program is stored in the storage medium, and the computer program includes program instructions, and the processor executes the program instructions to realize the provision of the embodiments of the present application.
  • the detection model training method For example, a processor can execute:
  • the parameters of the first detection model are adjusted to obtain the trained first detection model.
  • the processor when the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor is configured to execute: acquiring the first salient region feature and the similarity between the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value to obtain the first detection model after training Model.
  • the processor when the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model, the processor is configured to perform: performing a weighted average operation on the similarity and the loss value to obtain target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
  • the similarity includes Euclidean distance
  • the processor when acquiring the similarity between the first salient region feature and the second salient region feature, the processor is configured to perform: determining the first salient region feature and the second salient region feature Euclidean distance between regional features to get the similarity between the first salient region feature and the second salient region feature.
  • the processor when determining the salient region corresponding to the target object based on the position information of the target object in the sample image, is configured to: obtain a plurality of candidate regions; determine the target region of the target object based on the position information; From the candidate regions, select the regions whose coincidence degree with the target region is greater than the first preset threshold, and obtain a positive sample region; and select from the plurality of candidate regions that the coincidence degree with the target region is within the preset range, and the classification probability value is greater than A region with a preset probability threshold is obtained as a negative sample region, and the preset range is an interval smaller than the first preset threshold and greater than the second preset threshold; the positive sample region and the negative sample region are set as salient regions corresponding to the target.
  • the processor when acquiring multiple candidate regions, is configured to perform: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
  • the processor when obtaining the first salient region feature according to the first feature information and the salient region, and obtaining the second salient region feature according to the second feature information and the salient region, the processor is configured to execute: respectively obtaining positive samples obtaining the first salient region feature by obtaining the first feature information in the region and the negative sample region; and obtaining the second feature information of the positive sample region and the negative sample region respectively to obtain the second salient region feature.
  • the processor is further configured to perform: acquiring the to-be-detected The image; the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
  • the processor before the feature extraction is performed on the sample image by the first detection model, the processor is configured to perform: acquiring an initial image; extracting an image of an area where the target is located from the initial image; extracting the key of the target from the area image point; preprocess the initial image and key points to obtain the sample image and the preprocessed key points; determine the position information of the target in the sample image according to the preprocessed key points.
  • the processor when the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points, the processor is configured to perform: rotate, translate, and zoom the initial image and the key points according to a preset angle and/or brightness adjustment to obtain sample images and preprocessed keypoints.
  • the target includes a human face.
  • the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
  • the storage medium is applied to the distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
  • the storage medium may be an internal storage unit of the detection model training apparatus described in any of the foregoing embodiments, such as a hard disk or a memory of the detection model training apparatus.
  • the storage medium can also be an external storage device of the detection model training device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card equipped on the detection model training device (Flash Card) etc.
  • the computer program stored in the storage medium can execute any detection model training method provided by the embodiments of the present application, it is possible to implement any of the detection model training methods provided by the embodiments of the present application.
  • the beneficial effects refer to the foregoing embodiments for details, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

A detection model training method and device, a detection model using method and a storage medium. The method comprises: performing feature extraction on a sample image by means of a first detection model to obtain first feature information, and performing feature extraction on the sample image by means of a trained second detection model to obtain second feature information (S101); on the basis of position information of a target object in the sample image, determining a salient region corresponding to the target object (S102); acquiring first salient region features according to the first feature information and the salient region, and acquiring second salient region features according to the second feature information and the salient region (S103); and adjusting parameters of the first detection model according to the first salient region features and the second salient region features to obtain a trained first detection model (S104). The training reliability and accuracy of the first detection model are improved.

Description

检测模型训练方法、装置、检测模型使用方法及存储介质Detection model training method, device, detection model use method and storage medium 技术领域technical field
本申请涉及计算机视觉技术领域,尤其涉及一种检测模型训练方法、装置、检测模型使用方法及存储介质。The present application relates to the technical field of computer vision, and in particular, to a detection model training method, an apparatus, a detection model use method, and a storage medium.
背景技术Background technique
随着科技的发展,以及深度学习的兴起,对图像中目标物进行识别的技术已成为计算机视觉非常重要技术之一,并且使用深度学习在图像目标检测领域中的应用得到巨大的突破。例如,可以从给定的图像中识别出人脸所在的区域。With the development of science and technology and the rise of deep learning, the technology of identifying objects in images has become one of the very important technologies in computer vision, and the application of deep learning in the field of image object detection has achieved great breakthroughs. For example, the region where a face is located can be identified from a given image.
目前,现有的检测模型中目标检测算法的技术重心是放在检测结果的准确率上,因此现有的检测模型的规模较大,使得现有的检测模型运行速度较慢且无法在资源配置较小的移动终端上实施,若减小模型规模应用到移动终端,则无法保证检测模型的性能,并且限制了模型的使用范围。At present, the technical focus of the target detection algorithm in the existing detection model is on the accuracy of the detection result, so the scale of the existing detection model is large, which makes the existing detection model run slowly and cannot be used in resource allocation. If it is implemented on a small mobile terminal, if the scale of the model is reduced and applied to the mobile terminal, the performance of the detection model cannot be guaranteed, and the scope of use of the model is limited.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种检测模型训练方法、装置、检测模型使用方法及存储介质,可以减小第一检测模型的规模,以及提高对第一检测模型训练的可靠性和准确性。Embodiments of the present application provide a detection model training method, device, detection model use method, and storage medium, which can reduce the scale of the first detection model and improve the reliability and accuracy of the first detection model training.
第一方面,本申请实施例提供了一种检测模型训练方法,包括:In a first aspect, an embodiment of the present application provides a detection model training method, including:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;Perform feature extraction on the sample image through the first detection model to obtain first feature information, and perform feature extraction on the sample image through the trained second detection model to obtain second feature information;
基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;determining the salient area corresponding to the target based on the position information of the target in the sample image;
根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;obtaining a first salient area feature according to the first feature information and the salient area, and obtaining a second salient area feature according to the second feature information and the salient area;
根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
第二方面,本申请实施例还提供了一种检测模型训练装置,包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时执行本申请实施例提供的任一种检测模型训练方法。In a second aspect, an embodiment of the present application further provides a detection model training apparatus, including a processor and a memory, where a computer program is stored in the memory, and the processor executes the implementation of the present application when calling the computer program in the memory Any of the detection model training methods provided in the example.
第三方面,本申请实施例还提供了一种检测模型使用方法,应用于计算机 设备,所述检测模型为训练后的第一检测模型,所述训练后的第一检测模型为采用本申请实施例提供的任一种检测模型训练方法进行训练得到的模型,并部署在所述计算机设备中;所述检测模型使用方法包括:In a third aspect, an embodiment of the present application also provides a method for using a detection model, which is applied to computer equipment, where the detection model is a first detection model after training, and the first detection model after training is implemented using the present application. A model obtained by training any of the detection model training methods provided in the example is deployed in the computer equipment; the detection model using method includes:
获取待检测的图像;Get the image to be detected;
通过所述训练后的第一检测模型对所述图像中的目标物进行检测,得到所述目标物在所述图像中的目标位置信息。The target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
第四方面,本申请实施例还提供了一种存储介质,所述存储介质用于存储计算机程序,所述计算机程序被处理器加载以执行:In a fourth aspect, an embodiment of the present application further provides a storage medium, where the storage medium is used to store a computer program, and the computer program is loaded by a processor to execute:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;Perform feature extraction on the sample image through the first detection model to obtain first feature information, and perform feature extraction on the sample image through the trained second detection model to obtain second feature information;
基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;determining the salient area corresponding to the target based on the position information of the target in the sample image;
根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;obtaining a first salient area feature according to the first feature information and the salient area, and obtaining a second salient area feature according to the second feature information and the salient area;
根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
本申请实施例可以通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息。然后,可以基于样本图像中目标物的位置信息,确定目标物对应的显著区域,根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征。此时,可以根据第一显著区域特征和第二著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。该方案可以利用训练好了的第二检测模型对第一检测模型进行准确训练,以便后续可以利用训练后的第一检测模型可以应用到移动终端对目标物进行检测,可以减小了第一检测模型的规模,以及,基于确定的目标物对应的显著区域及其显著区域特征对第一检测模型进行训练,可以提高对第一检测模型训练的可靠性和准确性,使得第一检测模型的适用范围广。This embodiment of the present application may perform feature extraction on the sample image by using the first detection model to obtain the first feature information, and perform feature extraction on the sample image by using the second detection model to obtain the second feature information. Then, the salient region corresponding to the target can be determined based on the position information of the target in the sample image, the first salient region feature can be obtained according to the first feature information and the salient region, and the second salient region can be obtained according to the second feature information and the salient region. salient regional features. At this time, the parameters of the first detection model may be adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model. This solution can use the trained second detection model to accurately train the first detection model, so that the trained first detection model can be applied to the mobile terminal to detect the target, which can reduce the number of first detection The scale of the model, and the training of the first detection model based on the salient region corresponding to the determined target and its salient region characteristics can improve the reliability and accuracy of training the first detection model, making the first detection model applicable. wide range.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以 根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1是本申请实施例提供的检测模型训练方法的流程示意图;1 is a schematic flowchart of a detection model training method provided by an embodiment of the present application;
图2是本申请实施例提供的提取目标物所在的区域图像的示意图;2 is a schematic diagram of an image of an area where an extraction target is located provided by an embodiment of the present application;
图3是本申请实施例提供的对初始图像和人脸的关键点进行预处理的流程的流程示意图;3 is a schematic flowchart of a process for preprocessing an initial image and key points of a human face provided by an embodiment of the present application;
图4是本申请实施例提供的生成多个候选区域的示意图;4 is a schematic diagram of generating multiple candidate regions provided by an embodiment of the present application;
图5是本申请实施例提供的检测模型使用方法的流程示意图;5 is a schematic flowchart of a method for using a detection model provided by an embodiment of the present application;
图6是本申请实施例提供的对第一检测模型进行训练的流程示意图;6 is a schematic flowchart of training a first detection model provided by an embodiment of the present application;
图7是本申请实施例提供的检测模型训练装置的结构示意图。FIG. 7 is a schematic structural diagram of a detection model training apparatus provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the figures are for illustration only, and do not necessarily include all contents and operations/steps, nor do they have to be performed in the order described. For example, some operations/steps can also be decomposed, combined or partially combined, so the actual execution order may be changed according to the actual situation.
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and features in the embodiments may be combined with each other without conflict.
请参阅图1,图1是申请一实施例提供的一种检测模型训练方法的流程示意图。该检测模型训练方法可以应用在检测模型训练装置中,用于通过第二检测模型对规模较小的第一检测模型进行准确训练。其中检测模型训练装置可以包括手机、电脑、服务器或无人机等。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a detection model training method provided by an embodiment of the application. The detection model training method can be applied to a detection model training device, and is used to accurately train the smaller-scale first detection model by using the second detection model. The detection model training device may include a mobile phone, a computer, a server, or an unmanned aerial vehicle.
其中,无人机可以为旋翼型无人机,例如四旋翼无人机、六旋翼无人机、八旋翼无人机,也可以是固定翼无人机,还可以是旋翼型与固定翼无人机的组合,在此不作限定。Among them, the UAV can be a rotary-wing UAV, such as a quad-rotor UAV, a hexa-rotor UAV, an octa-rotor UAV, or a fixed-wing UAV, or a rotary-wing and fixed-wing unmanned aerial vehicle. The combination of man and machine is not limited here.
具体地,如图1所示,该检测模型训练方法可以包括步骤S101至步骤S104等。Specifically, as shown in FIG. 1 , the detection model training method may include steps S101 to S104 and so on.
S101、通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息。S101. Perform feature extraction on the sample image through a first detection model to obtain first feature information, and perform feature extraction on the sample image through a second detection model to obtain second feature information.
其中,第一检测模型和第二检测模型可以根据实际需要进行灵活设置,具体类型在此处不作限定,例如第一检测模型和第二检测模型可以神经网络。The first detection model and the second detection model can be flexibly set according to actual needs, and the specific types are not limited here. For example, the first detection model and the second detection model can be neural networks.
在一些实施方式中,检测模型训练方法可以应用于蒸馏算法,第一检测模型为学生模型,第二检测模型为教师模型。In some embodiments, the detection model training method can be applied to a distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
其中,蒸馏算法可以是,用一个或多个训练好的教师模型(也可以称为Teacher模型,该Teacher模型可以是规模较大的模型)指导学生模型(也可以称为Student模型,该Student模型可以是规模较小的模型)进行训练。蒸馏算法的流程可以是:Teacher模型训练、Student模型训练、以及用Teacher模型和Student模型联合训练达到提升Student模型性能的目的。例如,可以通过样本图像分别对Teacher模型和Student模型进行训练,Teacher模型和Student模型分别训练完成后,将Teacher模型的参数固定,即Teacher模型只作特征提取不再做参数更新,Student模型继续进行蒸馏训练。Among them, the distillation algorithm may be to use one or more trained teacher models (also called Teacher models, which may be larger-scale models) to guide the student model (also called the Student model, the Student model). can be a smaller model) for training. The process of the distillation algorithm can be: Teacher model training, Student model training, and joint training with the Teacher model and the Student model to improve the performance of the Student model. For example, the teacher model and the student model can be trained separately through sample images. After the teacher model and the student model are trained separately, the parameters of the teacher model are fixed, that is, the teacher model only performs feature extraction and no parameter update is performed, and the student model continues. Distillation training.
在现有技术中,有少量的蒸馏技术可以应用到检测模型中,但是是基于Two-stage(两阶段)目标检测技术的,对于one-stage(一阶段)目标检测不适用,然而,本申请实施例可以通过获取目标物对应的显著性区域,并由此获取第一检测模型(student模型)和第二检测模型(teacher模型)的显著区域特征,从而基于两者的显著区域特征对第一检测模型进行训练,不仅应用于Two-stage(两阶段)目标检测的蒸馏算法,还适用于one-stage(一阶段)目标检测的蒸馏算法,实用性更广,提高训练的效率。In the prior art, there are a small number of distillation techniques that can be applied to the detection model, but they are based on the Two-stage (two-stage) target detection technology, which is not applicable to the one-stage (one-stage) target detection. However, this application The embodiment can obtain the saliency area corresponding to the target object, and thereby obtain the salient area features of the first detection model (student model) and the second detection model (teacher model), so as to determine the first detection model based on the salient area features of the two. The detection model is trained not only for the distillation algorithm of two-stage (two-stage) target detection, but also for the distillation algorithm of one-stage (one-stage) target detection, which has wider practicability and improves the efficiency of training.
在一些实施方式中,第一检测模型的规模小于第二检测模型的规模,第二检测模型为训练后的模型。为了提高对第一检测模型训练的准确性,可以利用预先训练好的,第二检测模型来指导第一检测模型的训练。In some embodiments, the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model. In order to improve the training accuracy of the first detection model, the pre-trained second detection model may be used to guide the training of the first detection model.
其中,样本图像可以是通过摄像头或照相机等采集设备采集得到的,或者是,样本图像可以是从预设的本地数据库或服务器上获取得到的,或者是,样本图像可以是对获取得到的初始图像进行旋转或缩放等预处理后生成的。该样本图像中可以包含目标物,该目标物的类型可以根据实际需要进行灵活设置,例如,该目标物可以包括人脸、车辆、球或狗等物体。需要说明的是,样本图像可以包括多张,每张样本图像的大小可以一样也可以不一样,一张样本图像中包含的同一类型的目标物可以是一个或多个,或者,一张样本图像中可以包含的多种不同类型的目标物,具体在此处不作限定。Wherein, the sample image may be acquired by a collection device such as a camera or a camera, or the sample image may be acquired from a preset local database or server, or the sample image may be an initial image obtained from the acquisition Generated after preprocessing such as rotation or scaling. The sample image may contain a target, and the type of the target may be flexibly set according to actual needs. For example, the target may include objects such as a human face, a vehicle, a ball, or a dog. It should be noted that a sample image can include multiple images, and the size of each sample image can be the same or different. A sample image can contain one or more objects of the same type, or, a sample image There are many different types of targets that can be included in , which are not specifically limited here.
在一些实施方式中,通过第一检测模型对样本图像进行特征提取之前,检 测模型训练方法还可以包括:获取初始图像;从初始图像中提取目标物所在的区域图像;从区域图像中提取目标物的关键点;对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点;根据预处理后的关键点确定样本图像中目标物的位置信息。In some embodiments, before the feature extraction is performed on the sample image by the first detection model, the detection model training method may further include: acquiring an initial image; extracting an image of a region where the target is located from the initial image; extracting the target from the regional image The initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points; the position information of the target object in the sample image is determined according to the preprocessed key points.
为了丰富样本图像,以及扩大模型学习范围,可以对获取到的初始图像进行预处理以得到丰富的样本图像,以便利用丰富的样本图像对第一检测模型进行训练,解决现有数据资源受限而无法充分训练的问题。具体地,可以是通过摄像头或照相机等采集设备采集初始图像,或者是,可以是从预设的本地数据库或服务器上获取初始图像等。该初始图像中可以包含目标物,例如,目标物类型可以包括人脸、车辆、球或狗等物体。In order to enrich the sample images and expand the learning range of the model, the obtained initial images can be preprocessed to obtain abundant sample images, so as to use the abundant sample images to train the first detection model, so as to solve the problem of limited existing data resources. The problem of insufficient training. Specifically, the initial image may be collected by a collection device such as a camera or a camera, or the initial image may be obtained from a preset local database or server, or the like. The initial image may contain objects. For example, the types of objects may include objects such as faces, vehicles, balls, or dogs.
然后,可以从初始图像中提取目标物所在的区域图像,例如,如图2所示,可以从包含用户的初始图像中提取该用户的人脸所在的区域图像;又例如,可以从包含车辆的初始图像中提取该车辆所在的区域图像。此时,可以从区域图像中提取目标物的关键点,该关键点的数量、形状、位置或大小等可以根据实际需要进行灵活设置,具体内容在此处不作限定。例如,可以从人脸所在的区域图像中提取人脸的眼睛、鼻子、嘴巴以及轮廓等关键点,又例如,可以从车辆所在的区域图像中提取车辆的车轮、车灯、车窗、以及车身等关键点。Then, the image of the area where the target is located can be extracted from the initial image. For example, as shown in FIG. 2 , the image of the area where the user's face is located can be extracted from the initial image containing the user; for another example, the image of the area containing the vehicle can be extracted. The image of the area where the vehicle is located is extracted from the initial image. At this time, the key points of the target can be extracted from the area image, and the number, shape, position or size of the key points can be flexibly set according to actual needs, and the specific content is not limited here. For example, key points such as eyes, nose, mouth, and contours of the face can be extracted from the image of the area where the face is located. For another example, the wheels, lights, windows, and body of the vehicle can be extracted from the image of the area where the vehicle is located. etc. key points.
此时,可以对初始图像进行预处理,得到样本图像,以及对关键点进行预处理,得到预处理后的关键点。在一些实施方式中,对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点可以包括:对初始图像和关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到样本图像和预处理后的关键点。At this point, the initial image can be preprocessed to obtain a sample image, and the keypoints can be preprocessed to obtain preprocessed keypoints. In some embodiments, preprocessing the initial image and the key points to obtain the sample image and the preprocessed key points may include: rotating, translating, zooming and/or adjusting the brightness of the initial image and the key points according to a preset angle , get the sample image and the preprocessed keypoints.
其中,预处理可以根据实际需要进行灵活设置,例如,预处理可以包括旋转、剪裁、翻转、平移、缩放、亮度减弱和/或亮度增强等处理。该预设角度可以根据实际需要进行灵活设置。要说明的是,对初始图像进行预处理的方式和对关键点进行预处理的方式可以一致,或不一致。例如,可以对初始图像和关键点均进行顺时针的90度旋转,得到样本图像和预处理后的关键点;又例如,可以对初始图像进行顺时针的90度旋转,得到样本图像,以及对关键点进行顺时针的45度旋转,得到预处理后的关键点。The preprocessing may be flexibly set according to actual needs, for example, the preprocessing may include processing such as rotation, cropping, flipping, translation, scaling, brightness reduction and/or brightness enhancement. The preset angle can be flexibly set according to actual needs. It should be noted that the way of preprocessing the initial image and the way of preprocessing keypoints can be consistent or inconsistent. For example, both the initial image and the key points can be rotated 90 degrees clockwise to obtain the sample image and the preprocessed key points; for another example, the initial image can be rotated 90 degrees clockwise to obtain the sample image, and the The key points are rotated 45 degrees clockwise to obtain the pre-processed key points.
最后,可以根据预处理后的关键点确定样本图像中目标物的位置信息,例如,确定预处理后的关键点在样本图像中的位置,并根据预处理后的关键点在 样本图像中的位置生成目标物在样本图像中的区域,该区域可以是矩形或正方形等,基于目标物在样本图像中的区域确定样本图像中目标物的位置信息。该位置信息可以是目标物的像素坐标,或者是目标物在样本图像中的区域的顶角像素坐标等。Finally, the position information of the object in the sample image can be determined according to the preprocessed key points. For example, the position of the preprocessed key point in the sample image can be determined, and the position of the preprocessed key point in the sample image can be determined according to the position of the preprocessed key point in the sample image. The area of the target object in the sample image is generated, and the area can be a rectangle or a square, and the position information of the target object in the sample image is determined based on the area of the target object in the sample image. The position information may be the pixel coordinates of the target object, or the pixel coordinates of the vertex of the region of the target object in the sample image, or the like.
如图3所示,以目标物为人脸为例,对初始图像和人脸的关键点进行预处理的流程可以包括:As shown in Figure 3, taking the target as a face as an example, the process of preprocessing the initial image and the key points of the face may include:
S11、获取初始图像image。S11. Acquire an initial image image.
S12、根据已知的人脸框从初始图像中提取出人脸区域图像face_image。S12. Extract the face area image face_image from the initial image according to the known face frame.
S13、提取人脸区域图像face_image的人脸关键点face_landmarks。S13, extracting the face key points face_landmarks of the face area image face_image.
S14、将初始图像image和人脸关键点face_landmarks旋转任意随机角度,得到旋转后的图像rotate_image和旋转后的人脸关键点rotate_landmarks。S14. Rotate the initial image image and the face key point face_landmarks by any random angle to obtain the rotated image rotate_image and the rotated face key point rotate_landmarks.
S15、根据旋转后的人脸关键点rotate_landmarks计算人脸框rotate_box,即人脸的位置信息。S15. Calculate the face frame rotate_box according to the rotated face key points rotate_landmarks, that is, the position information of the face.
S16、保存旋转后的图像rotate_image和人脸框rotate_box。S16, save the rotated image rotate_image and the face frame rotate_box.
实现了自动对初始图像和关键点进行预处理(也可以称为数据增强处理),省时省力。需要说明的是,还可以人工手动对对初始图像和关键点进行预处理等。The automatic preprocessing of the initial image and key points (also known as data enhancement processing) is realized, saving time and effort. It should be noted that the initial image and key points can also be preprocessed manually.
在得到样本图像和目标物的位置信息后,可以通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息。After the location information of the sample image and the target object is obtained, feature extraction can be performed on the sample image through the first detection model to obtain first feature information, and feature extraction can be performed on the sample image through the second detection model to obtain second feature information.
S102、基于样本图像中目标物的位置信息,确定目标物对应的显著区域。S102. Determine a salient area corresponding to the target based on the position information of the target in the sample image.
可以通过第一检测模型基于样本图像中目标物的位置信息,确定目标物对应的显著区域pos-anchors,该显著区域可以是便于模型学习的区域,可以仅包括正样本区域,还可以包括正样本区域和负样本区域等。The salient area pos-anchors corresponding to the target can be determined based on the position information of the target in the sample image through the first detection model. The salient area can be an area that is convenient for model learning, and can only include positive sample areas or positive samples. region and negative sample region, etc.
在一些实施方式中,基于样本图像中目标物的位置信息,确定目标物对应的显著区域可以包括:获取多个候选区域;基于所述目标物的位置信息确定目标物的目标区域;从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域;从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,预设范围可以为小于第一预设阈值且大于第二预设阈值的区间;将正样本区域和负样本区域设置为目标物对应的显著区域。In some embodiments, determining the salient area corresponding to the target object based on the position information of the target object in the sample image may include: acquiring a plurality of candidate areas; determining the target area of the target object based on the position information of the target object; From the candidate regions, select the region whose coincidence degree with the target region is greater than the first preset threshold, and obtain a positive sample region; and select from the plurality of candidate regions that the coincidence degree with the target region is within the preset range, and the classification probability value is greater than A region with a preset probability threshold is obtained as a negative sample region, and the preset range can be an interval smaller than the first preset threshold and greater than the second preset threshold; the positive sample region and the negative sample region are set as salient regions corresponding to the target.
为了提高显著区域的可靠性,以便提高对模型训练的精准性以及提升模型的性能,可以获取包含正样本区域和负样本区域的显著区域对模型进行训练。具体地,首先可以获取多个候选区域,在一些实施方式中,获取多个候选区域可以包括:基于第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。In order to improve the reliability of salient regions, so as to improve the accuracy of model training and improve the performance of the model, salient regions containing positive sample regions and negative sample regions can be obtained to train the model. Specifically, first, multiple candidate regions may be acquired. In some embodiments, acquiring multiple candidate regions may include: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
其中,候选区域的形状或大小等可以根据实际需要进行灵活设置,例如,如图4所示,可以通过第二检测模型对样本图像进行检测,生成多个候选区域;又例如,可以直接获取预先标注的多个候选区域,预先标注的多个候选区域可以是人工标注或自动标注等。The shape or size of the candidate region can be flexibly set according to actual needs. For example, as shown in Figure 4, the sample image can be detected by the second detection model to generate multiple candidate regions; The multiple candidate regions marked in advance, and the multiple candidate regions marked in advance can be manually marked or automatically marked.
以及,基于位置信息确定目标物的目标区域,例如,可以基于目标物所在四边形的四个顶角的像素坐标位置确定目标物的目标区域。然后,可以分别计算每个候选区域与目标区域之间的重合度,例如,可以采用交并比算法(Intersection over Union,IOU)计算每个候选区域与目标区域之间的重合度:获取候选区域与目标区域之间的交集面积,以及获取候选区域与目标区域之间的并集面积,根据交集面积和并集面积计算候选区域与目标区域之间的重合度。And, the target area of the target object is determined based on the position information, for example, the target area of the target object may be determined based on the pixel coordinate positions of the four vertex corners of the quadrilateral where the target object is located. Then, the degree of coincidence between each candidate area and the target area can be calculated separately. For example, the intersection over union (IOU) algorithm can be used to calculate the degree of coincidence between each candidate area and the target area: Obtain the candidate area The intersection area with the target area, and the union area between the candidate area and the target area are obtained, and the degree of coincidence between the candidate area and the target area is calculated according to the intersection area and the union area.
其中,候选区域与目标区域之间的重合度的计算方式可以如下公式(1):Among them, the calculation method of the degree of coincidence between the candidate area and the target area can be as follows: formula (1):
Figure PCTCN2020104973-appb-000001
Figure PCTCN2020104973-appb-000001
公式(1)中,IOU(A,B)表示候选区域A与与目标区域B之间的重合度,|A∩B|表示候选区域A与与目标区域B之间的交集面积,|A∪B|表示候选区域A与与目标区域B之间的并集面积。In formula (1), IOU(A, B) represents the degree of coincidence between the candidate area A and the target area B, |A∩B| represents the intersection area between the candidate area A and the target area B, |A∪ B| represents the union area between the candidate area A and the target area B.
对于多个候选区域,均可以通过公式(1)计算得到其与目标区域的重合度。当样本图像中包括多个目标物时,可以分别计算每个目标物的重合度。For multiple candidate regions, the degree of coincidence with the target region can be calculated by formula (1). When the sample image includes multiple objects, the degree of coincidence of each object can be calculated separately.
然后,可以从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域,该第一预设阈值的具体取值可以根据实际需要进行灵活设置,若候选区域与目标区域的重合度大于第一预设阈值,则说明该候选区域与目标区域之间的相似度较高。Then, a region whose coincidence degree with the target region is greater than the first preset threshold can be selected from the multiple candidate regions to obtain a positive sample region. The specific value of the first preset threshold can be flexibly set according to actual needs. If the coincidence degree between the candidate area and the target area is greater than the first preset threshold, it indicates that the similarity between the candidate area and the target area is high.
以及,计算每个候选区域的分类概率值,该分类概率值的取值范围可以是0至1,例如候选区域为人脸区域的分类概率值为0.6或0.9等。此时可以从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概 率阈值的区域,得到负样本区域,其中,预设范围为小于第一预设阈值且大于第二预设阈值的区间,该第二预设阈值的具体取值可以根据实际需要进行灵活设置。最后,可以将正样本区域和负样本区域设置为目标物对应的显著区域。And, the classification probability value of each candidate region is calculated, and the value range of the classification probability value may be 0 to 1, for example, the classification probability value of the candidate region being a face region is 0.6 or 0.9. At this time, a region with a degree of coincidence with the target region within a preset range and a classification probability value greater than a preset probability threshold can be selected from the multiple candidate regions to obtain a negative sample region, where the preset range is smaller than the first preset probability threshold. The threshold value is set in an interval greater than the second preset threshold value, and the specific value of the second preset threshold value can be flexibly set according to actual needs. Finally, the positive sample region and the negative sample region can be set as the salient regions corresponding to the target.
在本发明实施例中,不仅获取正样本区域,还获取负样本区域的信息对第一检测模型进行训练,使得训练更充分,得到的第一检测模型更准确和可靠,解决现有训练资源不足的问题。In the embodiment of the present invention, not only the positive sample area, but also the information of the negative sample area is obtained to train the first detection model, so that the training is more sufficient, the obtained first detection model is more accurate and reliable, and the shortage of existing training resources is solved. The problem.
S103、根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征。S103. Acquire the first salient area feature according to the first feature information and the salient area, and acquire the second salient area feature according to the second feature information and the salient area.
其中,第一显著区域特征和第二显著区域特征可以根据实际需要进行灵活设置,具体内容在此处不作限定。例如,第一显著区域特征可以是显著区域中与第一特征信息相关的特征,第二显著区域特征可以是显著区域中与第二特征信息相关的特征。The first salient region feature and the second salient region feature may be flexibly set according to actual needs, and the specific content is not limited here. For example, the first salient region feature may be a feature related to the first feature information in the salient region, and the second salient region feature may be a feature related to the second feature information in the salient region.
为了提高第一显著区域特征和第二显著区域特征获取的准确性,在一些实施方式中,根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征可以包括:分别获取正样本区域和负样本区域中的第一特征信息,得到第一显著区域特征;以及,分别获取正样本区域和负样本区域的第二特征信息,得到第二显著区域特征。In order to improve the accuracy of acquiring the first salient region feature and the second salient region feature, in some embodiments, the first salient region feature is acquired according to the first feature information and the salient region, and according to the second feature information and the salient region, Obtaining the features of the second salient region may include: obtaining first feature information in the positive sample region and the negative sample region, respectively, to obtain the first salient region feature; and obtaining the second feature information of the positive sample region and the negative sample region, respectively, to obtain The second salient regional feature.
S104、根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。S104. Adjust the parameters of the first detection model according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
在本实施例中,所述第一检测模型用于检测目标物的类型和位置。In this embodiment, the first detection model is used to detect the type and position of the target.
在一些实施方式中,根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型可以包括:获取第一显著区域特征和第二显著区域特征之间的相似度;获取第一检测模型对样本图像进行检测得到的损失值;根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。In some embodiments, the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature, and obtaining the trained first detection model may include: acquiring the first salient region feature and the second salient region feature The similarity between the features; the loss value obtained by the first detection model for detecting the sample image; and the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model.
为了提高对第一检测模型训练的可靠性和准确性,可以获取第一显著区域特征和第二显著区域特征之间的相似度,该相似度可以欧几里得距离(即欧氏距离)来表征。In order to improve the reliability and accuracy of training the first detection model, the similarity between the first salient region feature and the second salient region feature can be obtained, and the similarity can be calculated by Euclidean distance (ie Euclidean distance). characterization.
在一些实施方式中,相似度包括欧几里得距离,获取第一显著区域特征和第二显著区域特征之间的相似度可以包括:确定第一显著区域特征和第二显著区域特征之间的欧几里得距离,得到第一显著区域特征和第二显著区域特征之 间的相似度。例如,可以计算第一显著区域特征和第二显著区域特征之间的欧几里得距离L2-loss(distill-loss),该欧几里得距离L2-loss即为第一显著区域特征和第二显著区域特征之间的相似度。In some embodiments, the similarity includes Euclidean distance, and obtaining the similarity between the first salient region feature and the second salient region feature may include: determining the similarity between the first salient region feature and the second salient region feature Euclidean distance to get the similarity between the first salient region feature and the second salient region feature. For example, the Euclidean distance L2-loss (distill-loss) between the first salient region feature and the second salient region feature can be calculated, and the Euclidean distance L2-loss is the first salient region feature and the second salient region feature. The similarity between the two salient region features.
以及,获取第一检测模型对样本图像进行检测得到的损失值loss,然后,可以根据相似度L2-loss和损失值loss对第一检测模型的参数进行调整,得到训练后的第一检测模型。And, the loss value loss obtained by detecting the sample image by the first detection model is obtained, and then the parameters of the first detection model can be adjusted according to the similarity L2-loss and the loss value loss to obtain the trained first detection model.
在一些实施方式中,根据所述相似度和所述损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型可以包括:对相似度和损失值进行加权平均运算,得到目标损失值;根据目标损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。In some embodiments, adjusting the parameters of the first detection model according to the similarity and the loss value, and obtaining the trained first detection model may include: performing a weighted average operation on the similarity and the loss value to obtain the target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
例如,可以将相似度L2-loss和损失值loss相加并取平均值,得到目标损失值=(L2-loss+loss)/2。又例如,可以设置相似度L2-loss的权重值为A,以及设置损失值loss的权重值为B,其中,A+B=1,此时,将相似度L2-loss和损失值loss分别乘以对应的权重值并求和,得到目标损失值=L2-loss*A+loss*B。For example, the similarity L2-loss and the loss value loss can be added and averaged to obtain the target loss value=(L2-loss+loss)/2. For another example, the weight value of the similarity L2-loss can be set to A, and the weight value of the loss value loss can be set to B, where A+B=1. In this case, multiply the similarity L2-loss and the loss value respectively. Take the corresponding weight value and sum it up to get the target loss value=L2-loss*A+loss*B.
然后,可以根据目标损失值对第一检测模型的参数进行调整,以使得第一检测模型的参数调整至合适的数值,得到训练后的第一检测模型。从而能够在受限的计算资源的双重限制下得到满足需求的高精度的训练后的第一检测模型,在达到相同效果的前提下能够节省较大的采集数据量,省时省资源。Then, the parameters of the first detection model may be adjusted according to the target loss value, so that the parameters of the first detection model are adjusted to appropriate values, and the trained first detection model is obtained. Therefore, a high-precision trained first detection model that meets the requirements can be obtained under the dual constraints of limited computing resources, and a large amount of collected data can be saved, saving time and resources, on the premise of achieving the same effect.
在一些实施方式中,根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型之后,检测模型训练方法还可以包括:获取待检测的图像;通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。In some embodiments, the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature, and after the trained first detection model is obtained, the detection model training method may further include: acquiring the to-be-detected The image; the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
在得到训练后的第一检测模型后,可以利用训练后的第一检测模型对图像中的目标物进行精准检测。例如,可以是通过摄像头或照相机等采集设备采集待检测的图像,或者是,可以是从预设的本地数据库或服务器上获取待检测的图像等。此时可以通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。例如,可以通过训练后的第一检测模型对图像中的人脸进行检测,得到人脸在图像中的目标位置信息,该目标位置信息可以是多边形(例如四边形)人脸框的顶角位置。After the trained first detection model is obtained, the trained first detection model can be used to accurately detect the target in the image. For example, the image to be detected may be collected by a collection device such as a camera or a camera, or the image to be detected may be obtained from a preset local database or server, or the like. At this time, the target object in the image can be detected by the trained first detection model, and the target position information of the target object in the image can be obtained. For example, the trained first detection model can be used to detect the face in the image to obtain the target position information of the face in the image, and the target position information can be the vertex position of the polygon (for example, quadrilateral) face frame.
本申请实施例可以通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信 息。然后,可以基于样本图像中目标物的位置信息,确定目标物对应的显著区域,根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征。此时,可以根据第一显著区域特征和第二著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。该方案可以利用第二检测模型对规模较小的第一检测模型进行准确训练,以便后续可以利用训练后的第一检测模型可以应用到移动终端对目标物进行检测,减小了第一检测模型的规模,以及,基于确定的目标物对应的显著区域及其显著区域特征对第一检测模型进行训练,可以提高对第一检测模型训练的可靠性和准确性,使得第一检测模型的适用范围广。This embodiment of the present application may perform feature extraction on the sample image by using the first detection model to obtain the first feature information, and perform feature extraction on the sample image by using the second detection model to obtain the second feature information. Then, the salient region corresponding to the target can be determined based on the position information of the target in the sample image, the first salient region feature can be obtained according to the first feature information and the salient region, and the second salient region can be obtained according to the second feature information and the salient region. salient regional features. At this time, the parameters of the first detection model may be adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model. In this solution, the second detection model can be used to accurately train the smaller-scale first detection model, so that the trained first detection model can be applied to the mobile terminal to detect the target, which reduces the size of the first detection model. The scale of the first detection model, and training the first detection model based on the salient region corresponding to the determined target and its salient region characteristics can improve the reliability and accuracy of the training of the first detection model, so that the scope of application of the first detection model can be improved. wide.
请参阅图5,图5是申请一实施例提供的一种检测模型使用方法的流程示意图。该检测模型使用方法可以应用在计算机设备中,用于基于训练后的第一检测模型对图像中的目标物进行精准检测。其中计算机设备可以包括移动终端、无人机、服务器和相机等,移动终端可以包括手机和平板电脑等。检测模型为训练后的第一检测模型,训练后的第一检测模型为采用上述检测模型训练方法进行训练得到的模型,并部署在计算机设备中。Please refer to FIG. 5 , which is a schematic flowchart of a method for using a detection model provided by an embodiment of the application. The method for using the detection model can be applied to computer equipment for accurately detecting the target in the image based on the trained first detection model. The computer equipment may include mobile terminals, drones, servers, cameras, etc., and the mobile terminals may include mobile phones and tablet computers. The detection model is a trained first detection model, and the trained first detection model is a model obtained by using the above-mentioned detection model training method, and is deployed in a computer device.
例如,如图6所示,对第一检测模型进行训练的流程可以包括:For example, as shown in FIG. 6 , the process of training the first detection model may include:
S21、获取样本图像。S21. Obtain a sample image.
S22、基于样本图像对Teacher模型(T-model)进行训练。S22. Train a Teacher model (T-model) based on the sample images.
S23、基于样本图像对Student模型(S-model)进行训练。S23. Train the Student model (S-model) based on the sample images.
S24、固定Teacher模型的参数,通过Teacher模型提取样本图像的特征feature-T。S24. Fix the parameters of the Teacher model, and extract the feature feature-T of the sample image through the Teacher model.
S25、通过Student模型提取样本图像的特征feature-S,以及提取显著区域pos_anchors。S25, extract the feature-S of the sample image through the Student model, and extract the salient region pos_anchors.
S26、根据显著性区pos_anchors以及特征feature-T计算显著区域特征pos_feat_T,以及根据显著性区pos_anchors以及特征feature-S计算显著区域特征pos_feat_S。S26. Calculate the salient area feature pos_feat_T according to the salient area pos_anchors and the feature feature-T, and calculate the salient area feature pos_feat_S according to the salient area pos_anchors and the feature feature-S.
S27、计算显著区域特征pos_feat_T和显著区域特征pos_feat_S的欧氏距离L2-loss。S27. Calculate the Euclidean distance L2-loss of the salient region feature pos_feat_T and the salient region feature pos_feat_S.
S28、计算Student模型的原始损失值loss。S28. Calculate the original loss value loss of the Student model.
S29、计算欧氏距离L2-loss和原始损失值loss的加权平均,并进行Student模型的再次训练finetune,得到distill-S-model(训练后的Student模型)并保存。S29, calculate the weighted average of the Euclidean distance L2-loss and the original loss value loss, and retrain the Student model with finetune to obtain the distill-S-model (trained Student model) and save it.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对检测模型训练方法的详细描述,此处不再赘述。In the above embodiments, the description of each embodiment has its own emphasis. For the part that is not described in detail in a certain embodiment, please refer to the above detailed description of the detection model training method, and will not be repeated here.
具体地,如图5所示,该检测模型使用方法可以包括步骤S201至步骤S202等。Specifically, as shown in FIG. 5 , the method for using the detection model may include steps S201 to S202 and so on.
S201、获取待检测的图像。S201. Acquire an image to be detected.
S202、通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。S202: Detect the target in the image by using the trained first detection model to obtain target position information of the target in the image.
例如,可以是通过摄像头或照相机等采集设备采集待检测的图像,或者是,可以是从预设的本地数据库或服务器上获取待检测的图像等。此时可以通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。例如,可以通过训练后的第一检测模型对图像中的人脸进行检测,得到人脸在图像中的目标位置信息,该目标位置信息可以是多边形(例如四边形)人脸框的顶角位置。实现了利用训练后的第一检测模型对图像中的目标物进行精准检测。For example, the image to be detected may be collected by a collection device such as a camera or a camera, or the image to be detected may be obtained from a preset local database or server, or the like. At this time, the target object in the image can be detected by the trained first detection model, and the target position information of the target object in the image can be obtained. For example, the trained first detection model can be used to detect the face in the image to obtain the target position information of the face in the image, and the target position information can be the vertex position of the polygon (for example, quadrilateral) face frame. The first detection model after training is used to accurately detect the target in the image.
请参阅图7,图7是本申请一实施例提供的检测模型训练装置的示意性框图。该检测模型训练装置11可以包括处理器111和存储器112,处理器111和存储器112通过总线连接,该总线比如为I2C(Inter-integrated Circuit)总线。Please refer to FIG. 7 , which is a schematic block diagram of a detection model training apparatus provided by an embodiment of the present application. The detection model training apparatus 11 may include a processor 111 and a memory 112, and the processor 111 and the memory 112 are connected through a bus, such as an I2C (Inter-integrated Circuit) bus.
具体地,处理器111可以是微控制单元(Micro-controller Unit,MCU)、中央处理单元(Central Processing Unit,CPU)或数字信号处理器(Digital Signal Processor,DSP)等。Specifically, the processor 111 may be a micro-controller unit (Micro-controller Unit, MCU), a central processing unit (Central Processing Unit, CPU), or a digital signal processor (Digital Signal Processor, DSP) or the like.
具体地,存储器112可以是Flash芯片、只读存储器(ROM,Read-Only Memory)磁盘、光盘、U盘或移动硬盘等,可以用于存储计算机程序。Specifically, the memory 112 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) magnetic disk, an optical disk, a U disk, or a mobile hard disk, etc., and may be used to store computer programs.
其中,处理器111用于调用存储在存储器112中的计算机程序,并在执行计算机程序时实现本申请实施例提供的检测模型训练方法,例如可以执行如下步骤:The processor 111 is configured to call the computer program stored in the memory 112, and implement the detection model training method provided by the embodiment of the present application when executing the computer program, for example, the following steps may be performed:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息;基于样本图像中目标物的位置信息,确定目标物对应的显著区域;根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征;根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。Perform feature extraction on the sample image through the first detection model to obtain first feature information, and perform feature extraction on the sample image through the second detection model to obtain second feature information; and determine the target object based on the position information of the target object in the sample image Corresponding salient region; according to the first feature information and salient region, obtain the first salient region feature, and according to the second feature information and salient region, obtain the second salient region feature; according to the first salient region feature and the second salient region feature The parameters of the first detection model are adjusted to obtain the trained first detection model.
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型时,处理器111用于执行:获取第一显著区域特征和第二显著区域特征之间的相似度;获取第一检测模型对样本图像进行检测得到的损失值;根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。In some embodiments, when the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor 111 is configured to execute: obtain the first salient region feature. The similarity between the regional feature and the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value, and obtain the first detection model after training. A detection model.
在一些实施方式中,在根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型时,处理器111用于执行:对相似度和损失值进行加权平均运算,得到目标损失值;根据目标损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。In some embodiments, when the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model, the processor 111 is configured to perform: weighted average operation on the similarity and the loss value , obtain the target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
在一些实施方式中,相似度包括欧几里得距离,在获取第一显著区域特征和第二显著区域特征之间的相似度时,处理器111用于执行:确定第一显著区域特征和第二显著区域特征之间的欧几里得距离,得到第一显著区域特征和第二显著区域特征之间的相似度。In some embodiments, the similarity includes Euclidean distance. When obtaining the similarity between the first salient region feature and the second salient region feature, the processor 111 is configured to perform: determining the first salient region feature and the second salient region feature. The Euclidean distance between the two salient region features, to obtain the similarity between the first salient region feature and the second salient region feature.
在一些实施方式中,在基于样本图像中目标物的位置信息,确定目标物对应的显著区域时,处理器111用于执行:获取多个候选区域;基于位置信息确定目标物的目标区域;从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域;从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,预设范围为小于第一预设阈值且大于第二预设阈值的区间;将正样本区域和负样本区域设置为目标物对应的显著区域。In some embodiments, when determining the salient region corresponding to the target object based on the position information of the target object in the sample image, the processor 111 is configured to perform: acquiring a plurality of candidate regions; determining the target region of the target object based on the position information; A positive sample region is obtained by screening out the regions whose coincidence degree with the target region is greater than the first preset threshold from the plurality of candidate regions; screening out from the plurality of candidate regions that the coincidence degree with the target region is within a preset range, and the classification probability is The area whose value is greater than the preset probability threshold value is obtained as a negative sample area, and the preset range is an interval smaller than the first preset threshold value and greater than the second preset threshold value; the positive sample area and the negative sample area are set as the significant area corresponding to the target object .
在一些实施方式中,在获取多个候选区域时,处理器111用于执行:基于第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。In some embodiments, when acquiring multiple candidate regions, the processor 111 is configured to perform: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
在一些实施方式中,在根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征时,处理器111用于执行:分别获取正样本区域和负样本区域中的第一特征信息,得到第一显著区域特征;以及,分别获取正样本区域和负样本区域的第二特征信息,得到第二显著区域特征。In some embodiments, when acquiring the first salient region feature according to the first feature information and the salient region, and acquiring the second salient region feature according to the second feature information and the salient region, the processor 111 is configured to execute: respectively acquiring The first feature information in the positive sample region and the negative sample region is obtained to obtain the first salient region feature; and the second feature information of the positive sample region and the negative sample region are respectively obtained to obtain the second salient region feature.
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型之后,处理器111还用于执行:获取待检测的图像;通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。In some embodiments, after the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor 111 is further configured to execute: acquiring the to-be-detected Detect the target object in the image through the trained first detection model, and obtain the target position information of the target object in the image.
在一些实施方式中,在通过第一检测模型对样本图像进行特征提取之前,处理器111用于执行:获取初始图像;从初始图像中提取目标物所在的区域图像;从区域图像中提取目标物的关键点;对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点;根据预处理后的关键点确定样本图像中目标物的位置信息。In some embodiments, before the feature extraction is performed on the sample image by the first detection model, the processor 111 is configured to perform: acquiring an initial image; extracting an image of the region where the target object is located from the initial image; extracting the target object from the region image The initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points; the position information of the target object in the sample image is determined according to the preprocessed key points.
在一些实施方式中,在对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点时,处理器111用于执行:对初始图像和关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到样本图像和预处理后的关键点。In some embodiments, when the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points, the processor 111 is configured to perform: rotate and translate the initial image and the key points according to a preset angle , scaling and/or brightness adjustment to obtain sample images and preprocessed keypoints.
在一些实施方式中,目标物包括人脸。In some embodiments, the target includes a human face.
在一些实施方式中,第一检测模型的规模小于第二检测模型的规模,第二检测模型为训练后的模型。In some embodiments, the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
在一些实施方式中,存储介质应用于蒸馏算法,第一检测模型为学生模型,第二检测模型为教师模型。In some embodiments, the storage medium is applied to the distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对检测模型训练方法的详细描述,此处不再赘述。In the above embodiments, the description of each embodiment has its own emphasis. For the part that is not described in detail in a certain embodiment, please refer to the above detailed description of the detection model training method, and will not be repeated here.
本申请的实施例中还提供一种存储介质,该存储介质为计算机可读存储介质,该存储介质存储有计算机程序,计算机程序中包括程序指令,处理器执行程序指令,实现本申请实施例提供的检测模型训练方法。例如,处理器可以执行:Embodiments of the present application further provide a storage medium, where the storage medium is a computer-readable storage medium, where a computer program is stored in the storage medium, and the computer program includes program instructions, and the processor executes the program instructions to realize the provision of the embodiments of the present application. The detection model training method. For example, a processor can execute:
通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过第二检测模型对样本图像进行特征提取,得到第二特征信息;基于样本图像中目标物的位置信息,确定目标物对应的显著区域;根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征;根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型。Perform feature extraction on the sample image through the first detection model to obtain first feature information, and perform feature extraction on the sample image through the second detection model to obtain second feature information; and determine the target object based on the position information of the target object in the sample image Corresponding salient region; according to the first feature information and salient region, obtain the first salient region feature, and according to the second feature information and salient region, obtain the second salient region feature; according to the first salient region feature and the second salient region feature The parameters of the first detection model are adjusted to obtain the trained first detection model.
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型时,处理器用于执行:获取第一显著区域特征和第二显著区域特征之间的相似度;获取第一检测模型对样本图像进行检测得到的损失值;根据相似度和损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。In some embodiments, when the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor is configured to execute: acquiring the first salient region feature and the similarity between the second salient region feature; obtain the loss value obtained by the first detection model to detect the sample image; adjust the parameters of the first detection model according to the similarity and the loss value to obtain the first detection model after training Model.
在一些实施方式中,在根据相似度和损失值对第一检测模型的参数进行调 整,得到训练后的第一检测模型时,处理器用于执行:对相似度和损失值进行加权平均运算,得到目标损失值;根据目标损失值对第一检测模型的参数进行调整,得到训练后的第一检测模型。In some embodiments, when the parameters of the first detection model are adjusted according to the similarity and the loss value to obtain the trained first detection model, the processor is configured to perform: performing a weighted average operation on the similarity and the loss value to obtain target loss value; adjust the parameters of the first detection model according to the target loss value to obtain the trained first detection model.
在一些实施方式中,相似度包括欧几里得距离,在获取第一显著区域特征和第二显著区域特征之间的相似度时,处理器用于执行:确定第一显著区域特征和第二显著区域特征之间的欧几里得距离,得到第一显著区域特征和第二显著区域特征之间的相似度。In some embodiments, the similarity includes Euclidean distance, and when acquiring the similarity between the first salient region feature and the second salient region feature, the processor is configured to perform: determining the first salient region feature and the second salient region feature Euclidean distance between regional features to get the similarity between the first salient region feature and the second salient region feature.
在一些实施方式中,在基于样本图像中目标物的位置信息,确定目标物对应的显著区域时,处理器用于执行:获取多个候选区域;基于位置信息确定目标物的目标区域;从多个候选区域中筛选出与目标区域的重合度大于第一预设阈值的区域,得到正样本区域;从多个候选区域中筛选出与目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,预设范围为小于第一预设阈值且大于第二预设阈值的区间;将正样本区域和负样本区域设置为目标物对应的显著区域。In some embodiments, when determining the salient region corresponding to the target object based on the position information of the target object in the sample image, the processor is configured to: obtain a plurality of candidate regions; determine the target region of the target object based on the position information; From the candidate regions, select the regions whose coincidence degree with the target region is greater than the first preset threshold, and obtain a positive sample region; and select from the plurality of candidate regions that the coincidence degree with the target region is within the preset range, and the classification probability value is greater than A region with a preset probability threshold is obtained as a negative sample region, and the preset range is an interval smaller than the first preset threshold and greater than the second preset threshold; the positive sample region and the negative sample region are set as salient regions corresponding to the target.
在一些实施方式中,在获取多个候选区域时,处理器用于执行:基于第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。In some embodiments, when acquiring multiple candidate regions, the processor is configured to perform: generating multiple candidate regions based on the second detection model, or acquiring multiple pre-labeled candidate regions.
在一些实施方式中,在根据第一特征信息和显著区域,获取第一显著区域特征,以及根据第二特征信息和显著区域,获取第二显著区域特征时,处理器用于执行:分别获取正样本区域和负样本区域中的第一特征信息,得到第一显著区域特征;以及,分别获取正样本区域和负样本区域的第二特征信息,得到第二显著区域特征。In some embodiments, when obtaining the first salient region feature according to the first feature information and the salient region, and obtaining the second salient region feature according to the second feature information and the salient region, the processor is configured to execute: respectively obtaining positive samples obtaining the first salient region feature by obtaining the first feature information in the region and the negative sample region; and obtaining the second feature information of the positive sample region and the negative sample region respectively to obtain the second salient region feature.
在一些实施方式中,在根据第一显著区域特征和第二显著区域特征对第一检测模型的参数进行调整,得到训练后的第一检测模型之后,处理器还用于执行:获取待检测的图像;通过训练后的第一检测模型对图像中的目标物进行检测,得到目标物在图像中的目标位置信息。In some embodiments, after the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain the trained first detection model, the processor is further configured to perform: acquiring the to-be-detected The image; the target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
在一些实施方式中,在通过第一检测模型对样本图像进行特征提取之前,处理器用于执行:获取初始图像;从初始图像中提取目标物所在的区域图像;从区域图像中提取目标物的关键点;对初始图像和关键点进行预处理,得到样本图像和预处理后的关键点;根据预处理后的关键点确定样本图像中目标物的位置信息。In some embodiments, before the feature extraction is performed on the sample image by the first detection model, the processor is configured to perform: acquiring an initial image; extracting an image of an area where the target is located from the initial image; extracting the key of the target from the area image point; preprocess the initial image and key points to obtain the sample image and the preprocessed key points; determine the position information of the target in the sample image according to the preprocessed key points.
在一些实施方式中,在对初始图像和关键点进行预处理,得到样本图像和 预处理后的关键点时,处理器用于执行:对初始图像和关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到样本图像和预处理后的关键点。In some embodiments, when the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points, the processor is configured to perform: rotate, translate, and zoom the initial image and the key points according to a preset angle and/or brightness adjustment to obtain sample images and preprocessed keypoints.
在一些实施方式中,目标物包括人脸。In some embodiments, the target includes a human face.
在一些实施方式中,第一检测模型的规模小于第二检测模型的规模,第二检测模型为训练后的模型。In some embodiments, the scale of the first detection model is smaller than the scale of the second detection model, which is a trained model.
在一些实施方式中,存储介质应用于蒸馏算法,第一检测模型为学生模型,第二检测模型为教师模型。In some embodiments, the storage medium is applied to the distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对检测模型训练方法的详细描述,此处不再赘述。In the above embodiments, the description of each embodiment has its own emphasis. For the part that is not described in detail in a certain embodiment, please refer to the above detailed description of the detection model training method, and will not be repeated here.
其中,存储介质可以是前述任一实施例所述的检测模型训练装置的内部存储单元,例如检测模型训练装置的硬盘或内存。存储介质也可以是检测模型训练装置的外部存储设备,例如检测模型训练装置上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The storage medium may be an internal storage unit of the detection model training apparatus described in any of the foregoing embodiments, such as a hard disk or a memory of the detection model training apparatus. The storage medium can also be an external storage device of the detection model training device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card equipped on the detection model training device (Flash Card) etc.
由于该存储介质中所存储的计算机程序,可以执行本申请实施例所提供的任一种检测模型训练方法,因此,可以实现本申请实施例所提供的任一种检测模型训练方法所能实现的有益效果,详见前面的实施例,在此不再赘述。Since the computer program stored in the storage medium can execute any detection model training method provided by the embodiments of the present application, it is possible to implement any of the detection model training methods provided by the embodiments of the present application. For the beneficial effects, refer to the foregoing embodiments for details, which will not be repeated here.
应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should be understood that the terms used in the specification of the present application herein are for the purpose of describing particular embodiments only and are not intended to limit the present application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items. It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到 各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions shall be covered by the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (29)

  1. 一种检测模型训练方法,其特征在于,包括:A detection model training method, comprising:
    通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;Perform feature extraction on the sample image through the first detection model to obtain first feature information, and perform feature extraction on the sample image through the trained second detection model to obtain second feature information;
    基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;determining the salient area corresponding to the target based on the position information of the target in the sample image;
    根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;obtaining a first salient area feature according to the first feature information and the salient area, and obtaining a second salient area feature according to the second feature information and the salient area;
    根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  2. 根据权利要求1所述的检测模型训练方法,其特征在于,所述根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型包括:The detection model training method according to claim 1, wherein the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained The first detection model includes:
    获取所述第一显著区域特征和所述第二显著区域特征之间的相似度;obtaining the similarity between the first salient region feature and the second salient region feature;
    获取所述第一检测模型对所述样本图像进行检测得到的损失值;obtaining a loss value obtained by detecting the sample image by the first detection model;
    根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the similarity and the loss value to obtain a trained first detection model.
  3. 根据权利要求2所述的检测模型训练方法,其特征在于,所述根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型包括:The detection model training method according to claim 2, wherein the adjusting parameters of the first detection model according to the similarity and the loss value, and obtaining the trained first detection model comprises:
    对所述相似度和所述损失值进行加权平均运算,得到目标损失值;Perform a weighted average operation on the similarity and the loss value to obtain a target loss value;
    根据所述目标损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the target loss value to obtain a trained first detection model.
  4. 根据权利要求2所述的检测模型训练方法,其特征在于,所述相似度包括欧几里得距离,所述获取所述第一显著区域特征和所述第二显著区域特征之间的相似度包括:The detection model training method according to claim 2, wherein the similarity comprises a Euclidean distance, and the acquiring the similarity between the first salient region feature and the second salient region feature include:
    确定所述第一显著区域特征和所述第二显著区域特征之间的欧几里得距离,得到所述第一显著区域特征和所述第二显著区域特征之间的相似度。The Euclidean distance between the first salient region feature and the second salient region feature is determined, and the similarity between the first salient region feature and the second salient region feature is obtained.
  5. 根据权利要求1所述的检测模型训练方法,其特征在于,所述基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域包括:The detection model training method according to claim 1, wherein the determining the salient region corresponding to the target object based on the position information of the target object in the sample image comprises:
    获取多个候选区域;Get multiple candidate regions;
    基于所述位置信息确定所述目标物的目标区域;determining a target area of the target based on the location information;
    从所述多个候选区域中筛选出与所述目标区域的重合度大于第一预设阈值的区域,得到正样本区域;Screening out the regions whose coincidence degree with the target region is greater than the first preset threshold from the plurality of candidate regions to obtain a positive sample region;
    从所述多个候选区域中筛选出与所述目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,所述预设范围为小于所述第一预设阈值且大于第二预设阈值的区间;From the plurality of candidate regions, select the regions whose coincidence degree with the target region is within the preset range, and the classification probability value is greater than the preset probability threshold, to obtain a negative sample region, and the preset range is smaller than the an interval between the first preset threshold and greater than the second preset threshold;
    将所述正样本区域和所述负样本区域设置为所述目标物对应的显著区域。The positive sample area and the negative sample area are set as salient areas corresponding to the target.
  6. 根据权利要求5所述的检测模型训练方法,其特征在于,所述获取多个候选区域包括:The detection model training method according to claim 5, wherein the acquiring multiple candidate regions comprises:
    基于所述第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。Generate multiple candidate regions based on the second detection model, or obtain multiple candidate regions marked in advance.
  7. 根据权利要求5所述的检测模型训练方法,其特征在于,所述根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征包括:The detection model training method according to claim 5, wherein the first salient region feature is acquired according to the first feature information and the salient region, and the second feature information and the salient region are obtained according to the second feature information and the salient region. region, and obtaining the second salient region features include:
    分别获取所述正样本区域和所述负样本区域中的所述第一特征信息,得到第一显著区域特征;以及,respectively acquiring the first feature information in the positive sample region and the negative sample region to obtain a first salient region feature; and,
    分别获取所述正样本区域和所述负样本区域的所述第二特征信息,得到第二显著区域特征。The second feature information of the positive sample area and the negative sample area are acquired respectively, to obtain a second salient area feature.
  8. 根据权利要求1所述的检测模型训练方法,其特征在于,所述根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型之后,所述检测模型训练方法还包括:The detection model training method according to claim 1, wherein the parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained After the first detection model, the detection model training method further includes:
    获取待检测的图像;Get the image to be detected;
    通过所述训练后的第一检测模型对所述图像中的所述目标物进行检测,得到所述目标物在所述图像中的目标位置信息。Detect the target in the image by using the trained first detection model to obtain target position information of the target in the image.
  9. 根据权利要求1所述的检测模型训练方法,其特征在于,所述通过第一检测模型对样本图像进行特征提取之前,所述检测模型训练方法还包括:The detection model training method according to claim 1, wherein before the feature extraction is performed on the sample image by the first detection model, the detection model training method further comprises:
    获取初始图像;get the initial image;
    从所述初始图像中提取所述目标物所在的区域图像;extracting an image of the region where the target is located from the initial image;
    从所述区域图像中提取所述目标物的关键点;extracting key points of the target from the area image;
    对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后 的关键点;The initial image and the key point are preprocessed to obtain the sample image and the preprocessed key point;
    根据所述预处理后的关键点确定所述样本图像中所述目标物的位置信息。The position information of the target object in the sample image is determined according to the preprocessed key points.
  10. 根据权利要求9所述的检测模型训练方法,其特征在于,所述对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后的关键点包括:The detection model training method according to claim 9, wherein the preprocessing of the initial image and the key points to obtain the sample image and the preprocessed key points comprises:
    对所述初始图像和所述关键点按照预设角度进行旋转、平移、缩放和/或亮度调节,得到所述样本图像和预处理后的关键点。Rotate, translate, zoom and/or adjust the brightness of the initial image and the key points according to a preset angle to obtain the sample image and the pre-processed key points.
  11. 根据权利要求1所述的检测模型训练方法,其特征在于,所述目标物包括人脸。The detection model training method according to claim 1, wherein the target object comprises a human face.
  12. 根据权利要求1至11任一项所述的检测模型训练方法,其特征在于,所述第一检测模型的规模小于所述第二检测模型的规模,所述第二检测模型为训练后的模型。The detection model training method according to any one of claims 1 to 11, wherein the scale of the first detection model is smaller than the scale of the second detection model, and the second detection model is a trained model .
  13. 根据权利要求1至11任一项所述的检测模型训练方法,其特征在于,所述检测模型训练方法应用于蒸馏算法,所述第一检测模型为学生模型,所述第二检测模型为教师模型。The detection model training method according to any one of claims 1 to 11, wherein the detection model training method is applied to a distillation algorithm, the first detection model is a student model, and the second detection model is a teacher Model.
  14. 一种检测模型训练装置,其特征在于,包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时执行如权利要求1至13任一项所述的检测模型训练方法。A detection model training device, characterized in that it includes a processor and a memory, wherein a computer program is stored in the memory, and the processor executes the method described in any one of claims 1 to 13 when the processor calls the computer program in the memory. The detection model training method described above.
  15. 一种检测模型使用方法,其特征在于,应用于计算机设备,所述检测模型为训练后的第一检测模型,所述训练后的第一检测模型为采用权利要求1至13任一项所述的检测模型训练方法进行训练得到的模型,并部署在所述计算机设备中;所述检测模型使用方法包括:A method for using a detection model, characterized in that, when applied to computer equipment, the detection model is a first detection model after training, and the first detection model after training is a method described in any one of claims 1 to 13. The detection model training method is used to train the obtained model, and is deployed in the computer equipment; the detection model using method includes:
    获取待检测的图像;Get the image to be detected;
    通过所述训练后的第一检测模型对所述图像中的目标物进行检测,得到所述目标物在所述图像中的目标位置信息。The target object in the image is detected by the trained first detection model, and the target position information of the target object in the image is obtained.
  16. 根据权利要求15所述的检测模型使用方法,其特征在于,所述计算机设备包括移动终端、无人机和相机。The method for using a detection model according to claim 15, wherein the computer equipment comprises a mobile terminal, a drone and a camera.
  17. 一种存储介质,其特征在于,所述存储介质用于存储计算机程序,所述计算机程序被处理器加载以执行:A storage medium, characterized in that the storage medium is used to store a computer program, and the computer program is loaded by a processor to execute:
    通过第一检测模型对样本图像进行特征提取,得到第一特征信息,以及通过训练好的第二检测模型对所述样本图像进行特征提取,得到第二特征信息;Perform feature extraction on the sample image through the first detection model to obtain first feature information, and perform feature extraction on the sample image through the trained second detection model to obtain second feature information;
    基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域;determining the salient area corresponding to the target based on the position information of the target in the sample image;
    根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征;obtaining a first salient area feature according to the first feature information and the salient area, and obtaining a second salient area feature according to the second feature information and the salient area;
    根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the first salient region feature and the second salient region feature to obtain a trained first detection model.
  18. 根据权利要求17所述的存储介质,其特征在于,在根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型时,所述处理器用于执行:The storage medium according to claim 17, wherein the first detection model after training is obtained by adjusting parameters of the first detection model according to the first salient region feature and the second salient region feature model, the processor is used to execute:
    获取所述第一显著区域特征和所述第二显著区域特征之间的相似度;obtaining the similarity between the first salient region feature and the second salient region feature;
    获取所述第一检测模型对所述样本图像进行检测得到的损失值;obtaining a loss value obtained by detecting the sample image by the first detection model;
    根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the similarity and the loss value to obtain a trained first detection model.
  19. 根据权利要求18所述的存储介质,其特征在于,在根据所述相似度和所述损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型时,所述处理器用于执行:The storage medium according to claim 18, wherein when parameters of the first detection model are adjusted according to the similarity and the loss value to obtain a trained first detection model, the processing The device is used to execute:
    对所述相似度和所述损失值进行加权平均运算,得到目标损失值;Perform a weighted average operation on the similarity and the loss value to obtain a target loss value;
    根据所述目标损失值对所述第一检测模型的参数进行调整,得到训练后的第一检测模型。The parameters of the first detection model are adjusted according to the target loss value to obtain a trained first detection model.
  20. 根据权利要求18所述的存储介质,其特征在于,所述相似度包括欧几里得距离,在获取所述第一显著区域特征和所述第二显著区域特征之间的相似度时,所述处理器用于执行:The storage medium according to claim 18, wherein the similarity comprises a Euclidean distance, and when acquiring the similarity between the first salient region feature and the second salient region feature, the The processor described above is used to execute:
    确定所述第一显著区域特征和所述第二显著区域特征之间的欧几里得距离,得到所述第一显著区域特征和所述第二显著区域特征之间的相似度。The Euclidean distance between the first salient region feature and the second salient region feature is determined, and the similarity between the first salient region feature and the second salient region feature is obtained.
  21. 根据权利要求17所述的存储介质,其特征在于,在基于所述样本图像中目标物的位置信息,确定所述目标物对应的显著区域时,所述处理器用于执行:The storage medium according to claim 17, wherein, when determining the salient region corresponding to the target object based on the position information of the target object in the sample image, the processor is configured to execute:
    获取多个候选区域;Get multiple candidate regions;
    基于所述位置信息确定所述目标物的目标区域;determining a target area of the target based on the location information;
    从所述多个候选区域中筛选出与所述目标区域的重合度大于第一预设阈值的区域,得到正样本区域;Screening out the regions whose coincidence degree with the target region is greater than the first preset threshold from the plurality of candidate regions to obtain a positive sample region;
    从所述多个候选区域中筛选出与所述目标区域的重合度在预设范围内,且分类概率值大于预设概率阈值的区域,得到负样本区域,所述预设范围为小于 所述第一预设阈值且大于第二预设阈值的区间;From the plurality of candidate regions, select the regions whose coincidence degree with the target region is within the preset range, and the classification probability value is greater than the preset probability threshold, to obtain a negative sample region, and the preset range is smaller than the an interval between the first preset threshold and greater than the second preset threshold;
    将所述正样本区域和所述负样本区域设置为所述目标物对应的显著区域。The positive sample area and the negative sample area are set as salient areas corresponding to the target.
  22. 根据权利要求21所述的存储介质,其特征在于,在获取多个候选区域时,所述处理器用于执行:The storage medium according to claim 21, wherein when acquiring multiple candidate regions, the processor is configured to execute:
    基于所述第二检测模型生成多个候选区域,或者获取预先标注的多个候选区域。Generate multiple candidate regions based on the second detection model, or obtain multiple candidate regions marked in advance.
  23. 根据权利要求21所述的存储介质,其特征在于,在根据所述第一特征信息和所述显著区域,获取第一显著区域特征,以及根据所述第二特征信息和所述显著区域,获取第二显著区域特征时,所述处理器用于执行:The storage medium according to claim 21, characterized in that in obtaining the first salient region feature according to the first feature information and the salient region, and obtaining the first salient region feature according to the second feature information and the salient region When the second salient region is characterized, the processor is configured to execute:
    分别获取所述正样本区域和所述负样本区域中的所述第一特征信息,得到第一显著区域特征;以及,respectively acquiring the first feature information in the positive sample region and the negative sample region to obtain a first salient region feature; and,
    分别获取所述正样本区域和所述负样本区域的所述第二特征信息,得到第二显著区域特征。The second feature information of the positive sample area and the negative sample area are acquired respectively, to obtain a second salient area feature.
  24. 根据权利要求17所述的存储介质,其特征在于,在根据所述第一显著区域特征和所述第二显著区域特征对所述第一检测模型的参数进行调整,得到训练后的第一检测模型之后,所述处理器还用于执行:The storage medium according to claim 17, wherein the first detection model after training is obtained by adjusting parameters of the first detection model according to the first salient region feature and the second salient region feature After the model, the processor is also used to execute:
    获取待检测的图像;Get the image to be detected;
    通过所述训练后的第一检测模型对所述图像中的所述目标物进行检测,得到所述目标物在所述图像中的目标位置信息。Detect the target in the image by using the trained first detection model to obtain target position information of the target in the image.
  25. 根据权利要求17所述的存储介质,其特征在于,在通过第一检测模型对样本图像进行特征提取之前,所述处理器用于执行:The storage medium according to claim 17, wherein before the feature extraction is performed on the sample image by the first detection model, the processor is configured to execute:
    获取初始图像;get the initial image;
    从所述初始图像中提取所述目标物所在的区域图像;extracting an image of the region where the target is located from the initial image;
    从所述区域图像中提取所述目标物的关键点;extracting key points of the target from the area image;
    对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后的关键点;Preprocessing the initial image and the key points to obtain the sample image and the preprocessed key points;
    根据所述预处理后的关键点确定所述样本图像中所述目标物的位置信息。The position information of the target object in the sample image is determined according to the preprocessed key points.
  26. 根据权利要求25所述的存储介质,其特征在于,在对所述初始图像和所述关键点进行预处理,得到所述样本图像和预处理后的关键点时,所述处理器用于执行:The storage medium according to claim 25, wherein when the initial image and the key points are preprocessed to obtain the sample image and the preprocessed key points, the processor is configured to execute:
    对所述初始图像和所述关键点按照预设角度进行旋转、平移、缩放和/或亮 度调节,得到所述样本图像和预处理后的关键点。Rotate, translate, zoom and/or adjust the brightness of the initial image and the key points according to a preset angle to obtain the sample image and the pre-processed key points.
  27. 根据权利要求17所述的存储介质,其特征在于,所述目标物包括人脸。The storage medium of claim 17, wherein the target comprises a human face.
  28. 根据权利要求17至27任一项所述的存储介质,其特征在于,所述第一检测模型的规模小于所述第二检测模型的规模,所述第二检测模型为训练后的模型。The storage medium according to any one of claims 17 to 27, wherein the scale of the first detection model is smaller than the scale of the second detection model, and the second detection model is a trained model.
  29. 根据权利要求17至27任一项所述的存储介质,其特征在于,所述存储介质应用于蒸馏算法,所述第一检测模型为学生模型,所述第二检测模型为教师模型。The storage medium according to any one of claims 17 to 27, wherein the storage medium is applied to a distillation algorithm, the first detection model is a student model, and the second detection model is a teacher model.
PCT/CN2020/104973 2020-07-27 2020-07-27 Detection model training method and device, detection model using method and storage medium WO2022021029A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080015995.2A CN113490947A (en) 2020-07-27 2020-07-27 Detection model training method and device, detection model using method and storage medium
PCT/CN2020/104973 WO2022021029A1 (en) 2020-07-27 2020-07-27 Detection model training method and device, detection model using method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/104973 WO2022021029A1 (en) 2020-07-27 2020-07-27 Detection model training method and device, detection model using method and storage medium

Publications (1)

Publication Number Publication Date
WO2022021029A1 true WO2022021029A1 (en) 2022-02-03

Family

ID=77933700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104973 WO2022021029A1 (en) 2020-07-27 2020-07-27 Detection model training method and device, detection model using method and storage medium

Country Status (2)

Country Link
CN (1) CN113490947A (en)
WO (1) WO2022021029A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897069A (en) * 2022-05-09 2022-08-12 大庆立能电力机械设备有限公司 Intelligent control energy-saving protection device for oil pumping unit
CN115687914A (en) * 2022-09-07 2023-02-03 中国电信股份有限公司 Model distillation method, device, electronic equipment and computer readable medium
CN115761529A (en) * 2023-01-09 2023-03-07 阿里巴巴(中国)有限公司 Image processing method and electronic device
CN115908982A (en) * 2022-12-01 2023-04-04 北京百度网讯科技有限公司 Image processing method, model training method, device, equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580533A (en) * 2022-03-04 2022-06-03 腾讯科技(深圳)有限公司 Method, apparatus, device, medium, and program product for training feature extraction model
WO2024113242A1 (en) * 2022-11-30 2024-06-06 京东方科技集团股份有限公司 Dress code discrimination method, person re-identification model training method, and apparatus
CN116071608B (en) * 2023-03-16 2023-06-06 浙江啄云智能科技有限公司 Target detection method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898168A (en) * 2018-06-19 2018-11-27 清华大学 The compression method and system of convolutional neural networks model for target detection
WO2019143946A1 (en) * 2018-01-19 2019-07-25 Visa International Service Association System, method, and computer program product for compressing neural network models
CN110245662A (en) * 2019-06-18 2019-09-17 腾讯科技(深圳)有限公司 Detection model training method, device, computer equipment and storage medium
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111382870A (en) * 2020-03-06 2020-07-07 商汤集团有限公司 Method and device for training neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019143946A1 (en) * 2018-01-19 2019-07-25 Visa International Service Association System, method, and computer program product for compressing neural network models
CN108898168A (en) * 2018-06-19 2018-11-27 清华大学 The compression method and system of convolutional neural networks model for target detection
CN110245662A (en) * 2019-06-18 2019-09-17 腾讯科技(深圳)有限公司 Detection model training method, device, computer equipment and storage medium
CN110599503A (en) * 2019-06-18 2019-12-20 腾讯科技(深圳)有限公司 Detection model training method and device, computer equipment and storage medium
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111382870A (en) * 2020-03-06 2020-07-07 商汤集团有限公司 Method and device for training neural network

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897069A (en) * 2022-05-09 2022-08-12 大庆立能电力机械设备有限公司 Intelligent control energy-saving protection device for oil pumping unit
CN114897069B (en) * 2022-05-09 2023-04-07 大庆立能电力机械设备有限公司 Intelligent control energy-saving protection device for oil pumping unit
CN115687914A (en) * 2022-09-07 2023-02-03 中国电信股份有限公司 Model distillation method, device, electronic equipment and computer readable medium
CN115687914B (en) * 2022-09-07 2024-01-30 中国电信股份有限公司 Model distillation method, apparatus, electronic device, and computer-readable medium
CN115908982A (en) * 2022-12-01 2023-04-04 北京百度网讯科技有限公司 Image processing method, model training method, device, equipment and storage medium
CN115761529A (en) * 2023-01-09 2023-03-07 阿里巴巴(中国)有限公司 Image processing method and electronic device
CN115761529B (en) * 2023-01-09 2023-05-30 阿里巴巴(中国)有限公司 Image processing method and electronic device

Also Published As

Publication number Publication date
CN113490947A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
WO2022021029A1 (en) Detection model training method and device, detection model using method and storage medium
US20200160040A1 (en) Three-dimensional living-body face detection method, face authentication recognition method, and apparatuses
WO2019128646A1 (en) Face detection method, method and device for training parameters of convolutional neural network, and medium
WO2021190171A1 (en) Image recognition method and apparatus, terminal, and storage medium
CN108229509B (en) Method and device for identifying object class and electronic equipment
WO2019232866A1 (en) Human eye model training method, human eye recognition method, apparatus, device and medium
US8792722B2 (en) Hand gesture detection
US10534957B2 (en) Eyeball movement analysis method and device, and storage medium
US8750573B2 (en) Hand gesture detection
WO2019232862A1 (en) Mouth model training method and apparatus, mouth recognition method and apparatus, device, and medium
WO2017096753A1 (en) Facial key point tracking method, terminal, and nonvolatile computer readable storage medium
WO2021136528A1 (en) Instance segmentation method and apparatus
WO2019033572A1 (en) Method for detecting whether face is blocked, device and storage medium
US9626553B2 (en) Object identification apparatus and object identification method
US7869632B2 (en) Automatic trimming method, apparatus and program
EP2697775A1 (en) Method of detecting facial attributes
WO2020151299A1 (en) Yellow no-parking line identification method and apparatus, computer device and storage medium
JP2022133378A (en) Face biological detection method, device, electronic apparatus, and storage medium
WO2023284182A1 (en) Training method for recognizing moving target, method and device for recognizing moving target
WO2020155790A1 (en) Method and apparatus for extracting claim settlement information, and electronic device
WO2019033570A1 (en) Lip movement analysis method, apparatus and storage medium
WO2019033567A1 (en) Method for capturing eyeball movement, device and storage medium
WO2024077935A1 (en) Visual-slam-based vehicle positioning method and apparatus
CN115170893B (en) Training method of common-view gear classification network, image sorting method and related equipment
CN112766065A (en) Mobile terminal examinee identity authentication method, device, terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20947718

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20947718

Country of ref document: EP

Kind code of ref document: A1