WO2022193731A1 - 对象识别模型的训练方法、装置及存储介质 - Google Patents

对象识别模型的训练方法、装置及存储介质 Download PDF

Info

Publication number
WO2022193731A1
WO2022193731A1 PCT/CN2021/134345 CN2021134345W WO2022193731A1 WO 2022193731 A1 WO2022193731 A1 WO 2022193731A1 CN 2021134345 W CN2021134345 W CN 2021134345W WO 2022193731 A1 WO2022193731 A1 WO 2022193731A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target object
recognition model
monitoring
result
Prior art date
Application number
PCT/CN2021/134345
Other languages
English (en)
French (fr)
Inventor
黄国雄
唐槐
余子君
Original Assignee
杭州海康威视系统技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视系统技术有限公司 filed Critical 杭州海康威视系统技术有限公司
Publication of WO2022193731A1 publication Critical patent/WO2022193731A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the technical field of video surveillance, and in particular, to a training method, device and storage medium for an object recognition model.
  • object recognition is an important research direction.
  • the current object recognition technology usually uses a sample image to pre-train an object recognition model, and according to the object recognition model, recognizes the target object on the image of the target object with the target scene as the background.
  • the sample images participating in the training of the object recognition model may not be images with the target scene as the background. In this way, when the pre-trained object recognition model based on these sample images is used to recognize the target object on the image with the target scene as the background, the obtained recognition result may be inaccurate.
  • the object recognition model applicable to the target scene can be retrained for the target scene.
  • Embodiments of the present application provide a training method, device, and storage medium for an object recognition model, which help to improve the adaptability of the trained object recognition model more efficiently.
  • an embodiment of the present application provides an image processing method, the method includes: acquiring a first monitoring image of the same monitoring point at different times, a reference image including a target object, and annotation information, where the annotation information is used to represent The recognition result of the target object in the reference image, and the accuracy of the annotation information is greater than the accuracy threshold; according to the obtained first monitoring image and the reference image, a fusion image of the same monitoring point at different times is generated, and the fusion image includes the target object.
  • the annotation information is determined as the annotation result of the fusion image; according to the fusion image and the annotation result, the current object recognition model is iteratively trained until the model converges, and the first target object recognition model is obtained, the The first target object recognition model is used for identifying target objects in the monitoring images of the same monitoring point.
  • the monitoring point is generated according to the first monitoring image and the reference image obtained at the monitoring point. Fusion images at different times, the fused image includes the target object and the background of the first monitoring image.
  • the first monitoring image can be an image that only includes the background collected from the specific monitoring point, or can be collected for the specific monitoring point.
  • the annotation information of the reference image is determined as the annotation result of the fusion image, and the annotation information of the reference image represents the recognition result of the target object in the reference image.
  • the current object recognition model is trained, and the obtained first target object recognition model can be used to recognize the target object in the image obtained from the monitoring point without manually obtaining the monitoring point.
  • the obtained sample images including the target object save labor costs, thereby improving the adaptability of the trained object recognition model more efficiently.
  • the above-mentioned “generating fused images of the same monitoring point at different times according to the obtained first monitoring image and reference image” includes: acquiring the image of the target object in the reference image; The image fusion algorithm fuses the first monitoring image of the same monitoring point at different times and the image of the target object, and obtains the fusion image of the same monitoring point at different times.
  • the image of the target object can be fused into different positions in the background image of the first monitoring image, and multiple different fusion images of the same monitoring point at different times can be obtained, which further enriches the samples participating in the training. image, thereby improving the adaptability of the trained first target object recognition model.
  • the above “according to the preset image fusion algorithm, fuse the first monitoring image of the same monitoring point at different times and the image of the target object to obtain the fusion image of the same monitoring point at different times. "Including: according to the preset image fusion algorithm, fusing the first monitoring image of the same monitoring point and the image of the target object at different times to obtain the intermediate images of the same monitoring point at different times; for the same monitoring point at different times Perform data enhancement processing on the intermediate image of the same monitoring point to obtain the fusion images of the same monitoring point at different times.
  • data enhancement is performed on the intermediate image fused by the preset image fusion algorithm, for example, adding noise to the fused image, adjusting the contrast of the fused image, adjusting the saturation of the fused image, cropping or scaling the fused image, etc., to obtain More fused images further enrich the sample images participating in the training, thereby further improving the adaptability of the trained first target object recognition model.
  • the above-mentioned "obtaining a reference image including the target object and labeling information" includes: inputting the test image including the target object into the current object recognition model, and obtaining the recognition result of each test image The test image corresponding to the target recognition result and the recognition result whose accuracy is greater than the accuracy threshold is used as the reference image;
  • the target recognition result is the adjustment operation in response to the recognition result whose accuracy is less than or equal to the described accuracy threshold, and the obtained adjustment
  • the target recognition result and the recognition result whose accuracy is greater than the accuracy threshold are used as the label information.
  • the target recognition results and the recognition results whose accuracy is greater than the accuracy threshold are used as the annotation information, which saves the cost of manual annotation, and improves the adaptability of the trained object recognition model more efficiently.
  • the method further includes: acquiring a second monitoring image including the object to be identified at the same monitoring point; inputting the second monitoring image into the first target object recognition model to obtain the object to be identified in the second monitoring image Identify the intermediate identification result of the object; in response to the adjustment operation on the intermediate identification result, obtain the adjusted intermediate identification result, and the adjusted intermediate identification result is used to represent whether the object to be identified is the target object; determine the adjusted intermediate identification result is the labeling result of the second monitoring image; according to the second monitoring image and the labeling result of the second monitoring image, the first target object recognition model is iteratively trained to obtain the second target object recognition model.
  • the method further includes: acquiring a third monitoring image in which the same monitoring point includes the object to be identified, and an annotation result of the third monitoring image is used to indicate whether the object to be identified is candidate object; input the third monitoring image into the first target object recognition model to obtain an intermediate recognition result of the object to be recognized in the third monitoring image; characterize the to-be-recognized object in response to the intermediate recognition result
  • the object is the target object, and the adjustment operation on the first target object recognition model obtains the adjusted first target object recognition model; the adjusted first target object recognition model is used to output the to-be-recognized Whether the object is the identification result of the candidate object; according to the third monitoring image and the labeling result of the third monitoring image, iteratively train the adjusted first target object recognition model to obtain a third target object recognition model .
  • an embodiment of the present application provides an apparatus for training an object recognition model.
  • the training device includes: an acquisition module, a generation module, a determination module and a training module.
  • the above acquisition module is used to acquire the first monitoring images of the same monitoring point at different times, the reference images including the target object, and the labeling information; the labeling information is used to represent the recognition result of the target object in the reference image, and the labeling information is accurate. is greater than the accuracy threshold.
  • the above-mentioned generating module is configured to generate fused images of the same monitoring point at different times according to the first monitoring image and the reference image acquired by the acquiring module, and the fused images include the target object and the background of the first monitoring image.
  • the above determining module is configured to determine the labeling information obtained by the obtaining module as the labeling result of the fusion image generated by the generating module.
  • the above training module is used to iteratively train the current object recognition model according to the fusion image generated by the generation module and the labeling result determined by the determination module, until the model converges, to obtain the first target object recognition model; the first target object recognition model uses It is used to identify the target object in the first monitoring image of the same monitoring point.
  • the above-mentioned generation module is specifically used to obtain the image of the target object in the reference image; according to a preset image fusion algorithm, the first monitoring image and the image of the target object at the same monitoring point at different times are fused to obtain the same monitoring image. Fused images of points at different times.
  • the above-mentioned generating module is specifically used for: according to a preset image fusion algorithm, fusing the first monitoring image of the same monitoring point at different times and the image of the target object to obtain intermediate images of the same monitoring point at different times; Data enhancement is performed on the intermediate images of the same monitoring point at different times, and the fusion images of the same monitoring point at different times are obtained.
  • the above acquisition module is specifically used for: inputting the test image including the target object into the current object recognition model to obtain the recognition result of each test image; the target recognition result and the recognition result whose accuracy is greater than the accuracy threshold.
  • the corresponding test image is used as the reference image; the target recognition result is the adjusted recognition result obtained in response to the adjustment operation to the recognition result whose accuracy is less than or equal to the described accuracy threshold; the target recognition result and the accuracy are greater than the accuracy
  • the recognition result of the threshold is used as the label information.
  • the above-mentioned acquisition module is further configured to: acquire a second monitoring image of the same monitoring point including the object to be identified at different times; input the second monitoring image into the first target object recognition model, and obtain the to-be-identified image in the second monitoring image.
  • the intermediate identification result of the object in response to the adjustment operation on the intermediate identification result, the adjusted intermediate identification result is obtained, and the adjusted intermediate identification result is used to represent whether the object to be identified is the target object; the determination module is also used for: adjusting the adjusted intermediate identification result
  • the intermediate recognition result is determined as the labeling result of the second monitoring image; the training module is also used to: perform iterative training on the first target object recognition model according to the second monitoring image and the labeling result of the second monitoring image to obtain the second target object recognition model. Model.
  • the obtaining module is further configured to: obtain a third monitoring image including the object to be identified at the same monitoring point, and the labeling result of the third monitoring image is used to indicate whether the object to be identified is a candidate object; Inputting the third monitoring image into the first target object recognition model to obtain an intermediate identification result of the object to be identified in the third monitoring image; in response to the intermediate identification result, characterize the object to be identified as the object to be identified. the target object, and the adjustment operation on the first target object recognition model to obtain the adjusted first target object recognition model; the adjusted first target object recognition model is used to output whether the object to be recognized is the identification result of the candidate object;
  • the training module is further configured to perform iterative training on the adjusted first target object recognition model according to the third monitoring image and the labeling result of the third monitoring image to obtain a third target object recognition model.
  • the present application provides a training device for an object recognition model, comprising: a memory and a processor; the memory and the processor are coupled; the memory is used to store computer program code, the computer program code includes computer instructions; When the computer executes the computer instructions, the object recognition model training apparatus executes the object recognition model training method provided by the first aspect and any possible implementation manners thereof.
  • the present application provides a computer-readable storage medium storing instructions.
  • the training device of the object recognition model is caused to execute the method for training the object recognition model provided by the above-mentioned first aspect and any possible implementation manners thereof.
  • the present application provides a computer program product that, when the computer program product runs on a training device for an object recognition model, enables the training device for the object recognition model to perform the first aspect and any one of its possibilities.
  • the implementation of the provided object recognition model training method is not limited to:
  • the above computer instructions may be stored in whole or in part on the first computer-readable storage medium.
  • the first computer-readable storage medium may be packaged with the processor of the training device of the object recognition module, or separately packaged with the processor of the training device of the object recognition model, which is not limited in this application.
  • the name of the training device of the object recognition model does not constitute a limitation on the device or the functional module itself, and in actual implementation, these devices or functional modules may appear in other names. As long as the functions of each device or functional module are similar to those of the present application, they fall within the scope of the claims of the present application and their equivalents.
  • FIG. 1 is a schematic diagram of a background and a target object provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a training system provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a training method for an object recognition model provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a test image provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of acquiring an image of a target object according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of fusing a first monitoring image and an image of a target object to obtain a fused image according to a preset image fusion algorithm according to an embodiment of the present application;
  • FIG. 7 is a schematic flowchart of a method for optimizing a first target object recognition model to obtain a second target object recognition model according to an embodiment of the present application
  • FIG. 8 is a schematic diagram of a second monitoring image provided by an embodiment of the present application.
  • FIG. 9 is a first structural schematic diagram of a training device 30 for an object recognition model provided by an embodiment of the present application.
  • FIG. 10 is a second schematic structural diagram of a training device 30 for an object recognition model provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a computer program product of the object recognition model training method provided by the embodiment of the present application.
  • words such as “first” and “second” are used to distinguish the same items or similar items with basically the same functions and functions. A skilled person can understand that words such as “first” and “second” do not limit the quantity and execution order.
  • Background Refers to the area that does not change in the image acquired by the image acquisition device over a longer time scale. As shown in a in Figure 1, the background can be a room without any foreground objects.
  • Target object refers to the area that will change in the image acquired by the image acquisition device in a relatively long time scale, and may also be referred to as a foreground object. As shown in b in FIG. 1 , the target object may be a human body that has fallen to the ground.
  • FIG. 2 is a schematic structural diagram of a training system provided by an embodiment of the present application.
  • the training system may include: at least one image acquisition device 10 (in FIG. 2 , the image acquisition devices are taken as examples from cameras 10 - 1 to 10 - 3 for illustration, which is not limiting) and a server 20 .
  • the image acquisition device 10 may be used to acquire images of a designated area (such as a monitored room), and send the acquired images to the server 20 .
  • a designated area such as a monitored room
  • the image capture device 10 is a camera
  • the images of the designated area captured by the image capture device 10 are shown in a and b in FIG. 1 .
  • the installation position of the camera also known as the monitoring point
  • its acquisition area is also fixed.
  • the type of the camera is a bolt, a small hemispherical camera or a large hemispherical camera
  • the angle of capturing images is also unique.
  • the server 20 stores a trained object recognition model, and the object recognition model is obtained by training based on a plurality of labeled sample images.
  • the annotated sample image is a sample image with an annotation result, and the object included in the sample image may be the target object or other objects.
  • the labeling result of the sample image may be used to indicate whether the object included in the sample image is the target object, or may be used to indicate which object the object included in the sample image is.
  • the server 20 may input the reference image including the target object into the object recognition model to obtain a prediction result, and obtain the recognition result of the reference image including the target object according to the prediction result.
  • the server 20 may be configured to receive an image of a designated area collected by at least one image collecting device 10, and generate a fusion image according to the image of the designated area and a reference image including the target object.
  • the server 20 can train the current object recognition model according to the recognition results of the fusion image and the reference image and according to the training method of the object recognition model provided by the embodiment of the present application, so as to obtain the target object that can recognize the target object in the image with the specified area as the background object recognition model.
  • the server 20 provided in this embodiment of the present application may be a computer device such as a personal computer, a notebook computer, a smart phone, a tablet computer, a server, or a server cluster.
  • the image capturing apparatus 10 may be a device for capturing images, such as a camera, a capture camera, or a video camera.
  • the training device of the object recognition model in the embodiment of the present application may be the server 20 shown in FIG. 2 , or may be a part of the device in the server 20 .
  • a system-on-a-chip in the server 20 a system-on-a-chip in the server 20 .
  • the chip system includes chips, and may also include other discrete devices or circuit structures.
  • the following describes the training method of the object recognition model provided by the embodiment of the present application by taking the training device of the object recognition model as the server 20 as an example with reference to the training system architecture shown in FIG. 2 .
  • the server 20 may acquire a plurality of sample images with annotation results, and obtain the current object recognition model by training according to the acquired sample images with annotation results.
  • Current object recognition models can be used to identify target objects in images.
  • the server 20 obtains the current object recognition model by training according to multiple images of the target object in the general context.
  • This embodiment of the present application does not limit the general background, and an exemplary general background may be a scene such as a green screen, a street scene, or an indoor scene.
  • a schematic flowchart of a training method for an object recognition model provided by an embodiment of the present application, the training method includes:
  • the server 20 acquires the first monitoring images of the same monitoring point at different times, the reference images including the target object, and the annotation information.
  • the annotation information is used to represent the recognition result of the target object in the reference image, and the accuracy of the annotation information is greater than the accuracy threshold.
  • the images of the monitoring point at different times can be understood as: the images collected by the image acquisition device 10 of the monitoring point at different times.
  • the acquisition by the server 20 of the first monitoring images of the same monitoring point at different times may include the following implementations:
  • the server 20 receives the first monitoring image sent by the image acquisition device 10 .
  • the acquisition area of the image acquisition device 10 is a fixed area, which may be referred to as the first scene.
  • the image collection device 10 collects images of the first scene, and from the collected images of the first scene, determines that images without any foreground objects collected at different times with the first scene as the background area are the first monitoring images.
  • the first monitoring image refers to an image of a region that does not change in the image of the first scene acquired by the image acquisition device 10 within a preset time period.
  • the image acquisition device 10 determines, from the acquired images of the first scene, first monitoring images collected at different times, and sends the first monitoring images to the server 20 .
  • the first monitoring image takes the first scene as a background area, and there is no foreground object in the first monitoring image. It can be understood that the first monitoring image is the background image, that is, the first monitoring image refers to the background image in the image of the first scene collected by the image collecting device 10 within a preset time period.
  • the preset time period may be set according to actual needs, for example, the preset time period may be one day, two days, or one week.
  • the image acquisition device 10 screens the images of the first scene acquired within a preset time period (for example, within a day) to obtain different time periods. Or the first monitoring image of the first scene under different lighting conditions.
  • the image acquisition device 10 obtains the first monitoring images of the first scene at different times, that is, to obtain the first monitoring images of the first scene under different lighting conditions.
  • the server 20 determines the first monitoring image of the first scene from the images of the first scene sent by the image acquisition device 10 erected at the monitoring point on the site to be monitored.
  • the image acquisition device 10 sends the acquired image of the first scene to the server 20 .
  • the server 20 determines, from the received images of the first scene, first monitoring images collected at different times.
  • the first monitoring image takes the first scene as a background area, and there is no foreground object in the first monitoring image.
  • the image acquisition device 10 sends the image of the first scene acquired within a preset time period to the server 20 .
  • the server 20 may perform screening processing on the images of the first scene to obtain a first monitoring image of the first scene.
  • the server 20 may also acquire the background image in the image of the first scene according to the matting algorithm, and use the acquired background image as the first monitoring image.
  • the first monitoring image may also be any image collected by the image acquisition device 10 , that is, the first monitoring image may include foreground objects or may not include foreground objects.
  • the server 20 obtains the reference image including the target object and the annotation information through the following steps:
  • Step 1 The server 20 acquires a test image including the target object.
  • the number of test images can be one or more.
  • the server 20 receives a test image including the target object sent by other devices (for example, the image acquisition device 10 ).
  • the server 20 reads the test image including the target object stored locally by the server 20 .
  • Step 2 The server 20 inputs the test image including the target object into the current object recognition model, and obtains the recognition result of each test image.
  • Step 3 The server 20 uses the target recognition result and the test image corresponding to the recognition result whose accuracy is greater than the accuracy threshold as the reference image.
  • the target recognition result is an adjusted recognition result obtained in response to an adjustment operation on a recognition result whose accuracy is less than or equal to the accuracy threshold.
  • the accuracy threshold can be set according to actual needs.
  • the accuracy threshold may be 85%, 90%, 95% or 97%, etc.
  • the server 20 takes the test image corresponding to the recognition result whose accuracy is greater than the accuracy threshold in the test image as the reference image; for the recognition result whose accuracy is less than or equal to the accuracy threshold, in response to the adjustment of the recognition result operation, obtain the adjusted recognition result as the target recognition result, and use the test image corresponding to the target recognition result as the reference image.
  • the recognition result whose accuracy is greater than the accuracy threshold is the first recognition result
  • the recognition result whose accuracy is less than or equal to the accuracy threshold is the second recognition result as an example.
  • the server 20 takes the test image corresponding to the first recognition result as the reference image; in addition, in response to the adjustment operation on the second recognition result, the server 20 obtains the adjusted second recognition result as the target recognition result, and uses the test image corresponding to the target recognition result
  • the image also serves as a reference image.
  • the first recognition result and the second recognition result are two recognition results with opposite results. For example, if the first recognition result indicates that the target object is A, the second recognition result indicates that the target object is not A. , based on this, the training data can be enriched with limited resources, and the accuracy of the trained object recognition model can be improved.
  • the server 20 uses all the test images as reference images. If the accuracy of the recognition result of a test image is greater than the accuracy threshold, the recognition result is used as the recognition result of the corresponding reference image; if the accuracy of the recognition result of the test image is less than or equal to the accuracy threshold, the recognition result The adjustment is performed to obtain an adjusted recognition result, that is, a target recognition result, and the target recognition result is used as the recognition result of the corresponding reference image.
  • the test image shown in FIG. 4 is a test image including the target object being a human body doing yoga.
  • the server 20 inputs the test image into the current object recognition model, and obtains a recognition result of the test image: the target object is a fallen human body, and the accuracy of the recognition result is 75%. This accuracy of 75% is below the accuracy threshold of 85%.
  • the server 20 in response to the input adjustment operation on the recognition result, obtains that the adjusted recognition result is that the target object is a human body that has not fallen to the ground. Therefore, the target recognition result obtained by the server 20 is: the target object is a human body that is not fallen on the ground.
  • the test image shown in FIG. 4 is the test image corresponding to the target recognition result.
  • the server 20 uses the test image shown in FIG. 4 as a reference image, and the recognition result of the reference image is that the target object is a human body that is not fallen to the ground.
  • Step 4 The server 20 uses the target recognition result and the recognition result whose accuracy is greater than the accuracy threshold as the annotation information of the reference image corresponding to each recognition result.
  • the server 20 For each target recognition result, the server 20 uses the target recognition result as the labeling information of the reference image corresponding to the target recognition result; for each recognition result whose accuracy is greater than the accuracy threshold, the server 20 uses the recognition result as the recognition result The annotation information of the corresponding reference image.
  • the server 20 takes the target object as a human body that is not fallen on the ground as the annotation information of the reference image shown in FIG. 4 .
  • the server 20 generates fusion images of the same monitoring point at different times according to the acquired first monitoring image and the reference image.
  • the fusion image includes the target object and the background of the first monitoring image.
  • the first monitoring image may be understood as a background image.
  • the server 20 fuses the reference image with the first monitoring images of the monitoring point at different times, respectively, to obtain fused images of the monitoring point at different times.
  • the server 20 acquires the image of the target object in the reference image, and according to a preset image fusion algorithm, fuses the first monitoring image of the same monitoring point and the image of the target object at different times to obtain the same Fusion images of monitoring points at different times.
  • the server 20 uses a matting algorithm on the image including the target object shown in a in FIG. 5 , to obtain the image of the target object shown in b in FIG. 5 .
  • the server 20 fuses the first monitoring image a and the image b of the target object according to a preset image fusion algorithm to obtain a schematic diagram of the fused image c.
  • the server 20 fuses the first monitoring image of the same monitoring point at different times and the image of the target object, and obtains intermediate images of the same monitoring point at different times. Data enhancement is performed on intermediate images at different times to obtain more fused images.
  • data enhancement processing may include, but is not limited to, adding noise, adjusting contrast, adjusting saturation, cropping and scaling, and the like.
  • the server 20 can add noise to the obtained intermediate image, adjust the contrast of the fused image, adjust the saturation of the fused image, crop the fused image, or scale the fused image, etc., to obtain more fused images.
  • the server 20 determines the labeling information as the labeling result of the fused image.
  • the server 20 For each reference image, the server 20 fuses the reference image with the first monitoring image, and after obtaining the fused image, uses the annotation information of the reference image as the annotation result of the fused image.
  • the annotation information based on the reference image shown in a in Figure 5 is: the target object is a fallen human body. Therefore, the annotation result of the fusion image shown in Figure 6 is: the target object is a fallen human body.
  • the server 20 can automatically generate multiple fused images and annotation results of the fused images by executing the above S11 to S13, these multiple fused images can be used as sample images, the annotation results of the fused images can be used as the annotation results of the sample images, and then Based on the multiple sample images and the annotation results of the sample images, the current object recognition model is trained, which reduces the labor cost of obtaining the sample images and the annotation results of the sample images.
  • the server 20 performs iterative training on the current object recognition model according to the fusion image and the labeling result, until the model converges, and a first target object recognition model is obtained.
  • the first target object recognition model is used for identifying target objects in the monitoring images of the same monitoring point.
  • the monitoring image refers to any image collected by the image collecting device 10 located at the same monitoring point.
  • the server 20 can also use the first target object recognition model as the new current object recognition model, and re-execute the above S11 to S14 to iteratively train the current object recognition model, thereby obtaining a new first target object recognition model.
  • the new first target object recognition model is more accurate when performing target object recognition on the image including the target object acquired at the monitoring point.
  • the specific monitoring image is generated according to the first monitoring image and the reference image obtained at the specific monitoring point. Fusion images of points at different times.
  • the fusion image includes the target object and the background of the first monitoring image.
  • the first monitoring image can be an image that only includes the background collected from the specific monitoring point, or it can be the specific monitoring point.
  • the acquired images include background and foreground.
  • the annotation information of the reference image is determined as the annotation result of the fusion image, and the annotation information of the reference image represents the recognition result of the target object in the reference image.
  • the current object recognition model is trained, and the obtained first target object recognition model can be used to recognize the target object in the image obtained at the specific monitoring point, without the need to manually acquire the specific target object.
  • the sample images including the target object obtained at the monitoring point save labor costs, thereby improving the adaptability of the trained object recognition model more efficiently.
  • the server 20 may also obtain a second monitoring image including the object to be identified collected at the same monitoring point, and input the second monitoring image into the first target object.
  • the target object recognition model obtains the intermediate identification result of the object to be identified in the second monitoring image; in response to the adjustment operation on the intermediate identification result, the adjusted intermediate identification result is obtained, and according to the adjusted intermediate identification result and the second monitoring image , and optimize the first target object recognition model to obtain the second target object recognition model.
  • FIG. 7 is a schematic flowchart of a method for optimizing a first target object recognition model to obtain a second target object recognition model, and the method may include:
  • the server 20 acquires a second monitoring image including the object to be identified at the same monitoring point.
  • the server 20 receives the second monitoring image sent by the image acquisition device 10 with the first scene as the background area and including the object to be identified.
  • S21 may be: the image acquisition device 10 sends a second monitoring image to the server 20, where the second monitoring image takes the first scene as a background area, and the second monitoring image includes the object to be identified.
  • the second monitoring images acquired by the server 20 include monitoring images as shown in a and b in FIG. 8 .
  • the object 80 shown in a in FIG. 8 is an image of the object to be recognized
  • the object 81 shown in b in FIG. 8 is an image of the object to be recognized.
  • the object to be identified and the above-mentioned target object may be the same object or different objects, which are not limited in this embodiment of the present application.
  • the server 20 inputs the second monitoring image into the first target object recognition model, and obtains an intermediate recognition result of the object to be recognized in the second monitoring image.
  • the intermediate recognition result of the object to be recognized shown in a in FIG. 8 is: the object to be recognized is a human body that has not fallen to the ground, and the intermediate recognition result of the object to be recognized shown in b in FIG. 8 is: to be recognized
  • the subject is a human body that has fallen to the ground.
  • the server 20 acquires the adjusted intermediate identification result in response to the adjustment operation on the intermediate identification result.
  • the adjusted intermediate recognition result is used to characterize whether the object to be recognized is the target object.
  • the adjusted intermediate recognition result is the correct recognition result of the object to be recognized.
  • the adjusted intermediate recognition result of the object to be recognized shown in a in FIG. 8 is: the object to be recognized is a human body that fell to the ground, and the adjusted intermediate recognition result of the object to be recognized shown in b in FIG. 8 The result is: the object to be recognized is a human body that is not fallen to the ground.
  • the above-mentioned second monitoring images may be a plurality of second monitoring images, and each second monitoring image respectively includes an object to be identified.
  • the above-mentioned S22 to S23 may be performed for each second monitoring image, thereby obtaining an intermediate recognition result of each second monitoring image. In the case that the intermediate recognition result is wrong, the adjusted intermediate recognition result is obtained.
  • the first target object recognition model can be used to recognize object A, object B and object C.
  • the obtained intermediate recognition results of the objects to be recognized in the second monitoring images, and the obtained adjusted intermediate recognition results are shown in Table 1 below.
  • a in Table 1 is used to represent object A, and A' is used to represent non-object A.
  • B is used to represent object B
  • B' is used to represent non-object B
  • C is used to represent object C
  • C' is used to represent non-object C.
  • the server 20 determines the adjusted intermediate recognition result as the labeling result of the second monitoring image.
  • the server 20 can obtain a plurality of second monitoring images and the labeling result of each second monitoring image by executing the above S21 to S24.
  • the server 20 performs iterative training on the first target object recognition model according to the second monitoring image and the labeling result of the second monitoring image to obtain a second target object recognition model.
  • the server 20 may, according to the plurality of second monitoring images and the labeling result with each second monitoring image, make The first target object recognition model is iteratively trained to obtain a second target object recognition model.
  • the server 20 may also perform iterative training on the first target object recognition model according to the second monitoring image and the labeling result of the second monitoring image after obtaining a second monitoring image and the labeling result of the second monitoring image, A second target object recognition model is obtained. This embodiment of the present application does not limit this.
  • the first target object is The recognition model is iteratively trained, and the obtained second target object recognition model is more stable.
  • the recognition result of the recognition error is manually adjusted (that is, in response to the above adjustment operation on the intermediate recognition result, Obtain the adjusted intermediate recognition result), and use the adjusted recognition result as the labeling result of the corresponding second monitoring image to iteratively train the first target object recognition model to obtain the second target object recognition model.
  • the accuracy of the obtained second target object recognition model in recognizing the target object in the first scene is higher than that in recognizing the target in the first scene by the first target object recognition model
  • the accuracy of the object that is, the second accuracy is higher than the first accuracy
  • the first accuracy is the accuracy with which the first target object recognition model recognizes the target object in the first scene
  • the second accuracy is the second target object recognition model. The accuracy of identifying the target object in the first scene.
  • the embodiment of the present application also provides a training method for an object recognition model.
  • the server obtains a third monitoring image including the object to be recognized at the same monitoring point, and the labeling result of the third monitoring image is used to represent whether the object to be recognized is not.
  • the adjustment operation of the model obtains the adjusted first target object recognition model; the adjusted first target object recognition model is used to output the recognition result of whether the object to be recognized is a candidate object; according to the third monitoring image and the third monitoring image Labeling the results, performing iterative training on the adjusted first target object recognition model to obtain a third target object recognition model.
  • the monitoring image a obtained by the server includes that the object to be recognized is a pear, the first target object recognition model recognizes the monitoring image a, and the intermediate recognition result is that the object to be recognized in the monitoring image a is an apple, then the first target The object recognition model is adjusted to obtain an adjusted first target object recognition model, and the adjusted first target object recognition model is used to output a recognition result of whether the object to be recognized is a pear.
  • the server performs iterative training on the adjusted first target object recognition model according to the monitoring image a and the labeling result of the monitoring image a to obtain a third target object recognition model.
  • the target object recognition model is adjusted and trained according to the monitoring images obtained in real time, which further improves the recognition accuracy of the target object recognition model.
  • the training device of the object recognition model can be divided into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. middle.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that, the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • the object recognition model training device 30 includes an acquisition module 301 , a generation module 302 , a determination module 303 and a training module 304 .
  • the acquisition module 301 is used to acquire the first monitoring image of the same monitoring point at different times, the reference image including the target object, and the annotation information; the annotation information is used to represent the recognition result of the target object in the reference image , and the accuracy of the annotation information is greater than the accuracy threshold.
  • the acquisition module 301 may be used to execute S11 ; in combination with the process shown in FIG. 7 , the acquisition module 301 may also be used to perform S21 to S23 .
  • the generating module 302 is configured to generate fused images of the same monitoring point at different times according to the first monitoring image and the reference image acquired by the acquiring module 301 , and the fused images include the target object and the background of the first monitoring image.
  • the generating module 302 may be used to execute S12.
  • the determining module 303 is configured to determine the labeling information obtained by the obtaining module 301 as the labeling result of the fusion image generated by the generating module 302 .
  • the determination module 303 may be used to execute S13 ; in combination with the process shown in FIG. 7 , the determination module 303 may also be used to execute S24 .
  • the training module 304 is used to iteratively train the current object recognition model according to the fusion image generated by the generation module 302 and the labeling result determined by the determination module 303 until the model converges to obtain the first target object recognition model; the first target object recognition model The model is used to identify the target object in the monitoring images of the same monitoring point.
  • the training module 304 can be used to execute S14 , and the training module 304 can also be used to execute S25 in conjunction with FIG. 7 .
  • the generating module 302 can be specifically used for: acquiring the image of the target object in the reference image; according to the preset image fusion algorithm, fusing the first monitoring image of the same monitoring point and the image of the target object at different times to obtain the image of the target object. Fusion images of the same monitoring point at different times.
  • the generating module 302 can be specifically configured to: according to a preset image fusion algorithm, fuse the first monitoring image of the same monitoring point at different times and the image of the target object, and obtain the same monitoring point in the middle of different times. Image; perform data enhancement processing on the intermediate images of the same monitoring point at different times to obtain the fusion images of the same monitoring point at different times.
  • the acquisition module 301 can be specifically used to: input the test image including the target object into the current object recognition model, and obtain the recognition result of each test image;
  • the test image corresponding to the recognition result is used as the reference image;
  • the target recognition result is the adjusted recognition result obtained in response to the adjustment operation on the recognition result whose accuracy is less than or equal to the accuracy threshold; the target recognition result and the accuracy are greater than the accuracy
  • the recognition result of the threshold is used as the label information.
  • the acquisition module 301 can also be used to: acquire a second monitoring image including the object to be recognized at the same monitoring point; input the second monitoring image into the first target object recognition model to obtain the information of the object to be recognized in the second monitoring image.
  • Intermediate identification result in response to the adjustment operation on the intermediate identification result, obtain the adjusted intermediate identification result, and the adjusted intermediate identification result is used to represent whether the object to be identified is the target object;
  • the determining module 303 may also be configured to determine the adjusted intermediate recognition result as the labeling result of the second monitoring image.
  • the training module 304 may also be configured to: perform iterative training on the first target object recognition model according to the second monitoring image and the labeling result of the second monitoring image to obtain a second target object recognition model.
  • the acquisition module 301 can also be used to: acquire a third monitoring image including the object to be identified at the same monitoring point, and the labeling result of the third monitoring image is used to indicate whether the object to be identified is a candidate object; input the third monitoring image
  • the first target object recognition model obtains an intermediate recognition result of the object to be recognized in the third monitoring image; in response to the intermediate recognition result representing the object to be recognized as the target object, and the adjustment operation on the first target object recognition model, the adjusted a first target object recognition model; the adjusted first target object recognition model is used to output a recognition result of whether the object to be recognized is a candidate object;
  • the training module 304 may also be configured to perform iterative training on the adjusted first target object recognition model according to the third monitoring image and the labeling result of the third monitoring image to obtain a third target object recognition model.
  • FIG. 10 is a second structural schematic diagram of an object recognition model training apparatus 30 provided by an embodiment of the present application.
  • the object recognition model training apparatus 30 may include: at least one processor 51 and a memory 52 , a communication interface 53 and a communication bus 54 .
  • each constituent component of the training device of the object recognition module is specifically introduced:
  • the processor 51 is the control center of the training device of the object recognition module, which may be one processor, or may be a general term for multiple processing elements.
  • the processor 51 is a central processing unit (Central Processing Unit, CPU), may also be a specific integrated circuit (Application Specific Integrated Circuit, ASIC), or is configured to implement one or more integrated circuits of the embodiments of the present application , for example: one or more DSPs, or, one or more Field Programmable Gate Arrays (FPGA).
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the processor 51 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 10 .
  • the training device of the object recognition module may include multiple processors, such as two processors 51 shown in FIG. 10 .
  • Each of these processors can be a single-core processor (Single-CPU) or a multi-core processor (Multi-CPU).
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • Memory 52 may be Read-Only Memory (ROM) or other types of static storage devices that can store static information and instructions, Random Access Memory (RAM) or other types of information and instructions that can be stored It can also be an electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being executed by a computer Access any other medium without limitation.
  • the memory 52 may exist independently and be connected to the processor 51 through the communication bus 54 .
  • the memory 52 may also be integrated with the processor 51 .
  • the memory 52 is used for storing data in the embodiments of the present application and software programs for executing the embodiments of the present application.
  • the processor 51 can perform various functions of the training apparatus of the object recognition model by running or executing software programs stored in the memory 52 and calling data stored in the memory 52 .
  • the communication interface 53 using any device such as a transceiver, is used to communicate with other devices or communication networks, such as radio access network (Radio Access Network, RAN), wireless local area network (Wireless Local Area Networks, WLAN), terminal, cloud Wait.
  • the communication interface 53 may include a receiving unit that implements a receiving function, and a transmitting unit that implements a transmitting function.
  • the communication bus 54 can be an industry standard architecture (Industry Standard Architecture, ISA) bus, a peripheral device interconnect (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus and the like.
  • ISA Industry Standard Architecture
  • PCI peripheral device interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • the receiving function of the acquisition module 301 in the training device 30 of the object recognition module can be realized by the communication interface 53 in FIG.
  • the functions of 303 and the training module 304 can be implemented by the processor 51 calling a software program in the memory 52 .
  • the training device for an object recognition model includes: a memory and a processor; the memory and the processor are coupled; the memory is used for storing computer program codes, and the computer program codes include a computer Instructions; when the processor executes the computer instructions, the apparatus for training the object recognition model executes any method for training the object recognition model.
  • Another embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on the training device of the object recognition model, the training device of the object recognition model is caused to execute the above method methods shown in the examples.
  • Another embodiment of the present application further provides a computer program.
  • the object recognition model training apparatus executes any object recognition model training method.
  • the disclosed methods may be implemented as computer program instructions encoded in a machine-readable format on a computer-readable storage medium or on other non-transitory media or articles of manufacture.
  • FIG. 11 schematically shows a conceptual partial view of a computer program product provided by an embodiment of the present application, the computer program product including a computer program for executing a computer process on a training device for an object recognition model.
  • the computer program product is implemented using signal bearing medium 410 .
  • Signal bearing medium 410 may include one or more program instructions, which, when executed by one or more processors, may provide the functions, or portions thereof, described above with respect to FIGS. 3 and 7 .
  • one or more features of S11 - S14 may be undertaken by one or more program instructions associated with signal bearing medium 410 .
  • one or more of the features of S21 - S24 may be undertaken by one or more program instructions associated with the signal bearing medium 410 .
  • the program instructions in Figure 11 also describe example instructions.
  • the signal bearing medium 410 may include a computer readable medium 411 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, a memory, a read only memory (read only memory) -only memory, ROM) or random access memory (RAM), etc.
  • a computer readable medium 411 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, a memory, a read only memory (read only memory) -only memory, ROM) or random access memory (RAM), etc.
  • the signal bearing medium 410 may also include a computer recordable medium 412, such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.
  • a computer recordable medium 412 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.
  • signal bearing medium 410 may also include communication medium 413, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.) .
  • communication medium 413 such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.) .
  • Signal bearing medium 410 may be conveyed by a wireless form of communication medium 413 (eg, a wireless communication medium that conforms to the IEEE 802.41 standard or other transmission protocol).
  • the one or more program instructions may be computer-executable instructions or logic-implementing instructions, or the like.
  • server 20 such as described with respect to FIG. 3 may be configured to provide various An operation, function, or action.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种对象识别模型的训练方法、装置及存储介质,涉及智能视频监控技术领域,有助于更高效率提升训练的对象识别模型的适应性。该方法包括:获取同一监测点位在不同时间的第一监测图像、包括有目标对象的参考图像以及标注信息;根据获取到的第一监测图像和参考图像,生成同一监测点位在不同时间的融合图像,融合图像包括目标对象以及第一监测图像的背景;将标注信息确定为融合图像的标注结果;根据融合图像和标注结果,对当前的对象识别模型进行迭代训练,直到模型收敛,得到第一目标对象识别模型;第一目标对象识别模型用于识别同一监测点位的第一监测图像中的目标对象。

Description

对象识别模型的训练方法、装置及存储介质
本申请要求于2021年3月18日提交中国专利局、申请号为202110290714.0发明名称为“对象识别模型的训练方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频监控技术领域,尤其涉及对象识别模型的训练方法、装置及存储介质。
背景技术
在视频监控领域中,对象识别是比较重要的一个研究方向。目前的对象识别技术通常是使用样本图像预先训练好一个对象识别模型,根据该对象识别模型,对以目标场景为背景的目标对象的图像进行目标对象的识别。但是,在实际应用中,参与对象识别模型训练的样本图像可能并非是以目标场景为背景的图像。这样,基于这些样本图像预先训练好的对象识别模型对目标场景为背景的图像进行目标对象的识别,所得到的识别结果可能不准确。
为了解决上述问题,可针对目标场景重新训练适用该目标场景的对象识别模型。但是,为了重新训练对象识别模型需要提取以目标场景为背景的样本图像,例如,逐帧的手工裁切出样本图像。这需要花费大量的人工成本。
发明内容
本申请实施例提供一种对象识别模型的训练方法、装置及存储介质,有助于更高效率提升训练的对象识别模型的适应性。
为达到上述目的,本申请实施例采用如下技术方案:
第一方面,本申请实施例提供一种图像处理方法,该方法包括:获取同一监测点位在不同时间的第一监测图像、包括有目标对象的参考图像以及标注信息,该标注信息用于表征参考图像中目标对象的识别结果,且标注信息的准确度大于准确度阈值;根据获取到的第一监测图像和参考图像,生成同一监测点位在不同时间的融合图像,该融合图像包括目标对象以及第一监测图像的背景;将标注信息确定为融合图像的标注结果;根据该融合图像和标注结果,对当前的对象识别模型进行迭代训练,直到模型收敛,得到第一目标对象识别模型,该第一目标对象识别模型用于识别同一监测点位的监测图像中的目标对象。
为了使当前的对象识别模型适用于上述监测点位获取的图像中目标对象的识别,本申请实施例中,根据获取到的该监测点位的第一监测图像和参考图像,生成该监测点位在不同时间的融合图像,融合图像包括了目标对象以及第一监测图像的背景,第一监测图像可以为该特定监测点位采集得到的仅包括背景的图像,也可以为该特定监测点位采集得到的包括背景以及前景的图像,将参考图像的标注信息确定为融合图像的标注结果,参考图像的标注信息表征参考图像中目标对象的识别结果。这样,根据融合图像及融合图像的标注结果,训练当前的对象识别模型,得到的第一目标对象识别模型即可用于识别该监测点位获取的图像中的目标对象,而无需人工获取该监测点位获取的包括目标对象的样本图像,节省了人工成本,进而可以更高效率提升训练的对象识别模型的适应性。
在一种可能的实现方式中,上述“根据获取到的第一监测图像和参考图像,生成同一监测点位在不同时间的融合图像”包括:获取参考图像中目标对象的图像;根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的融合图像。
这样,根据图像融合算法,可以将目标对象的图像融合进第一监测图像的背景图像中的不同位置,得到同一监测点位在不同时间的多个不同的融合图像,进一步丰富了参与训练的样本图像,从而提高了训练得到的第一目标对象识别模型的适应性。
在另一种可能的实现方式中,上述“根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的融合图像”包括:根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的中间图像;对同一监测点位在不同时间的中间图像进行数据增强处理,得到同一监测点位在不同时间的融合图像。
这样,对通过预设的图像融合算法融合得到的中间图像进行数据增强,例如:对融合图像添加噪声、调整融合图像的对比度、调整融合图像的饱和度、对融合图像进行裁剪或者缩放等,得到更多的融合图像,进一步丰富了参与训练的样本图像,从而进一步提高了训练得到的第一目标对象识别模型的适应性。
在另一种可能的实现方式中,上述“获取包括有目标对象的参考图像以及标注信息”包括:将包括有目标对象的测试图像输入到当前的对象识别模型,得到每一测试图像的识别结果;将目标识别结果以及准确度大于准确度阈值的识别结果对应的测试图像作为参考图像;目标识别结果为响应于对准确度小于等于所述准确度阈值的识别结果的调整操作,所得到的调整后的识别结果;将目标识别结果以及准确度大于准确度阈值的识别结果作为标注信息。
这样,将目标识别结果以及准确度大于准确度阈值的识别结果作为标注信息,节省了人工标注的成本,更高效率提升了训练的对象识别模型的适应性。
在另一种可能的实现方式中,该方法还包括:获取同一监测点位包括待识别对象的第二监测图像;将第二监测图像输入第一目标对象识别模型,得到第二监测图像中待识别对象的中间识别结果;响应于对中间识别结果的调整操作,获取调整后的中间识别结果,调整后的中间识别结果用于表征待识别对象是否为目标对象;将调整后的中间识别结果确定为第二监测图像的标注结果;根据第二监测图像以及第二监测图像的标注结果,对第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。
这样,在将第一目标对象识别模型应用于特定监测点位的第二监测图像的识别的过程中,对得到的中间识别结果进行调整操作,得到调整后的中间识别结果,相当于对第二监测图像中的待识别对象进行了界定,从而使得训练得到的第二目标对象识别模型在识别待识别对象时更准确。
在另一种可能的实现方式中,该方法还包括:获取所述同一监测点位包括待识别对象的第三监测图像,所述第三监测图像的标注结果用于表征所述待识别对象是否候选对象;将所述第三监测图像输入所述第一目标对象识别模型,得到所述第三监测图像中所述待识 别对象的中间识别结果;响应于所述中间识别结果表征所述待识别对象为所述目标对象,以及对所述第一目标对象识别模型的调整操作,获取调整后的第一目标对象识别模型;所述调整后的第一目标对象识别模型用于输出所述待识别对象是否为所述候选对象的识别结果;根据所述第三监测图像以及所述第三监测图像的标注结果,对调整后的第一目标对象识别模型进行迭代训练,得到第三目标对象识别模型。
第二方面,本申请实施例提供一种对象识别模型的训练装置。该训练装置包括:获取模块、生成模块、确定模块以及训练模块。上述获取模块,用于获取同一监测点位在不同时间的第一监测图像、包括有目标对象的参考图像以及标注信息;标注信息用于表征参考图像中目标对象的识别结果,且标注信息的准确度大于准确度阈值。上述生成模块,用于根据获取模块获取到的第一监测图像和参考图像,生成同一监测点位在不同时间的融合图像,融合图像包括目标对象以及第一监测图像的背景。上述确定模块,用于将获取模块获取到的标注信息确定为生成模块生成的融合图像的标注结果。上述训练模块,用于根据生成模块生成的融合图像和确定模块确定的标注结果,对当前的对象识别模型进行迭代训练,直到模型收敛,得到第一目标对象识别模型;第一目标对象识别模型用于识别同一监测点位的第一监测图像中的目标对象。
可选的,上述生成模块,具体用于获取参考图像中目标对象的图像;根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的融合图像。
可选的,上述生成模块具体用于:根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的中间图像;对同一监测点位在不同时间的中间图像进行数据增强处理,得到同一监测点位在不同时间的融合图像。
可选的,上述获取模块具体用于:将包括有目标对象的测试图像输入到当前的对象识别模型,得到每一测试图像的识别结果;将目标识别结果以及准确度大于准确度阈值的识别结果对应的测试图像作为参考图像;目标识别结果为响应于对准确度小于等于所述准确度阈值的识别结果的调整操作,所得到的调整后的识别结果;将目标识别结果以及准确度大于准确度阈值的识别结果作为标注信息。
可选的,上述获取模块还用于:获取同一监测点位在不同时间包括待识别对象的第二监测图像;将第二监测图像输入第一目标对象识别模型,得到第二监测图像中待识别对象的中间识别结果;响应于对中间识别结果的调整操作,获取调整后的中间识别结果,调整后的中间识别结果用于表征待识别对象是否为目标对象;确定模块还用于:将调整后的中间识别结果确定为第二监测图像的标注结果;训练模块还用于:根据第二监测图像以及第二监测图像的标注结果对第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。
可选的,所述获取模块还用于:获取所述同一监测点位包括待识别对象的第三监测图像,所述第三监测图像的标注结果用于表征所述待识别对象是否候选对象;将所述第三监测图像输入所述第一目标对象识别模型,得到所述第三监测图像中所述待识别对象的中间 识别结果;响应于所述中间识别结果表征所述待识别对象为所述目标对象,以及对所述第一目标对象识别模型的调整操作,获取调整后的第一目标对象识别模型;所述调整后的第一目标对象识别模型用于输出所述待识别对象是否为所述候选对象的识别结果;
所述训练模块还用于:根据所述第三监测图像以及所述第三监测图像的标注结果,对调整后的第一目标对象识别模型进行迭代训练,得到第三目标对象识别模型。
第三方面,本申请提供一种对象识别模型的训练装置,包括:存储器和处理器;存储器和处理器耦合;存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令;当所述处理器执行所述计算机指令时,所述对象识别模型的训练装置执行上述第一方面及其任一种可能的实现方式提供的对象识别模型的训练方法。
第四方面,本申请提供一种计算机可读存储介质,存储有指令。当所述指令在对象识别模型的训练装置上运行时,使得对象识别模型的训练装置执行如上述第一方面及其任一种可能的实现方式提供的对象识别模型的训练方法。
第五方面,本申请提供一种计算机程序产品,当所述计算机程序产品在对象识别模型的训练装置上运行时,使得所述对象识别模型的训练装置执行如第一方面及其任一种可能的实现方式提供的对象识别模型的训练方法。
需要说明的是,上述计算机指令可以全部或者部分存储在第一计算机可读存储介质上。其中,第一计算机可读存储介质可以与对象识别模块的训练装置的处理器封装在一起的,也可以与对象识别模型的训练装置的处理器单独封装,本申请对此不作限定。
本申请中第二方面、第三方面、第四方面以及第五方面的描述,可以参考第一方面的详细描述;并且,第二方面、第三方面、第四方面以及第五方面的描述的有益效果,可以参考第一方面的有益效果分析,此处不再赘述。
在本申请中,上述对象识别模型的训练装置的名字对设备或功能模块本身不构成限定,在实际实现中,这些设备或功能模块可以以其他名称出现。只要各个设备或功能模块的功能和本申请类似,属于本申请权利要求及其等同技术的范围之内。
本申请的这些方面或其他方面在以下的描述中会更加简明易懂。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的背景和目标对象的一种示意图;
图2为本申请实施例提供的一种训练系统的一种结构示意图;
图3为本申请实施例提供的一种对象识别模型的训练方法的一种流程示意图;
图4为本申请实施例提供的一种测试图像的示意图;
图5为本申请实施例提供的一种获取目标对象的图像的示意图;
图6为本申请实施例提供的一种根据预设的图像融合算法,融合第一监测图像和目标对象的图像以得到融合图像的一种示意图;
图7为本申请实施例提供的一种对第一目标对象识别模型进行优化得到第二目标对象识别模型的方法的一种流程示意图;
图8为本申请实施例提供的一种第二监测图像的示意图;
图9为本申请实施例提供的一种对象识别模型的训练装置30的第一种结构示意图;
图10为本申请实施例提供的一种对象识别模型的训练装置30的第二种结构示意图;
图11为本申请实施例提供的对象识别模型的训练方法的计算机程序产品的一种结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。
为了便于理解本申请,下面先对本申请的实施例涉及到的相关术语进行解释。
背景:指图像采集装置在较长的时间尺度下采集的图像中不会变化的区域。如图1中的a所示,背景可以为无任何前景对象的房间。
目标对象:指图像采集装置在较长的时间尺度下采集的图像中会变化的区域,也可以称为前景对象。如图1中的b所示,目标对象可以为摔倒在地的人体。
下面对本申请实施例提供的对象识别模型的训练方法所适用的训练系统的结构进行描述。
图2为本申请实施例提供的一种训练系统的一种结构示意图。如图2所示,该训练系统可以包括:至少一个图像采集装置10(图2中以图像采集装置为摄像头10-1至摄像头10-3为例进行示意,并不起限定作用)和服务器20。
其中,图像采集装置10,可以用于采集指定区域(如被监控房间)的图像,并将采集的图像发送至服务器20。示例性的,当指定区域为老年公寓房间内部特定区域,并且图像采集装置10为摄像头时,图像采集装置10采集的该指定区域的图像如图1中a和b所示。
在实际的应用中,摄像头的安装位置(又称监测点位)固定时其采集区域也是固定的。示例性的,摄像头的种类为枪机、小半球摄像头或者大半球摄像头时,其采集图像的角度也是唯一的。
服务器20,存储有训练好的对象识别模型,该对象识别模型是基于多个标注样本图像训练得到的。其中,标注样本图像为带有标注结果的样本图像,样本图像包括的对象可以为目标对象,也可以为其他对象。样本图像的标注结果可以为用于指示该样本图像包括的对象是否为目标对象,也可以用于指示该样本图像包括的对象为哪个对象。
服务器20可以将包括有目标对象的参考图像输入该对象识别模型,得到预测结果,根据预测结果得到该包括有目标对象的参考图像的识别结果。
服务器20可以用于接收至少一个图像采集装置10采集的指定区域的图像,并根据该指定区域的图像以及包括有目标对象的参考图像,生成融合图像。服务器20可以根据融合图像以及参考图像的识别结果,按照本申请实施例提供的对象识别模型的训练方法,训练当前的对象识别模型,从而得到能够识别出以指定区域为背景的图像中的目标对象的对象识别模型。
本申请实施例提供的服务器20可以为个人计算机、笔记本电脑、智能手机、平板电脑、服务器或服务器集群等计算机设备。图像采集装置10可以为用于采集图像的设备,例如:相机、抓拍机或摄像机等。
本申请实施例中的对象识别模型的训练装置可以是图2中示出的服务器20,也可以是服务器20中的一部分装置。例如,服务器20中的芯片系统。该芯片系统包括芯片,也可以包括其他分立器件或电路结构。
以下结合图2示出的训练系统架构,以对象识别模型的训练装置为服务器20为例,对本申请实施例所提供的对象识别模型的训练方法进行介绍。
在进行适用于目标场景的对象识别模型的训练之前,服务器20可以获取多个带标注结果的样本图像,并根据获取的带标注结果的样本图像训练得到当前的对象识别模型。当前的对象识别模型可以用于识别图像中的目标对象。示例性的,服务器20根据通用背景下目标对象的多个图像训练得到当前的对象识别模型。本申请实施例对通用背景不进行限定,示例性的通用背景可以是绿幕、街边场景或室内场景等场景。
如图3所示,为本申请实施例提供的对象识别模型的训练方法的一种流程示意图,该训练方法包括:
S11、服务器20获取同一监测点位在不同时间的第一监测图像、包括有目标对象的参考图像以及标注信息。
其中,标注信息用于表征参考图像中目标对象的识别结果,且标注信息的准确度大于准确度阈值。监测点位在不同时间的图像可以理解为:监测点位的图像采集装置10在不同时间所采集的图像。
服务器20获取同一监测点位在不同时间的第一监测图像可以包括如下实现方式:
在一种可能的实现方式中,服务器20接收图像采集装置10发送的第一监测图像。例如,待监控现场中图像采集装置10的监测点位确定之后,该图像采集装置10的采集区域即为固定区域,可以称为第一场景。图像采集装置10采集第一场景的图像,从采集得到的第一场景的图像中,确定以第一场景为背景区域、且在不同时间所采集的没有任何前景对象的图像为第一监测图像。其中,第一监测图像指图像采集装置10在预设时间段内采集得到的第一场景的图像中不会变化的区域的图像。
本申请实施例中,图像采集装置10从采集得到的第一场景的图像中,确定在不同时间所采集的第一监测图像,并将第一监测图像发送给服务器20。第一监测图像以第一场景为背景区域,且第一监测图像中没有任何前景对象。可以理解的,第一监测图像即为背景图像,即为第一监测图像指的是:图像采集装置10在预设时间段内采集的第一场景的图像中的背景图像。
预设时间段可以为根据实际需求进行设定,例如,预设时间段可以为一天、两天或一个星期等。
示例性的,图像采集装置10架设于待监控现场的监测点位之后,该图像采集装置10对预设时间段内(例如:一天内)采集得到的第一场景的图像进行筛选,得到不同时间或不同光照条件下第一场景的第一监测图像。
预设时间段内,时间不同,光照条件不同。图像采集装置10获得不同时间的第一场景的第一监测图像,即为获得不同光照条件下第一场景的第一监测图像。
在另一种可能的实现方式中,服务器20从架设于待监控现场的监测点位的图像采集装置10发送的第一场景的图像中,确定第一场景的第一监测图像。
本申请实施例中。图像采集装置10将采集得到的第一场景的图像发送给服务器20。服务器20从接收的第一场景的图像中,确定在不同时间所采集的第一监测图像。第一监测图像以第一场景为背景区域,且第一监测图像中没有任何前景对象。
示例性的,图像采集装置10架设于待监控现场的监测点位之后,图像采集装置10向服务器20发送预设时间段内采集得到的第一场景的图像。服务器20可以对第一场景的图像进行筛选处理,得到第一场景的第一监测图像。或者,服务器20也可以根据抠图算法,获取第一场景的图像中的背景图像,将获取的背景图像作为第一监测图像。
在再一种可能的实现方式中,第一监测图像也可以为图像采集装置10采集的任一图像,即第一监测图像可以包括前景对象,也可以不包括前景对象。
在一种可能的实现方式中,服务器20通过如下步骤,获取包括有目标对象的参考图像以及标注信息:
步骤一:服务器20获取包括有目标对象的测试图像。
测试图像的数量可以为一个或多个。
在一种可能的实现方式中,服务器20接收其他装置(例如:图像采集装置10)发送的包括有目标对象的测试图像。
在另一种可能的实现方式中,服务器20读取服务器20本地存储的包括有目标对象的测试图像。
步骤二:服务器20将包括有目标对象的测试图像输入当前的对象识别模型,得到每一测试图像的识别结果。
步骤三:服务器20将目标识别结果以及准确度大于准确度阈值的识别结果对应的测试图像作为参考图像。其中,目标识别结果为响应于对准确度小于等于准确度阈值的识别结果的调整操作,所得到的调整后的识别结果。
准确度阈值可以根据实际需求进行设定。例如,准确度阈值可以为85%、90%、95%或97%等。
本申请实施例中,服务器20将测试图像中准确度大于准确度阈值的识别结果对应的测试图像作为参考图像;针对准确度小于或者等于准确度阈值的识别结果,响应于对该识别结果的调整操作,获取调整后的识别结果作为目标识别结果,并将目标识别结果对应的测试图像也作为参考图像。
本申请实施例中,以准确度大于准确度阈值的识别结果为第一识别结果,对准确度小于或者等于准确度阈值的识别结果为第二识别结果为例。服务器20将第一识别结果对应的测试图像作为参考图像;另外,服务器20响应于对第二识别结果的调整操作,获取调整后的第二识别结果作为目标识别结果,将目标识别结果对应的测试图像也作为参考图像。
可以理解的,本申请实施例中,第一识别结果与第二识别结果为结果相反的两个识别结果,例如,第一识别结果指示目标对象为A,则第二识别结果指示目标对象不是A,基于此,可以以有限的资源,丰富训练数据,提高了训练得到的对象识别模型的准确度。
本申请实施例中,服务器20将所有的测试图像均作为参考图像。若一个测试图像的识别结果的准确度大于准确度阈值,则将该识别结果作为相应的参考图像的识别结果;若测试图像的识别结果的准确度小于或者等于准确度阈值,则对该识别结果进行调整,得到调整后的识别结果,即目标识别结果,将该目标识别结果作为相应的参考图像的识别结果。
在一个例子中,如图4所示的测试图像为:包括目标对象为做瑜伽的人体的测试图像。服务器20将该测试图像输入当前的对象识别模型,得到该测试图像的识别结果为:目标对象为摔倒在地的人体,且该识别结果的准确度为75%。该准确度75%低于准确度阈值85%。人工审核该识别结果的过程中,服务器20响应于输入的对该识别结果的调整操作,得到调整后的识别结果为目标对象为非摔倒在地的人体。因此,服务器20获取的目标识别结果为:目标对象为非摔倒在地的人体。图4所示的测试图像为该目标识别结果对应的测试图像。服务器20将图4所示的测试图像作为参考图像,该参考图像的识别结果为:目标对象为非摔倒在地的人体。
步骤四:服务器20将目标识别结果以及准确度大于准确度阈值的识别结果作为每个识别结果对应的参考图像的标注信息。
对于每个目标识别结果,服务器20将该目标识别结果作为该目标识别结果对应的参考图像的标注信息;对于准确度大于准确度阈值的每个识别结果,服务器20将该识别结果作为该识别结果对应的参考图像的标注信息。
基于步骤三中的示例,服务器20将目标对象为非摔倒在地的人体作为图4所示参考图像的标注信息。
S12、服务器20根据获取到的第一监测图像和参考图像,生成同一监测点位在不同时间的融合图像。其中,融合图像包括目标对象以及第一监测图像的背景。
本申请实施例中,对于一个监测点位,第一监测图像可以理解为背景图像。服务器20将参考图像分别与该监测点位在不同时间的第一监测图像融合,得到该监测点位在不同时间的融合图像。
在一种可能的实现方式中,服务器20获取参考图像中目标对象的图像,并根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的融合图像。
在一个例子中,服务器20对如图5中a所示的包括有目标对象的图像使用抠图算法,得到图5中b所示的目标对象的图像。如图6所示,服务器20根据预设的图像融合算法,融合第一监测图像a和目标对象的图像b,得到融合图像c的示意图。
可选的,服务器20根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的中间图像,对同一监测点位在不同时间的中间图像进行数据增强处理,得到更多的融合图像。
示例性的,数据增强处理可以包括但不限于添加噪声、调整对比度、调整饱和度、裁剪和缩放等处理。基于此,服务器20可以为得到的中间图像添加噪声、调整融合图像的对比度、调整融合图像的饱和度、对融合图像进行裁剪或者对融合图像进行缩放等数据增强处理,得到更多的融合图像。
S13、服务器20将标注信息确定为融合图像的标注结果。
对于每一参考图像,服务器20将该参考图像与第一监测图像融合,得到融合图像后,将该参考图像的标注信息作为该融合图像的标注结果。
基于图5中a所示的参考图像的标注信息为:目标对象为摔倒在地的人体,因此,图6所示的融合图像的标注结果为:目标对象为摔倒在地的人体。
可以理解的是,服务器20执行上述S11~S13可以自动生成多个融合图像以及融合图像的标注结果,这多个融合图像可以作为样本图像,融合图像的标注结果可以作为样本图像的标注结果,进而基于这多个样本图像以及样本图像的标注结果,训练当前的对象识别模型,降低了获取样本图像以及样本图像的标注结果的人工成本。
S14、服务器20根据融合图像和标注结果,对当前的对象识别模型进行迭代训练,直到模型收敛,得到第一目标对象识别模型。第一目标对象识别模型用于识别同一监测点位的监测图像中的目标对象。监测图像指的是位于同一监测点的图像采集装置10采集的任一图像。
可以理解的是,服务器20还可以将第一目标对象识别模型作为新的当前的对象识别模型,重新执行上述S11~S14对当前的对象识别模型进行迭代训练,从而得到新的第一目标对象识别模型。新的第一目标对象识别模型在对该监测点位获取的包括目标对象的图像进行目标对象识别时更准确。
为了使当前的对象识别模型适用于特定监测点位获取的图像中目标对象的识别,本申请实施例中,根据获取到的该特定监测点位的第一监测图像和参考图像,生成该特定监测点位在不同时间的融合图像,融合图像包括了目标对象以及第一监测图像的背景,第一监测图像可以为该特定监测点位采集得到的仅包括背景的图像,也可以为该特定监测点位采集得到的包括背景以及前景的图像。将参考图像的标注信息确定为融合图像的标注结果,参考图像的标注信息表征参考图像中目标对象的识别结果。这样,根据融合图像及融合图像的标注结果,训练当前的对象识别模型,得到的第一目标对象识别模型即可用于识别该特定监测点位获取的图像中的目标对象,而无需人工获取该特定监测点位获取的包括目标对象的样本图像,节省了人工成本,进而可以更高效率提升训练的对象识别模型的适应性。
在一种可能的实现方式中,在获得第一目标对象识别模型之后,服务器20还可以获取该同一监测点位采集的包括待识别对象的第二监测图像,并将第二监测图像输入第一目标对象识别模型,得到第二监测图像中待识别对象的中间识别结果;响应于对中间识别结果的调整操作,获取调整后的中间识别结果,并根据调整后的中间识别结果以及第二监测 图像,对第一目标对象识别模型进行优化,得到第二目标对象识别模型。
如图7所示,图7为对第一目标对象识别模型进行优化得到第二目标对象识别模型的方法的一种流程示意图,该方法可以包括:
S21、服务器20获取同一监测点位包括待识别对象的第二监测图像。
本申请实施例中,服务器20接收图像采集装置10发送的以第一场景为背景区域、且包括待识别对象的第二监测图像。
S21可以为:图像采集装置10向服务器20发送第二监测图像,该第二监测图像以第一场景为背景区域,且该第二监测图像包括待识别对象。
示例性的,服务器20获取的第二监测图像包括如图8中a和b所示的监测图像。图8中a所示的对象80为待识别对象的图像,图8中b所示的对象81为待识别对象的图像。
待识别对象与上述目标对象可以为相同的对象,也可以为不同的对象,本申请实施例对此不进行限定。
S22、服务器20将第二监测图像输入第一目标对象识别模型,得到第二监测图像中待识别对象的中间识别结果。
基于S21中的示例,图8中a所示待识别对象的中间识别结果为:待识别对象为非摔倒在地的人体,图8中b所示待识别对象的中间识别结果为:待识别对象为摔倒在地的人体。
S23、服务器20响应于对中间识别结果的调整操作,获取调整后的中间识别结果。调整后的中间识别结果用于表征待识别对象是否为目标对象。调整后的中间识别结果为待识别对象的正确识别结果。
以目标对象为非摔倒在地的人体为例。基于S22中的示例,图8中a所示待识别对象的调整后的中间识别结果为:待识别对象为摔倒在地的人体,图8中b所示待识别对象的调整后的中间识别结果为:待识别对象为非摔倒在地的人体。
可以理解的是,上述第二监测图像可以是多个第二监测图像,每个第二监测图像分别包括待识别对象。对于每个第二监测图像均可以执行上述S22~S23,进而得到每个第二监测图像的中间识别结果。在中间识别结果错误的情况下,获取调整后的中间识别结果。
示例性的,假设,第一目标对象识别模型可以用于识别对象A、对象B和对象C。将多个第二监测图像输入第一目标对象识别模型,得到的第二监测图像中待识别对象的中间识别结果,以及获取的调整后的中间识别结果如下表1所示。
表1
中间识别结果 调整后的中间识别结果
A
A
B
B
C
C
表1中的A用于表征对象A,Aˊ用于表征非对象A。B用于表征对象B,Bˊ用于表征非对象B,C用于表征对象C,Cˊ用于表征非对象C。
S24、服务器20将调整后的中间识别结果确定为第二监测图像的标注结果。
可以理解的是,服务器20可以通过执行上述S21~S24,得到多个第二监测图像以及每个第二监测图像的标注结果。
S25、服务器20根据第二监测图像以及第二监测图像的标注结果,对第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。
本申请实施例中,服务器20可以在得到多个第二监测图像以及每个第二监测图像的标注结果之后,根据该多个第二监测图像以及与每个第二监测图像的标注结果,对第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。服务器20也可以在每得到一个第二监测图像及该第二监测图像的标注结果之后,根据该第二监测图像及该第二监测图像的标注结果,对第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。本申请实施例对此不进行限定。
可以理解的是,在得到多个第二监测图像以及每个第二监测图像的标注结果之后,根据该多个第二监测图像以及与每个第二监测图像的标注结果,对第一目标对象识别模型进行迭代训练,得到的第二目标对象识别模型更加稳定。
本申请实施例中,在将第一目标对象识别模型应用于第一场景中目标对象的识别的过程中,人为对识别错误的识别结果进行调整(即上述响应于对中间识别结果的调整操作,获取调整后的中间识别结果),并将调整后的识别结果作为对应的第二监测图像的标注结果,来对第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。由于第二目标对象识别模型学习了调整后的中间识别结果,因此,得到的第二目标对象识别模型识别第一场景中目标对象的准确度高于第一目标对象识别模型识别第一场景中目标对象的准确度,即第二准确度高于第一准确度,第一准确度为第一目标对象识别模型识别第一场景中目标对象的准确度,第二准确度为第二目标对象识别模型识别第一场景中目标对象的准确度。
本申请实施例还提供了一种对象识别模型的训练方法,该方法中,服务器获取同一监测点位包括待识别对象的第三监测图像,第三监测图像的标注结果用于表征待识别对象是否候选对象;将第三监测图像输入第一目标对象识别模型,得到第三监测图像中待识别对象的中间识别结果;响应于中间识别结果表征待识别对象为目标对象,以及对第一目标对象识别模型的调整操作,获取调整后的第一目标对象识别模型;调整后的第一目标对象识别模型用于输出待识别对象是否为候选对象的识别结果;根据第三监测图像以及第三监测图像的标注结果,对调整后的第一目标对象识别模型进行迭代训练,得到第三目标对象识别模型。
例如,服务器获取的监测图像a中包括待识别对象为梨子,第一目标对象识别模型对该监测图像a进行识别,得到中间识别结果为该监测图像a中待识别对象为苹果,则第一目标对象识别模型进行调整,获取调整后的第一目标对象识别模型,该调整后的第一目标对象识别模型用于输出待识别对象是否为梨子的识别结果。服务器根据该监测图像a以及 该监测图像a的标注结果,对调整后的第一目标对象识别模型进行迭代训练,得到第三目标对象识别模型。
本申请实施例中,根据实时获取的监测图像对目标对象识别模型进行调整以及训练,进一步提高了目标对象识别模型识别的准确性。
上述主要从方法的角度对本申请实施例提供的方案进行了介绍。为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例,对对象识别模型的训练装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图9所示,为本申请实施例提供的一种对象识别模型的训练装置30的第一种结构示意图。对象识别模型的训练装置30包括获取模块301、生成模块302、确定模块303以及训练模块304。
其中,获取模块301,用于获取同一监测点位在不同时间的第一监测图像、包括有目标对象的参考图像以及标注信息;所述标注信息用于表征所述参考图像中目标对象的识别结果,且所述标注信息的准确度大于准确度阈值。
例如:结合图3所示的流程,获取模块301可以用于执行S11;结合图7所示的流程,获取模块301还可以用于执行S21~S23。
生成模块302,用于根据获取模块301获取到的第一监测图像和参考图像,生成同一监测点位在不同时间的融合图像,融合图像包括目标对象以及第一监测图像的背景。例如:结合图3所示的流程,生成模块302可以用于执行S12。
确定模块303,用于将获取模块301获取到的标注信息确定为生成模块302生成的融合图像的标注结果。例如:结合图3所示的流程,确定模块303可以用于执行S13;结合图7所示的流程,确定模块303还可以用于执行S24。
训练模块304,用于根据生成模块302生成的融合图像和确定模块303确定的标注结果,对当前的对象识别模型进行迭代训练,直到模型收敛,得到第一目标对象识别模型;第一目标对象识别模型用于识别同一监测点位的监测图像中的目标对象。
例如:结合图3所示的流程,训练模块304可以用于执行S14,结合图7训练模块304还可以用于执行S25。
可选的,生成模块302,具体可以用于:获取参考图像中目标对象的图像;根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的融合图像。
可选的,生成模块302,具体可以用于:根据预设的图像融合算法,融合同一监测点位在不同时间的第一监测图像和目标对象的图像,得到同一监测点位在不同时间的中间图像;对同一监测点位在不同时间的中间图像进行数据增强处理,得到同一监测点位在不同时间的融合图像。
可选的,获取模块301,具体可以用于:将包括有目标对象的测试图像输入到当前的对象识别模型,得到每一测试图像的识别结果;将目标识别结果以及准确度大于准确度阈值的识别结果对应的测试图像作为参考图像;目标识别结果为响应于对准确度小于等于准确度阈值的识别结果的调整操作,所得到的调整后的识别结果;将目标识别结果以及准确度大于准确度阈值的识别结果作为标注信息。
可选的,获取模块301还可以用于:获取同一监测点位包括待识别对象的第二监测图像;将第二监测图像输入第一目标对象识别模型,得到第二监测图像中待识别对象的中间识别结果;响应于对中间识别结果的调整操作,获取调整后的中间识别结果,调整后的中间识别结果用于表征待识别对象是否为目标对象;
确定模块303,还可以用于将调整后的中间识别结果确定为第二监测图像的标注结果。
训练模块304还可以用于:根据第二监测图像以及第二监测图像的标注结果,对第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,其作用在此不再赘述。
可选的,获取模块301还可以用于:获取同一监测点位包括待识别对象的第三监测图像,第三监测图像的标注结果用于表征待识别对象是否候选对象;将第三监测图像输入第一目标对象识别模型,得到第三监测图像中待识别对象的中间识别结果;响应于中间识别结果表征待识别对象为目标对象,以及对第一目标对象识别模型的调整操作,获取调整后的第一目标对象识别模型;调整后的第一目标对象识别模型用于输出待识别对象是否为候选对象的识别结果;
训练模块304还可以用于:根据第三监测图像以及第三监测图像的标注结果,对调整后的第一目标对象识别模型进行迭代训练,得到第三目标对象识别模型。
图10为本申请实施例提供的一种对象识别模型的训练装置30的第二种结构示意图,如图10所示,该对象识别模型的训练装置30可以包括:至少一个处理器51、存储器52、通信接口53和通信总线54。
下面结合图10对对象识别模块的训练装置的各个构成部件进行具体的介绍:
其中,处理器51是对象识别模块的训练装置的控制中心,可以是一个处理器,也可以是多个处理元件的统称。例如,处理器51是一个中央处理器(Central Processing Unit,CPU),也可以是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,例如:一个或多个DSP,或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)。
作为一种实施例,处理器51可以包括一个或多个CPU,例如图10中所示的CPU0和CPU1。且,作为一种实施例,对象识别模块的训练装置可以包括多个处理器,例如图10 中所示的两个处理器51。这些处理器中的每一个可以是一个单核处理器(Single-CPU),也可以是一个多核处理器(Multi-CPU)。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
存储器52可以是只读存储器(Read-Only Memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(Random Access Memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器52可以是独立存在,通过通信总线54与处理器51相连接。存储器52也可以和处理器51集成在一起。
其中,存储器52,用于存储本申请实施例中的数据和执行本申请实施例的软件程序。处理器51可以通过运行或执行存储在存储器52内的软件程序,以及调用存储在存储器52内的数据,执行对象识别模型的训练装置的各种功能。
通信接口53,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如无线接入网(Radio Access Network,RAN),无线局域网(Wireless Local Area Networks,WLAN)、终端、云端等。通信接口53可以包括实现接收功能的接收单元,以及实现发送功能的发送单元。
通信总线54,可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
作为一个示例,结合图9所示,对象识别模块的训练装置30中的获取模块301的接收功能可以由图10中的通信接口53实现,获取模块301中的处理功能、生成模块302、确定模块303以及训练模块304的功能均可以由处理器51调用存储器52中的软件程序实现。
本申请另一实施例还提供一种对象识别模型的训练装置,该对象识别模型的训练装置包括:存储器和处理器;存储器和处理器耦合;存储器用于存储计算机程序代码,计算机程序代码包括计算机指令;当处理器执行计算机指令时,对象识别模型的训练装置执行任一的对象识别模型的训练方法。
本申请另一实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当指令在对象识别模型的训练装置上运行时,使得对象识别模型的训练装置执行上述方法实施例所示的方法。
本申请另一实施例还提供一种计算机程序,当处理器执行计算机程序时,对象识别模型的训练装置执行任一的对象识别模型的训练方法。
在一些实施例中,所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。
图11示意性地示出本申请实施例提供的计算机程序产品的概念性局部视图,所述计算机程序产品包括用于在对象识别模型的训练装置上执行计算机进程的计算机程序。
在一个实施例中,计算机程序产品是使用信号承载介质410来实现的。信号承载介质410可以包括一个或多个程序指令,当其被一个或多个处理器运行时,可以提供以上针对图3和图7描述的功能或者部分功能。
例如,参考图3中所示的实施例,S11-S14的一个或多个特征可以由与信号承载介质410相关联的一个或多个程序指令来承担。再例如,参考图7中所示的实施例,S21-S24的一个或多个特征可以由与信号承载介质410相关联的一个或多个程序指令来承担。此外,图11中的程序指令也描述示例指令。
在一些示例中,信号承载介质410可以包含计算机可读介质411,诸如但不限于,硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(random access memory,RAM)等等。
在一些实施方式中,信号承载介质410还可以包含计算机可记录介质412,诸如但不限于,存储器、读/写(R/W)CD、R/W DVD等等。
在一些实施方式中,信号承载介质410还可以包含通信介质413,诸如但不限于,数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。
信号承载介质410可以由无线形式的通信介质413(例如,遵守IEEE 802.41标准或者其它传输协议的无线通信介质)来传达。一个或多个程序指令可以是计算机可执行指令或者逻辑实施指令等。
在一些示例中,诸如针对图3描述的服务器20可以被配置为,响应于计算机可读介质411、计算机可记录介质412、和/或通信介质413中的一个或多个程序指令,提供各种操作、功能、或者动作。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种对象识别模型的训练方法,所述训练方法包括:
    获取同一监测点位在不同时间的第一监测图像、包括有目标对象的参考图像以及标注信息;所述标注信息用于表征所述参考图像中目标对象的识别结果,且所述标注信息的准确度大于准确度阈值;
    根据获取到的第一监测图像和参考图像,生成所述同一监测点位在不同时间的融合图像,所述融合图像包括所述目标对象以及所述第一监测图像的背景;
    将所述标注信息确定为所述融合图像的标注结果;
    根据所述融合图像和所述标注结果,对当前的对象识别模型进行迭代训练,直到模型收敛,得到第一目标对象识别模型;所述第一目标对象识别模型用于识别所述同一监测点位的监测图像中的所述目标对象。
  2. 根据权利要求1所述的训练方法,其中,所述根据获取到的第一监测图像和参考图像,生成所述同一监测点位在不同时间的融合图像,包括:
    获取所述参考图像中所述目标对象的图像;
    根据预设的图像融合算法,融合所述同一监测点位在不同时间的第一监测图像和所述目标对象的图像,得到所述同一监测点位在不同时间的融合图像。
  3. 根据权利要求2所述的训练方法,其中,所述根据预设的图像融合算法,融合所述同一监测点位在不同时间的第一监测图像和所述目标对象的图像,得到所述同一监测点位在不同时间的融合图像,包括:
    根据预设的图像融合算法,融合所述同一监测点位在不同时间的第一监测图像和所述目标对象的图像,得到所述同一监测点位在不同时间的中间图像;
    对所述同一监测点位在不同时间的中间图像进行数据增强处理,得到所述同一监测点位在不同时间的融合图像。
  4. 根据权利要求1-3任一项所述的训练方法,其中,获取包括有目标对象的参考图像以及标注信息,包括:
    将包括有目标对象的测试图像输入到所述当前的对象识别模型,得到每一测试图像的识别结果;
    将目标识别结果以及准确度大于所述准确度阈值的识别结果对应的测试图像作为参考图像;所述目标识别结果为响应于对准确度小于等于所述准确度阈值的识别结果的调整操作,所得到的调整后的识别结果;
    将所述目标识别结果以及所述准确度大于所述准确度阈值的识别结果作为标注信息。
  5. 根据权利要求1-3任一项所述的训练方法,其中,所述方法还包括:
    获取所述同一监测点位包括待识别对象的第二监测图像;
    将所述第二监测图像输入所述第一目标对象识别模型,得到所述第二监测图像中所述待识别对象的中间识别结果;
    响应于对所述中间识别结果的调整操作,获取调整后的中间识别结果;所述调整后的中间识别结果用于表征所述待识别对象是否为所述目标对象;
    将所述调整后的中间识别结果确定为所述第二监测图像的标注结果;
    根据所述第二监测图像以及所述第二监测图像的标注结果,对所述第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。
  6. 根据权利要求1-3任一项所述的训练方法,其中,所述方法还包括:
    获取所述同一监测点位包括待识别对象的第三监测图像,所述第三监测图像的标注结果用于表征所述待识别对象是否候选对象;
    将所述第三监测图像输入所述第一目标对象识别模型,得到所述第三监测图像中所述待识别对象的中间识别结果;
    响应于所述中间识别结果表征所述待识别对象为所述目标对象,以及对所述第一目标对象识别模型的调整操作,获取调整后的第一目标对象识别模型;所述调整后的第一目标对象识别模型用于输出所述待识别对象是否为所述候选对象的识别结果;
    根据所述第三监测图像以及所述第三监测图像的标注结果,对调整后的第一目标对象识别模型进行迭代训练,得到第三目标对象识别模型。
  7. 一种对象识别模型的训练装置,包括:
    获取模块,用于获取同一监测点位在不同时间的第一监测图像、包括有目标对象的参考图像以及标注信息;所述标注信息用于表征所述参考图像中目标对象的识别结果,且所述标注信息的准确度大于准确度阈值;
    生成模块,用于根据所述获取模块获取到的第一监测图像和参考图像,生成所述同一监测点位在不同时间的融合图像,所述融合图像包括所述目标对象以及所述第一监测图像的背景;
    确定模块,用于将所述获取模块获取到的所述标注信息确定为所述生成模块生成的所述融合图像的标注结果;
    训练模块,用于根据所述生成模块生成的所述融合图像和所述确定模块确定的所述标注结果,对当前的对象识别模型进行迭代训练,直到模型收敛,得到第一目标对象识别模型;所述第一目标对象识别模型用于识别所述同一监测点位的第一监测图像中的所述目标对象。
  8. 根据权利要求7所述的训练装置,其中,所述生成模块,具体用于
    获取所述参考图像中所述目标对象的图像;
    根据预设的图像融合算法,融合所述同一监测点位在不同时间的第一监测图像和所述目标对象的图像,得到所述同一监测点位在不同时间的融合图像。
  9. 根据权利要求7所述的训练装置,其中,所述生成模块具体用于:
    根据预设的图像融合算法,融合所述同一监测点位在不同时间的第一监测图像和所述目标对象的图像,得到所述同一监测点位在不同时间的中间图像;
    对所述同一监测点位在不同时间的中间图像进行数据增强处理,得到所述同一监测点位在不同时间的融合图像。
  10. 根据权利要求7-9任一项所述的训练装置,其中,所述获取模块具体用于:
    将包括有目标对象的测试图像输入到所述当前的对象识别模型,得到每一测试图像的 识别结果;
    将目标识别结果以及准确度大于所述准确度阈值的识别结果对应的测试图像作为参考图像;所述目标识别结果为响应于对准确度小于等于所述准确度阈值的识别结果的调整操作,所得到的调整后的识别结果;
    将所述目标识别结果以及所述准确度大于所述准确度阈值的识别结果作为标注信息。
  11. 根据权利要求7-9任一项所述的训练装置,其中,
    所述获取模块还用于:获取所述同一监测点位包括待识别对象的第二监测图像;将所述第二监测图像输入所述第一目标对象识别模型,得到所述第二监测图像中所述待识别对象的中间识别结果;响应于对所述中间识别结果的调整操作,获取调整后的中间识别结果;所述调整后的中间识别结果用于表征所述待识别对象是否为所述目标对象;
    所述确定模块还用于:将所述调整后的中间识别结果确定为所述第二监测图像的标注结果;
    所述训练模块还用于:根据所述第二监测图像以及所述第二监测图像的标注结果,对所述第一目标对象识别模型进行迭代训练,得到第二目标对象识别模型。
  12. 根据权利要求7-9任一项所述的训练装置,其中,
    所述获取模块还用于:获取所述同一监测点位包括待识别对象的第三监测图像,所述第三监测图像的标注结果用于表征所述待识别对象是否候选对象;将所述第三监测图像输入所述第一目标对象识别模型,得到所述第三监测图像中所述待识别对象的中间识别结果;响应于所述中间识别结果表征所述待识别对象为所述目标对象,以及对所述第一目标对象识别模型的调整操作,获取调整后的第一目标对象识别模型;所述调整后的第一目标对象识别模型用于输出所述待识别对象是否为所述候选对象的识别结果;
    所述训练模块还用于:根据所述第三监测图像以及所述第三监测图像的标注结果,对调整后的第一目标对象识别模型进行迭代训练,得到第三目标对象识别模型。
  13. 一种计算机可读存储介质,其中,存储有指令,当所述指令在对象识别模型的训练装置上运行时,使得对象识别模型的训练装置执行如上述权利要求1-6任一项所述的训练方法。
  14. 一种对象识别模型的训练装置,其中,包括:
    存储器和处理器;所述存储器和所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令;当所述处理器执行所述计算机指令时,所述对象识别模型的训练装置执行如权利要求1-6中任意一项所述的对象识别模型的训练方法。
  15. 一种计算机程序产品,其中,当所述计算机程序产品在对象识别模型的训练装置上运行时,使得所述对象识别模型的训练装置执行如权利要求1-6中任意一项所述的对象识别模型的训练方法。
PCT/CN2021/134345 2021-03-18 2021-11-30 对象识别模型的训练方法、装置及存储介质 WO2022193731A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110290714.0A CN115147671A (zh) 2021-03-18 2021-03-18 对象识别模型的训练方法、装置及存储介质
CN202110290714.0 2021-03-18

Publications (1)

Publication Number Publication Date
WO2022193731A1 true WO2022193731A1 (zh) 2022-09-22

Family

ID=83321697

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134345 WO2022193731A1 (zh) 2021-03-18 2021-11-30 对象识别模型的训练方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN115147671A (zh)
WO (1) WO2022193731A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8165407B1 (en) * 2006-10-06 2012-04-24 Hrl Laboratories, Llc Visual attention and object recognition system
CN108898185A (zh) * 2018-07-03 2018-11-27 北京字节跳动网络技术有限公司 用于生成图像识别模型的方法和装置
CN110516514A (zh) * 2018-05-22 2019-11-29 杭州海康威视数字技术股份有限公司 一种目标检测模型的建模方法和装置
CN111145177A (zh) * 2020-04-08 2020-05-12 浙江啄云智能科技有限公司 图像样本生成方法、特定场景目标检测方法及其系统
CN112258504A (zh) * 2020-11-13 2021-01-22 腾讯科技(深圳)有限公司 一种图像检测方法、设备及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8165407B1 (en) * 2006-10-06 2012-04-24 Hrl Laboratories, Llc Visual attention and object recognition system
CN110516514A (zh) * 2018-05-22 2019-11-29 杭州海康威视数字技术股份有限公司 一种目标检测模型的建模方法和装置
CN108898185A (zh) * 2018-07-03 2018-11-27 北京字节跳动网络技术有限公司 用于生成图像识别模型的方法和装置
CN111145177A (zh) * 2020-04-08 2020-05-12 浙江啄云智能科技有限公司 图像样本生成方法、特定场景目标检测方法及其系统
CN112258504A (zh) * 2020-11-13 2021-01-22 腾讯科技(深圳)有限公司 一种图像检测方法、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN115147671A (zh) 2022-10-04

Similar Documents

Publication Publication Date Title
KR20210144625A (ko) 영상 데이터 처리 방법, 장치 및 판독 가능 저장 매체
WO2020093694A1 (zh) 生成视频分析模型的方法及视频分析系统
WO2016004673A1 (zh) 一种基于云服务的智能目标识别装置、系统及方法
CN109308490B (zh) 用于生成信息的方法和装置
WO2020010694A1 (zh) 动物健康监测方法、装置及计算机可读存储介质
US8965067B2 (en) Face data acquirer, end user video conference device, server, method, computer program and computer program product for extracting face data
US20170109912A1 (en) Creating a composite image from multi-frame raw image data
EP2688296A1 (en) Video monitoring system and method
JP6800351B2 (ja) 電極シートのバリを検出するための方法および装置
CN110022463A (zh) 动态场景下实现视频感兴趣区域智能编码方法及系统
CN113255685B (zh) 一种图像处理方法、装置、计算机设备以及存储介质
CN111832447A (zh) 建筑图纸构件识别方法、电子设备及相关产品
CN113408566A (zh) 目标检测方法及相关设备
CN111656275B (zh) 一种确定图像对焦区域的方法及装置
CN111226226A (zh) 基于运动的对象检测方法及其对象检测装置和电子设备
US11979660B2 (en) Camera analyzing images on basis of artificial intelligence, and operating method therefor
CN111343416B (zh) 一种分布式图像分析方法、系统及存储介质
WO2022193731A1 (zh) 对象识别模型的训练方法、装置及存储介质
CN111723767B (zh) 一种图像处理方法、装置及计算机存储介质
WO2021027555A1 (zh) 一种人脸检索方法及装置
US20050052535A1 (en) Context sensitive camera
US20170133059A1 (en) Method and system for video data stream storage
US20170126951A1 (en) Updating an exposure table of an image sensor
CN113705643B (zh) 一种目标物检测方法、装置以及电子设备
WO2016180324A1 (zh) 一种目标人物的轨迹信息的确定方法、系统及处理服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21931305

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21931305

Country of ref document: EP

Kind code of ref document: A1