CN111047625B - Semi-automatic dish video sample marking method - Google Patents

Semi-automatic dish video sample marking method Download PDF

Info

Publication number
CN111047625B
CN111047625B CN201911406896.2A CN201911406896A CN111047625B CN 111047625 B CN111047625 B CN 111047625B CN 201911406896 A CN201911406896 A CN 201911406896A CN 111047625 B CN111047625 B CN 111047625B
Authority
CN
China
Prior art keywords
video
frame
background
positioning
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911406896.2A
Other languages
Chinese (zh)
Other versions
CN111047625A (en
Inventor
王阔阔
许野平
方亮
瞿晨非
凌桂婷
王龙春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Synthesis Electronic Technology Co Ltd
Original Assignee
Synthesis Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Synthesis Electronic Technology Co Ltd filed Critical Synthesis Electronic Technology Co Ltd
Priority to CN201911406896.2A priority Critical patent/CN111047625B/en
Publication of CN111047625A publication Critical patent/CN111047625A/en
Application granted granted Critical
Publication of CN111047625B publication Critical patent/CN111047625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The invention relates to a semi-automatic labeling method for a dish video sample, which comprises the following steps: s01), opening an original video; s02) obtaining a current frame image; s03), manually marking, and establishing a background model; s04) setting a threshold value according to the video characteristics, if the threshold value is larger than the threshold value and is considered to be the same area, transmitting the label information under the background modeling algorithm to an area positioned by a positioning network, updating the background model by taking the positioning area of the positioning network as a reference, and if the threshold value is smaller than the threshold value, giving up the updating and keeping the original background model; simultaneously updating the video display picture; s05), acquiring the current marked region coordinates and mark data such as a label; s06) and storing. The invention has the beneficial effects that: by combining a positioning network and a background modeling algorithm, the speed of sample marking can be effectively increased, the labor cost is greatly reduced, and a solid guarantee is provided for the process of deep learning later.

Description

Semi-automatic dish video sample marking method
Technical Field
The invention relates to a semi-automatic labeling method for a dish video sample, which adopts a background modeling algorithm to complete the tracking of a video target by combining a positioning network so as to obtain a labeled sample.
Background
Dish identification is an article identification technology for identifying a brand name based on characteristic information of the surface of a dish, can effectively reduce the labor cost in the dish settlement link, and can play an important role in subsequent nutrition analysis. However, after a good effect is achieved, a large amount of dish samples are needed to be provided to ensure the learning accuracy, and the quality of the samples directly determines whether the network after learning has a better recognition rate. The collection of dish video samples belongs to one type of image tracking, and the problem is always a difficult problem in the field of computer vision, and the target position in a first frame is known in advance, and then a target needs to be found in a subsequent frame. The illumination change, the target dimension change, the target being shielded, the deformation of the target, the motion blur, the fast motion of the target, the rotation of the target, the target escape parallax, the background clutter, the low resolution and other phenomena in the tracking process are all the challenges of the target tracking problem.
In a patent "method for detecting smoke in video image" (application No. 201610004646.6, publication No. CN 105654494A), a method for tracking smoke in video based on color features is proposed, which can effectively reduce the influence of interference and noise. In the dish video, tracking samples are complex, color features among dishes are not obvious, and the method is not suitable;
a video target tracking method based on Hough forests is provided in the patent 'Hough forest based video target tracking method' (application number 201210253267.2, publication number CN 102831618A) so as to improve the robustness of target occlusion and non-rigid change and the real-time performance of target tracking. The Hough transform and the random forest classifier are combined to be used as a detector to detect a target, a Lucas-Kanade tracker tracks the target, the Hough transform and the random forest classifier are combined to improve the performance of the random forest classifier, so that the tracking of the random forest classifier on the target shielding and the non-rigid change of the target is more robust, meanwhile, the scale of a target area is adjusted by an introduced Lucas-Kanade method, the position of the target is further determined, and the tracking is well adapted to the scale change of the target. Because the Lucas-Kanade tracker is used, the technology needs three assumptions in the using process, namely constant brightness, small movement and consistent space, and in order to ensure that the training is not limited by the posture shadow, the illumination angle of the dish video sample can change along with the change of time, so the method is not suitable.
Disclosure of Invention
The invention aims to solve the technical problem that multi-target tracking is inaccurate in the dish video sample marking process, and further provides a semi-automatic dish video sample marking method, so that the samples can be quickly marked under less human intervention, and the sample marking speed is effectively improved.
In order to solve the technical problem, the technical idea adopted by the invention is as follows: the video is divided into frame images, the frame images are submitted to a positioning network to complete the positioning of the dish areas in the frame images of the video, the dish areas tracked by a background modeling algorithm are assisted, and the tracking of multiple targets in the video is completed by combining the frame images and the dish areas. Because dishes need to be contained in a container, the positioning network only needs to train a network for distinguishing dish areas, and compared with recognition, the network training is fast, samples need to be few, and the network training method is an indispensable step in a subsequent recognition process.
The invention discloses a semi-automatic labeling method for a dish video sample, which is characterized by comprising the following steps of:
s01), opening an original video;
s02) obtaining a current frame image;
s03), if the frame is the first frame, manually marking the initial position of the dish and inputting corresponding label information, and then establishing a background model;
s04) if the frame is not the first frame, manually judging whether the automatic marking needs to be suspended, manually marking again, if so, repeating the step S03, if not, respectively calculating a target area (hereinafter referred to as a background area) under a background modeling algorithm and a target area (hereinafter referred to as a positioning area) under a positioning network, calculating a union set and an intersection set of the areas of the background area and the positioning area, then calculating the proportion of the intersection set to the union set, setting a threshold value according to the characteristics of the video, considering the area to be the same area if the threshold value is larger than the threshold value, transmitting label information under the background modeling algorithm to the area positioned by the positioning network, updating the background model by taking the positioning area of the positioning network as a reference, abandoning the updating and keeping the original background model if the threshold value is smaller than the threshold value; simultaneously updating the video display picture;
s05), acquiring the current marked region coordinates and mark data such as a label;
s06), storing the video into a magnetic disk according to frames, storing the marked data into an xml file, and ending the marking work.
Further, the step S01 includes the following specific steps: s11), opening the video, if the video is a high-definition video, calculating that the background data volume is large, processing the video, reducing the resolution, and changing one frame of image into an image with a moderate resolution by using linear adjacent interpolation.
Further, S31), determining whether the first frame is present, and if the first frame is present, executing the following steps:
s32), manually marking out the interested region, naming the marked region, and finishing one-time marking;
s33), extracting a marked region, converting the marked region into an HSV color space, calculating a color histogram of the marked region, normalizing the color histogram to a value range of 0-255, and establishing a background model; .
Further, the algorithm of step S04 adopts an IOU algorithm; the threshold value is 0.8.
The invention has the beneficial effects that: according to the characteristics of the sample video, the positioning network and the background modeling algorithm are combined, the speed of sample marking can be effectively increased, the labor cost is greatly reduced, and a solid guarantee is provided for the later deep learning process.
Drawings
FIG. 1 is a flow chart of semi-automatic tagging;
FIG. 2 is a flow chart of manual labeling
Fig. 3 is an automatic labeling flow chart.
The specific implementation mode is as follows:
the invention will be further described with reference to the accompanying figures 1-3 and specific examples.
The embodiment discloses a video-based sample semi-automatic marking method which mainly comprises the steps of artificially marking an interested area, and then automatically updating the marked area by fusing a background modeling algorithm and a positioning network, thereby completing the marking work of the whole video sample.
As shown in fig. 1, the specific process is as follows:
s1), opening a video sample;
s2), acquiring a current frame image;
s31), judging whether a first frame exists, and if so, executing the following steps: (FIG. 2)
S32), manually marking out the interested region, naming the marked region, and finishing one-time marking;
s33), extracting an ROI (region of interest, namely a marked region), converting the ROI into HSV color space, calculating a color histogram of the ROI, normalizing the color histogram to a value range of 0-255, and establishing a background model;
s41), if the frame is not the first frame, manually judging whether the automatic marking needs to be suspended, manually marking again, if so, repeating the step S3, and if not, performing the following steps: (FIG. 3)
S42) according to the position and the size in the background model z of the previous frame and the color histogram as initial values, calculating a color probability distribution graph obtained by back projection of the histogram, calculating a centroid in a window on the probability distribution graph, moving the center of a search window to the centroid, performing iterative operation for several times, when the center point and the centroid point are smaller than a convergence condition, indicating that a target is matched, and then centralizing and marking a convergence region, thus obtaining a dish region (hereinafter referred to as a background region) under a background modeling algorithm; the searching central point of each frame is the centroid point of the target of the previous frame, and because the time interval between adjacent frames is small and the target change is small, the target part is contained in the searching window, the calculated iteration times are few, and the real-time performance is high;
s43), submitting the current image to a trained positioning network, and positioning the specific position of the dishes in the current image by the positioning network according to the uploaded image, so that the dish area under the positioning network is obtained. (hereinafter referred to as a positioning region)
S44), firstly, calculating the union and intersection of the background area and the positioning area;
s45), calculating the proportion of the intersection in the union set, considering that the matching is successful if the proportion is larger than a threshold value, and transmitting the information of the background area to the positioning area if the matching is successful; a match less than a threshold is considered to have failed; the method is a specific step of an IOU algorithm, the IOU algorithm is used for calculating area occupation, and the time complexity is O (1), so that the instantaneity of matching the IOU with two areas can be ensured;
s46), circularly calculating all background areas to find corresponding positioning areas, and if all the positioning areas and the background areas do not accord with the conditions, giving up the background updating and keeping the original information; simultaneously updating the video display picture;
s51), obtaining the area coordinates marked in the step and the label information to obtain marked data; continuing to perform the next frame until the video is finished;
s61), saving the video frame image as a jpg file, and saving the tag data as an xml file.
The above disclosure is only an exemplary embodiment of the present invention, which should not be taken as limiting the scope of the invention, and therefore, the appended claims are intended to cover all such modifications and changes as fall within the true spirit and scope of the invention.

Claims (3)

1. A method for semi-automatically marking video samples of dishes is characterized by comprising the following steps:
s01), opening an original video;
s02) obtaining a current frame image;
s03), if the frame is the first frame, manually marking the initial position of the dish and inputting corresponding label information, and then establishing a background model;
establishing a background model according to the steps of S31), S32) and S33):
s31), judging whether a first frame exists, and if so, executing the following steps:
s32), manually marking the interested region, naming the marked region, and finishing one-time marking;
s33), extracting a marked region, converting the marked region into an HSV color space, calculating a color histogram of the marked region, normalizing the color histogram to a value range of 0-255, and establishing a background model;
s04) if the frame is not the first frame, performing the following steps:
s41), if the frame is not the first frame, manually judging whether the automatic marking needs to be suspended or not, manually marking again, if so, repeating the step S03, and if not, performing the following steps S42), S43), S44), S45) and S46);
s42) according to the position and the size in the background model z of the previous frame and the color histogram as initial values, calculating a color probability distribution graph obtained by back projection of the histogram, calculating a centroid in a window on the probability distribution graph, moving the center of a search window to the centroid, performing iterative operation for several times, when a central point and a centroid point are smaller than a convergence condition, indicating that a target is matched, centralizing a convergence region and marking, and thus obtaining a dish region under a background modeling algorithm, namely a background region;
s43), submitting the current image to a trained positioning network, and positioning the specific position of the dishes in the current image by the positioning network according to the uploaded image, so that a dish area under the positioning network, namely a positioning area, is obtained;
s44), firstly, calculating the union and intersection of the background area and the positioning area;
s45), calculating the proportion of the intersection in the union set, considering that the matching is successful if the proportion is larger than a threshold value, and transmitting the information of the background area to the positioning area if the matching is successful; a match less than a threshold is considered to have failed;
s46), circularly calculating all background areas to find corresponding positioning areas, and if all the positioning areas and the background areas do not accord with the condition, giving up the background updating and keeping the original information; simultaneously updating the video display picture;
s05), obtaining the area coordinates marked in the step and label information to obtain marked data; continuing to perform the next frame until the video is finished;
s06), storing the video into a magnetic disk according to frames, storing the marked data into an xml file, and ending the marking work.
2. The method for semi-automatically labeling video samples of a dish as claimed in claim 1, wherein: the specific steps of step S01 are: s11), opening the video, if the video is a high-definition video, calculating the background data with larger amount, processing the video, reducing the resolution, and changing one frame of image into an image with moderate resolution by using linear adjacent interpolation.
3. The method for semi-automatically labeling video samples of a dish as claimed in claim 1, wherein: the algorithm of the step S04 adopts an IOU algorithm; the threshold value is 0.8.
CN201911406896.2A 2020-02-18 2020-02-18 Semi-automatic dish video sample marking method Active CN111047625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911406896.2A CN111047625B (en) 2020-02-18 2020-02-18 Semi-automatic dish video sample marking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911406896.2A CN111047625B (en) 2020-02-18 2020-02-18 Semi-automatic dish video sample marking method

Publications (2)

Publication Number Publication Date
CN111047625A CN111047625A (en) 2020-04-21
CN111047625B true CN111047625B (en) 2023-04-07

Family

ID=70242872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911406896.2A Active CN111047625B (en) 2020-02-18 2020-02-18 Semi-automatic dish video sample marking method

Country Status (1)

Country Link
CN (1) CN111047625B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381114A (en) * 2020-10-20 2021-02-19 广东电网有限责任公司中山供电局 Deep learning image annotation system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426937A (en) * 2015-11-06 2016-03-23 北京格分维科技有限公司 System and method used for automatic identification and broadcast of dish of intelligent plate
WO2018095082A1 (en) * 2016-11-28 2018-05-31 江苏东大金智信息系统有限公司 Rapid detection method for moving target in video monitoring
CN108986162A (en) * 2018-06-28 2018-12-11 四川斐讯信息技术有限公司 Vegetable and background segment method based on Inertial Measurement Unit and visual information
CN109508664A (en) * 2018-10-26 2019-03-22 浙江师范大学 A kind of vegetable identification pricing method based on deep learning
CN109684946A (en) * 2018-12-10 2019-04-26 成都睿码科技有限责任公司 A kind of kitchen mouse detection method based on the modeling of single Gaussian Background
CN110059654A (en) * 2019-04-25 2019-07-26 台州智必安科技有限责任公司 A kind of vegetable Automatic-settlement and healthy diet management method based on fine granularity identification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426937A (en) * 2015-11-06 2016-03-23 北京格分维科技有限公司 System and method used for automatic identification and broadcast of dish of intelligent plate
WO2018095082A1 (en) * 2016-11-28 2018-05-31 江苏东大金智信息系统有限公司 Rapid detection method for moving target in video monitoring
CN108986162A (en) * 2018-06-28 2018-12-11 四川斐讯信息技术有限公司 Vegetable and background segment method based on Inertial Measurement Unit and visual information
CN109508664A (en) * 2018-10-26 2019-03-22 浙江师范大学 A kind of vegetable identification pricing method based on deep learning
CN109684946A (en) * 2018-12-10 2019-04-26 成都睿码科技有限责任公司 A kind of kitchen mouse detection method based on the modeling of single Gaussian Background
CN110059654A (en) * 2019-04-25 2019-07-26 台州智必安科技有限责任公司 A kind of vegetable Automatic-settlement and healthy diet management method based on fine granularity identification

Also Published As

Publication number Publication date
CN111047625A (en) 2020-04-21

Similar Documents

Publication Publication Date Title
CN109360226B (en) Multi-target tracking method based on time series multi-feature fusion
CN107292252B (en) Identity recognition method for autonomous learning
Yang et al. Robust superpixel tracking
CN107424171B (en) Block-based anti-occlusion target tracking method
CN110827247B (en) Label identification method and device
Buehler et al. Upper body detection and tracking in extended signing sequences
CN112257676A (en) Pointer instrument reading method and system and inspection robot
CN112669349A (en) Passenger flow statistical method, electronic equipment and storage medium
CN111784737B (en) Automatic target tracking method and system based on unmanned aerial vehicle platform
CN109708658B (en) Visual odometer method based on convolutional neural network
Naufal et al. Preprocessed mask RCNN for parking space detection in smart parking systems
CN112132103A (en) Video face detection and recognition method and system
CN111047625B (en) Semi-automatic dish video sample marking method
Rapuru et al. Correlation-based tracker-level fusion for robust visual tracking
CN110458019B (en) Water surface target detection method for eliminating reflection interference under scarce cognitive sample condition
Chen et al. Exploring depth information for head detection with depth images
CN111415370A (en) Embedded infrared complex scene target real-time tracking method and system
CN117557859A (en) Ultrasonic image target multi-angle fusion analysis system and method based on target tracking
CN111797832B (en) Automatic generation method and system for image region of interest and image processing method
Qi et al. High-speed video salient object detection with temporal propagation using correlation filter
CN109064497B (en) Video tracking method based on color clustering supplementary learning
CN113689365B (en) Target tracking and positioning method based on Azure Kinect
CN107704864A (en) Well-marked target detection method based on image object Semantic detection
CN112991395B (en) Vision tracking method based on foreground condition probability optimization scale and angle
CN114387308A (en) Machine vision characteristic tracking system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant