CN114116094B

CN114116094B - Method and system for automatically collecting samples

Info

Publication number: CN114116094B
Application number: CN202111327499.3A
Authority: CN
Inventors: 林德银; 邓宏平
Original assignee: Shanghai Yingjue Technology Co ltd
Current assignee: Shanghai Yingjue Technology Co ltd
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2024-02-27
Anticipated expiration: 2041-11-10
Also published as: CN114116094A

Abstract

The invention provides a method and a system for automatically collecting samples, comprising the steps of recording the positions of all buttons of an operated software platform; simulating movement and clicking of a mouse through windows API, and repeating the step of manually operating the monitoring platform; and automatically judging the video frames according to a part of preset algorithm. According to the invention, by designing an automatic acquisition flow and combining a corresponding algorithm, the batch automatic acquisition of sample pictures is realized under the condition that only a small amount of manual intervention is needed, so that the labor cost is greatly reduced, and the method has quite a great heuristic significance for the construction of a machine learning system; the manpower is released from the tedious and boring sample selection problem, so that the quality problem caused by artificial drowsiness is reduced; the automatic acquisition method can work day and night, can be deployed on a plurality of computers in a copying mode, has the acquisition speed far faster than that of human beings, and improves the operation speed of projects.

Description

Method and system for automatically collecting samples

Technical Field

The invention relates to the technical field of deep learning, in particular to a method and a system for automatically collecting samples.

Background

Deep learning is already currently the most dominant method in the task of target intelligent analysis based on video surveillance. The method relies on manual collection of a large number of samples, relying on empirical knowledge of the human being condensed during the labeling process. The number and quality of samples have a great influence on the detection quality of the depth network. Currently, the cost expended for sample collection in each deep learning project already occupies the largest fraction of the cost of machine learning systems.

With respect to the related art in the above, the inventors consider that there are the following problems:

the picture frame is required to be read and intercepted to judge whether a video exists under the current date; early warning is required for the situation of video jamming; the video should be filtered out directly if it is still for a long time; the popup window can shield software to cause interference; and how to reduce the cost of sample collection in terms of manpower and time to the greatest extent possible, thereby reducing the cost of the overall system. Therefore, a technical solution is needed to improve the above technical problems.

Disclosure of Invention

In view of the drawbacks of the prior art, an object of the present invention is to provide a method and system for automated sample collection.

According to the invention there is provided a method of automated sample collection, the method comprising the steps of:

step S1: recording the positions of the buttons of the operated software platform;

step S2: simulating movement and clicking of a mouse through windows API, and repeating the step of manually operating the monitoring platform;

step S3: and automatically judging the video frames according to a part of preset algorithm.

Preferably, the camera is selected manually, and a time starting point corresponding to the video is set; then starting automation software, clicking at the corresponding button position, and carrying out video selection, video playing, progress bar dragging and video screenshot;

the buttons comprise a month selection in a day, a date selection, a triangle symbol of whether the day contains video, a player start play button and a progress bar button;

the button is manually configured to position before the software works, and a screen capturing image is displayed; the user clicks each button in turn with the mouse, and the software automatically captures the coordinate position of the click and corresponds the position to the name of the button.

Preferably, three function interfaces, namely MoveMouse, mouseLBtnDown and MouseLBtnUp in Windows API, are utilized to respectively run three functions of moving the mouse to a certain coordinate position, pressing the left mouse button and bouncing the left mouse button; all controls of the monitoring software by a human are simulated by a combination of these three functions.

Preferably, the position of the slider of the playing bar is obtained by an image detection method; moving the mouse, wherein the tip of the mouse just corresponds to the right center of the sliding block; pressing down the mouse and then moving to the right; after the movement is completed, the mouse is sprung up;

recording a slider image in advance, recording a moving area of the slider, traversing the moving area from left to right by utilizing the stored slider image, matching pixel images, and finding a matching position, wherein the coordinates of the central point of the slider at the moment are the positions of the slider;

the coordinates of the central points of the buttons are recorded in advance; when a button needs to be clicked, the mouse is moved to the center point, the mouse is pressed down, and the mouse is sprung up after waiting for 500 ms.

Preferably, traversing the video file according to the date and month, recording the current date in the memory, and sequentially traversing; finding the icon position corresponding to the current date; intercepting a sub-image of the area where the triangle symbol is located; comparing the sub-image with the stored triangular images pixel by pixel, if the total difference value is greater than a threshold value, considering that no triangular symbol exists, indicating that no video exists in the current day, and directly filtering;

recording the position of a picture in a play area; after each screen capture, the screen capture image is cut by utilizing the region coordinates of the play picture, and the subgraph is taken as a finally required sample image.

The invention also provides a system for automatically collecting samples, which comprises the following modules:

module M1: recording the positions of the buttons of the operated software platform;

module M2: simulating movement and clicking of a mouse through windows API, and repeating the module of the manual operation monitoring platform;

module M3: and automatically judging the video frames according to a part of preset algorithm.

Preferably, the position of the slider of the playing bar is obtained through an image detection system; moving the mouse, wherein the tip of the mouse just corresponds to the right center of the sliding block; pressing down the mouse and then moving to the right; after the movement is completed, the mouse is sprung up;

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, by designing an automatic acquisition flow and combining a corresponding algorithm, the batch automatic acquisition of sample pictures is realized under the condition that only a small amount of manual intervention is needed, so that the labor cost is greatly reduced, and the method has quite a great heuristic significance for the construction of a machine learning system;

2. according to the invention, manpower is released from the tedious and boring sample selection problem, so that the quality problem caused by artificial drowsiness is reduced;

3. the automatic acquisition method can work day and night, can be deployed on a plurality of computers in a copying mode, has the acquisition speed far faster than that of human beings, and improves the operation speed of projects.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flow schematic of the present invention;

FIG. 2 is a schematic diagram of a monitoring platform according to the present invention;

FIG. 3 is a diagram of an automated sample collection software parameter configuration interface in accordance with the present invention;

FIG. 4 is a diagram of a video player bar interface in accordance with the present invention;

FIG. 5 is a schematic view of a date field of the present invention;

FIG. 6 is a view of a sample image after cropping in accordance with the present invention;

FIG. 7 is a schematic representation of an invalid image according to the present invention;

fig. 8 is a schematic diagram of frame interference in the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

The specific application scenarios of the method are: on a certain network video monitoring platform, a large amount of historical videos are stored, and the longest time is 9 months. The cameras which are simultaneously accessed by the platform can reach hundreds of paths. All video clips on the platform need to be flipped frame by frame to determine whether the frame of pictures contains targets of interest to the task, so that the targets are intercepted and saved to a sample library. However, if each frame of video is manually browsed, the long working time leads to the easy occurrence of human negligence and the missing of important information; moreover, a large amount of non-target time often exists in the video, no target information exists in the picture, the video is a completely static background environment, and the identification of the nonsensical picture frames is very boring; it is therefore decided to implement the scrolling and cropping of picture frames by developing automated software instead of humans.

The invention provides a method for automatically collecting samples, which comprises the following steps:

step S1: recording the positions of the buttons of the operated software platform in advance; step S2: simulating movement and clicking of a mouse through windows API, and reproducing the steps of manually operating the monitoring platform to realize automation; step S3: and designing a part of algorithm to realize automatic judgment of the effectiveness of the video frame.

As shown in fig. 2, is a software screen shot of the monitoring platform. The camera in the software is manually selected, and a time starting point corresponding to the video is set. And then starting the automation software, clicking at the corresponding button position, and realizing actions such as video selection, video playing, progress bar dragging, video screenshot and the like, thereby achieving the purpose of automatically collecting the related images.

Buttons in software that need to be clicked include: month selection in a day, date selection, triangle symbol of whether the day contains video, player start play button, progress bar button. These locations are all manually configured prior to the software operation. The specific method comprises the following steps: screenshot and displaying a screenshot image; the user clicks each button in turn with the mouse, and the software automatically captures the coordinate position of the click and associates the position with the name of the button, as shown in fig. 3.

The three functions of moving the mouse to a certain coordinate position, pressing the left mouse button and bouncing the left mouse button can be respectively realized by utilizing the three function interfaces MoveMouse, mouseLBtnDown, mouseLBtnUp in the Windows API. By a flexible combination of these three functions, all control of the monitoring software by a person can be fully simulated.

Obtaining the position of a sliding block of the playing bar by an image detection method; recording a slider image in advance, recording a movable region of the slider, traversing the movable region from left to right by utilizing the stored slider image, matching the pixel-by-pixel image, and finding out the most suitable matching position, wherein the coordinates of the central point of the slider are the positions of the slider; FIG. 4 is a diagram of a video player bar interface; moving the mouse so that the tip of the mouse just corresponds to the exact center of the sliding block; pressing down the mouse and then moving to the right; and calculating the horizontal offset by converting the ratio of the total time of the sliding bar to 5 seconds and combining the length of the sliding bar at the new position point. After the movement is completed, the mouse is sprung.

The coordinates of the center point of each button have been recorded in advance. When a button needs to be clicked, the mouse is moved to the center point, the mouse is pressed down, 500ms is waited for, and the purpose can be achieved by bouncing the mouse.

On the interface of a certain type of monitoring software, a triangle symbol is added on each date corresponding icon in a date table and used for indicating that a video file exists on the same day; as shown in fig. 5, a date area diagram is shown. If the triangle symbol is not available, the video file is not available on the same day, and the icon on the same day does not need to be clicked at the moment, so that the video file can be skipped directly. The specific method comprises the following steps: realizing storing pictures corresponding to the triangle symbols; the relative position of the triangle symbol corresponding to each date icon is recorded; traversing the video file according to the date and month, recording the current date in the memory, and sequentially traversing; finding the icon position corresponding to the current date; intercepting a sub-image of the area where the triangle symbol is located; and comparing the sub-image with the stored triangular images pixel by pixel, and if the total difference value is larger than the threshold value, considering that no triangular symbol exists, indicating that no video exists in the current day, and directly filtering.

The position of the picture in the play area is recorded in advance. After each screenshot, the screenshot image is cropped by using the region coordinates of the play screen, and the subgraph is taken as a finally required sample image, as shown in fig. 6.

Since the monitoring software works in the network environment and is affected by factors such as server performance and network performance, the phenomenon that the monitoring software is blocked is quite common, especially when video fast forwarding, decoding and other operations are involved, and blind operation is not significant if the software is blocked. The method is adopted for early warning and interference: before the mouse clicks a button, performing screen capture and saving to obtain fig. 2; after the mouse clicks the button, the screen capturing and saving are performed similarly, so that fig. 3 is obtained; the distance between the two moments before and after clicking the mouse is 10 seconds; clicking of each button corresponds to a change in the corresponding area on the software. Such as: dragging the progress bar of the video player, so that the video time can be changed; clicking the date, then popping up a video list; clicking on the month changes the date table. And recording the position of the region where the screenshot changes after the action of each button in advance, and detecting whether the content of the region changes to judge whether the picture is stuck or not. The method for detecting whether the relevant area is changed comprises the following steps: and comparing pixel-by-pixel RGB values of the subgraphs of the region obtained through front and rear screen capturing, wherein the difference value is larger than a threshold value, and considering that the change occurs, or else, considering that the picture does not change. If the phenomenon of picture blocking is detected, early warning should be timely carried out, and the problem is eliminated by artificial interference.

On some computers with lower performance, lower CPU and memory, lower disk read-write speed and lower network speed, the response cannot be seen immediately after clicking a button. However, this situation is not the same as a complete jam, and is therefore to be treated differently: detecting the blocking phenomenon, wherein the distance between two time points before and after clicking the mouse is 10 seconds; determining whether the reaction is too slow, and setting the interval between two time points before and after the mouse click to be 5 seconds; if there is no change in the two graphs at the time interval of 5 seconds, the time interval of 5 seconds is further equal to 5 seconds, and whether there is a difference between the two graphs at the time interval of 10 seconds is analyzed. If so, the explanation is that the computer is too slow to react. If there is no difference, this indicates that it is indeed stuck.

Since most of the time in the monitoring video is not targeted, the frames are meaningless for training the depth network and belong to invalid frames; as shown in fig. 7, is one of the invalid images. However, if these video frames are screened and filtered by a person, not only is the time consuming process too long resulting in high costs, but the process is extremely noisy resulting in drowsiness of the operator and thus easy to overlook. Such background invalid frames may be filtered out by an automated method. The method comprises the following steps: intercepting an image every 5 seconds, and storing the image into a memory; the current screenshot and the previous screenshot are subjected to pixel-by-pixel difference, RGB three values of each pixel are correspondingly subjected to difference, and the sum of differences is calculated; binarizing the difference value diagram, wherein a threshold value th=20, and pixels with large difference between the two diagrams are displayed in the foreground; extracting connected domains from the difference map, filtering small-area connected domains, and filtering the connected domains with area threshold value area_th=20, namely, less than 20 pixels; counting the number of connected domains in the foreground image, and if a large-area connected domain (the area exceeds 20 pixels) exists, the moving target is considered to exist in the image, and the screenshot is effective and can be recorded in a sample set.

During the operation of the automation software, if a pop window suddenly appears, the function of the software will be seriously disturbed. For example, windows popped up suddenly, such as 360, a lucky star, a dog search input method, a game plug-in advertisement and the like, occupy a significant area in the center of the display for a long time, so that a plurality of buttons are blocked, the control right of monitoring platform software is captured, and error contents appear in a captured sampling diagram; as shown in fig. 8, one such image includes a bullet frame window. To solve this problem, the following method may be adopted: the screenshot of window areas corresponding to popup windows of various software is stored in advance; obtaining a first Jing Liantong domain by utilizing the difference between the screenshot obtained in the two adjacent times; correspondingly intercepting subgraphs from the original image for each foreground connected domain; each sub-graph is compared with the pre-stored popup window screenshot pixel by pixel, if the sum of pixel differences is smaller than a set threshold value, the sub-graph is considered as the popup window of the type, and the pre-warning is needed at the moment, the user is informed of artificial interference, and the interference of the condition is removed; if no popup window screen shots are matched, the popup window screen shots are not popup window interference, belong to effective targets and are processed according to the step 10.

The invention also provides a system for automatically collecting samples, which comprises the following modules: module M1: recording the positions of the buttons of the operated software platform; module M2: simulating movement and clicking of a mouse through windows API, and repeating the module of the manual operation monitoring platform; module M3: and automatically judging the video frames according to a part of preset algorithm.

Manually selecting a camera, and setting a time starting point corresponding to the video; then starting automation software, clicking at the corresponding button position, and carrying out video selection, video playing, progress bar dragging and video screenshot; buttons include a month selection in a date, a date selection, a triangle symbol of whether the current day contains video, a player start play button, and a progress bar button; the button is manually configured to position before the software works, screen capturing is performed, and a screen capturing image is displayed; the user clicks each button in turn with the mouse, and the software automatically captures the coordinate position of the click and corresponds the position to the name of the button.

Utilizing the MoveMouse, mouseLBtnDown and MouseLBtnUp function interfaces in Windows API to respectively run three functions of moving the mouse to a certain coordinate position, pressing the left mouse button and bouncing the left mouse button; all controls of the monitoring software by a human are simulated by a combination of these three functions. Obtaining the position of a sliding block of the playing bar through an image detection system; moving the mouse, wherein the tip of the mouse just corresponds to the right center of the sliding block; pressing down the mouse and then moving to the right; after the movement is completed, the mouse is sprung up; recording a slider image in advance, recording a moving area of the slider, traversing the moving area from left to right by utilizing the stored slider image, matching pixel images, and finding a matching position, wherein the coordinates of the central point of the slider at the moment are the positions of the slider; the coordinates of the central points of the buttons are recorded in advance; when a button needs to be clicked, the mouse is moved to the center point, the mouse is pressed down, and the mouse is sprung up after waiting for 500 ms.

Traversing the video file according to the date and month, recording the current date in the memory, and sequentially traversing; finding the icon position corresponding to the current date; intercepting a sub-image of the area where the triangle symbol is located; comparing the sub-image with the stored triangular images pixel by pixel, if the total difference value is greater than a threshold value, considering that no triangular symbol exists, indicating that no video exists in the current day, and directly filtering; recording the position of a picture in a play area; after each screen capture, the screen capture image is cut by utilizing the region coordinates of the play picture, and the subgraph is taken as a finally required sample image.

According to the invention, by designing an automatic acquisition flow and combining a corresponding algorithm, the batch automatic acquisition of sample pictures is realized under the condition that only a small amount of manual intervention is needed, so that the labor cost is greatly reduced, and the method has quite a great heuristic significance for the construction of a machine learning system; the manpower is released from the tedious and boring sample selection problem, so that the quality problem caused by artificial drowsiness is reduced; the automatic acquisition method can work day and night, can be deployed on a plurality of computers in a copying mode, has the acquisition speed far faster than that of human beings, and improves the operation speed of projects.

Those skilled in the art will appreciate that the invention provides a system and its individual devices, modules, units, etc. that can be implemented entirely by logic programming of method steps, in addition to being implemented as pure computer readable program code, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units for realizing various functions included in the system can also be regarded as structures in the hardware component; means, modules, and units for implementing the various functions may also be considered as either software modules for implementing the methods or structures within hardware components.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.

Claims

1. A method of automated sample collection, the method comprising the steps of:

step S3: automatically judging the video frame according to a part of preset algorithm;

manually selecting a camera, and setting a time starting point corresponding to the video; then starting automation software, clicking at the corresponding button position, and carrying out video selection, video playing, progress bar dragging and video screenshot;

the button is manually configured to position before the software works, and a screen capturing image is displayed; the user clicks each button with a mouse in turn, and the software automatically captures the clicked coordinate position and corresponds the position to the name of the button;

traversing the video file according to the date and month, recording the current date in the memory, and sequentially traversing; finding the icon position corresponding to the current date; intercepting a sub-image of the area where the triangle symbol is located; comparing the sub-image with the stored triangular images pixel by pixel, if the total difference value is greater than a threshold value, considering that no triangular symbol exists, indicating that no video exists in the current day, and directly filtering;

recording the position of a picture in a play area; after each screen capturing, cutting the screen capturing image by utilizing the region coordinates of the play picture, and taking the subgraph as a finally required sample image;

specifically, an image is intercepted every 5 seconds and stored in a memory; the current screenshot and the previous screenshot are subjected to pixel-by-pixel difference, RGB three values of each pixel are correspondingly subjected to difference, and the sum of differences is calculated; binarizing the difference value diagram, wherein a threshold value th=20, and pixels with large difference between the two diagrams are displayed in the foreground; extracting connected domains from the difference map, filtering small-area connected domains, and filtering the connected domains with area threshold value area_th=20, namely, less than 20 pixels; counting the number of connected domains in the foreground image, and considering that a moving target exists in the image if a large-area connected domain exists, the screenshot is effective and can be recorded in a sample set.

2. The method of automated sample collection according to claim 1, wherein three functions, namely moving a mouse to a certain coordinate position, pressing a left mouse button, and bouncing a left mouse button, are performed using three function interfaces, moveMouse, mouseLBtnDown and mousebtnup, respectively, in a Windows API; all controls of the monitoring software by a human are simulated by a combination of these three functions.

3. The automated sample collection method of claim 1, wherein the position of the slider of the playbar is obtained by means of image detection; moving the mouse, wherein the tip of the mouse just corresponds to the right center of the sliding block; pressing down the mouse and then moving to the right; after the movement is completed, the mouse is sprung up;

4. A system for automated collection of samples, the system comprising the following modules:

module M3: automatically judging the video frame according to a part of preset algorithm;

5. The automated sample collection system of claim 4, wherein the three functions of moving the mouse to a coordinate position, pressing the left mouse button, and bouncing the left mouse button are performed using three function interfaces, moveMouse, mouseLBtnDown and mousebtnup, respectively, in the Windows API; all controls of the monitoring software by a human are simulated by a combination of these three functions.

6. The automated sample collection system of claim 4, wherein the position of the slider of the playbar is obtained by a system of image detection; moving the mouse, wherein the tip of the mouse just corresponds to the right center of the sliding block; pressing down the mouse and then moving to the right; after the movement is completed, the mouse is sprung up;