CN113033311A

CN113033311A - Equipment control method based on video stream

Info

Publication number: CN113033311A
Application number: CN202110210611.9A
Authority: CN
Inventors: 林开荣; 李贵生; 卢丽煌; 王梓俊; 徐艺文
Original assignee: Fujian Hydrogen Qi Health Technology Co ltd
Current assignee: Fujian Hydrogen Qi Health Technology Co ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-25

Abstract

The invention provides a video stream-based equipment control method in the technical field of equipment control, which comprises the following steps: step S10, building and training a switch detection model on a computer; step S20, the computer acquires the panoramic video stream in the exhibition hall, identifies the switch of each device in the panoramic video stream by using the switch detection model, marks each identified switch on the panoramic video stream, and records the coordinate range of each switch on the panoramic video stream; step S30, the computer projects the panoramic video stream to the display screen; step S40, the computer recognizes the distance, the angle and the gesture of the user finger touching the display screen through the laser radar, and converts the distance and the angle into the coordinate of the user finger touching through the radar coordinate conversion model; and step S50, the computer judges the switch to be operated by the user by comparing the coordinate range and the coordinate, and controls the equipment corresponding to the switch by combining the gesture. The invention has the advantages that: the flexibility and the convenience of equipment control are greatly improved.

Description

Equipment control method based on video stream

Technical Field

The present invention relates to the field of device control technologies, and in particular, to a device control method based on a video stream.

Background

The exhibition hall takes a business scene as a platform, and the enterprise exhibition hall and the innovation idea are mutually fused by utilizing the internet technology, the digital multimedia technology, the intelligent hardware, the somatosensory interaction technology and the diversified intelligent display technology, so that the experience feeling and the mutual dynamic feeling of visitors are improved.

The exhibition hall controls multimedia equipment such as a projector, a display stand, a screen, a camera and the like to play the exhibition content through a computer and a central control system, background equipment such as lamplight, an air conditioner, a sound, a microphone and the like is controlled to create the atmosphere of the exhibition hall, and the complexity of equipment composition brings certain difficulty to the control of the central control system.

When the devices in the exhibition hall need to be controlled, the central control system needs to monitor the specific conditions of the exhibition hall in real time, then operate the switch control panels of different devices, or click the button icons corresponding to the devices through touch screens such as I PAD, and the central control system executes corresponding operations after receiving the trigger signals of the switch control panels or the touch screens. But has the following disadvantages: due to the limitation of the switch control panel and the touch screen, the central control system cannot flexibly and timely make adjustment according to the current situation of the exhibition hall, and cannot perform different regulation and control according to time periods, the touch screen only stops clicking the button icon, and the touch screen is single in function and not convenient to operate.

Therefore, how to provide a device control method based on video stream to improve flexibility and convenience of device control becomes a problem to be solved urgently.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a device control method based on video stream, so as to improve flexibility and convenience of device control.

The invention is realized by the following steps: a device control method based on video streaming, comprising the steps of:

step S10, a switch detection model is built on a computer, and a large number of switch images are obtained to train the switch detection model;

step S20, the computer acquires a panoramic video stream in the exhibition hall through the camera, the switch of each device in the panoramic video stream is identified by using the switch detection model, each identified switch is marked on the panoramic video stream, and the coordinate range of each switch on the panoramic video stream is recorded;

step S30, the computer projects the panoramic video stream marked with the switch on a display screen;

step S40, the computer recognizes the distance, the angle and the gesture of the user finger touching the display screen through the laser radar, and converts the distance and the angle into the coordinate of the user finger touching through the radar coordinate conversion model;

and step S50, the computer judges the switch to be operated by the user by comparing the coordinate range and the coordinate, and controls the equipment corresponding to the switch by combining the gesture.

Further, the step S10 specifically includes:

step S11, creating a switch detection model on the computer based on ResNet50, FPN, SPP and SubNet;

step S12, acquiring video streams containing all devices in the exhibition hall, and extracting switch images from each frame of data of the video streams;

step S13, performing data enhancement processing on each switch image to increase the sample size;

step S14, labeling each switch image after image enhancement processing, and adjusting the size and the file name of the image to obtain a data set;

and step S15, training a switch detection model by using the data set.

Further, the step S13 is specifically:

and performing data enhancement processing of rotation, translation, scaling or edge filling on each switch image to increase the sample size.

Further, the step S14 is specifically:

manually labeling each switch image after image enhancement processing by using l abe l Img, uniformly reducing the image size of each switch image to a preset resolution, and modifying the file name of each switch image based on the labeled content to obtain a data set.

Further, in the step S10, the loss function of the switch detection model adopts a smooth-L1 function and a Foca L function.

Further, in step S10, before the switch detection model is trained, model parameters at least including a training batch, an iteration number, and a learning rate need to be set.

Further, the step S40 is specifically:

the computer measures the distance s between the laser radar and the user finger through the laser radar, the included angle alpha between the connecting line of the laser radar and the user finger and the horizontal line, the known distance d between the laser radar and the display screen, the width W of the display screen and the height H of the display screen are combined, the coordinate of the user finger touching the display screen is calculated, and the number of times of clicking the user finger and the moving direction of the user finger are measured through the laser radar to identify the gesture.

Further, in step S40, the gesture at least includes a single long-press single-finger single-click, a single double-press single-finger double-click, a single triple-click single-finger triple-click, a single-press switch and moving up, a single-press switch and moving down, a double-press single-finger single-click, and a double-press double-click double-finger.

The invention has the advantages that:

1. switch of each equipment in the panoramic video stream is discerned through the switch detection model, and mark and record behind the coordinate scope, with the panoramic video stream project on the display screen, the computer passes through laser radar discernment user finger touch display screen's distance, after angle and the gesture, turn into the coordinate that user pointed with distance and angle, through judging the coordinate in which coordinate scope match corresponding switch, and then the equipment that the linkage operation corresponds, whole operation process need not additionally to use switch control panel or touch screen, can also directly look over the condition in the exhibition room through the display screen that throws the panoramic video stream, the computer can be according to the operation that the current condition of exhibition room in time adjusted gesture is correlated, perhaps distinguish the different operations of the same gesture according to the time quantum, ultimate very big flexibility and the convenience of equipment control have been promoted.

2. A switch detection model is established through ResNet50, FPN, SPP and SubNet, and the negative influence of gradient disappearance is greatly reduced due to ResNet 50; the FPN integrates the characteristics of different dimensions, so that the richness of information is improved; SPP solves the defects caused by different sizes of input images, and performs feature extraction from different angles to increase the identification precision; the SubNet integrates two functions of classification and regression; finally, the accuracy of switch identification in the panoramic video stream is greatly improved.

3. By adopting the Foca function as the loss function of the switch detection model, the Foca function is added with the weight coefficient in front of the original cross entropy function, so that the negative influence caused by 'extremely unbalanced class' is weakened, and the switch identification precision in the panoramic video stream is further improved.

Drawings

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

Fig. 1 is a flowchart of a device control method based on video streaming according to the present invention.

Fig. 2 is a hardware architecture diagram of a video stream-based device control method according to the present invention.

Fig. 3 is an architecture diagram of the switch detection model of the present invention.

Fig. 4 is a schematic diagram of lidar coordinate calculation of the present invention.

Detailed Description

The technical scheme in the embodiment of the application has the following general idea: switch of each equipment in the panorama video stream through switch detection model discernment exhibition room to after marking the switch and writing down the coordinate scope, with the panorama video stream projection on the display screen, after the computer discerned user finger touch display screen's distance, angle and gesture through laser radar, turn into the coordinate that the user pointed with distance and angle, through coordinate scope, the switch and the corresponding order of coordinate and gesture recognition user desire to operate, and then the equipment that direct linkage operation corresponds, with flexibility and the convenience of promotion equipment control.

Referring to fig. 1 to 4, the present invention needs to use a hardware architecture including a computer, a laser radar, at least one camera, and a display screen; the laser radar, the camera and the display screen are respectively connected with a computer; the laser radar and the display screen are positioned on the same horizontal plane; and the laser radar is connected with the computer through a USB interface.

The invention discloses a preferable embodiment of a device control method based on video stream, which comprises the following steps:

The step S10 specifically includes:

resnet 50: when the features are extracted, the shortcoming of the linear CNN can be overcome by using a depth residual error network Resnet50, quick connection is added in a convolution feedforward network, and the mapping result of the network is directly added into the output of an overlying layer. The data set of 7 x 64 is first input to the convolutional layers, then 3 x 3 maximum pooling layers, with four large groups of b l ocks, each 3,4,6,3 small b l ocks, with three convolutions inside each small b l ock, and this network has a single convolutional layer at the very beginning, thus (3+4+6+3) + 3+ 1-49, and a full connection layer at the very end, thus 50 layers in total. After convolution, converting the input value distribution of any neuron of each layer of neural network into standard normal distribution with the mean value of 0 and the variance of 1 through batch normalization BN; and activating the function through the ReLU to enhance the characteristic learning capability of the network.

FPN: and merging the feature graphs which are connected with different layers from bottom to top, from top to bottom and transversely by using the feature pyramid network, and then performing convolution on each merged feature layer by using a 3-by-3 convolution check to obtain feature layers P2, P3, P4 and P5. In the FPN construction process, any single-scale image is input, and then a proper multi-scale characteristic diagram is obtained in a full convolution mode to serve as output, so that each layer of the pyramid can be used for target detection of different sizes.

SPP: in the pyramid pooling process, when a picture is input, three scales (4 x 4, 2 x 2 and 1 x 1) with different sizes are divided into 21 blocks, the maximum value of each block is calculated respectively to obtain an output neuron, and finally any map is converted into 21-dimensional features with fixed sizes. The output of other dimensions can also be designed by increasing the pyramid layer number and changing the size of the division grid.

SubNet: after the feature extraction network ResNet50 and the feature fusion networks FPN, SPP, the network outputs are sent to the classification sub-network and the regression sub-network, respectively, to obtain the frame position and category information. The regression and classification sub-networks are composed of 4 convolution layers, the length and the width of the feature graph are unchanged after the feature graph passes through the network, the channel dimension of the regression network is changed into 4 anchors, and the classification channel dimension is changed into the category number.

Step S12, acquiring video streams containing all devices in the exhibition hall, and extracting switch images from each frame of data of the video streams; the video stream with the duration of 1S comprises 25-30 frames of switch images;

step S14, labeling each switch image after image enhancement processing, generating the switch image in an XML format, and adjusting the size and the file name of the image to obtain a data set;

and step S15, training a switch detection model by using the data set, and evaluating an output result by using the mAP.

The step S13 specifically includes:

The step S14 specifically includes:

In step S10, the loss function of the switch detection model is a smooth-L1 function and a Foca function.

In step S10, before the switch detection model is trained, model parameters at least including a training batch, an iteration number, and a learning rate need to be set; the training batch value is preferably 2, the iteration number value is preferably 100, an Adam mode is adopted, and the learning rate value is preferably 1 e-5. And continuously adjusting model parameters in the training process of the switch detection model to improve the training performance.

The step S40 specifically includes:

As shown in fig. 4, the point C is a point where the laser beam of the laser radar scans the user's finger, and the calculation process of the coordinates of the user's finger on the display screen includes the following steps:

1. and (3) correction: the problem that the coordinates of a user finger click touch and the coordinates of a display screen are not matched is solved. And setting the resolution of the display screen as W x H, the correction frame and the display screen to be superposed, and setting the upper left end point of the correction frame to be (0,0), so that the lower left end point, the upper right end point and the lower right end point are respectively (0, H), (W,0) and (W, H).

2. Plotting: the rightmost end of the OB vertical display screen is crossed with a point B, and the OB vertical display screen is crossed with a point D.

3. Known and assumed conditions:

the computer displays that the distance length from the laser radar to the finger C of the user is s; the angle is less than alpha;

the length of the vertical distance OA from the radar to the display screen is known as d; the distance from point a to the upper left end of the display screen is known to be a fixed value h.

4. And (3) calculating coordinates: in the right triangle COD, the abscissa of C is the length of AD and the ordinate is the length of (h-CD). Because OD ═ s ≦ cos ≦ α, AD ═ OD-OA ≦ s ≦ co ≦ α — d, and the AD length is obtained. Since CD ═ s × s i n ═ α and h-CD ═ h- (s × s i n ≦ α), the coordinates of point C are:

5. and (3) comparison: the upper left endpoint of the coordinate range has coordinates of (x)_min,y_min) The coordinate of the lower right endpoint is (x)_max,y_max) When the following conditions are satisfied, the user's finger is said to be within the coordinate range:

in step S40, the gesture at least includes a single long-press single-finger single-click, a single double-press single-finger double-click, a single triple-click single-finger triple-click, a single-finger pressing and moving up the switch, a single-finger pressing and moving down the switch, a double-finger single-click, and a double-finger double-click.

In specific implementation, the specific command represented by each gesture can be set as required, for example, a single finger long-time pressing indicates to turn on the device, a single finger double-click indicates to turn off the device, a single finger three-click indicates to pause the device, a single finger presses the switch and moves upwards to indicate to adjust the temperature/humidity, the volume, the speed, the brightness and the like, a single finger presses the switch and moves downwards to indicate to adjust the temperature/humidity, the volume, the speed, the brightness and the like, a double finger single-click indicates to play the previous video or the previous song, and a double finger double-click indicates to play the next video or the next song.

In summary, the invention has the advantages that:

Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims

1. A device control method based on video streaming, characterized by: the method comprises the following steps:

2. A video stream-based device control method according to claim 1, characterized in that: the step S10 specifically includes:

step S11, creating a switch detection model on the computer based on ResNet 0, FPN, SPP and SubNet;

and step S15, training a switch detection model by using the data set.

3. A video stream-based device control method according to claim 2, characterized in that: the step S13 specifically includes:

4. A video stream-based device control method according to claim 2, characterized in that: the step S14 specifically includes:

and manually labeling each switch image after image enhancement processing by using labelImg, uniformly reducing the image size of each switch image to a preset resolution, and modifying the file name of each switch image based on the labeled content to obtain a data set.

5. A video stream-based device control method according to claim 1, characterized in that: in step S10, the loss function of the switch detection model is a smooth-L1 function and a Focal function.

6. A video stream-based device control method according to claim 1, characterized in that: in step S10, before the switch detection model is trained, model parameters at least including a training batch, an iteration number, and a learning rate need to be set.

7. A video stream-based device control method according to claim 1, characterized in that: the step S40 specifically includes:

8. A video stream-based device control method according to claim 1, characterized in that: in step S40, the gesture at least includes a single long-press single-finger single-click, a single double-press single-finger double-click, a single triple-click single-finger triple-click, a single-finger pressing and moving up the switch, a single-finger pressing and moving down the switch, a double-finger single-click, and a double-finger double-click.