CN114040094B

CN114040094B - Preset position adjusting method and device based on cradle head camera

Info

Publication number: CN114040094B
Application number: CN202111240092.7A
Authority: CN
Inventors: 王雯雯; 冯远宏; 王江涛
Original assignee: Hisense TransTech Co Ltd
Current assignee: Hisense TransTech Co Ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2023-10-31
Anticipated expiration: 2041-10-25
Also published as: CN114040094A

Abstract

The invention discloses a preset position adjusting method and device based on a pan-tilt camera, wherein in the adjusting device, a processor is configured to identify the type of an object to be detected in a target picture to be detected; if the target to-be-detected picture comprises a non-traffic type object, determining that the scene of the target to-be-detected picture is a non-traffic scene; or if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture; if the scene type of the target to-be-detected picture is determined to be a non-traffic scene, sending an adjusting instruction comprising physical position information of the target preset position to the tripod head camera so that the tripod head camera adjusts the position of the tripod head camera to the target preset position according to the physical position information. And automatically adjusting the pan-tilt camera offset from the target preset position to the target preset position in time.

Description

Preset position adjusting method and device based on cradle head camera

Technical Field

The invention relates to the technical field of video monitoring, in particular to a preset position adjusting method and device based on a cradle head camera.

Background

With the large-scale construction of urban video monitoring systems, video-based event detection analysis application is becoming more and more widespread, and a front-end video monitoring device is required to be connected to a video analysis and detection basis to acquire real-time video streams, so that detection and analysis are performed on abnormal events in video pictures.

In related applications, a video detection area is usually limited, the basic operation is supported for common electric police and bayonet cameras in front-end video monitoring equipment, but the video monitoring camera with a preset position function is generally provided with a tripod head and a zoom lens, a video picture is not fixed, once the tripod head is offset (offset of unknown reasons or artificial movement) to cause the video picture to be adjusted, the video detection area is changed, and then the original visual perception analysis algorithm cannot normally detect and analyze, a preset position is required to be manually input and adjusted, and the video detection area is calibrated.

Disclosure of Invention

According to the preset position adjusting method and device based on the tripod head camera, the offset condition of the tripod head camera is timely obtained, so that the tripod head camera offset from the target preset position is automatically adjusted to the target preset position, and video detection is conducted on a video picture of the target preset position accurately.

According to a first aspect in an exemplary embodiment, there is provided a preset bit adjustment method based on a pan-tilt camera, the method including:

identifying the type of an object to be detected in a target picture to be detected; the target to-be-detected picture is obtained by processing the to-be-detected picture according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by the pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

if the target to-be-detected picture comprises a non-traffic type object, determining that the scene of the target to-be-detected picture is a non-traffic scene; or (b)

If the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture;

if the scene type of the target to-be-detected picture is determined to be a non-traffic scene, sending an adjusting instruction comprising physical position information of a target preset position to the tripod head camera so that the tripod head camera adjusts the position of the tripod head camera to the target preset position according to the physical position information.

In the embodiment of the application, the reference picture is obtained by decoding the target video stream shot by the tripod head camera at the target preset position, and the scene corresponding to the reference picture is the traffic scene, so that the reference picture can be used as a reference for comparison. When the picture to be detected is processed, the picture to be detected is processed according to the position of the target detection area in the reference picture relative to the reference picture, so that the consistency of the target picture to be detected and the reference picture on the detection area is ensured in the picture preprocessing process; and determining the scene of the target picture to be detected by identifying the traffic type object and the non-traffic type object contained in the target picture to be detected. Because the reference picture under the target preset position corresponds to the traffic scene, when the scene type of the target picture to be detected is determined to be a non-traffic scene, the preset position of the tripod head camera is indicated to be shifted from the target preset position, and at the moment, an adjusting instruction comprising the physical position information of the target preset position is sent to the tripod head camera, so that the tripod head camera adjusts the position of the tripod head camera to the target preset position according to the physical position information. The offset condition of the pan-tilt camera is timely known, the pan-tilt camera with the offset target preset position is automatically adjusted to the target preset position, and the reference picture meeting the video analysis and detection requirements is obtained under the target preset position, so that the condition that the video picture shot by the adjusted pan-tilt camera meets the video analysis and detection requirements can be ensured.

In some exemplary embodiments, the method further comprises:

if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a traffic scene when the traffic flow direction obtained based on the traffic type object is consistent with the traffic flow direction determined based on the reference picture;

and sending the target picture to be detected to a video analysis terminal so that the video analysis terminal applies a preset video analysis algorithm to analyze the target picture to be detected.

In the above embodiment, the target to-be-detected picture includes the traffic type object and does not include the non-traffic type object, which indicates that the pan-tilt camera may shift to another preset position just where the traffic type object is just shot, and at this time, the traffic flow direction obtained by the traffic type object needs to be determined again is consistent with the traffic flow direction determined based on the reference picture, so that it can be determined more accurately that the pan-tilt camera preset position is still the target preset position, no shift occurs, at this time, it is determined that the target to-be-detected picture or the target to-be-detected video can be used for video analysis, and then the video analysis terminal is sent to perform analysis.

In some exemplary embodiments, the target to-be-detected picture is obtained by:

cutting the picture to be detected according to the position of the target detection area in the reference picture relative to the reference picture to obtain the target picture to be detected.

According to the embodiment, the picture to be detected is cut according to the position of the target detection area in the reference picture relative to the reference picture, so that the consistency of the target picture to be detected and the reference picture on the detection area is ensured in the picture preprocessing process.

In some exemplary embodiments, the first traffic flow direction is determined by:

identifying first pixel coordinates of all traffic type objects in the target to-be-detected picture;

determining the direction of the first traffic flow according to the first pixel coordinates of the objects of each traffic type;

the second traffic flow direction is determined by:

identifying second pixel coordinates of all traffic type objects in the reference picture in the target picture to be detected;

and determining the direction of the second traffic flow according to the second pixel coordinates of each traffic type object.

According to the embodiment, the direction of the traffic flow is determined by identifying the pixel coordinates of the traffic type object in the target to-be-detected picture, so that the accuracy of the determined traffic flow direction is ensured.

In some exemplary embodiments, the first traffic flow direction and the second traffic flow direction are determined to be inconsistent by:

determining a first angle of the first traffic flow direction in the target picture to be detected and a second angle of the second traffic flow direction in the reference picture;

and if the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction and the second traffic flow direction are inconsistent.

According to the embodiment, whether the two traffic flow directions are consistent or not is determined by determining the angle difference between the angles of the traffic flow directions in the corresponding pictures, so that accuracy in determining the scene type when the target to-be-detected picture comprises the traffic type object and does not comprise the non-traffic type object is further ensured.

In some exemplary embodiments, the scene type of the target to-be-detected picture is obtained by inputting the target to-be-detected picture into a pre-trained neural network model.

In some exemplary embodiments, the backbone network of the neural network model is a lightweight network MobileNet v2; the lightweight network MobileNet v2 comprises a depth convolution product of a depth separable convolution and a 1*1 point-by-point convolution; the size of the convolution kernel of the lightweight network MobileNet v2 is 3, the basic building block is a residual bottleneck depth separable convolution, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; reLU6 was used as a nonlinear activation function.

According to a second aspect in an exemplary embodiment, there is provided a pan-tilt camera based preset position adjustment device, the device comprising a processor and a video processing unit, wherein:

the video processing unit is configured to:

acquiring a video stream to be detected, which is shot by the pan-tilt camera at a position to be detected, and decoding the video stream to be detected to obtain a picture to be detected;

the processor is configured to:

In some exemplary embodiments, the processor is further configured to:

In some exemplary embodiments, the processor is further configured to obtain the target to-be-detected picture by:

In some exemplary embodiments, the processor is further configured to determine the first traffic flow direction by:

the processor is further configured to determine a second traffic flow direction by:

In some exemplary embodiments, the processor is further configured to determine that the first traffic flow direction and the second traffic flow direction are inconsistent by:

According to a third aspect in an exemplary embodiment, there is provided a preset position adjusting device based on a pan-tilt camera, the device comprising:

the object identification module is used for identifying the type of the object to be detected in the target picture to be detected; the target to-be-detected picture is obtained by processing the to-be-detected picture according to the position of a target detection area in a reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by the pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

the first scene determining module is used for determining that the scene of the target picture to be detected is a non-traffic scene when the target picture to be detected comprises a non-traffic type object;

the second scene determining module is used for determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture, wherein the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object;

And the preset position adjusting module is used for sending an adjusting instruction comprising physical position information of a target preset position to the tripod head camera when the scene type of the target to-be-detected picture is determined to be a non-traffic scene, so that the tripod head camera can adjust the position of the tripod head camera to the target preset position according to the physical position information.

In some exemplary embodiments, the apparatus further comprises a third scenario determination module and a transmission module:

the third scene determination module is configured to: the method comprises the steps that an object of a traffic type is included in a target picture to be detected, an object of a non-traffic type is not included in the target picture to be detected, and when the traffic flow direction obtained based on the object of the traffic type is consistent with the traffic flow direction determined based on the reference picture, a scene of the target picture to be detected is determined to be a traffic scene;

the sending module is specifically configured to: and sending the target picture to be detected to a video analysis terminal so that the video analysis terminal applies a preset video analysis algorithm to analyze the target picture to be detected.

In some exemplary embodiments, the method further includes an image processing module, configured to obtain a target to-be-detected picture by:

In some exemplary embodiments, the system further comprises a traffic flow direction determination module for determining the first traffic flow direction by:

the traffic flow direction determination module is further configured to determine a second traffic flow direction by:

In some exemplary embodiments, the method further comprises a traffic flow direction comparison module for determining that the first traffic flow direction and the second traffic flow direction are inconsistent by:

According to a fourth aspect in an exemplary embodiment, a computer storage medium is provided, in which computer program instructions are stored which, when run on a computer, cause the computer to perform the pan-tilt camera based preset bit adjustment method according to the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 illustrates an application scenario diagram of a preset position adjusting method based on a pan-tilt camera according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a preset bit adjustment method based on a pan-tilt camera according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a preset bit adjustment method based on a pan-tilt camera according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a video picture shot by a calibrated pan-tilt camera according to an embodiment of the present invention;

fig. 5 schematically illustrates a structural diagram of a preset position adjusting device based on a pan-tilt camera according to an embodiment of the present invention;

fig. 6 illustrates a schematic structural diagram of a preset position adjusting device based on a pan-tilt camera according to an embodiment of the present invention.

Detailed Description

The following description will be given in detail of the technical solutions in the embodiments of the present application with reference to the accompanying drawings. Wherein, in the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and furthermore, in the description of the embodiments of the present application, "plural" means two or more than two.

The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In related applications, a video detection area is usually limited, the basic operation is supported for common electric police and bayonet cameras, but a video monitoring camera with a preset position function is generally provided with a holder and a zoom lens, a video picture is not fixed, once the holder dome camera deflects or manually adjusts the video picture, the video detection area changes, and further the original visual perception analysis algorithm cannot normally detect and analyze, the preset position is required to be manually input and adjusted, the video detection area is calibrated, and the manual adjustment is time-consuming and labor-consuming and has poor timeliness.

Therefore, the application provides a preset position adjusting method based on a pan-tilt camera, which comprises the steps of identifying the type of an object to be detected in a target picture to be detected; the target to-be-detected picture is obtained by processing the to-be-detected picture according to the position of a target detection area in the reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by the pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene; if the target to-be-detected picture comprises a non-traffic type object, determining that the scene of the target to-be-detected picture is a non-traffic scene; or if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture; if the scene type of the target to-be-detected picture is determined to be a non-traffic scene, sending an adjusting instruction comprising physical position information of the target preset position to the tripod head camera so that the tripod head camera adjusts the position of the tripod head camera to the target preset position according to the physical position information. The offset condition of the pan-tilt camera is timely known, the pan-tilt camera with the offset target preset position is automatically adjusted to the target preset position, and the reference picture meeting the video analysis and detection requirements is obtained under the target preset position, so that the condition that the video picture shot by the adjusted pan-tilt camera meets the video analysis and detection requirements can be ensured.

After the design idea of the embodiment of the present application is introduced, some simple descriptions are made below for application scenarios applicable to the technical solution of the embodiment of the present application, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present application and are not limiting. In the specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Referring to fig. 1, an application scenario diagram of a preset bit adjustment method based on a pan-tilt camera is shown. As can be seen from fig. 1, the video detection area where the video analysis and detection is originally performed is a video image of the intersection, but after the pan-tilt camera is shifted or manually adjusted, the video image changes, and only more areas such as green belts can be shot. Thus, video detection analysis cannot be performed according to the video detection algorithm configured for the original video detection area.

In order to further explain the technical solution provided by the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although embodiments of the present application provide the method operational steps shown in the following embodiments or figures, more or fewer operational steps may be included in the method based on routine or non-inventive labor. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application.

The technical scheme provided by the embodiment of the application is described below with reference to an application scenario shown in fig. 1 and a preset position adjusting method based on a pan-tilt camera shown in fig. 2.

S201, identifying the type of the object to be detected in the target image to be detected.

The target to-be-detected picture is obtained by processing the to-be-detected picture according to the position of a target detection area in the reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by the pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

s202, if the target to-be-detected picture comprises a non-traffic type object, determining that the scene of the target to-be-detected picture is a non-traffic scene.

S203, if the target to-be-detected picture comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target to-be-detected picture is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture.

S204, if the scene type of the target to-be-detected picture is determined to be a non-traffic scene, sending an adjusting instruction comprising physical position information of the target preset position to the tripod head camera so that the tripod head camera adjusts the position of the tripod head camera to the target preset position according to the physical position information.

Referring to S201, a target preset position is preset, for example, there are 100 preset positions when a pan-tilt camera leaves the factory, where, for a current intersection, a video analysis picture obtained by shooting the relevant parameter setting of the 58 th preset position can be used for video analysis and detection, and relevant parameters of a corresponding video detection algorithm are configured. Thus, the target preset bit can be the 58 th preset bit, and the physical position information of the target preset bit can be obtained through the acquisition of the factory parameters of the cradle head camera.

And secondly, under the condition that the cradle head camera is set as a target preset position, acquiring a video picture shot by the cradle head camera within a period of time, namely a target video stream, and decoding the target video stream to obtain a plurality of pictures. Since these pictures are all taken under the preset target position, the obtained pictures are labeled with areas, that is, the areas where traffic flows, people flows and the like at the intersections are located are labeled as target detection areas, and the target detection areas are taken as video detection areas. The marked picture is called a reference picture, and because the reference picture is marked by the principle, the scene corresponding to the reference picture is a traffic scene. There are also non-traffic scenes, different from traffic scenes, wherein one example of the non-traffic scenes is to include too many greenbelts or buildings, etc.

And the target picture to be detected can be obtained by processing the picture to be detected according to the position of the target detection area in the reference picture relative to the reference picture. For example, if the position of the target detection area in the reference picture relative to the reference picture is at the lower right corner of the reference picture, then the lower right corner of the picture to be detected is cut, and the target picture to be detected is obtained. The lower right hand corner is merely an example, and in actual practice, the position of the target detection area relative to the reference picture may be determined using accurate pixel coordinates.

Specifically, after the target to-be-detected picture is obtained, the type of the to-be-detected object in the target to-be-detected picture is identified, wherein the type of the to-be-detected object can comprise an object of a traffic type and an object of a non-traffic type. By way of example, traffic-type objects are, for example, vehicles, traffic lights, etc., and non-traffic-type objects are, for example, trees, greenbelts, buildings, etc.

Referring to S202, the scene of the target to-be-detected picture is determined to be a non-traffic scene by analyzing the type of the object included in the target to-be-detected picture.

In the first case, if the target to-be-detected picture includes an object of a non-traffic type, determining that the scene of the target to-be-detected picture is a non-traffic scene.

In this case, for example, if at least one of a tree, a green belt, or a building is included in the target to-be-detected picture, it may be determined that the scene of the target to-be-detected picture is a non-traffic scene. In this case, whether or not there is an object of a traffic type in the target to-be-detected picture, as long as it includes an object of a non-traffic type, it may be directly determined as a non-traffic scene.

In the second case, if the target to-be-detected picture includes the traffic type object and does not include the non-traffic type object, determining that the scene of the target to-be-detected picture is a non-traffic scene when the first traffic flow direction obtained based on the traffic type object is inconsistent with the second traffic flow direction determined based on the reference picture.

In this case, for example, the target to-be-detected picture includes a car but does not include a tree, and at this time, the scene of the target to-be-detected picture cannot be directly determined as a traffic scene, and further determination is required. And further judging that the scene of the target picture to be detected is a non-traffic scene when the first traffic flow direction obtained by the object based on the traffic type is inconsistent with the second traffic flow direction determined based on the reference picture.

The example is a case that, after the pan-tilt camera is shifted or manually adjusted, the pan-tilt camera just faces an intersection, for example, an intersection, and when the target preset position is set, vehicles coming in the north direction and vehicles going in the north direction are monitored, and when the shifted or adjusted preset position is set, vehicles coming in the east direction or vehicles going in the east direction are monitored. In this case, there is no non-traffic type object in the target to-be-detected picture, but the actual pan-tilt camera has already been shifted. At this time, for example, the direction of the first traffic flow determined based on each vehicle in the target to-be-detected picture is north, and the direction of the second traffic flow determined based on each vehicle in the reference picture is east, and the two are determined to be inconsistent, and the scene of the target to-be-detected picture is determined to be a non-traffic scene.

In one specific example, the first traffic flow direction is determined by:

identifying first pixel coordinates of all traffic type objects in the target to-be-detected picture; the direction of the first traffic flow is determined from the first pixel coordinates of the objects of the respective traffic types.

Specifically, the traffic type object is a vehicle, taking 15 vehicles in the target to-be-detected picture as an example, determining the first pixel coordinates of the 15 vehicles corresponding to the target to-be-detected picture, and then determining the direction of the first traffic flow according to the change trend of the abscissa and the change trend of the ordinate in the 15 groups of pixel coordinates.

The second traffic flow direction is determined by:

identifying second pixel coordinates of all traffic type objects in the reference picture in the target picture to be detected; the direction of the second traffic flow is determined from the second pixel coordinates of the objects of the respective traffic types.

Specifically, the traffic type object is a vehicle, taking 20 vehicles in the reference picture as an example, determining that the 20 vehicles correspond to the second pixel coordinates in the reference picture, and then determining the direction of the second traffic flow according to the change trend of the abscissa and the change trend of the ordinate in the 20 groups of pixel coordinates.

In the practical application process, the pixel coordinates of the vehicle can be thinned to the pixel coordinates of the head position and the pixel coordinates of the tail position, so that the directions of the first traffic flow and the second traffic flow can be accurately determined.

After determining the first traffic flow direction and the second traffic flow direction, it may be determined whether the first traffic flow direction and the second traffic flow direction are consistent by:

Determining a first angle of a first traffic flow direction in a target picture to be detected and a second angle of a second traffic flow direction in a reference picture; if the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction is inconsistent with the second traffic flow direction.

Wherein, if the target to-be-detected picture and the reference picture are rectangles, a first angle of the first traffic flow direction in the target to-be-detected picture is, for example, an angle relative to a long side of the target to-be-detected picture, and a second angle of the second traffic flow direction in the reference picture is, for example, an angle relative to a long side of the reference picture. And when the angle difference between the first angle and the second angle is larger than a preset angle threshold, determining that the first traffic flow direction and the second traffic flow direction are inconsistent, and indicating that the pan-tilt camera is shifted. In a specific example, the preset angle threshold is, for example, 5 °. If the angle difference between the first angle and the second angle is smaller than or equal to a preset angle threshold value, determining that the first traffic flow direction is consistent with the second traffic flow direction, and indicating that the pan-tilt camera is still at a target preset position and is not offset.

When the scene type of the target to-be-detected picture is determined to be a non-traffic scene in the above manner, the preset position of the pan-tilt camera is not the target preset position, and an adjusting instruction is sent to the pan-tilt camera, wherein the adjusting instruction comprises physical position information of the target preset position, so that the pan-tilt camera can adjust the position of the pan-tilt camera to the target preset position according to the physical position information. Thus, the cradle head camera can monitor the target preset position, and the obtained video stream can be used for video analysis and detection.

It should be noted that, the preset positions in the embodiments of the present application are all for the pan-tilt, and the rotation of the camera relative to the pan-tilt is irrelevant to the technical problem solved by the present application, and is not in the range to be considered in the embodiments of the present application.

In the process of judging the scene type of the target to-be-detected picture, there is a situation that the target to-be-detected picture comprises the traffic type object and does not comprise the non-traffic type object, and when the traffic flow direction obtained based on the traffic type object is consistent with the traffic flow direction determined based on the reference picture, the scene of the target to-be-detected picture is determined to be the traffic scene, which indicates that the tripod head camera still monitors with the target preset position, that is, the obtained video stream can be used for video analysis, and the corresponding target to-be-detected picture can be used for video analysis. At the moment, the target picture to be detected is sent to the video analysis terminal, and then the video analysis terminal applies a preset video analysis algorithm to analyze the target picture to be detected. The video analysis terminal is, for example, a video analysis device of a traffic segment. The preset video analysis algorithm may be any video analysis algorithm in the prior art, and will not be described herein.

In the practical application process, the scene type of the target to-be-detected picture is usually obtained by inputting the target to-be-detected picture into a pre-trained neural network model. In the process, firstly, training a neural network model with a set structure until a neural network model with a good training effect is obtained, and then identifying a scene of a target picture to be detected by the trained model.

Specifically, the convolutional neural network is utilized to extract the image characteristics of the detection area, finally, the scene classification result is predicted through the softmax classifier, the scene classification result comprises two types of traffic scenes and non-traffic scenes, the scene comparison result is defined to be consistent if the detection area is the traffic scene, and the scene comparison result is defined to be inconsistent if the detection area is the non-traffic scene.

Regarding algorithm type selection, a scene classification technology based on deep learning develops a design algorithm, and a lightweight network MobileNet v2 is selected as a backbone network. The design principle and network structure of the selection model will be described below.

The neural network model with the selected setting structure has the following structure and structural parameters: the backbone network is a lightweight network MobileNet v2; the lightweight network MobileNet v2 comprises a depth convolution product of depth separable convolution and 1*1 point-by-point convolution; the size of the convolution kernel of the lightweight network MobileNet v2 is 3, the basic building block is a residual bottleneck depth separable convolution, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; reLU6 was used as a nonlinear activation function.

In detail, mobileNet v2 improves on the v1 basis. The core idea in MobileNet v1 is a depth separable convolution, splitting the standard convolution into two partial convolutions: the first layer is called the depth convolution, applying a single-channel lightweight convolution kernel to each input channel; the second layer is a 1 x 1 convolution, called a point-by-point convolution, responsible for computing the linear combination of the input channels to construct new features. The convolution kernel size k=3 used in MobileNet v2 reduces the computation by a factor of 8 to 9 compared to standard convolution with only a slight loss in accuracy. The basic building block of mobilet v2 is a residual bottleneck depth separable convolution, and the architecture of mobilet v2 contains an initial full convolution with 32 convolution kernels, followed by 19 residual bottleneck layers, where ReLU6 is used as a nonlinear activation function, which is more robust to low precision computation. The network structure of MobileNet v2 is shown in table 1, where t is the multiplication factor of the input channel, c is the number of output channels, n is the number of repetitions of the module, and s is the step size at which the module is repeated for the first time. For the class 19 classification requirement of the project, the final average pooling layer is modified based on the original model, and two full-connection layers (the thickened part in table 1) are added.

The training process of the model is exemplified as follows: selecting a large number of positive samples and negative samples, wherein the positive samples are a plurality of positive pictures obtained by decoding video streams obtained by a cradle head camera obtained under a target preset position, the negative samples are a plurality of negative pictures obtained by decoding video streams obtained by a cradle head camera obtained under other preset positions except the target preset position, the plurality of positive pictures and the plurality of negative pictures form training samples, each training sample is labeled, category labels corresponding to each training sample are obtained, for example, the positive samples are labeled with labels 1, the negative samples are labeled with labels 0, the labels 1 represent traffic scenes, and the labels 0 represent non-traffic scenes. Inputting a plurality of training samples and a pre-labeled category label corresponding to each training sample into an initial MobileNet v2 model, and carrying out fusion processing on image features of the training samples through the initial MobileNet v2 model to obtain a prediction classification result corresponding to each training sample and used for representing the probability that the training sample respectively belongs to each preset category; and determining a classification loss value according to a prediction classification result corresponding to each training sample and a pre-labeled class label, and adjusting the initial MobileNet v2 model parameter according to the classification loss value until the determined classification loss value is within a preset range to obtain a trained MobileNet v2 model.

In a specific example, table 1 shows parameters of a MobileNet v2 network. The set of parameters can obtain the MobileNet v2 model with good training effect.

Table 1 parameter table of MobileNet v2 network

Input device	Operation of	t	c	n	s
						2242×3	conv2d	-	32	1	2
1122×32	bottleneck	1	16	1	1
						1122×16	bottleneck	6	24	2	2
562×24	bottleneck	6	32	3	2
						282×32	bottleneck	6	64	4	2
142×64	bottleneck	6	96	3	1
						142×96	bottleneck	6	160	3	2
72×160	bottleneck	6	320	1	1
						72×320	conv2d 1×1	-	1280	1	1
72×1280	Avgpool 7×7	-	-	1	-
						1280	fc-1000	-	1000	-	-
1000	fc-19	-	19	-	-

In order to make the technical scheme of the application more perfect, referring to fig. 3, a complete flowchart is used to describe the preset position adjusting method based on the pan-tilt camera in the application.

S301, cutting the picture to be detected according to the position of the target detection area in the reference picture relative to the reference picture, and obtaining the target picture to be detected.

S302, identifying the type of the object to be detected in the target image to be detected.

The target to-be-detected picture is obtained by processing the to-be-detected picture according to the position of a target detection area in the reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by the tripod head camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

s303, if the target to-be-detected picture comprises a non-traffic type object, determining that the scene of the target to-be-detected picture is a non-traffic scene.

S304, if the target to-be-detected picture comprises the traffic type object and does not comprise the non-traffic type object, identifying the first pixel coordinates of all traffic type objects in the target to-be-detected picture; determining the direction of a first traffic flow according to the first pixel coordinates of the objects of each traffic type; identifying second pixel coordinates of all traffic type objects in the reference picture in the target picture to be detected; the direction of the second traffic flow is determined from the second pixel coordinates of the objects of the respective traffic types.

S305, determining a first angle of a first traffic flow direction in a target picture to be detected and a second angle of a second traffic flow direction in a reference picture; if the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction is inconsistent with the second traffic flow direction.

And S306, when the first traffic flow direction obtained by the object based on the traffic type is inconsistent with the second traffic flow direction determined based on the reference picture, determining that the scene of the target picture to be detected is a non-traffic scene.

S307, if the scene type of the target to-be-detected picture is determined to be a non-traffic scene, sending an adjusting instruction comprising the physical position information of the target preset position to the tripod head camera so that the tripod head camera adjusts the position of the tripod head camera to the target preset position according to the physical position information.

And S308, if the target picture to be detected comprises the traffic type object and does not comprise the non-traffic type object, determining that the scene of the target picture to be detected is a traffic scene when the traffic flow direction obtained based on the traffic type object is consistent with the traffic flow direction determined based on the reference picture.

S309, sending the target picture to be detected to the video analysis terminal, so that the video analysis terminal applies a preset video analysis algorithm to analyze the target picture to be detected.

In summary, the embodiment of the application can determine whether the preset position of the pan-tilt camera changes only by analyzing the scene, thereby adjusting the preset position of the pan-tilt camera to the target preset position of the original set target detection area and ensuring the accuracy and the effectiveness of the subsequent video analysis and detection process.

It should be noted that, the number of the steps in the above flow and the execution sequence of the steps are not directly related, and the whole adjustment process of the preset position based on the pan-tilt camera is implemented in combination with fig. 3. In a specific example, fig. 4 shows a schematic diagram of a video frame shot by a calibrated pan-tilt camera; it can be seen from this that the calibrated picture is consistent with the reference picture and can be used for video analysis and detection of the intersection or road section.

As shown in fig. 5, based on the same inventive concept, an embodiment of the present application provides a preset position adjusting device based on a pan-tilt camera, including: an object recognition module 51, a first scene determination module 52, a second scene determination module 53, and a preset bit adjustment module 54.

The object identifying module 51 is configured to identify a type of an object to be detected in the target image to be detected; the target to-be-detected picture is obtained by processing the to-be-detected picture according to the position of a target detection area in the reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by the pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

The first scene determining module 52 is configured to determine that a scene of the target to-be-detected picture is a non-traffic scene when the target to-be-detected picture includes a non-traffic type object;

the second scene determining module 53 is configured to determine that the scene of the target to-be-detected picture is a non-traffic scene when the target to-be-detected picture includes an object of a traffic type and does not include an object of a non-traffic type and the first traffic flow direction obtained based on the object of the traffic type is inconsistent with the second traffic flow direction determined based on the reference picture;

and the preset position adjusting module 54 is configured to send an adjusting instruction including physical position information of the target preset position to the pan-tilt camera when determining that the scene type of the target to-be-detected picture is a non-traffic scene, so that the pan-tilt camera adjusts its position to the target preset position according to the physical position information.

the third scene determination module is used for: the method comprises the steps that an object of a traffic type is included in a target picture to be detected, an object of a non-traffic type is not included in the target picture to be detected, and when the traffic flow direction obtained based on the object of the traffic type is consistent with the traffic flow direction determined based on a reference picture, a scene of the target picture to be detected is determined to be a traffic scene;

The sending module is specifically used for: and sending the target picture to be detected to a video analysis terminal so that the video analysis terminal applies a preset video analysis algorithm to analyze the target picture to be detected.

and cutting the picture to be detected according to the position of the target detection area in the reference picture relative to the reference picture to obtain the target picture to be detected.

determining the direction of a first traffic flow according to the first pixel coordinates of the objects of each traffic type;

the direction of the second traffic flow is determined from the second pixel coordinates of the objects of the respective traffic types.

determining a first angle of a first traffic flow direction in a target picture to be detected and a second angle of a second traffic flow direction in a reference picture;

if the angle difference between the first angle and the second angle is larger than a preset angle threshold value, determining that the first traffic flow direction is inconsistent with the second traffic flow direction.

In some exemplary embodiments, the backbone network of the neural network model is a lightweight network MobileNet v2; the lightweight network MobileNet v2 comprises a depth convolution product of depth separable convolution and 1*1 point-by-point convolution; the size of the convolution kernel of the lightweight network MobileNet v2 is 3, the basic building block is a residual bottleneck depth separable convolution, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; reLU6 was used as a nonlinear activation function.

Since the device is the device in the method according to the embodiment of the present invention, and the principle of the device for solving the problem is similar to that of the method, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.

As shown in fig. 6, based on the same inventive concept, an embodiment of the present invention provides a preset position adjusting apparatus based on a pan-tilt camera, the apparatus including: a processor 601 and a video processing unit 602.

The video processing unit 602 is configured to:

obtaining a video stream to be detected, which is obtained by shooting a holder camera at a position to be detected, and decoding the video stream to be detected to obtain a picture to be detected;

the processor 601 is configured to:

identifying the type of an object to be detected in a target picture to be detected; the target to-be-detected picture is obtained by processing the to-be-detected picture according to the position of a target detection area in the reference picture relative to the reference picture, the reference picture is obtained by decoding a target video stream shot by the pan-tilt camera at a target preset position, and a scene corresponding to the reference picture is a traffic scene;

If the scene type of the target to-be-detected picture is determined to be a non-traffic scene, sending an adjusting instruction comprising physical position information of the target preset position to the tripod head camera so that the tripod head camera adjusts the position of the tripod head camera to the target preset position according to the physical position information.

In some exemplary embodiments, the processor 601 is further configured to:

In some exemplary embodiments, the processor 601 is further configured to obtain the target to-be-detected picture by:

In some exemplary embodiments, the processor 601 is further configured to determine the first traffic flow direction by:

In some exemplary embodiments, the processor 601 is further configured to determine that the first traffic flow direction and the second traffic flow direction are inconsistent by:

The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores computer program instructions, and when the instructions run on a computer, the computer is caused to execute the steps of the network allocation method of the electronic home equipment.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The preset position adjusting device based on the pan-tilt camera is characterized by comprising a video processing unit and a processor:

the video processing unit is configured to:

the processor is configured to:

if the scene type of the target to-be-detected picture is determined to be a non-traffic scene, sending an adjusting instruction comprising physical position information of a target preset position to the pan-tilt camera so that the pan-tilt camera adjusts the position of the pan-tilt camera to the target preset position according to the physical position information;

the processor is further configured to:

2. The device of claim 1, wherein the processor is further configured to obtain the target picture to be detected by:

3. The device of claim 1, wherein the processor is further configured to determine the first traffic flow direction by:

4. The device of claim 3, wherein the processor is further configured to determine that the first traffic flow direction and the second traffic flow direction are not coincident by:

5. The apparatus according to any one of claims 1 to 4, wherein the scene type of the target to-be-detected picture is obtained by inputting the target to-be-detected picture into a pre-trained neural network model.

6. The apparatus of claim 5, wherein the backbone network of the neural network model is a lightweight network MobileNet v2; the lightweight network MobileNet v2 comprises a depth convolution product of a depth separable convolution and a 1*1 point-by-point convolution; the size of the convolution kernel of the lightweight network MobileNet v2 is 3, the basic building block is a residual bottleneck depth separable convolution, and the architecture comprises an initial full convolution with 32 convolution kernels and 19 residual bottleneck layers; reLU6 was used as a nonlinear activation function.

7. A preset position adjusting method based on a cradle head camera is characterized by comprising the following steps:

The method further comprises the steps of:

8. The method according to claim 7, wherein the target picture to be detected is obtained by: