CN118537773A

CN118537773A - Video detection method and device for operation site, electronic equipment and storage medium

Info

Publication number: CN118537773A
Application number: CN202410617616.7A
Authority: CN
Inventors: 吴志加; 冯程; 谢丙辉; 李琳; 邓惠华; 陈铁森; 王媚
Original assignee: Guangdong Power Grid Co Ltd; Meizhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Meizhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2024-05-17
Filing date: 2024-05-17
Publication date: 2024-08-23

Abstract

The invention discloses a video detection method and device for a job site, electronic equipment and a storage medium. Acquiring field video data of a plurality of video sources of an operation field; the field video data comprise panoramic video data of the operation field collected by at least one first shooting device and local video data of the operation field collected by at least one second shooting device; generating a target video based on field video data of a plurality of video sources, determining field operation information of an operator based on the target video, and performing safety detection on the field operation information based on preset safety detection conditions; under the condition that the operation risk of the field operation information is detected, risk prompt information is generated, and the risk prompt information is displayed. According to the technical scheme, the effects of performing video acquisition on the operation site based on the plurality of video sources and performing safety detection and risk perception on the operation site based on the site video data of the plurality of video sources are achieved.

Description

Video detection method and device for operation site, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for detecting video on a job site, an electronic device, and a storage medium.

Background

In the power emergency operation scene, an operator is required to carry out emergency operation on an operation site so as to enable the operation site to quickly return to normal.

However, during emergency operation of an operator on the operation site, the power terminal background may not be capable of detecting the operation behavior of the operator on the operation site in real time, so that safety detection and risk perception of the operation site may not be performed in time.

Disclosure of Invention

The invention provides a video detection method, a video detection device, electronic equipment and a storage medium for a working site, which are used for realizing the effects of video acquisition of the working site based on a plurality of video sources and safety detection and risk perception of the working site based on field video data of the plurality of video sources.

According to an aspect of the present invention, there is provided a video detection method of an operation site, the method comprising:

Acquiring field video data of a plurality of video sources of an operation field; the field video data comprise panoramic video data of an operation field acquired by at least one first shooting device and local video data of the operation field acquired by at least one second shooting device; the second shooting device comprises a zooming shooting device arranged on a safety helmet worn by an operator;

Generating a target video based on the field video data of a plurality of video sources, determining field operation information of the operator based on the target video, and performing safety detection on the field operation information based on preset safety detection conditions;

And under the condition that the operation risk of the field operation information is detected, generating risk prompt information and displaying the risk prompt information.

According to another aspect of the present invention, there is provided a video inspection apparatus for an operation site, the apparatus comprising:

The video data acquisition module is used for acquiring field video data of a plurality of video sources of the operation field; the field video data comprise panoramic video data of an operation field acquired by at least one first shooting device and local video data of the operation field acquired by at least one second shooting device; the second shooting device comprises a zooming shooting device arranged on a safety helmet worn by an operator;

the operation safety detection module is used for generating a target video based on the field video data of a plurality of video sources, determining field operation information of the operator based on the target video, and carrying out safety detection on the field operation information based on preset safety detection conditions;

the operation risk prompting module is used for generating risk prompting information and displaying the risk prompting information under the condition that the operation risk of the field operation information is detected.

According to another aspect of the present invention, there is provided an electronic apparatus including:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the video inspection method of the job site according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a video detection method of a job site according to any one of the embodiments of the present invention.

According to the technical scheme, the field video data of a plurality of video sources of the operation field are obtained, wherein the field video data comprise panoramic video data of the operation field collected by at least one first shooting device and local video data of the operation field collected by at least one second shooting device, then a target video is generated based on the field video data of the plurality of video sources, field operation information of operators is determined based on the target video, and safety detection is carried out on the field operation information based on preset safety detection conditions. Further, under the condition that the operation risk of the field operation information is detected, risk prompt information is generated and displayed, the problem that the operation behaviors of operators in the operation field cannot be detected in real time in the related technology, and further the safety detection and risk perception of the operation field cannot be timely carried out is solved, and the effects of carrying out video acquisition on the operation field based on a plurality of video sources and carrying out safety detection and risk perception on the operation field based on field video data of the plurality of video sources are achieved. And the effect of video detection on the operation details of the operation site is realized by collecting the local video data, so that the efficiency of accuracy of safety detection and risk perception is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a video detection method for a job site according to a first embodiment of the present invention;

fig. 2 is a flowchart of a video detection method for a job site according to a second embodiment of the present invention;

Fig. 3 is a flowchart of a video detection method for a job site according to a third embodiment of the present invention;

Fig. 4 is a schematic structural diagram of a video detection device for an operation site according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device implementing a video detection method of a job site according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a video detection method for a job site according to an embodiment of the present invention, where the method may be performed by a video detection device for a job site, which may be implemented in hardware and/or software, and the video detection device for a job site may be configured in a terminal and/or a server, where the video detection device for a job site is applicable to a case of performing security detection on a job site based on field video data of the job site. As shown in fig. 1, the method includes:

s110, acquiring field video data of a plurality of video sources of a working field.

The work site is understood to be a site where work is performed on the work equipment. In this embodiment, the job site may be an electric job site. The electric power operation field is an important link in an electric power system, and mainly relates to operations such as installation, debugging, overhaul, maintenance and the like of high-voltage equipment. Illustratively, the worksite may be a power pole worksite. The video source may be a source from which the video signal is acquired. Alternatively, the video source may be a camera, video recorder, compact disc, internet, television broadcast or other type of device. It should be noted that each video source may correspondingly acquire video pictures of the job site under the same or different shooting angles. Also, different video sources may provide different video quality and resolution, which is not particularly limited in this embodiment. Live video data may be understood as data obtained from video acquisition of a job site based on the corresponding video source. The live video data may include panoramic video data of the work site acquired by the at least one first camera and local video data of the work site acquired by the at least one second camera. In this embodiment, the first photographing device may be a device that supports panoramic photographing of an arbitrary region. The first photographing device may include a panorama photographing device disposed on a preset area. The first camera may be used to acquire a wide field of view. The first photographing device may include a panning device. The preset area may be any area in the job site where a panoramic picture of the job site can be photographed. Alternatively, the preset area may be on a tower pole or in a tunnel, etc. For example, assuming that the operation site is an electric utility pole operation site, the panoramic photographing apparatus included in the first photographing apparatus may be disposed on a pole of the operation site. The panoramic video data may be video data acquired by the job site in a shooting field corresponding to the first shooting device. The second photographing device may be a device that supports local photographing of an arbitrary region. The second camera may include a zoom camera disposed on a helmet worn by the worker. The second camera may be used to capture details. The number of the second photographing devices in the work site may be equal to or smaller than the number of the workers in the work site. The operator is the person who performs the operation on the operation site. The number of operators may be one or more, and this embodiment is not particularly limited. It will be appreciated that for safety reasons, the operator needs to wear a helmet before entering the job site for the job.

In practical application, in a power emergency operation scene, due to adverse factors of the operation environment and safety conditions, a signal blind area or a signal weak area often exists. In general, during operation, panoramic detection and risk perception are performed on an operation site under the condition of weak signals, and emergency communication private network products are generally adopted for panoramic detection on the operation site. However, the deployment and use of these emergency communication private network products is expensive and cannot communicate with the public network terminals, resulting in a complex data communication process, and the acquired video data may not be transmitted to the background terminal in real time. Or some products are single-frequency-band products provided by a single operator, the data rate of the products cannot meet the requirements of video stream transmission and engineering application, and the requirements of real-time transmission of panoramic detection data streams and real-time visualization of a background on an operation site in the operation process cannot be met.

In view of the above, in the present embodiment, a zoom camera is provided on a helmet worn by an operator. Further, during the operation, the operation site can be photographed based on the provided zoom photographing device, and the local video data of the operation site can be obtained. The zoom camera may be one capable of varying focal length in certain range to obtain different wide and narrow angles of view, different sizes of images and different scene ranges. The zoom photographing apparatus can change the photographing range by varying the focal length without changing the photographing distance. The local video data may be video data acquired by the job site under a shooting view angle corresponding to the corresponding second shooting device. It is understood that the local video data is data obtained by photographing a work site at a close distance, and therefore, the work details of the work site can be detected based on the local video data. The panoramic video data is data obtained by performing omnidirectional shooting with respect to the operation site, and therefore, the operation site can be detected in an omnidirectional manner based on the panoramic shooting data.

As an alternative implementation manner of this embodiment, the first photographing device may be provided in advance at the operation site, and the second photographing device may be provided on a helmet worn by an operator. Furthermore, in the process that the operator works on the operation site, panoramic picture collection can be carried out on the operation site based on the first shooting device, and panoramic video data of the operation site can be obtained. And, can carry out local picture collection to the job site based on the second camera that sets up on the operating personnel helmet to obtain the local video data of job site that at least one second camera gathered. Further, the collected panoramic video data and the at least one local video data may be transmitted to the background terminal device, so that the background terminal device performs omnibearing and local safety monitoring on the operation site based on the received video data.

It should be noted that, the manner of transmitting the collected panoramic video data and the local video data to the background terminal device may include various manners. Alternatively, the panoramic video data and the local video data may be directly transmitted to the background terminal device; or the panoramic video data and the local video data may be transmitted to the receiving apparatus. Further, the panoramic video data and the local video data are transmitted to the background terminal device based on the receiving device. Wherein the receiving means may be an emergency signal enhancing means.

S120, generating a target video based on field video data of a plurality of video sources, determining field operation information of operators based on the target video, and performing safety detection on the field operation information based on preset safety detection conditions.

The target video may be understood as a video generated based on field video data of a plurality of video sources, which satisfies the requirement of field video detection. The field operation information may be used to characterize a particular job situation of a worker in the job site. Alternatively, the field operation information may include information such as a field operation position and a field operation behavior. The safety detection conditions may be used to detect safety risks present in the job site. Illustratively, the safety detection condition may be a preset job site safety specification.

In this embodiment, the acquired live video data includes panoramic video data of the work site acquired based on the first photographing device and local video data of the work site acquired based on the second photographing device, each of the panoramic video data and the local video data being composed of a plurality of video frames. In order to more clearly and intuitively observe local pictures of the operation site under each shooting view angle of each video frame, after the field video data of a plurality of video sources are obtained, video fusion processing can be performed on the field video data of the plurality of video sources according to the video frames, and then the video obtained after fusion can be used as a target video.

As an optional implementation manner of this embodiment, the video frames to be fused in the live video data of each video source may be determined separately, and the key points and descriptors in each video frame to be fused may be determined. Further, target point pairs in a plurality of video frames to be fused can be determined according to the plurality of key points and the descriptors. Further, the target transformation matrix may be determined from the keypoints in the target point pair. Furthermore, the on-site video data of a plurality of video sources can be aligned based on the target transformation matrix, and the aligned on-site video data are fused to obtain the target video.

As another alternative implementation of this embodiment, the panoramic video data may be regarded as video data obtained by stitching a plurality of local video data, that is, the panoramic video data may be divided into a plurality of field areas according to the local video data, and the video data of each field area may correspond to the corresponding local video data. The first position information and the first display size corresponding to the second photographing device may be determined. Thereafter, the panoramic video data may be divided into a plurality of field of view areas according to the first location information. Further, second position information and a second display size corresponding to each field of view region may be determined. Furthermore, the panoramic video data and the local video data may be fused according to the first position information, the first size information, the second position information, and the second display size to obtain the target video.

In this embodiment, in order to improve the video quality of the final target video, so that the target video can more clearly represent the detailed information of the working condition in the working site, before generating the target video based on the site video data of the multiple video sources, the method further includes: the live video data of at least one video source is preprocessed.

Wherein the preprocessing includes distortion removal processing and/or image correction processing. It will be appreciated that distortion is typically due to the geometry of the optical system such that the shape or position of the object in the image is distorted or deformed as compared to the object in the real world. The distortion removal process is to eliminate or mitigate such image distortion due to the optical system, thereby restoring the authenticity of the shape and position of the object in the image. The image correction processing refers to a restorative processing performed on a distorted image, which aims to eliminate or mitigate image distortion due to various causes (such as aberration, distortion, limited bandwidth of an imaging system, etc.) to restore the original appearance of the image.

As an optional implementation manner of this embodiment, after the live video data corresponding to the plurality of video sources is acquired, the live video data corresponding to at least one video source having distortion and/or distortion problem in the live video data corresponding to the plurality of video sources may be preprocessed. Further, the target video may be generated based on the preprocessed live video data.

In the present embodiment, in the process of panoramic photographing the work site based on the first photographing device and partial photographing the work site based on the second photographing device, there may be a case where there is an overlapping photographing region of the photographed panoramic video data and the partial video data. If the target video continues to be generated in this case, the video quality and the video sharpness of the target video may be affected. Based on this, before generating the target video based on live video data of the plurality of video sources, further comprising: and determining overlapping shooting areas of the plurality of live video data according to the position information and the shooting visual fields of each first shooting device and each second shooting device, and updating the live video data based on the overlapping shooting areas.

The position information can be understood as the setting position of the photographing device. Alternatively, the position information of the first photographing device may be a position where the first photographing device is set in the work site. The position information of the second camera may be the position of the corresponding helmet in the work site and the position of the camera set on the helmet. The photographing field of view can be understood as being based on the range of the area that can be observed by the photographing device. The overlapping shot region can be understood as a region where a plurality of different video data overlap each other in corresponding video frames under the same time stamp.

As an alternative implementation of the present embodiment, the position information and the shooting field of view of the first shooting device and each second shooting device may be acquired. Further, overlapping shot areas of a plurality of live video data may be determined based on the positional information and the shot field of view. Then, the live video data can be smoothly transited based on the overlapped shooting areas, and artifacts and discontinuous data caused by the overlapped shooting areas in the live video data are removed so as to update the live video data.

In this embodiment, after the target video is obtained, the operation detection may be performed on the operator included in the target video according to the preset operation detection manner, and the field operation information of the operator may be obtained. The preset operation detection mode may be any mode capable of detecting the operation of a person in the video. Optionally, the preset operation detection mode may include determining on-site operation information of the operator based on the operation detection model.

Optionally, determining the field operation information of the operator based on the target video includes: and determining a plurality of continuous second video frames containing the operators in the target video, and respectively inputting each second video frame into the operation detection model to obtain the field operation information of the operators.

In this embodiment, the target video may include a plurality of video frames, and the video frames may include a video frame including an operator and a video frame not including an operator, and the video frame including an operator may be used as the second video frame. The operational detection model may be a pre-trained neural network model for detecting operational behavior of a target object in a video frame. The operation detection model may be trained based on a sample video frame containing the operator and an operation information tag corresponding to the sample video frame.

As an optional implementation manner of this embodiment, the target video may be processed based on a preset object detection algorithm, so as to determine a plurality of continuous second video frames including the operator from a plurality of video frames included in the target video. Further, each second video frame may be input into the operation detection model separately. Further, the field operation information of the operator can be obtained.

In this embodiment, after the field operation information of the operator is obtained, the field operation information may be subjected to safety detection based on a preset safety detection condition to determine whether the operation performed by the operator in the operation field meets the safety specification.

Optionally, performing safety detection on the field operation based on a preset safety detection condition includes: inputting the field operation information into the operation recognition model to obtain an operation detection result corresponding to the field operation information,

The operation recognition model may be a neural network model trained in advance, which is used to recognize operation behaviors based on operation information and perform security detection on the operation behaviors. The operation recognition model may be trained based on the sample operation information and the operation tag corresponding to the sample operation information. The operation labels may include labels corresponding to canonical operations and labels corresponding to risk operations. The operation detection result comprises the existence of operation risks and standard operation. The operational risk may be used to indicate the degree of risk of the corresponding operation under the corresponding canonical criteria. The canonical operation may be used to indicate that the corresponding operation meets the security detection condition.

As an alternative implementation of this embodiment, field operation information may be input into the operation recognition model. Further, it is possible to identify and safely detect an operation included in the field operation information based on the operation identification model, and output an operation detection result corresponding to the field operation information.

It should be noted that, the video frame displayed in the target video is composed of video frames of a plurality of local video data, that is, a plurality of local video data may be simultaneously displayed in the video frames of the target video. After generating the target video based on live video data of the plurality of video sources, further comprising: dividing a video frame in the target video into a plurality of second field of view areas; and when the view triggering operation for the second visual field area is received and the triggered second visual field area comprises an operator, determining a safety helmet worn by the operator, and magnifying and displaying the field video data acquired by the safety helmet.

The second field of view region may be a video display region corresponding to different shooting angles in the video frame. Each of the second field of view regions may correspond to a different photographing angle of view. Viewing the trigger operation may be understood as a trigger operation for viewing video data displayed in the corresponding field of view after triggering. The view trigger operation may be any operation that acts on the second field of view region. Optionally, a click operation for the second field of view region. Wherein the click operation may be a single click operation or a multiple click operation (e.g., a double click operation, etc.). In order to facilitate the user operation, the viewing trigger operation for the second field of view area may further be determined to be detected when the pause time of the user on the second field of view area through the input device or the touch point is detected to reach the preset time.

As an optional implementation manner of this embodiment, the video frames in the target video may be divided into a plurality of second field areas according to the shooting viewing angle, and the video frames of the target video may be displayed on the interface in a form of the plurality of second field areas. Further, in the case that a view trigger operation for an arbitrary second visual field area is detected, the triggered second visual field area is determined in response to the view trigger operation, and video data corresponding to the triggered second visual field area is detected according to an object detection algorithm, so that whether the triggered second visual field area includes an operator or not is determined. Further, in the case where it is determined that the triggered second field of view includes an operator, predetermined headgear personnel configuration information may be determined to determine a headgear worn by the operator. Furthermore, the field video data collected by the safety helmet can be obtained, and the obtained field video data is amplified and displayed. The advantages of this arrangement are that: the method is convenient for a user to browse the operation details in the process of browsing the target video, and improves the intelligent degree of video detection of the operation site.

S130, under the condition that the operation risk of the field operation information is detected, risk prompt information is generated, and the risk prompt information is displayed.

The risk prompt information may be used to indicate that an operation risk exists at the job site. Risk cues may be understood as information characterizing the operational risk specific situation present at the job site. Optionally, the risk prompting information may include information such as location information of an operator at risk of operation, a specific risk operation, and/or a risk index of the risk operation.

In the present embodiment, in the case where the field operation information is obtained, the field operation information may be detected. Under the condition that the operation risk of the field operation information is detected, risk prompt information can be generated according to the field operation information with the operation risk, and the generated risk prompt information is displayed. The risk prompt information can be displayed on a display interface of any terminal equipment associated with the operation. Optionally, the risk prompt information can be displayed on a terminal equipment interface of the background system; or the risk prompt information can be displayed on the interface of the terminal equipment of the related staff, etc.

Example two

Fig. 2 is a flowchart of a video detection method for a job site according to a second embodiment of the present invention, where, based on the foregoing embodiment, generating a target video based on field video data of a plurality of video sources includes: respectively determining first video frames to be fused in the field video data of each video source, and respectively determining key points and descriptors in the first video frames; determining target point pairs in the first video frames according to the key points and the descriptors, and determining a target transformation matrix according to the key points in the target point pairs; and aligning the field video data of the video sources based on the target transformation matrix, and fusing the aligned field video data to obtain the target video. The specific implementation manner can be seen in the technical scheme of the embodiment. Wherein, the technical terms identical or similar to those of the above embodiments are not repeated herein.

As shown in fig. 2, the method includes:

s210, acquiring field video data of a plurality of video sources of a working field.

S220, respectively determining first video frames to be fused in the field video data of each video source, and respectively determining key points and descriptors in the first video frames.

The first video frame may be understood as a video frame to be subjected to video data fusion in live video data. The first video frame may be any video frame in live video data. It should be noted that, the first video frame to be fused may be all video frames included in the live video data; or may be partial video frames of all video frames included in the live video data, which may be video frames capable of characterizing key features of the live video data. The keypoints may be used to characterize key features in the corresponding video frames. The key points may be feature points of any object included in the video frame. Alternatively, the key points may include head feature points, limb feature points of an operator in the video frame, feature points of a static object (e.g., a building, a tree, a mountain, or the like) included in the video frame, and the like. The descriptor can be used for expressing drawing information of the ground feature displayed and identified on the map. The descriptors may help the user understand the meaning and characteristics of the various map elements on the map.

In this embodiment, the live video data of each video source may be processed according to a video frame determining algorithm, and a first video frame to be fused in the live video data of each video source may be obtained. Further, each first video frame may be processed according to a feature detection algorithm, and the key point and the descriptor in each video frame may be obtained.

S230, determining target point pairs in the first video frames according to the key points and the descriptors, and determining a target transformation matrix according to the key points in the target point pairs.

Wherein the target point pair comprises at least two key points corresponding to the same position in the job site. The object transformation matrix may be used to indicate a particular implementation of mapping a set of video frames or images from one coordinate system to another. In other words, the target transformation matrix may be used to align one video frame onto another video frame. The target transformation matrix may be a two-dimensional or three-dimensional matrix, the elements of which define how each pixel point in the video frame is transformed.

As an optional implementation manner of this embodiment, after obtaining the plurality of keypoints and descriptors, the plurality of keypoints and descriptors may be processed based on a matching algorithm, so as to determine, based on the matching algorithm and the descriptors, a plurality of groups including at least two keypoints corresponding to the same location in the job site from the plurality of keypoints. Further, pairs of target points in the plurality of first videos can be obtained. Further, the target point pair may be processed according to a least squares algorithm to determine a target transformation matrix from the keypoints in the target point pair.

S240, aligning field video data of a plurality of video sources based on a target transformation matrix, fusing the aligned field video data to obtain a target video, determining field operation information of operators based on the target video, and performing safety detection on the field operation information based on preset safety detection conditions.

In this embodiment, after the target transformation matrix is obtained, for the live video data of multiple video sources, the live video data of the video source may be subjected to preset video frame transformation processing based on the target transformation matrix, so as to achieve alignment with the live video data of another video source. Further, after all of the live video data of the plurality of video sources are aligned, the aligned live video data can be obtained. The preset video frame transformation may include various transformation modes, and optionally affine transformation or homography transformation and the like.

Furthermore, the aligned field video data can be fused according to a video fusion algorithm. Further, the fused live video data may be used as a target video. Then, the field operation information of the operator can be determined based on the target video, and the field operation information can be safely detected based on preset safety detection conditions.

S250, under the condition that the operation risk of the field operation information is detected, risk prompt information is generated, and the risk prompt information is displayed.

According to the technical scheme, the first video frames to be fused in the field video data of each video source are respectively determined, and key points and descriptors in the first video frames are respectively determined. And then determining target point pairs in the first video frames according to the key points and the descriptors, and determining a target transformation matrix according to the key points in the target point pairs. And then, aligning the field video data of the plurality of video sources based on the target transformation matrix, and fusing the aligned field video data to obtain a target video, thereby realizing the effects of aligning the field video data of the plurality of video sources based on key points and descriptors in video frames and fusing the aligned field video data, and further improving the matching degree among the plurality of field video data in the target video.

Example III

Fig. 3 is a flowchart of a video detection method for a job site according to a third embodiment of the present invention, where generating a target video based on field video data of a plurality of video sources includes: respectively determining first video frames to be fused in the field video data of each video source, and respectively determining key points and descriptors in the first video frames; determining target point pairs in the first video frames according to the key points and the descriptors, and determining a target transformation matrix according to the key points in the target point pairs; and aligning the field video data of the video sources based on the target transformation matrix, and fusing the aligned field video data to obtain the target video. The specific implementation manner can be seen in the technical scheme of the embodiment. Wherein, the technical terms identical or similar to those of the above embodiments are not repeated herein.

As shown in fig. 3, the method includes:

S310, acquiring field video data of a plurality of video sources of a working field; the field video data comprise panoramic video data of the operation field acquired by at least one first shooting device and local video data of the operation field acquired by at least one second shooting device.

S320, determining first position information and a first display size corresponding to the safety helmet, and dividing the panoramic video data into a plurality of first visual field areas according to the first position information.

Wherein the first location information may be used to indicate a specific location of the helmet in the work site. The first location information may be determined based on coordinates of the helmet in a coordinate system corresponding to the job site. The first position information may be represented based on planar coordinates or spherical coordinates. The first display size may be used to indicate a display size of any object in the corresponding local video data. The first display size may include one or more of a display length, a display width, a display radius, and a display area. The first display size may be determined based on photographing parameters set by the zoom photographing apparatus. The first field of view region may be understood as a partial field of view region in the panoramic field of view region corresponding to the panoramic video data.

In this embodiment, the position information of the helmet in the operation site may be determined, and the first position information may be obtained. And determining a shooting visual field according to shooting parameters of the zooming shooting device arranged on the safety helmet so as to obtain a first display size corresponding to the safety helmet. Further, the panoramic video data may be divided into a plurality of first field of view areas according to the first location information.

As an optional implementation manner of this embodiment, after the first position information of the safety helmet is obtained, a center point of a panoramic view field corresponding to the panoramic video data may be determined according to the first position information. Further, the panoramic video data may be divided into regions according to the center point and the preset region size, and a plurality of first field regions may be obtained, and at this time, a region center point of a first field region located at a center position among the plurality of first field regions may correspond to the first position information. The preset area size may be understood as a dividing size according to which the area is divided. The preset zone size may be any zone size.

As another optional implementation manner of this embodiment, after the first position information of the safety helmet is obtained, a center point of a panoramic view field corresponding to the panoramic video data may be determined according to the first position information. Further, the panoramic video data may be divided into a plurality of first field of view regions according to the field of view size corresponding to the center point and the local imaging device.

It should be noted that, the area division of the panoramic video data may be performed by a grid division or a sector division, which is not particularly limited in this embodiment.

S330, second position information and second display sizes corresponding to each first visual field area are respectively determined.

The second location information may be used to characterize a specific location of the first field of view region in a panoramic field of view region corresponding to the panoramic video data. The second location information may be determined based on coordinates of one or more preset keypoints (e.g., region center point, region vertex, or any point in the region, etc.) in the first field of view region in the panoramic field of view region. The second display size may be used to characterize the display size of any object in the panoramic video data. The second display size may be determined according to photographing parameters set by the first photographing device that collects panoramic video data.

In the present embodiment, after obtaining the plurality of first field of view regions, the position information and the size information of the first key point of the first field of view region in the panoramic field of view region may be determined for the plurality of first field of view regions. Further, the second position information and the second size information corresponding to the first visual field area may be used.

S340, according to the first position information, the first display size, the second position information and the second display size, the panoramic video data and the local video data are fused to obtain a target video, the field operation information of the operator is determined based on the target video, and the field operation information is subjected to safety detection based on preset safety detection conditions.

In this embodiment, the first display size corresponds to a photographing parameter of a zoom photographing device provided on the helmet. The second display size corresponds to photographing parameters of the first photographing device for photographing the panoramic video. In other words, the display size of the target object in the partial video data is larger than the display size of the target object in the panoramic video data for the same target object.

As an optional implementation manner of this embodiment, in a case where the first field of view is divided based on a preset area size, a video frame size corresponding to the local video data may be determined, and a ratio between the preset area size and the video frame size may be determined, and the ratio may be used as the area scaling ratio. Then, the local video data can be subjected to region scaling processing based on the region scaling ratio, and scaled local video data is obtained. Then, a display scaling ratio may be determined based on the first display size and the second display size, and display scaling processing may be performed on the scaled local video data according to the display scaling ratio, so that a display size corresponding to the local video data obtained by the processing corresponds to the second display size. Furthermore, the local video data can be updated to the first field of view area of the corresponding position in the panoramic video data based on the first position information and the second position information, so that fusion of the local video data and the panoramic video data is completed, and the target video is obtained.

As another optional implementation manner of this embodiment, in a case where the first field of view is divided based on the field of view size, the display scaling may be determined based on the first display size and the second display size, and the local video data may be subjected to display scaling processing according to the display scaling, so that the display size corresponding to the local video data obtained by processing corresponds to the second display size. Furthermore, the local video data can be updated to the first field of view area of the corresponding position in the panoramic video data based on the first position information and the second position information, so that fusion of the local video data and the panoramic video data is completed, and the target video is obtained.

Further, on-site operation information of the operator is determined based on the target video, and safety detection is performed on the on-site operation information based on preset safety detection conditions.

S350, under the condition that the operation risk of the field operation information is detected, risk prompt information is generated, and the risk prompt information is displayed.

According to the technical scheme, panoramic video data is divided into a plurality of first visual field areas according to first position information by determining the first position information and the first display size corresponding to the safety helmet; respectively determining second position information and second display size corresponding to each first visual field area; according to the first position information, the first display size, the second position information and the second display size, the panoramic video data and the local video data are fused to obtain a target video, the effect of video fusion of the panoramic video data and the local video data according to the position information and the display size of the safety helmet and the display size of the first shooting device is achieved, and then the matching degree between the local video data and the panoramic video data in the target video is enhanced.

Example IV

Fig. 4 is a schematic structural diagram of a video detection device for an operation site according to a fourth embodiment of the present invention. As shown in fig. 4, the apparatus includes: a video data acquisition module 410, an operational safety detection module 420, and an operational risk prompting module 430.

The video data acquisition module 410 is configured to acquire field video data of a plurality of video sources of a job site; the field video data comprise panoramic video data of an operation field acquired by at least one first shooting device and local video data of the operation field acquired by at least one second shooting device; the second shooting device comprises a zooming shooting device arranged on a safety helmet worn by an operator; an operation safety detection module 420, configured to generate a target video based on the live video data of a plurality of video sources, determine live operation information of the operator based on the target video, and perform safety detection on the live operation information based on preset safety detection conditions; the operation risk prompting module 430 is configured to generate risk prompting information and display the risk prompting information when detecting that the operation risk exists in the field operation information.

Optionally, the operation safety detection module 420 includes a video frame determination unit, a transformation matrix determination unit, and a video fusion unit.

The video frame determining unit is used for determining first video frames to be fused in the field video data of each video source respectively and determining key points and descriptors in the first video frames respectively;

A transformation matrix determining unit, configured to determine target point pairs in the plurality of first video frames according to the plurality of keypoints and the descriptors, and determine a target transformation matrix according to the keypoints in the target point pairs; wherein the target point pair comprises at least two key points corresponding to the same position in the operation site;

and the video fusion unit is used for aligning the field video data of the video sources based on the target transformation matrix and fusing the aligned field video data to obtain a target video.

Optionally, the operation safety detection module 420 includes: an operation information determination unit.

And the operation information determining unit is used for determining a plurality of continuous second video frames containing operators in the target video, and respectively inputting each second video frame into an operation detection model so as to obtain the field operation information of the operators.

Optionally, the operation safety detection module 420 includes: the safety detection unit is operated.

And the operation safety detection unit is used for inputting the field operation information into an operation identification model to obtain an operation detection result corresponding to the field operation information, wherein the operation detection result comprises an operation risk and a standard operation.

Optionally, the operation safety detection module 420 includes: the device comprises a visual field area dividing unit, a display size determining unit and a video data fusion unit.

The visual field area dividing unit is used for determining first position information and a first display size corresponding to the safety helmet and dividing the panoramic video data into a plurality of first visual field areas according to the first position information;

a display size determining unit for determining second position information and a second display size corresponding to each first field of view region respectively;

and the video data fusion unit is used for fusing the panoramic video data and the local video data according to the first position information, the first display size, the second position information and the second display size to obtain a target video.

Optionally, the apparatus further includes: and the video frame dividing module and the video amplifying display module.

A video frame dividing module, configured to divide a video frame in a target video into a plurality of second field of view regions after the generating of the target video based on the live video data of a plurality of video sources;

the video amplification display module is used for determining the safety helmet worn by the operator and carrying out amplification display on the field video data acquired by the safety helmet when the second visual field area which is triggered by receiving the view triggering operation of the second visual field area comprises the operator.

Optionally, the apparatus further includes: and a video updating module.

And the video updating module is used for determining overlapping shooting areas of a plurality of field video data according to the position information and shooting fields of each of the first shooting device and the second shooting device before the field video data based on a plurality of video sources generate target videos, and updating the field video data based on the overlapping shooting areas.

Optionally, the apparatus further includes: and a video preprocessing module.

A video preprocessing module for preprocessing the live video data of at least one video source before the generation of a target video based on the live video data of a plurality of video sources, wherein the preprocessing includes a distortion removal process and/or an image correction process.

The video detection device of the operation site provided by the embodiment of the invention can execute the video detection method of the operation site provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example five

Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a video detection method of a job site.

In some embodiments, the video detection method of the job site may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the video detection method of the job site described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the video detection method of the job site in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of video inspection of an operation site, comprising:

2. The method of claim 1, wherein the generating a target video based on the live video data of a plurality of video sources comprises:

respectively determining first video frames to be fused in the field video data of each video source, and respectively determining key points and descriptors in the first video frames;

Determining target point pairs in a plurality of first video frames according to the key points and the descriptors, and determining a target transformation matrix according to the key points in the target point pairs; wherein the target point pair comprises at least two key points corresponding to the same position in the operation site;

And aligning the field video data of the video sources based on the target transformation matrix, and fusing the aligned field video data to obtain a target video.

3. The method of claim 1, wherein the determining on-site operation information of the worker based on the target video comprises:

And determining a plurality of continuous second video frames containing operators in the target video, and respectively inputting each second video frame into an operation detection model to obtain the field operation information of the operators.

4. The method of claim 1, wherein the performing the safety inspection of the field operation information based on the preset safety inspection condition comprises:

And inputting the field operation information into an operation identification model to obtain an operation detection result corresponding to the field operation information, wherein the operation detection result comprises operation risk and standard operation.

5. The method of claim 1, wherein the generating a target video based on the live video data of a plurality of video sources comprises:

determining first position information and a first display size corresponding to the safety helmet, and dividing the panoramic video data into a plurality of first visual field areas according to the first position information;

respectively determining second position information and second display size corresponding to each first visual field area;

And fusing the panoramic video data and the local video data according to the first position information, the first display size, the second position information and the second display size to obtain a target video.

6. The method of claim 1, further comprising, after the generating a target video based on the live video data of the plurality of video sources:

Dividing a video frame in the target video into a plurality of second field of view areas;

And when the second visual field area which is triggered by receiving the view triggering operation aiming at the second visual field area comprises an operator, determining the safety helmet worn by the operator, and displaying the field video data acquired by the safety helmet in an enlarged mode.

7. The method of claim 1, further comprising, prior to the generating a target video based on the live video data of the plurality of video sources:

And determining overlapping shooting areas of a plurality of field video data according to the position information and shooting fields of each first shooting device and each second shooting device, and updating the field video data based on the overlapping shooting areas.

8. The method of claim 1, further comprising, prior to the generating a target video based on the live video data of the plurality of video sources:

Preprocessing the live video data of at least one of the video sources, wherein the preprocessing includes distortion removal processing and/or image correction processing.

9. A video inspection device for an operation site, comprising:

10. An electronic device, the electronic device comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the video detection method of the job site of any one of claims 1-8.