CN111539358A

CN111539358A - Working state determination method and device, computer equipment and storage medium

Info

Publication number: CN111539358A
Application number: CN202010348360.6A
Authority: CN
Inventors: 周康明; 陈�光
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2020-08-14

Abstract

The application relates to a working state determination method, a working state determination device, computer equipment and a storage medium. The method comprises the following steps: acquiring a video to be processed, wherein the video to be processed is a video segment which is extracted from an original video and contains an object to be detected; inputting a video to be processed into a behavior recognition model obtained by pre-training, respectively extracting spatial characteristics and motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain behavior actions of an object to be detected; and determining the working state of the object to be detected according to the behavior action. By adopting the method, the identification efficiency of the working state can be improved.

Description

Working state determination method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a working state, a computer device, and a storage medium.

Background

With the rapid development of economy and the improvement of the living standard of materials, vehicles are more and more popularized in daily life of people, and in order to ensure the safety of the vehicles, safety inspection personnel are required to regularly perform safety inspection on the vehicles.

In the traditional technology, a frame of image is extracted from a monitoring video corresponding to vehicle safety inspection, then the position of a safety inspection personnel is obtained by identifying the frame of image, and after the position is obtained, the behavior of the safety inspection personnel is identified.

Disclosure of Invention

In view of the above, it is necessary to provide an operating state determining method, apparatus, computer device, and storage medium capable of improving behavior recognition efficiency in view of the above technical problems.

A working state determination method comprises the following steps:

acquiring a video to be processed, wherein the video to be processed is a video segment which is extracted from an original video and contains an object to be detected;

inputting a video to be processed into a behavior recognition model obtained by pre-training, respectively extracting spatial characteristics and motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain behavior actions of an object to be detected;

and determining the working state of the object to be detected according to the behavior action.

In one embodiment, acquiring a video to be processed includes:

receiving an original video, and extracting image frames from the original video according to a preset frequency;

when the vehicle is identified from the image frames, the original video after the image frames of the vehicle are identified is segmented to obtain a plurality of segments of videos to be processed.

In one embodiment, when a vehicle is identified from image frames, segmenting an original video after the image frames of the vehicle are identified to obtain a plurality of segments of videos to be processed, including:

when a vehicle and an object to be detected are identified from an original video, carrying out segmentation processing on the original video to obtain a plurality of sections of videos to be processed;

and when the time length corresponding to the video to be processed is less than the preset time length, combining the video to be processed with the video to be processed of the previous section.

In one embodiment, determining the working state of the worker according to the behavior action comprises:

acquiring a corresponding relation between each pre-stored behavior action and a working state;

acquiring a working state corresponding to the behavior action according to the corresponding relation;

and when the working state is unqualified, generating and outputting warning information.

In one embodiment, after obtaining the working state corresponding to the behavior action according to the corresponding relationship, the method further includes:

when the working state is correspondingly qualified, acquiring the duration time in the qualified working state;

and when the duration time reaches the preset time, generating and outputting qualified prompt information.

In one embodiment, the method for acquiring the behavior recognition model includes:

acquiring video segments corresponding to various vehicle safety inspection scenes, wherein the video segments comprise various types of behavior actions of an object to be detected;

and training the network model according to each video segment to obtain a behavior recognition model, wherein the network model is obtained by performing convolution operation according to the spatial feature extraction channel and the motion feature extraction channel.

In one embodiment, training the network model according to each video segment to obtain a behavior recognition model includes:

acquiring a video classification data set, wherein the video classification data set comprises various types of behavior actions corresponding to various scenes;

pre-training a network model according to the video classification data set to obtain an initial behavior recognition model;

and training the initial behavior recognition model according to each video segment to obtain a behavior recognition model.

An operating condition determining apparatus, the apparatus comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a video to be processed, and the video to be processed is a video segment which is extracted from an original video and contains an object to be detected;

the prediction module is used for inputting the video to be processed into a behavior recognition model obtained by pre-training, respectively extracting the spatial characteristics and the motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain the behavior action of the object to be detected;

and the determining module is used for determining the working state of the object to be detected according to the behavior action.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

The method and the device for determining the working state, the computer equipment and the storage medium acquire the video to be processed, wherein the video to be processed is a video segment which is extracted from an original video and contains the object to be detected; inputting the video to be processed into a behavior recognition model obtained by pre-training, respectively extracting the spatial characteristics and the motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain the behavior action of the object to be detected. The behavior recognition method has the advantages that the behavior recognition is directly carried out by using the video data, the video is not required to be decomposed into single-frame images, a large number of single-frame images are respectively recognized, the utilization rate of computer resources is improved, a dynamic process of behavior can also be directly used by using the video data, the accuracy of behavior recognition is improved, furthermore, the spatial features and the motion features of the object to be detected can be extracted by using a behavior recognition model stored in the computer in advance, the behavior of the object to be detected is recognized in multiple dimensions, and the recognition accuracy of behavior is further improved.

Drawings

FIG. 1 is a diagram of an application environment of a method for determining an operating state in one embodiment;

fig. 2 is a schematic flow chart of an operation state determination method according to an embodiment;

FIG. 3 is a schematic diagram of a camera used for capturing video scenes in a trench of a chassis station according to one embodiment;

fig. 4 is a schematic diagram of a fast/slow channel connection manner based on a SlowFast network according to an embodiment;

FIG. 5 is a schematic flow chart of an inspector behavior recognition system in a video-based chassis inspection process according to one embodiment;

FIG. 6 is a block diagram showing the construction of an operation state determining apparatus according to an embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The working state determination method provided by the application can be applied to the application environment shown in fig. 1. The image capturing device 102 communicates with the server 104 through a network, and the server 104 may also be connected to the terminal 103 through the network. The server 104 acquires a video to be processed from the image acquisition device 102, wherein the video to be processed is a video segment which is extracted from an original video and contains an object to be detected; inputting a video to be processed into a behavior recognition model obtained by pre-training, respectively extracting spatial characteristics and motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain behavior actions of an object to be detected; and determining the working state of the object to be detected according to the behavior action. Further, the server 104 may also push the working state of the object to be detected to the terminal 103.

The image capturing device 102 may be a device with an image capturing function, such as a camera, the terminal 103 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a schematic flow chart of a method for determining an operating state is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step 210, a video to be processed is obtained, where the video to be processed is a video segment extracted from an original video and containing an object to be detected.

Specifically, the original video may be video data obtained by monitoring a working scene by an image capture device, and the video segments are video segments extracted from the original video according to a preset rule, for example, the original video may be sequentially divided into video segments with a preset duration according to a time sequence, and each video segment may be equal to or different from each other at any time. The video segments contain preset data information, for example, each video segment should contain a vehicle to be inspected, or each data segment should contain an object to be inspected.

If the working scene corresponding to the original video can be a vehicle safety inspection service scene, the camera is arranged in the vehicle safety inspection service scene and is used for monitoring information in a chassis station trench in a vehicle chassis inspection process to obtain video data. As shown in fig. 3, a schematic view of capturing a video scene in a trench of a chassis station by using a camera is provided, in fig. 3, a vehicle to be inspected is stopped above the trench, and an object to be inspected, such as an inspector inspectors, inspects the chassis of the vehicle in the trench.

Specifically, a target working area corresponding to the vehicle safety inspection service may be determined first, and for example, the target working area may be an area corresponding to the object to be detected for performing safety inspection on the vehicle; and then, the server acquires a monitoring video stream corresponding to the target working area by using the image acquisition equipment, wherein the monitoring video stream is an original video, the original video comprises the staff in the target working area, and further the original video also comprises the vehicle to be detected, which needs to be subjected to safety inspection.

And 220, inputting the video to be processed into a behavior recognition model obtained by pre-training, respectively extracting the spatial characteristics and the motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain the behavior of the object to be detected.

The behavior recognition model can recognize information of the object to be detected from the video to be processed, for example, the behavior recognition model can be used for recognizing the video to be processed to obtain behavior actions of the object to be detected, wherein the behavior actions can be vehicle detection work or behavior actions, such as smoking behavior actions and the like, executed by the object to be detected by using a tool. The identification model can be a SlowFast network model, the model is a network based on video identification combined by a fast path and a slow path, the network adopts a fast channel and a slow channel to carry out convolution operation, the slow channel is used for extracting spatial features in a video to be processed, the fast channel is used for extracting motion features on a time sequence of the video to be processed, and behavior actions in the video are classified by combining the features extracted by the two channels. Specifically, as shown in fig. 4, a schematic diagram of a fast-slow channel connection mode based on a SlowFast network is provided, after a server acquires a corresponding video to be processed, corresponding video frames are respectively input into modules corresponding to a slow channel and a fast channel, then different channels are used for processing data with different dimensions in the video frames to obtain identification information, and behavior information in the video to be processed can be predicted according to the identification information.

It should be noted that the behavior recognition model may also recognize, from the video to be processed, vehicle information and the like corresponding to the vehicle to be detected, and specifically may include recognizing whether the vehicle to be detected exists in the video to be processed and an action state that the vehicle is in when the vehicle exists, for example, the action state of the vehicle may be that the vehicle is being placed in a parking space where vehicle detection is performed, or the vehicle is being turned away from the parking space, or the vehicle is being located in the parking space, and the like.

And step 230, determining the working state of the object to be detected according to the behavior action.

The operating conditions may include an operating pass condition and an operating fail condition. The server can judge the working state of the object to be detected according to the recognized behavior action of the object to be detected, if the behavior action of the object to be detected is smoking and the like, the working state of the object to be detected is determined to be unqualified, and if the working state of the object to be detected is vehicle inspection work performed by a vehicle inspection tool, the working state of the object to be detected is determined to be qualified.

In this embodiment, the received video to be processed may be directly analyzed to obtain the behavior of the object to be detected in the video to be processed, and then the working state of the object to be detected may be obtained according to the behavior. Compared with the prior art, the behavior recognition technical scheme mainly adopts a single picture or a mode of combining multiple pictures to recognize the behavior action, the behavior is recognized by extracting one or more pictures in the whole behavior action from the video to be processed, a dynamic process of the behavior action cannot be well utilized, and misjudgment is easy to generate. According to the scheme, the video segment is adopted to analyze the behavior of the object to be detected, such as an inspector, the change difference of the time sequence in the behavior process is fully considered, the whole process of the behavior in the video segment is analyzed, and the identification result is more accurate.

Moreover, the method for determining the working state of the inspector provided in this embodiment can automatically identify the behavior of the inspector in the video segment, and does not need a special person to check the behavior of the inspector in the video, so that the labor cost is saved, the behavior of the inspector in the operation can be standardized, and potential safety hazards caused by human negligence are avoided.

In one embodiment, acquiring a video to be processed includes: receiving an original video, and extracting image frames from the original video according to a preset frequency; when the vehicle is identified from the image frames, the original video after the image frames of the vehicle are identified is segmented to obtain a plurality of segments of videos to be processed.

Specifically, the server acquires video data in a chassis station trench in a vehicle chassis inspection process by using an image acquisition device such as a camera to obtain an original video, and then extracts a single-frame image from the original video according to a preset frequency to obtain an image frame, for example, the image frame can be read from the original video according to a frequency of one frame per second. Specifically, the method for acquiring a single-frame image of a video from an original video by a server includes: the method comprises the steps of taking one frame of image every T1 seconds from the first frame of image of a video, sequentially sending the image frames into a vehicle detection model to detect whether a vehicle to be detected above a trench is detected, recording the time T _ start of the image frames in the video if the vehicle to be detected above the trench is detected, finishing detecting the vehicle, and taking one video every T seconds by taking the time T _ start as a starting point to obtain a plurality of sections of videos to be processed. The value of T1 is 1, the size of T1 can be adjusted according to specific situations, the value of T can be 10, and the time T can be appropriately prolonged or shortened according to practical applications in other embodiments. Further, when a vehicle to be detected above the trench is detected by using the vehicle detection model, the time T _ start of the frame image in the video is recorded, the behavior recognition of an object to be detected, such as a vehicle inspector, is carried out from the moment, and one video is taken every T seconds from the time T _ start of the video to obtain a plurality of sections of videos to be processed.

Further, the vehicle Detection model may adopt an improved referenet (Single-shot-guided neural Network for Object Detection), which is a target Detection algorithm based on deep learning, and replaces a standard convolution kernel of a base Network in the referenet with a lighter-weight deep separation convolution. Deep-split convolution is much less computationally intensive and efficient at running the network than standard convolution. And, the convolution kernel dimensions connected to the P3, P4, P5 and P6 feature layers are reduced from 256 to 128. The P3, P4, P5 and P6 feature layers are related to the number of target classes to be detected by the network, and considering that the detection model is mainly used for detecting vehicles, the number of classes to be detected by the network is small, so that the dimensionality of a convolution kernel connected with the feature layers can be properly reduced, the video memory occupation during model operation can be further reduced, and the model operation efficiency is improved.

Further, the training method of the vehicle detection model comprises the following steps: the method comprises the steps of firstly collecting images in the process of chassis inspection through a camera, wherein the collected images contain vehicles of various brands and types, and the purpose is to enable collected image samples to be more diversified. And marking the vehicles in the images after the images are collected to obtain the marking result of each image. The images and the corresponding labeling results together form training data of the vehicle detection model. And then training by using the optimized Refinedet network by using training data, wherein the basic learning rate base _ lr is 0.001 during training, optimizing network parameters by adopting a random gradient descent method, and finishing when the loss value loss of the network does not descend any more and the network converges to obtain the vehicle detection model.

In the embodiment, the image frames extracted from the original video are used for vehicle identification instead of vehicle identification from the original video, so that the data processing amount of a computer is reduced, the vehicle identification efficiency is improved, the video segmentation processing is performed after the vehicle is successfully identified, and then the behavior action identification of the object to be detected is performed on the segmented video segment instead of the whole original video, so that the utilization rate of the computer is improved, and the behavior action identification efficiency is improved.

In one embodiment, when a vehicle is identified from image frames, segmenting an original video after the image frames of the vehicle are identified to obtain a plurality of segments of videos to be processed, including: when a vehicle and an object to be detected are identified from an original video, carrying out segmentation processing on the original video to obtain a plurality of sections of videos to be processed; and when the time length corresponding to the video to be processed is less than the preset time length, combining the video to be processed with the video to be processed of the previous section.

In this embodiment, the video segmentation processing is performed only when the vehicle and the object to be detected are successfully identified from the original video, so that it is ensured that all the multiple segments of videos to be processed include the vehicle and the object to be detected, that is, all the multiple segments of videos to be processed are video data in a working state, and further, the computer only processes the video segment to be processed including effective information, thereby improving the utilization rate of the computer. Furthermore, when the last video segment is shorter than T seconds, the last video segment is merged into the previous video segment, so that each video segment to be processed can contain enough information, the situation that a computer receives invalid video data and processes the invalid data is avoided, computer resources are wasted, data processing by using the invalid information can possibly influence an identification result, and the identification accuracy is reduced.

In one embodiment, determining the working state of the worker according to the behavior action comprises: acquiring a corresponding relation between each pre-stored behavior action and a working state; acquiring a working state corresponding to the behavior action according to the corresponding relation; and when the working state is unqualified, generating and outputting warning information. When the working state is correspondingly qualified, acquiring the duration time in the qualified working state; and when the duration time reaches the preset time, generating and outputting qualified prompt information.

The server stores the corresponding relation between the behavior action and the working state in advance, wherein the behavior action is obtained by identifying through an action identification model, and the behavior action which can be identified in the action identification model can comprise smoking behavior action, working behavior action and the like. And the corresponding relation stores the corresponding relation between various types of behavior actions and working states, and if the working state corresponding to the smoking behavior action is unqualified, the working state corresponding to the working behavior action is unqualified. In particular, the work activity action may be an action of an object to be inspected, such as an inspector inspecting a vehicle chassis component using a hand hammer.

And when the action recognition model recognizes that the action of the object to be detected is unqualified, warning information can be generated and output, for example, the system prompts an inspector that an illegal action exists when the inspector inspects the chassis, and corresponding prompts are made, for example, when the server recognizes that the inspector smokes illegally, the server immediately prompts the inspector to stop the illegal action. And when the behavior action of the object to be detected is qualified and the corresponding time of the qualified behavior action reaches the preset time length, judging that the behavior action of the object to be detected is qualified, and keeping the object to be detected in a qualified working state. If the identification model identifies that the inspector uses the hand hammer to inspect the vehicle chassis component, the system judges that the inspector operation meets the specification, and the whole process is finished.

In the embodiment, the action of the object to be detected is identified, so that whether the working state of the object to be detected is qualified or not can be obtained, and when the action of the object to be detected is unqualified, warning information can be sent out in time to warn illegal operation of the object to be detected, so that the working safety is ensured.

In one embodiment, the method for acquiring the behavior recognition model includes: acquiring video segments corresponding to various vehicle safety inspection scenes, wherein the video segments comprise various types of behavior actions of an object to be detected; and training the network model according to each video segment to obtain a behavior recognition model, wherein the network model is obtained by performing convolution operation according to the spatial feature extraction channel and the motion feature extraction channel.

The various vehicle safety inspection scenes can correspond to vehicle safety inspection scenes corresponding to different geographic positions, for example, video segments corresponding to Shanghai vehicle safety inspection scenes can be acquired, video segments corresponding to Beijing vehicle safety inspection scenes can be acquired, and the like. Multiple vehicle safety inspection scenes can correspond to the same geographical position, but the installation positions of the image acquisition equipment are different, so that the acquired video segment contains data from more angles, and multiple types of behavior actions can include behavior actions corresponding to safety inspection personnel in the vehicle safety inspection working process, and can include qualified behavior actions and unqualified behavior actions.

Specifically, the training process of the behavior recognition model comprises the following steps: the camera collects a chassis component inspection flow video in a working state, and captures video segments corresponding to two actions of smoke extraction of an inspector and inspection of the vehicle chassis component by the inspector through a hand hammer, and if the duration of the video segments is controlled to be about 10 seconds. And then, taking the captured video segments of the two behaviors as training data of a behavior recognition model, training the SlowFast network by using the training data, and obtaining the behavior recognition model corresponding to the inspector when the network training is converged.

In the embodiment, video segments containing more information can be obtained by collecting video segment data in various vehicle safety scenes, for example, the installation positions of image collection devices in safety scenes corresponding to different geographic positions are different, so that the information contained in the collected video segments is different, and corresponding behavior habits are different in the working process of vehicle safety inspection by workers in different areas, so that various types of video segments can be obtained, and the behavior recognition model obtained by training can have better recognition capability by utilizing various types of video segments, so that the recognition accuracy and the recognition adaptability of the behavior recognition model are improved.

In one embodiment, training the network model according to each video segment to obtain a behavior recognition model includes: acquiring a video classification data set, wherein the video classification data set comprises various types of behavior actions corresponding to various scenes; pre-training a network model according to the video classification data set to obtain an initial behavior recognition model; and training the initial behavior recognition model according to each video segment to obtain a behavior recognition model.

The video classification data set may include various motions of human-object interaction, such as various sports including body motion, human-human interaction, music playing equipment, etc., for example, the video classification data set may be a UCF-101 data set, and UCF-101 is a video classification data set, and the data set includes 101 categories of video data. Specifically, pre-training may be performed on the UCF-101 data set, and then optimization may be performed using data corresponding to vehicle safety checks.

In this embodiment, a network is pre-trained by using some professional training sets to make the network have a certain learning ability, and then the training sets in our patent are used to train the network, so as to improve the training precision of the network and accelerate the convergence speed of network training.

With the rapid development of economy and the improvement of living standard of materials, vehicles are more and more popularized in daily life of people, the vehicles bring much convenience to daily travel of people, and meanwhile, the safety problems of the vehicles are more and more concerned. The motor vehicle has to be regularly checked by the relevant national laws, and the regular vehicle checking can help the vehicle owner to find the vehicle in time, thereby effectively avoiding potential safety hazards. However, as vehicle keeping volumes increase year by year, the vehicle security work volume of the relevant departments and organizations increases dramatically. At present, many inspection projects in vehicle security inspection need manual inspection, and long-time inspection work can make inspectors tired, and potential safety hazards are brought due to human negligence.

Therefore, the invention provides a video-based inspector behavior identification system in the chassis inspection process by adopting the deep learning related technology, helps the vehicle annual inspection related department and the mechanism to standardize the behavior of inspectors in work, and avoids the potential safety hazard caused by human negligence. Specifically, the embodiment can be applied to inspector behavior identification in a chassis inspection process during vehicle annual inspection, whether a vehicle to be inspected exists above a current chassis inspection trench can be detected by using a vehicle inspection model based on deep learning, and when the vehicle to be inspected exists above the trench, the behavior of an inspector in the trench is identified. And when the inspector recognizes the behavior of using the hammer to inspect the vehicle chassis component, the current inspection flow compliance of the vehicle chassis component is determined.

Specifically, as shown in fig. 5, a schematic flow chart of the inspector behavior recognition system in the video-based chassis inspection process is provided, and the specific flow chart is as follows:

and step 510, acquiring videos in the station trench of the chassis by the camera.

At step 520, an image is taken from the video every second.

And 530, judging whether a vehicle to be detected exists above the detection trench of the vehicle detection model, if so, turning to 540, otherwise, turning to 530 to continue to execute the step of detecting the vehicle to be detected from the video.

And step 540, starting from the current time, taking a video segment every T seconds.

At step 550, the inspector's behavior in the video is identified.

Step 560, determine if the inspector's action corresponds to smoking. And when the action of the inspector is judged to correspond to smoking, the step 570 is carried out to prompt the inspector to smoke illegally, when the action of the inspector is judged to correspond to non-smoking, the step 550 is carried out to continue identifying the action of the inspector in the video, and further, when the action of the inspector is judged to correspond to non-smoking, a prompt that the inspector does not smoke can be sent out.

It should be noted that, in other embodiments, the execution sequence of determining whether the inspector smokes in step 560 and determining whether the inspector uses the hand hammer to inspect the chassis component in step 580 is not limited, and step 560 may be executed first, and step 580 may be executed when the inspector determines that the inspector smokes in step 560 does not smoke, or step 560 and step 580 may be executed simultaneously, which is not limited herein.

And 580, judging whether the behavior of the inspector is corresponding to the inspection of the chassis component by using the hand hammer, if so, turning to 590 to prompt the inspector that the behavior meets the specification, and if not, turning to 550 to continue identifying the behavior of the inspector in the video.

In the embodiment, the safety hidden danger caused by human negligence is avoided by standardizing the normative operation of an inspector in the annual inspection process of the vehicle; according to the technical scheme, the video-based inspector behavior recognition model is adopted, related behaviors of an inspector can be automatically recognized, and corresponding prompts are given.

It should be understood that although the steps in the flowcharts of fig. 2 and 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 and 5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 6, there is provided an operation state determination device including:

the acquiring module 610 is configured to acquire a video to be processed, where the video to be processed is a video segment that is extracted from an original video and includes an object to be detected.

The prediction module 620 is configured to input the video to be processed into a behavior recognition model obtained through pre-training, extract spatial features and motion features of the video to be processed through the behavior recognition model, and predict behavior actions of the object to be detected through the spatial features and the motion features.

The determining module 630 is configured to determine the working state of the object to be detected according to the behavior action.

In one embodiment, the obtaining module 610 includes:

and the image frame extraction unit is used for receiving the original video and extracting the image frames from the original video according to a preset frequency.

And the video segment acquisition unit is used for carrying out segmentation processing on the original video after the image frame of the vehicle is identified to obtain a plurality of segments of videos to be processed when the vehicle is identified from the image frame.

In one embodiment, the video segment capture unit comprises:

and the video segment acquisition subunit is used for carrying out segmentation processing on the original video to obtain a plurality of segments of videos to be processed when the vehicle and the object to be detected are identified from the original video.

And the merging subunit is used for merging the video to be processed and the video to be processed of the previous section when the time length corresponding to the video to be processed is less than the preset time length.

In one embodiment, the determining module 630 includes:

and the corresponding relation acquisition unit is used for acquiring the corresponding relation between each behavior action and the working state which is stored in advance.

And the working state acquisition unit is used for acquiring the working state corresponding to the behavior action according to the corresponding relation.

And the warning unit is used for generating and outputting warning information when the working state corresponds to disqualification.

In one embodiment, the operating state determining apparatus further includes:

and the time acquisition module is used for acquiring the duration time in the qualified working state when the working state is correspondingly qualified.

And the prompt information output module is used for generating and outputting qualified prompt information when the duration time reaches the preset time.

In one embodiment, the operating state determining apparatus further includes:

the training video segment acquisition module is used for acquiring video segments corresponding to various vehicle safety inspection scenes, and the video segments comprise various types of behavior actions of the object to be detected.

And the model acquisition module is used for training the network model according to each video segment to obtain a behavior recognition model, wherein the network model is obtained by performing convolution operation on the spatial feature extraction channel and the motion feature extraction channel.

In one embodiment, the model obtaining module includes:

and the data set acquisition unit is used for acquiring a video classification data set, and the video classification data set comprises various types of behavior actions corresponding to various scenes.

And the initial model acquisition unit is used for pre-training the network model according to the video classification data set to obtain an initial behavior recognition model.

And the identification model acquisition unit is used for training the initial behavior identification model according to each video segment to obtain a behavior identification model.

For the specific limitations of the operation state determination device, reference may be made to the above limitations of the operation state determination method, which are not described herein again. The respective modules in the above-described operation state determination device may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the working state determination data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an operational state determination method.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: acquiring a video to be processed, wherein the video to be processed is a video segment which is extracted from an original video and contains an object to be detected; inputting a video to be processed into a behavior recognition model obtained by pre-training, respectively extracting spatial characteristics and motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain behavior actions of an object to be detected; and determining the working state of the object to be detected according to the behavior action.

In one embodiment, the step of obtaining the video to be processed is further performed when the processor executes the computer program: receiving an original video, and extracting image frames from the original video according to a preset frequency; when the vehicle is identified from the image frames, the original video after the image frames of the vehicle are identified is segmented to obtain a plurality of segments of videos to be processed.

In one embodiment, the processor when executing the computer program performs the step of segmenting the original video after the image frame of the vehicle is identified to obtain a plurality of pieces of video to be processed when the vehicle is identified from the image frame, and further: when a vehicle and an object to be detected are identified from an original video, carrying out segmentation processing on the original video to obtain a plurality of sections of videos to be processed; and when the time length corresponding to the video to be processed is less than the preset time length, combining the video to be processed with the video to be processed of the previous section.

In one embodiment, the step of determining the working state of the worker based on the behavior action is further performed when the processor executes the computer program to: acquiring a corresponding relation between each pre-stored behavior action and a working state; acquiring a working state corresponding to the behavior action according to the corresponding relation; and when the working state is unqualified, generating and outputting warning information.

In one embodiment, the step after the processor executes the computer program to obtain the working state corresponding to the behavior action according to the corresponding relationship is further configured to: when the working state is correspondingly qualified, acquiring the duration time in the qualified working state; and when the duration time reaches the preset time, generating and outputting qualified prompt information.

In one embodiment, the steps implemented as the method for obtaining an identification model when the processor executes the computer program are further for: acquiring video segments corresponding to various vehicle safety inspection scenes, wherein the video segments comprise various types of behavior actions of an object to be detected; and training the network model according to each video segment to obtain a behavior recognition model, wherein the network model is obtained by performing convolution operation according to the spatial feature extraction channel and the motion feature extraction channel.

In one embodiment, the step of training the network model according to the video segments to obtain the behavior recognition model is further performed when the processor executes the computer program to: acquiring a video classification data set, wherein the video classification data set comprises various types of behavior actions corresponding to various scenes; pre-training a network model according to the video classification data set to obtain an initial behavior recognition model; and training the initial behavior recognition model according to each video segment to obtain a behavior recognition model.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a video to be processed, wherein the video to be processed is a video segment which is extracted from an original video and contains an object to be detected; inputting a video to be processed into a behavior recognition model obtained by pre-training, respectively extracting spatial characteristics and motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain behavior actions of an object to be detected; and determining the working state of the object to be detected according to the behavior action.

In one embodiment, the computer program when executed by the processor performs the step of obtaining the video to be processed is further configured to: receiving an original video, and extracting image frames from the original video according to a preset frequency; when the vehicle is identified from the image frames, the original video after the image frames of the vehicle are identified is segmented to obtain a plurality of segments of videos to be processed.

In one embodiment, the computer program when executed by the processor performs the step of segmenting the original video after the image frame identifying the vehicle into a plurality of segments of video to be processed when the vehicle is identified from the image frame, and further: when a vehicle and an object to be detected are identified from an original video, carrying out segmentation processing on the original video to obtain a plurality of sections of videos to be processed; and when the time length corresponding to the video to be processed is less than the preset time length, combining the video to be processed with the video to be processed of the previous section.

In one embodiment, the computer program when executed by the processor performs the step of determining the working state of the worker based on the behavioral actions further comprises: acquiring a corresponding relation between each pre-stored behavior action and a working state; acquiring a working state corresponding to the behavior action according to the corresponding relation; and when the working state is unqualified, generating and outputting warning information.

In one embodiment, the step of acquiring the working state corresponding to the behavior action according to the correspondence when the computer program is executed by the processor is further configured to: when the working state is correspondingly qualified, acquiring the duration time in the qualified working state; and when the duration time reaches the preset time, generating and outputting qualified prompt information.

In one embodiment, the computer program when being executed by the processor performs the steps of the method for obtaining a behavior recognition model is further configured to: acquiring video segments corresponding to various vehicle safety inspection scenes, wherein the video segments comprise various types of behavior actions of an object to be detected; and training the network model according to each video segment to obtain a behavior recognition model, wherein the network model is obtained by performing convolution operation according to the spatial feature extraction channel and the motion feature extraction channel.

In one embodiment, the computer program when executed by the processor further performs the step of training the network model based on the video segments to obtain the behavior recognition model further comprises: acquiring a video classification data set, wherein the video classification data set comprises various types of behavior actions corresponding to various scenes; pre-training a network model according to the video classification data set to obtain an initial behavior recognition model; and training the initial behavior recognition model according to each video segment to obtain a behavior recognition model.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for determining an operating condition, the method comprising:

inputting the video to be processed into a behavior recognition model obtained by pre-training, respectively extracting the spatial characteristics and the motion characteristics of the video to be processed through the behavior recognition model, and predicting through the spatial characteristics and the motion characteristics to obtain the behavior of the object to be detected;

2. The method of claim 1, wherein the obtaining the video to be processed comprises:

and when a vehicle is identified from the image frames, segmenting the original video after the image frame of the vehicle is identified to obtain a plurality of segments of videos to be processed.

3. The method according to claim 2, wherein when a vehicle is identified from the image frames, segmenting the original video after the image frame in which the vehicle is identified to obtain a plurality of pieces of video to be processed comprises:

when the vehicle and the object to be detected are identified from the original video, carrying out segmentation processing on the original video to obtain a plurality of sections of videos to be processed;

4. The method of claim 1, wherein determining the work status of the worker from the behavioral action comprises:

and generating and outputting warning information when the working state is unqualified correspondingly.

5. The method according to claim 4, wherein after obtaining the working state corresponding to the behavior action according to the corresponding relationship, the method further comprises:

6. The method according to any one of claims 1 to 5, wherein the method for acquiring the behavior recognition model comprises:

and training a network model according to each video segment to obtain a behavior recognition model, wherein the network model is obtained by performing convolution operation according to a spatial feature extraction channel and a motion feature extraction channel.

7. The method of claim 6, wherein training a network model from each of the video segments to obtain a behavior recognition model comprises:

pre-training the network model according to the video classification data set to obtain an initial behavior recognition model;

8. An operating condition determining apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.