CN112364695A

CN112364695A - Behavior prediction method and device, computer equipment and storage medium

Info

Publication number: CN112364695A
Application number: CN202011089288.6A
Authority: CN
Inventors: 杨宇克; 李博; 潘玉武; 陈雨晴; 袁晓晓; 时士柱
Original assignee: Hangzhou Zhongyun Data Technology Co ltd; Huzhou Big Data Operation Co ltd; Hangzhou City Big Data Operation Co ltd
Current assignee: Hangzhou Zhongyun Data Technology Co ltd; Huzhou Big Data Operation Co ltd; Hangzhou City Big Data Operation Co ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-02-12

Abstract

The invention belongs to the technical field of computers, and particularly relates to a behavior prediction method, a behavior prediction device, computer equipment and a storage medium. A behavior prediction method includes: inputting a plurality of continuous-time frames of video pictures into a preset recurrent neural network model; the preset cyclic neural network model consists of a plurality of time sequence neural network units; respectively extracting picture features of the correspondingly input video pictures through each time sequence neural network unit; and inputting all the picture characteristics into one of the time sequence neural network units, and predicting a next action by the time sequence neural network units according to the picture characteristics corresponding to all the video pictures. The behavior prediction method provided by the embodiment of the invention utilizes the memorability characteristic of the recurrent neural network model to speculate and recognize the picture characteristics of a plurality of frames of video pictures, processes the picture characteristics in real time, can predict behaviors quickly, actively and accurately, and has better intelligent effect.

Description

Behavior prediction method and device, computer equipment and storage medium

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a behavior prediction method, a behavior prediction device, computer equipment and a storage medium.

Background

In the existing industrial or living scenes, the scenes for predicting the future environment are often found. For example, the existing home early warning system can be embedded into an intelligent camera, and after a child in a monitored home acts one by one, a certain degree of warning is given before the child possibly acts dangerously. However, most of the existing technologies require special personnel to predict the behavior in front of the monitoring system through the judgment of the staff, and cannot realize intelligent monitoring and behavior prediction, so that it is difficult to predict the behavior of the monitored environment timely and accurately.

Therefore, the requirement of meeting the modernization is difficult to realize through simple monitoring in the prior art, and effective and timely prediction on behaviors is difficult to make.

Disclosure of Invention

The embodiment of the invention aims to provide a behavior prediction method, and aims to solve the problem that behavior prediction is difficult to realize through simple monitoring in the prior art.

The embodiment of the invention is realized in such a way that a behavior prediction method comprises the following steps:

inputting a plurality of continuous-time frames of video pictures into a preset recurrent neural network model; the preset cyclic neural network model consists of a plurality of time sequence neural network units, and a plurality of frames of the video pictures and the time sequence neural network units are input in a one-to-one correspondence manner;

respectively extracting picture features of the correspondingly input video pictures through each time sequence neural network unit;

and inputting all the picture characteristics into one of the time sequence neural network units, and predicting a next action by the time sequence neural network units according to the picture characteristics corresponding to all the video pictures.

Preferably, before the step of inputting several frames of video pictures with continuous time into the preset recurrent neural network model, the method further includes:

inputting a preset sample video picture set into a preset recurrent neural network model for training until the preset recurrent neural network model behavior prediction accuracy reaches a preset standard value.

Preferably, when the recurrent neural network model is trained, the method mainly includes:

extracting a plurality of sample video pictures with continuous time in the sample video picture set;

respectively extracting sample picture features of the sample video picture in a one-to-one correspondence mode through each time sequence neural network unit, wherein the sample picture features include but are not limited to preset human body features in the picture, and the preset human body features include but are not limited to position information of human body key points;

inputting all the sample picture characteristics into one of the time sequence neural network units, and predicting a next action by the neural network unit according to the sample picture characteristics corresponding to all the sample video pictures;

and repeating the steps until the converged recurrent neural network model is obtained.

Preferably, in the preset recurrent neural network model, the input and output of each time-series neural network unit are arranged in series, and the video pictures and/or the sample video pictures are input to each time-series neural network unit in series in a one-to-one correspondence manner according to a time sequence.

Preferably, in each of the time-series neural network units connected in series, the preceding time-series neural network unit inputs the picture features extracted by the preceding time-series neural network unit into the next time-series neural network unit for further picture feature extraction.

Preferably, in the preset recurrent neural network model, the number of the sequential neural network units is five.

Preferably, the inputting all the picture features into one of the time-series neural network units, after the time-series neural network unit predicts the next action according to the picture features corresponding to all the video pictures, further includes:

and comparing the predicted next action with a preset action library action, and judging whether the predicted next action belongs to dangerous actions.

It is another object of an embodiment of the present invention to provide a behavior prediction apparatus, which is configured to perform the following steps:

inputting a plurality of continuous-time frames of video pictures into a preset recurrent neural network model; the preset cyclic neural network model consists of a plurality of time sequence neural network units, and the video pictures and the time sequence neural network units are input in a one-to-one correspondence manner;

It is a further object of embodiments of the present invention to provide a computer device, comprising a memory and a processor, the memory having stored therein a computer program, which, when executed by the processor, causes the processor to perform the steps of the behavior prediction method.

It is a further object of embodiments of the present invention to provide a computer readable storage medium, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of the behavior prediction method.

According to the behavior prediction method provided by the embodiment of the invention, the continuous frames of video pictures are input into the recurrent neural network model for feature extraction and analysis, and the memory characteristics of the recurrent neural network model are further utilized to carry out speculation and identification on the picture features of the frames of video pictures, so that behavior prediction is realized.

Drawings

FIG. 1 is a flow chart of a behavior prediction method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a behavior prediction method according to an embodiment of the present invention;

FIG. 3 is a flowchart of training a recurrent neural network model according to an embodiment of the present invention;

FIG. 4 is a block diagram of a recurrent neural network model according to an embodiment of the present invention;

FIG. 5 is a block diagram showing an internal configuration of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.

In the embodiment of the present invention, the behavior prediction method may be applied to a terminal or a computer device, and the terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like; the computer device may be an independent physical server or terminal, may also be a server cluster formed by a plurality of physical servers, and may be a cloud server providing basic cloud computing services such as a cloud server, a cloud database, a cloud storage, and a CDN.

Example 1

As shown in fig. 1, in an embodiment, a behavior prediction method is provided, and this embodiment is mainly illustrated by applying the method to the terminal. The behavior prediction method specifically comprises the following steps:

step S101, inputting a plurality of continuous frames of video pictures into a preset recurrent neural network model; the preset cyclic neural network model consists of a plurality of time sequence neural network units, and a plurality of frames of video pictures and the time sequence neural network units are input in a one-to-one correspondence manner;

step S102, respectively extracting picture characteristics of the corresponding input video pictures through each time sequence neural network unit;

step S103, inputting all picture characteristics into one time sequence neural network unit, and predicting the next action by the time sequence neural network unit according to the picture characteristics corresponding to all video pictures.

In the embodiment of the present invention, a preset Recurrent Neural Network (RNN) is a Recurrent Neural Network that takes sequence data as input, recurses in the evolution direction of the sequence, and all nodes (Recurrent units) are connected in a chain manner, the Recurrent Neural Network has the characteristics of memorability, parameter sharing, and image completeness, and the common Recurrent Neural Network model includes two Recurrent Neural networks, i.e., a Bidirectional Recurrent Neural Network (Bi-RNN) and a Long Short-Term Memory Network (LSTM).

In the embodiment of the present invention, several frames of pictures with continuous time are mainly directly extracted from a video image, and the extraction of the video image belongs to the mature technology in the field, and there are many software in the prior art, such as common software like sound and shadow or PS, and certainly, in order to improve efficiency, a person skilled in the art can also use common software to automatically extract each frame of picture in the video image, and the description is not further developed, and the person skilled in the art can select or design the pictures according to the actual situation.

In the embodiment of the invention, the time sequence neural network unit is a basic component in the cyclic neural network unit, is mainly a convolutional neural network unit, and can modify each time sequence neural network unit to a certain extent when necessary. For example, in the embodiment of the present invention, except for the temporal neural network unit used for receiving all the picture features and performing the final behavior prediction in step 103, other temporal neural network units may be modified, so that the other temporal neural network units can perform the functions of picture feature extraction and input/output, which is only described as a preferred example, and is not a strict limitation on the structural framework of each temporal neural network unit in the recurrent neural network model.

In the embodiment of the present invention, as shown in fig. 2, before the step S101 inputs several frames of video pictures with continuous time into the preset recurrent neural network model, the method further includes:

step S201, inputting a preset sample video picture set into a preset recurrent neural network model for training until the behavior prediction accuracy of the preset recurrent neural network model reaches a preset standard value.

In the embodiment of the invention, a large number of video images can be selected for training, and multiple times of training can also be performed, for example, after a recurrent neural network model obtained after training a sample video image set is used for predicting another sample video image set for testing, an image with correct prediction is used as the sample video image set for further training again, so that the prediction precision of the recurrent neural network model is improved. Of course, the prediction training of the recurrent neural network model is only illustrated here, and the training process and the training method are not strictly limited, and those skilled in the art may select other specific training methods according to actual needs.

In the embodiment of the present invention, when the step S201 trains the recurrent neural network model, the method mainly includes:

step S301, a plurality of sample video pictures with continuous time in a sample video picture set are extracted;

step S302, respectively extracting sample picture characteristics of a sample video picture in a one-to-one correspondence manner through each time sequence neural network unit, wherein the sample picture characteristics include but are not limited to preset human body characteristics in the picture, and the preset human body characteristics include but are not limited to position information of human body key points;

step S303, inputting all sample picture characteristics into one of the time sequence neural network units, and predicting the next action by the neural network unit according to the sample picture characteristics corresponding to all sample video pictures;

step S304, repeat the above steps until a converged recurrent neural network model is obtained.

The method for predicting the behavior of the video monitoring system comprises the steps of extracting a plurality of sample video pictures with continuous time, wherein the step of extracting the sample video pictures with continuous time is actually equivalent to the step of intercepting a small section of video for analysis, and predicting the next action through the analysis of the small section of video.

In the embodiment of the invention, the sample picture characteristics when the cyclic neural network model is trained are the same in nature as the picture characteristics when the behavior prediction is really carried out, and are only used as the description of the distinguished names, and the human body characteristics contained in the sample picture characteristics are correspondingly processed in the video pictures which are really predicted.

In the embodiment of the invention, the preset human body characteristics can be position information of some human body key points, such as limbs, heads, joints and the like, and the state or posture of the human body in the sample video picture can be obtained by identifying the human body key points and combining the position information, so that the next action can be inferred by combining continuous multi-frame pictures through the principle of a neural network.

In the embodiment of the invention, in the preset recurrent neural network model, the input and the output of each time sequence neural network unit are arranged in series, and the video pictures and/or the sample video pictures are input into each time sequence neural network unit in series in a one-to-one correspondence mode according to the time sequence. Specifically, as described above, the recurrent neural network model itself is formed by connecting neural network units according to a chain structure, and in the embodiment of the present invention, video pictures or sample video pictures are correspondingly input into the chain and continuous time-series neural network units in a one-to-one correspondence according to a time sequence to be processed, so that the time sequence of the video pictures or sample video pictures is memorized by using the memory of the recurrent neural network and the behavior prediction is performed in combination with the time sequence.

In the embodiment of the invention, in each time sequence neural network unit connected in series, the former time sequence neural network unit inputs the picture characteristics extracted by the former time sequence neural network unit into the next time sequence neural network unit for extracting the picture characteristics again. Equivalently, each time sequence neural network unit in the chain structure receives the picture characteristics sent to the time sequence neural network unit, and then the time sequence neural network unit performs feature extraction again on the picture characteristics processed by the time sequence neural network unit, so that a higher-quality feature map is obtained, and the accuracy of behavior prediction is improved.

In the embodiment of the present invention, in the preset recurrent neural network model, the number of the time-series neural network units is five. Therefore, when action prediction is performed each time, five frames of video pictures are input at the same time for analysis and prediction, and in the embodiment of the invention, when the last time-series neural network unit predicts the next action according to the picture characteristics corresponding to all the video pictures, specifically, a concatenate operation can be performed on 5 picture characteristic graphs to obtain a final total characteristic graph, and then the final action is estimated according to the characteristic graph, and a loss function is inserted in the process. The specific inference process is not further described here, but is based primarily on the functional implementation of the recurrent neural network.

In this embodiment of the present invention, inputting all picture features into one of the time-series neural network units, where the time-series neural network unit predicts the next action according to the picture features corresponding to all video pictures, and the method further includes:

In the embodiment of the invention, the preset action library is mainly created by statistics according to big data, and the dangerous behaviors contained in the preset action library can be customized according to needs and are not further developed in detail.

Example 2

In an embodiment, as shown in fig. 4, for a structural block diagram of a recurrent neural network model provided in an embodiment of the present invention, a behavior prediction apparatus is provided, which includes a preset recurrent neural network model, and the behavior prediction apparatus may be integrated in a computer device or a terminal, and is configured to perform the following steps:

inputting a plurality of continuous-time frames of video pictures into a preset recurrent neural network model; the preset recurrent neural network model consists of a plurality of time sequence neural network units, and video pictures and the time sequence neural network units are input in a one-to-one correspondence manner;

respectively extracting picture characteristics of the corresponding input video pictures through each time sequence neural network unit;

and inputting all picture characteristics into one time sequence neural network unit, and predicting the next action by the time sequence neural network unit according to the picture characteristics corresponding to all video pictures.

In the embodiment of the invention, the time sequence neural network unit is a basic component in the cyclic neural network unit, is mainly a convolutional neural network unit, and can modify each time sequence neural network unit to a certain extent when necessary. For example, in the embodiment of the present invention, except for the time-series neural network unit for receiving all the picture features and performing final behavior prediction, other time-series neural network units may be modified, so that the other time-series neural network units can perform functions of picture feature extraction and input/output, which is only described as a preferred example, and is not a strict limitation on the structural framework of each time-series neural network unit in the recurrent neural network model.

In this embodiment of the present invention, before the behavior prediction apparatus inputs several frames of video pictures with continuous time into the preset recurrent neural network model, the behavior prediction apparatus further includes:

and inputting a preset sample video picture set into a preset recurrent neural network model for training until the behavior prediction accuracy of the preset recurrent neural network model reaches a preset standard value.

In the embodiment of the present invention, when the behavior prediction apparatus trains the recurrent neural network model, the method mainly includes:

extracting a plurality of sample video pictures with continuous time in a sample video picture set;

respectively extracting sample picture characteristics of a sample video picture in a one-to-one correspondence mode through each time sequence neural network unit, wherein the sample picture characteristics comprise but are not limited to preset human body characteristics in the picture, and the preset human body characteristics comprise but are not limited to position information of human body key points;

inputting all sample picture characteristics into one of the time sequence neural network units, and predicting the next action by the neural network unit according to the sample picture characteristics corresponding to all sample video pictures;

repeating the steps until a converged recurrent neural network model is obtained.

In the embodiment of the invention, in the recurrent neural network model preset by the behavior prediction device, the input and output of each time sequence neural network unit are arranged in series, and the video pictures and/or the sample video pictures are input into each time sequence neural network unit in series in a one-to-one correspondence mode according to the time sequence. Specifically, as described above, the recurrent neural network model itself is formed by connecting neural network units according to a chain structure, and in the embodiment of the present invention, video pictures or sample video pictures are correspondingly input into the chain and continuous time-series neural network units in a one-to-one correspondence according to a time sequence to be processed, so that the time sequence of the video pictures or sample video pictures is memorized by using the memory of the recurrent neural network and the behavior prediction is performed in combination with the time sequence.

The behavior prediction device provided by the embodiment of the invention inputs a plurality of continuous frames of video pictures into the recurrent neural network model for feature extraction and analysis, and then speculates and identifies the picture features of the plurality of frames of video pictures by utilizing the memory characteristics of the recurrent neural network model, so as to realize behavior prediction.

Example 3

In one embodiment, a computer device is proposed, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

FIG. 5 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may be specifically an independent physical server or a terminal, may also be a server cluster formed by a plurality of physical servers, and may be a cloud server providing basic cloud computing services such as a cloud server, a cloud database, a cloud storage, and a CDN. But not limited thereto, the smart phone, the tablet computer, the notebook computer, the desktop computer, the smart speaker, the smart watch, and the like may also be used. As shown in fig. 5, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen linked by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the behavior prediction method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform the behavior prediction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Example 4

In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which, when executed by a processor, causes the processor to perform the steps of:

acquiring at least two target data tables to be analyzed, and setting corresponding identification columns for the target data tables;

screening the target data tables according to a preset first screening condition, determining data containing target fields needing to be analyzed, and grouping the data of all the target data tables according to the target fields to form at least one first data group;

grouping the data of each first data group according to the identification column of the target data table to form a second data group;

and counting the number of all the second data groups, and determining intersection and/or difference results among the data tables in the data table set according to the number of the second data groups and the first data groups.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a non-volatile computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of behavioral prediction, the method comprising:

2. The method according to claim 1, wherein the step of inputting the several frames of video pictures in time series into the preset recurrent neural network model is preceded by the steps of:

3. The method of claim 2, wherein the training of the recurrent neural network model mainly comprises:

4. The method according to claim 3, wherein in the preset recurrent neural network model, the input and output of each of the sequential neural network units are arranged in series, and the video pictures and/or the sample video pictures are input into each of the sequential neural network units in series in a one-to-one correspondence in time order.

5. The method according to claim 4, wherein in each of the time-series neural network units connected in series, the previous time-series neural network unit inputs the picture feature extracted by the previous time-series neural network unit into the next time-series neural network unit for further picture feature extraction.

6. The method according to claim 1 or 2, wherein the number of the sequential neural network elements in the recurrent neural network model is five.

7. The method of claim 4, wherein said inputting all said picture features into one of said sequential neural network units, which predicts a next action according to said picture features corresponding to all said video pictures, further comprises:

8. A behavior prediction device, characterized in that the device is adapted to perform the following steps:

9. A computer arrangement comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the behaviour prediction method according to any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out the steps of the behaviour prediction method according to any one of claims 1 to 7.