WO2021179591A1 - 行为识别的方法、装置及计算机存储介质 - Google Patents

行为识别的方法、装置及计算机存储介质 Download PDF

Info

Publication number
WO2021179591A1
WO2021179591A1 PCT/CN2020/119735 CN2020119735W WO2021179591A1 WO 2021179591 A1 WO2021179591 A1 WO 2021179591A1 CN 2020119735 W CN2020119735 W CN 2020119735W WO 2021179591 A1 WO2021179591 A1 WO 2021179591A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
behavior
image
recognized
neural network
Prior art date
Application number
PCT/CN2020/119735
Other languages
English (en)
French (fr)
Inventor
蒋霆
叶年进
王光甫
刘帅成
Original Assignee
成都旷视金智科技有限公司
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都旷视金智科技有限公司, 北京迈格威科技有限公司 filed Critical 成都旷视金智科技有限公司
Publication of WO2021179591A1 publication Critical patent/WO2021179591A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present disclosure relate to the field of image processing, and more specifically, to a method, device, and computer storage medium for behavior recognition.
  • Behavior recognition can refer to a technology that analyzes and recognizes human behavior. For example, it is possible to analyze and recognize human behavior in images (for example, video frame data). This technology can extract relevant frame data from video sequences and extract rich visual features, thereby evaluating human behavior. Express and explain.
  • behavior recognition is widely used in many aspects such as video surveillance and autonomous driving.
  • many traffic accidents caused by distracted behaviors of drivers there are many traffic accidents caused by distracted behaviors of drivers.
  • behavior recognition technology can be used to detect distracted behaviors of drivers, and detect drivers' distracted behaviors such as smoking and making phone calls. , Drinking water, etc., will affect normal driving behaviors in time to stop.
  • the present disclosure provides a method, a device and a computer storage medium for behavior recognition, which have high recognition accuracy and can meet the recognition requirements in specific application fields.
  • a method of behavior recognition including:
  • the output includes a first output, a second output, and a third output.
  • the behavior of the person in the image to be recognized is determined according to the first output, the second output, and the third output.
  • the determining the behavior of the person in the image to be recognized according to the first output, the second output, and the third output includes:
  • the comparison result between the first output and the first threshold, the comparison result between the second output and the second threshold, and the comparison result between the third output and the third threshold determine the to-be-identified The behavior of the characters in the image.
  • the determining the behavior of the person in the image to be recognized according to the first output, the second output, and the third output includes:
  • first output is greater than or equal to a first threshold, or if the first output is less than the first threshold and the second output is greater than the third output and the second output is less than a second threshold, Or, if the first output is less than the first threshold, the second output is less than or equal to the third output, and the third output is less than the third threshold, determine the person’s status in the image to be recognized Behaving normally
  • the first output is less than the first threshold, the second output is greater than the third output, and the second output is greater than or equal to the second threshold, it is determined that the person in the image to be recognized is Perform the first act;
  • the first output is less than the first threshold
  • the second output is less than or equal to the third output
  • the third output is greater than or equal to the third threshold
  • the neural network includes a first branch model, a second branch model, and a third branch model, and the first branch model generates the first output, and the second branch model generates the The second output, the third branch model produces the third output.
  • the second branch model is a smoking behavior recognition model
  • the third branch model is a phone call behavior recognition model
  • the output of the first convolutional layer of the first branch model and the output of the second convolutional layer of the second branch model are fused as the third convolution of the second branch model Layer input.
  • the output of the first convolution layer of the first branch model and the output of the fourth convolution layer of the third branch model are fused as the input of the fifth convolution layer of the third branch model.
  • the neural network is obtained through training based on a training data set.
  • the training data set is constructed in the following manner:
  • a set of multiple training data for N original data is the training data set, and the training data set includes M training data, where M is greater than N and all are positive integers.
  • edge detection data and the enhanced data are fused, they are used as the input of the third branch model of the neural network.
  • the data enhancement processing includes at least one of the following: mirroring, brightness change, and random cropping.
  • the termination of the training process is controlled by setting the amount of data in a single iteration, the total number of iterations, and the learning rate attenuation strategy.
  • a behavior recognition device configured to implement the steps of the method described in the first aspect or any one of the implementation manners, and the device includes:
  • the acquisition module is used to acquire the image to be recognized
  • An input module used to input the image to be recognized into a pre-trained neural network
  • the acquisition module is further configured to acquire the output of the neural network, the output includes a first output, a second output, and a third output, wherein the first output represents normal behavior of the person in the image to be recognized Probability, the second output indicates the probability that the person in the image to be recognized is performing the first behavior, and the third output indicates the probability that the person in the image to be recognized is performing the second behavior;
  • the determining module is configured to determine the behavior of the person in the image to be recognized according to the first output, the second output, and the third output.
  • an apparatus for behavior recognition including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer program
  • a computer storage medium having a computer program stored thereon, and the computer program, when executed by a processor, implements the steps of the method described in the foregoing first aspect or any one of the implementation manners.
  • the embodiments of the present disclosure can use a pre-trained neural network to determine the behavior of the person in the image to be recognized.
  • the neural network includes multiple branch models, which can extract rich visual features and focus on the image to be recognized.
  • the specific behavior of the characters in which makes the accuracy of behavior recognition higher.
  • the behavior recognition method of the embodiments of the present disclosure can meet real-time requirements, can perform real-time calculations, and further meet the recognition requirements of various application fields.
  • Fig. 1 is a schematic block diagram of an electronic device according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of the training process of the neural network of the embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the network structure of the neural network of the embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of the structure of a convolution block according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of determining a character's behavior based on output according to an embodiment of the present disclosure
  • FIG. 6 is a schematic flowchart of a method for behavior recognition according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic block diagram of a behavior recognition device according to an embodiment of the present disclosure.
  • Fig. 8 is another schematic block diagram of a behavior recognition apparatus according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic block diagram of the electronic device of the embodiments of the disclosure.
  • the electronic device 10 shown in FIG. 1 includes one or more processors 102, one or more storage devices 104, an input device 106, an output device 108, an image sensor 110, and one or more non-image sensors 114, which The components are interconnected through a bus system 112 and/or other forms. It should be noted that the components and structure of the electronic device 10 shown in FIG. 1 are only exemplary and not restrictive, and the electronic device may also have other components and structures as required.
  • the processor 102 may include a central processing unit (CPU) 1021 and a graphics processing unit (Graphics Processing Unit, GPU) 1022 or other forms of processing units with data processing capabilities and/or instruction execution capabilities, such as on-site Field-Programmable Gate Array (FPGA) or Advanced RISC (Reduced Instruction Set Computer) Machine (ARM), etc., and the processor 102 can control other components in the electronic device 10 To perform the desired function.
  • CPU central processing unit
  • GPU Graphics Processing Unit
  • FPGA Field-Programmable Gate Array
  • ARM Reduced Instruction Set Computer
  • the storage device 104 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as a volatile memory 1041 and/or a nonvolatile memory 1042.
  • the volatile memory 1041 may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory 1042 may include, for example, a read-only memory (Read-Only Memory, ROM), a hard disk, a flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 102 may execute the program instructions to implement various desired functions.
  • Various application programs and various data such as various data used and/or generated by the application program, can also be stored in the computer-readable storage medium.
  • the input device 106 may be a device used by the user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, and a touch screen.
  • the output device 108 may output various information (for example, images or sounds) to the outside (for example, a user), and may include one or more of a display, a speaker, and the like.
  • the image sensor 110 can take images (for example, photos, videos, etc.) desired by the user, and store the taken images in the storage device 104 for use by other components.
  • the components and structure of the electronic device 10 shown in FIG. 1 are only exemplary. Although the electronic device 10 shown in FIG. The number of some devices can be more, etc., which is not limited in the present disclosure.
  • the embodiments of the present disclosure provide a neural network, which can be used to recognize the behavior of a person.
  • the neural network includes at least three branch models, which are a first branch model, a second branch model, and a third branch model.
  • the first branch model may be the main branch model, for example, the first branch model may be used to perform main recognition on the image input to the neural network.
  • the second branch model may be mainly used to identify the first behavior, such as the first abnormal behavior.
  • the third branch model can be mainly used to identify the second behavior, such as the second abnormal behavior.
  • the first behavior can be a smoking behavior
  • the second behavior can be a phone call behavior
  • the three branch models may be referred to as the main branch model, the smoking branch model, and the telephone branch model.
  • the neural network may also include a fourth branch model, which may be used to identify a third behavior, such as a third abnormal behavior.
  • a third behavior such as a third abnormal behavior.
  • the neural network in the embodiment of the present disclosure may also include any number of branch models for identifying a corresponding number of behaviors.
  • the third behavior may be a drinking behavior.
  • the neural network in the embodiments of the present disclosure may be obtained through training based on a training data set.
  • the training data set can be constructed in the following manner: Obtain N original data, where each original data includes a portrait area; for each original data: segment the portrait area in it, and it is outside the portrait area. Multiple noises are added to other regions to generate multiple training data; the set of multiple training data for N original data is the training data set.
  • the training data set includes M training data, where M is greater than N and both are Positive integer.
  • obtaining N pieces of original data may include: obtaining an original data set by decimating frames of the video stream data, which includes sufficient data. If the neural network will be used in the driving field to recognize the behavior of the driver, then the original data set can contain three types of data: normal driving, smoking, and phone calls. Subsequently, the original data set can be manually screened or script screened to delete the wrong samples that do not contain human faces, severe blurring, etc., and finally N original data are obtained. Exemplarily, this process can also be understood as a process of data acquisition and cleaning.
  • each of the N raw data includes a human portrait area.
  • a verification data set (for example, it can be represented as V) can be constructed similarly, which can include N1 verification data. Or, optionally, as another example, after filtering the original data set, N pieces of the original data can be used for training, and the other N pieces of data can be used as the verification data set. It can be understood that the verification data set is used to verify the trained neural network and to judge the quality of the neural network model.
  • M training data can be obtained based on N original data, including: firstly, a segmentation model is used to segment the portrait region, and noise is added to the non-portrait region, so as to obtain a training data set, and the amount of data is M.
  • this process can be performed before sending the data to the neural network to be trained, so it can also be referred to as the process of data offline enhancement.
  • different noises can be added to a non-portrait region of the original data (ie, other regions other than the portrait region), for example, by adding p different noises, p training data can be obtained based on one original data.
  • noise when M training data is obtained, noise can be added to the non-portrait area, which can remove the background interference to a certain extent, so that the training process converges faster, and the trained neural network is guaranteed The accuracy is higher. And it can be understood that by adding noise to the non-portrait area, the resulting neural network can recognize the behavior of the person in the complex background, eliminating the interference of the complex background.
  • the training data set (as above, including M training data) can be sent to the neural network to be trained. Then, based on the training data set, the input data to the different branch models of the neural network can be generated.
  • data enhancement processing may be performed on the training data 2001 to obtain enhanced data, as shown in FIG. 2 as "data input" 2002.
  • the data enhancement processing may include at least one of mirroring, brightness change, and random cropping, or the data enhancement processing may also include other types of processing, which will not be listed here. It can be understood that by performing data enhancement processing, the effects of viewing angle diversity, different lighting, etc. can be eliminated.
  • the enhanced data (ie, "data input”) can be used as the input of the first branch model, that is, the data input 2002 shown in FIG. 2 can be intuitively understood as the input of the first branch model 2005.
  • the first branch model 2005 may be referred to as a main branch model.
  • the second branch model 2006 may be a smoking branch model, which mainly focuses on the behavior of smoking, so it may focus only on the behavior around the person's mouth.
  • the lower half of the face can be cropped on the enhanced data (area cropping 203 in FIG. 2), so that the image around the mouth that the smoking branch focuses on can be obtained by cropping.
  • the third branch model 2007 may be a call branch model, which mainly focuses on the call behavior, so it may only focus on objects such as mobile phones. Since the mobile phone is a rigid body with prominent edges, effective edge information can be detected by the edge detection method 204 to obtain edge-detected data, which is then merged with the enhanced data and used as the input of the third branch model 2007.
  • the edge detection can be performed by the sobel operator, or other methods can also be used for the edge detection, which is not limited in the present disclosure.
  • the fusion may include a concatenate operation (Concat for short), that is, combining the edge-detected data with the features of the enhanced data, so as to ensure that the third branch model focuses on the call behavior area.
  • Concat concatenate operation
  • the three-branch model of the neural network in the embodiments of the present disclosure may have the network structure shown in FIG. 3. It should be noted that the network structure shown in FIG. Structural constraints.
  • IP1 represents the data input to the first branch model, specifically the enhanced data in FIG. 2.
  • IP2 represents the data input to the second branch model, specifically the data after the region is cropped in FIG. 2.
  • the size of IP2 may be half of IP1.
  • IP3 represents the data after edge detection. After feature fusion with IP1, it is input to the third branch model.
  • the convolutional layer in FIG. 3 includes a conventional convolution layer (Convolution, abbreviated as CONV) and a convolution group (Convolution Group, abbreviated as CONVG).
  • Convolution Group is a group structure containing multiple convolution blocks (Conv Block) with a residual structure, in which Pool (pooling) is used as maximum pooling (Max Pooling) for dimensionality reduction.
  • Pool pooling
  • Max Pooling maximum pooling
  • the structure of the convolution block may be as shown in FIG.
  • multiple convolution operators include 1 ⁇ 1 convolution 402, 3 ⁇ 3DW convolution 403, 1 ⁇ 1 convolution 404, and SE (Squeeze-and-Excitation) 405, and also include batch normalization (batch- normalization, BN) and batch-normalization Rectified Linear Unit (BN ReLU).
  • BN batch-normalization Rectified Linear Unit
  • the output of the first convolution layer CONV1 of the first branch model and the output of the second convolution layer CONV2 of the second branch model are fused as the input of the third convolution layer CONV3 of the second branch model.
  • the output of the first convolution layer CONV1 of the first branch model and the output of the fourth convolution layer CONV4 of the third branch model are merged as the input of the fifth convolution layer CONV5 of the third branch model.
  • the first convolutional layer of the first branch model is represented as CONV1
  • the second convolutional layer and the third convolutional layer of the second branch model are represented as CONV2 and CONV3, respectively
  • the fourth convolution of the third branch model The layer and the fifth convolutional layer are denoted as CONV4 and CONV5, respectively.
  • the output of CONV1 and CONV2 are fused (cat) and then input to CONV3
  • the output of CONV1 and CONV4 are fused (cat) and then input to CONV5.
  • the first convolutional layer may refer to one of a plurality of convolutional layers included in the first branch model, and exemplarily may be a convolutional group of 4 blocks.
  • the convolutional layer located after the first convolutional layer in the first branch model can be a convolution group of 6 blocks, and then (CONV7 in Figure 3) is 4 Block convolution group.
  • the second convolutional layer (CONV2) may be one of the multiple convolutional layers included in the second branch model
  • the third convolutional layer (CONV3) may be one of the multiple convolutional layers included in the second branch model.
  • the convolution layer after the second convolution layer, for example, the third convolution layer (CONV3) may be a convolution group of 6 blocks.
  • the fourth convolutional layer (CONV4) may be one of the multiple convolutional layers included in the third branch model
  • the fifth convolutional layer (CONV5) may be multiple convolutional layers included in the third branch model.
  • the convolutional layer located after the fourth convolutional layer in, for example, the fifth convolutional layer (CONV5) may be a convolution group of 6 blocks.
  • the data of the first branch model is sequentially processed by the convolutional layer CONV0, the convolutional group layer CONV1, the convolutional group layer CONV6, the convolutional group layer CONV7, the convolutional layer CONV8, and the fully connected layer FC1.
  • the data of the second branch model is sequentially processed by the convolutional layer CONV2, the convolutional group layer CONV3, and the fully connected layer FC_S.
  • the data of the third branch model is sequentially processed by the convolutional layer CONV4, the convolutional group layer CONV5, and the fully connected layer FC_P.
  • a softmax (classification model) may also be included.
  • the softmax of the first branch model outputs the probability of normal behavior and abnormal behavior.
  • the softmax of the second branch model (such as smoking branch model) outputs the probability of the first behavior (such as smoking behavior) and non-first behavior (such as non-smoking behavior).
  • the softmax of the third branch model (such as the telephone branch model) outputs the probability of the second behavior (such as the telephone behavior) and the non-second behavior (such as the non-phone behavior).
  • the termination of the training process can be controlled by setting the amount of data in a single iteration, the total number of iterations, and the learning rate attenuation strategy.
  • the initial learning rate can be set to ⁇
  • the size of the batch (the number of data sent to the model in each iteration) is B
  • the total number of iterations (epoch) is E
  • the learning rate decay strategy is every K epochs Decrease the attenuation by a factor of ten.
  • the model training is terminated.
  • the model verification accuracy rate on the verification set V the model with the best verification accuracy rate can be saved every one iteration, and finally the best model is obtained, tested and deployed.
  • the three branch models of the neural network will get three softmax outputs.
  • the probability of the normal behavior output by the first branch model is Pn
  • the probability of the first behavior output by the second branch model is Ps
  • the probability of the second behavior output by the third branch model is Pc.
  • the probabilities of these three outputs can be combined to determine the behavior of the character.
  • the threshold values th1, th2, and th3 corresponding to the three branch models can be set, and the behavior of the character can be determined as the normal behavior or the first behavior or the second behavior according to the process shown in FIG. 5.
  • step S501 it is determined whether Pn is greater than or equal to th1, and if so, the normal behavior is output; otherwise, step S502 is continued;
  • step S502 determine whether Ps is greater than or equal to Pc, if yes, continue to perform step S503, otherwise, continue to perform step S504;
  • step S503 judge whether Ps is greater than or equal to th2, if yes, output the first behavior, otherwise, output the normal behavior;
  • step S504 it is determined whether Pc is greater than or equal to th3, and if so, the second behavior is output, otherwise, the normal behavior is output.
  • the first behavior and the second behavior may be the first abnormal behavior and the second abnormal behavior, for example, smoking behavior and phone call behavior, respectively.
  • a neural network for behavior recognition can be obtained, and the neural network includes a plurality of branch models, such as the three branch models described above.
  • the neural network of such a multi-branch model can focus on the behavior area of a specific behavior, so that the recognition accuracy of the behavior is higher.
  • the trained neural network can be deployed to be applied to a specific field, and can meet the identification requirements in a specific application field.
  • the embodiments of the present disclosure utilize the complex modeling capabilities of the deep learning model, so that the neural network obtained by training has a higher accuracy in recognizing behavior.
  • the normal behavior is normal driving behavior
  • the first behavior is smoking behavior
  • the second behavior is phone call behavior.
  • the traditional single model in the technology compares the accuracy of behavior recognition. It can be seen that the neural network in the embodiment of the present disclosure has a higher accuracy of behavior recognition.
  • FIG. 6 is a schematic flowchart of a behavior recognition method according to an embodiment of the present disclosure.
  • the method shown in FIG. 6 may be executed by the device 10 shown in FIG. 1, or more specifically, executed by the processor 102.
  • FIG. The methods shown can include:
  • S110 Obtain an image to be recognized, and input the image to be recognized into a pre-trained neural network
  • S130 Determine the behavior of the person in the image to be recognized according to the first output, the second output, and the third output.
  • the neural network mentioned in Fig. 6 may be the neural network described above in conjunction with Figs. 2 to 5, and the training process of the neural network can be referred to the above-mentioned related description.
  • the method shown in FIG. 6 does not limit the application scenarios. For example, it can be applied to fields such as video surveillance and automatic driving. Assuming that the method shown in Figure 6 is applied to the driving field, correspondingly, the normal behavior of a character can represent normal driving behavior, the first behavior can be the first abnormal behavior, such as smoking behavior, and the second behavior can be the second abnormal behavior, such as Calling behavior.
  • the image to be recognized can be an image captured in real time or a frame of an image in a video stream captured in real time; the image to be recognized can be a pre-stored image or a pre-stored video stream. One frame of image.
  • acquiring the image to be recognized in S110 may include extracting a frame of image in the video stream.
  • the neural network may include a first branch model, a second branch model, and a third branch model, and the first branch model generates a first output, the second branch model generates a second output, and the third branch model generates a third output .
  • the first output can represent the probability that the person (ie, the driver) in the image to be recognized is driving normally, and the second output represents that the person in the image to be recognized is performing the first abnormal behavior (Such as smoking behavior), the third output represents the probability that the person in the image to be recognized is performing a second abnormal behavior (such as a phone call).
  • the first branch model can be called the main branch model
  • the second branch model is called the smoking behavior recognition model
  • the third branch model is called the phone behavior recognition model.
  • S130 may include: determining the to-be-identified according to at least one of a comparison result of the first output and the first threshold, a comparison result of the second output and the second threshold, and a comparison result of the third output and the third threshold.
  • the behavior of the characters in the image are preset, for example, can be preset according to application scenarios, precision requirements, etc., and it is understandable that the first threshold, the second threshold, and the third threshold are all 0 Values between 1 and 1, and these three values may be equal or not equal to each other, which is not limited in the present disclosure.
  • S130 may include: if the first output is greater than or equal to the first threshold, or if the first output is less than the first threshold and the second output is greater than the third output and the second output is less than the second threshold, or if the first output If an output is less than the first threshold, the second output is less than or equal to the third output, and the third output is less than the third threshold, it is determined that the behavior of the person in the image to be recognized is normal. If the first output is less than the first threshold, the second output is greater than the third output, and the second output is greater than or equal to the second threshold, it is determined that the person in the image to be recognized is performing the first behavior. If the first output is less than the first threshold, the second output is less than or equal to the third output, and the third output is greater than or equal to the third threshold, it is determined that the person in the image to be recognized is performing the second behavior.
  • the first output, the second output, and the third output are represented as Pn, Ps, and Pc in sequence.
  • the first threshold, the second threshold, and the third threshold are denoted sequentially as th1, th2, and th3.
  • the behavior of the character can be determined It is a normal behavior, such as in the driving field, is a normal driving behavior.
  • the behavior of the character is the first behavior, such as the first abnormal behavior (such as smoking behavior) in the driving field.
  • the behavior of the character is the second behavior, such as in the driving field, the second abnormal behavior (such as the behavior of making a phone call).
  • the embodiments of the present disclosure can effectively identify the behaviors of characters under diverse viewing angles, different lighting, complex backgrounds, and diverse behavior states, and can effectively identify behavioral states such as calling and smoking, and effectively eliminate possible aggressive behaviors. Overcome the insensitivity problem existing in the prior art in complex scenarios.
  • the neural network in the embodiment of the present disclosure includes multiple branch models, which can use the multi-branch fusion model to extract rich visual features, can focus on the behavior of the person in the image to be recognized, and obtain effective behavior expression and interpretation.
  • the embodiments of the present disclosure can meet real-time requirements, and can perform real-time calculations on the embedded terminal (mobile phone terminal, vehicle terminal), and thereby meet practical applications.
  • Fig. 7 is a schematic block diagram of an apparatus for behavior recognition in an embodiment of the present disclosure.
  • the device 20 shown in FIG. 7 includes: an acquisition module 210, an input module 220, and a determination module 230.
  • the obtaining module 210 may be used to obtain the image to be recognized.
  • the input module 220 may be used to input the to-be-recognized image acquired by the acquisition module 210 into a pre-trained neural network.
  • the acquisition module 210 can also be used to acquire the output of the neural network.
  • the output includes a first output, a second output, and a third output.
  • the first output represents the probability that the person in the image to be recognized is normal, and the second output represents the image The probability that the character is performing the first behavior in the middle, and the third output indicates the probability that the character in the image to be recognized is performing the second behavior;
  • the determining module 230 may be used to determine the behavior of the person in the image to be recognized according to the first output, the second output, and the third output.
  • the determining module 230 may be specifically configured to, according to at least one of the comparison result between the first output and the first threshold, the comparison result between the second output and the second threshold, and the comparison result between the third output and the third threshold, Determine the behavior of the person in the image to be recognized.
  • the determining module 230 may be specifically configured to: if the first output is greater than or equal to the first threshold, or if the first output is less than the first threshold and the second output is greater than the third output and the second output is less than the second threshold, Alternatively, if the first output is less than the first threshold, the second output is less than or equal to the third output, and the third output is less than the third threshold, it is determined that the behavior of the person in the image to be recognized is normal.
  • first output is less than the first threshold and the second output is greater than the third output and the second output is greater than or equal to the second threshold, it is determined that the person in the image to be recognized is performing the first behavior; if the first output is less than the first threshold and If the second output is less than or equal to the third output and the third output is greater than or equal to the third threshold, it is determined that the person in the image to be recognized is performing the second behavior.
  • the neural network includes a first branch model, a second branch model, and a third branch model, and the first branch model generates a first output, the second branch model generates a second output, and the third branch model generates a third output.
  • the neural network can be obtained by pre-selection training, please refer to the related descriptions of Figs. 2 to 5 above.
  • the device 20 shown in FIG. 7 can implement the behavior recognition method shown in FIG.
  • the embodiments of the present disclosure also provide another behavior recognition device, including a memory, a processor, and a computer program stored on the memory and running on the processor.
  • the processor implements the foregoing when the program is executed.
  • Figure 6 shows the steps of the method for behavior recognition.
  • the device 30 may include a memory 310 and a processor 320.
  • the memory 310 stores computer program codes for implementing corresponding steps in the method for behavior recognition according to an embodiment of the present disclosure.
  • the processor 320 is configured to run the computer program code stored in the memory 310 to execute the corresponding steps of the behavior recognition method according to the embodiment of the present disclosure.
  • the following steps are performed when the computer program code is run by the processor 320: obtaining an image to be recognized, and inputting the image to be recognized into a pre-trained neural network; obtaining the output of the neural network, so The output includes a first output, a second output, and a third output.
  • the first output represents the probability that the person in the image to be recognized is behaving normally, and the second output represents that the person in the image to be recognized is performing The probability of the first behavior, the third output represents the probability that the person in the image to be recognized is performing the second behavior; the first output, the second output, and the third output are used to determine the to-be-recognized The behavior of the characters in the image.
  • an embodiment of the present disclosure also provides an electronic device.
  • the electronic device may be the electronic device 10 shown in FIG. 1, or the electronic device may include the behavior recognition apparatus shown in FIG. 7 or FIG. 8.
  • the electronic device can implement the behavior recognition method shown in FIG. 6 described above.
  • the electronic device may be a mobile terminal, and the mobile terminal may include an image acquisition device and the behavior recognition device shown in FIG. 7 or FIG. 8.
  • the mobile terminal may be a smart phone, or may be an in-vehicle device or the like.
  • the mobile terminal can be installed inside the vehicle with its image acquisition device facing the driver, such as behind or to the side of the steering wheel, etc., so that the mobile terminal can collect the driver's video stream data or data through its image acquisition device. Image data, and use the method shown in Figure 6 to determine the driver’s behavior in real time.
  • the mobile terminal can send out warning information in real time to remind the driver to correct his behavior in time, thereby ensuring Driving safety.
  • the embodiments of the present disclosure also provide a computer storage medium on which a computer program is stored.
  • the computer program is executed by the processor, the steps of the behavior recognition method shown in FIG. 6 can be implemented.
  • the computer storage medium is a computer-readable storage medium.
  • the computer or processor executes the following steps: obtain an image to be recognized, and input the image to be recognized into a pre-trained neural network; obtain The output of the neural network, the output includes a first output, a second output, and a third output, wherein the first output represents the probability that the character in the image to be recognized is normal, and the second output represents the The probability that the person in the image to be recognized is performing the first behavior, the third output represents the probability that the person in the image to be recognized is performing the second behavior; according to the first output, the second output, and the The third output determines the behavior of the person in the image to be recognized.
  • the computer storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory ( CD-ROM), USB memory, or any combination of the above storage media.
  • the computer-readable storage medium may be any combination of one or more computer-readable storage media.
  • embodiments of the present disclosure also provide a computer program product, which contains instructions, which when executed by a computer, cause the computer to execute the steps of the behavior recognition method shown in FIG. 6.
  • the embodiments of the present disclosure provide a method, device, and computer storage medium for behavior recognition, which can use a pre-trained neural network to determine the behavior of a person in an image to be recognized.
  • the neural network includes multiple branches.
  • the model can extract rich visual features and focus on the specific behavior of the person in the image to be recognized, so that the accuracy of behavior recognition is higher.
  • the behavior recognition method of the embodiments of the present disclosure can meet real-time requirements, can perform real-time calculations, and further meet the recognition requirements of various application fields.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another device, or some features can be ignored or not implemented.
  • the various component embodiments of the present disclosure may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor may be used in practice to implement some or all of the functions of some modules in the article analysis device according to the embodiments of the present disclosure.
  • the present disclosure can also be implemented as a device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • a program for realizing the present disclosure may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例公开了一种行为识别的方法、装置及计算机存储介质。该方法包括:获取待识别图像,并将其输入至预先训练好的神经网络;获取所述神经网络的输出,包括表示待识别图像中人物行为正常的概率的第一输出、表示该人物正在进行第一行为的概率的第二输出、表示该人物正在进行第二行为的概率的第三输出;根据输出确定该人物的行为。可见,本公开实施例能够利用预先训练好的神经网络确定待识别图像中人物的行为,具体地,该神经网络能够提取丰富的视觉特征,能够关注待识别图像中的人物的特定行为,使得行为识别的准确率更高。并且,本公开实施例的行为识别的方法能够满足实时性要求,能够进行实时运算,进而满足各种不同的应用领域的识别要求。 (图6)

Description

行为识别的方法、装置及计算机存储介质
相关申请的交叉引用
本申请要求于2020年03月12日提交的申请号为202010173400.8、名称为“行为识别的方法、装置及计算机存储介质”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。
技术领域
本公开实施例涉及图像处理领域,并且更具体地,涉及一种行为识别的方法、装置及计算机存储介质。
背景技术
行为识别可以是指对人的行为进行分析和识别的一种技术。例如,可以对图像(例如,视频帧数据)中人的行为进行分析和识别,这种技术能够从视频序列中抽取出相关的帧数据,并提取出丰富的视觉特征,从而对人的行为进行表达和解释。
在人工智能(AI)领域,行为识别在视频监控、自动驾驶等诸多方面有着非常广泛的应用。例如在驾驶领域,由于司机出现的分心行为造成的交通事故非常多,为了降低事故发生率,可以通过行为识别技术来对司机的分心行为进行检测,并对司机出现的如吸烟、打电话、喝水等会影响到正常驾驶状态的行为及时制止。
然而,由于多种因素的影响,目前行为识别的精度和鲁棒性在应用领域无法完全满足要求。
发明内容
本公开提供了一种行为识别的方法、装置及计算机存储介质,具有较高的识别精度,且能够满足在特定的应用领域的识别要求。
根据本公开的第一方面,提供了一种行为识别的方法,包括:
获取待识别图像,并将所述待识别图像输入至预先训练好的神经网络;
获取所述神经网络的输出,所述输出包括第一输出、第二输出和第三输出,其中,所述第一输出表示所述待识别图像中人物行为正常的概率,所述第二输出表示所述待识别图像中人物正在进行第一行为的概率,所述第三输出表示所述待识别图像中人物正在进行第二行为的概率;
根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为。
在一种实现方式中,所述根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为,包括:
根据所述第一输出与第一阈值的比较结果、所述第二输出与第二阈值的比较结果、所述 第三输出与第三阈值的比较结果中的至少一项,确定所述待识别图像中的人物的行为。
在一种实现方式中,所述根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为,包括:
若所述第一输出大于或等于第一阈值,或者,若所述第一输出小于所述第一阈值且所述第二输出大于所述第三输出且所述第二输出小于第二阈值,或者,若所述第一输出小于所述第一阈值且所述第二输出小于或等于所述第三输出且所述第三输出小于第三阈值,则确定所述待识别图像中的人物的行为正常;
若所述第一输出小于所述第一阈值且所述第二输出大于所述第三输出且所述第二输出大于或等于所述第二阈值,则确定所述待识别图像中的人物正在进行所述第一行为;
若所述第一输出小于所述第一阈值且所述第二输出小于或等于所述第三输出且所述第三输出大于或等于所述第三阈值,则确定所述待识别图像中的人物正在进行所述第二行为。
在一种实现方式中,所述神经网络包括第一分支模型、第二分支模型和第三分支模型,且所述第一分支模型产生所述第一输出,所述第二分支模型产生所述第二输出,所述第三分支模型产生所述第三输出。
在一种实现方式中,所述第二分支模型为吸烟行为识别模型,所述第三分支模型为打电话行为识别模型。
在一种实现方式中,所述第一分支模型的第一卷积层的输出与所述第二分支模型的第二卷积层的输出进行融合作为所述第二分支模型的第三卷积层的输入。所述第一分支模型的第一卷积层的输出与所述第三分支模型的第四卷积层的输出进行融合作为所述第三分支模型的第五卷积层的输入。
在一种实现方式中,所述神经网络是基于训练数据集,通过训练得到的。
在一种实现方式中,通过如下方式构建所述训练数据集:
获取N个原始数据,其中每个原始数据都包括人像区域;
针对每一个原始数据:分割出其中的人像区域,在所述人像区域之外的其他区域分别添加多个噪声,生成多个训练数据;
针对N个原始数据的多个训练数据的集合即为所述训练数据集,所述训练数据集包括M个训练数据,其中,M大于N且均为正整数。
在一种实现方式中,在训练所述神经网络时,针对所述训练数据集中的训练数据:
对所述训练数据进行数据增强处理,得到增强后的数据;
对所述增强后的数据进行人脸下半部分裁剪,得到裁剪后的数据;
对所述增强后的数据进行边缘检测,得到边缘检测后的数据;
将所述增强后的数据作为所述神经网络的所述第一分支模型的输入;
将所述裁剪后的数据作为所述神经网络的所述第二分支模型的输入;
将所述边缘检测后的数据与所述增强后的数据进行融合之后,作为所述神经网络的所述第三分支模型的输入。
在一种实现方式中,所述数据增强处理包括如下至少一项:镜像、亮度变化、随机裁剪。
在一种实现方式中,在训练所述神经网络时,通过设定单次迭代时的数据量、总的迭代次数、学习率衰减策略来控制所述训练过程的终止。
根据本公开的第二方面,提供了一种行为识别的装置,所述装置用于实现前述第一方面或任一实现方式所述方法的步骤,所述装置包括:
获取模块,用于获取待识别图像;
输入模块,用于将所述待识别图像输入至预先训练好的神经网络;
所述获取模块,还用于获取所述神经网络的输出,所述输出包括第一输出、第二输出和第三输出,其中,所述第一输出表示所述待识别图像中人物行为正常的概率,所述第二输出表示所述待识别图像中人物正在进行第一行为的概率,所述第三输出表示所述待识别图像中人物正在进行第二行为的概率;
确定模块,用于根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为。
根据本公开的第三方面,提供了一种行为识别的装置,包括存储器、处理器及存储在所述存储器上且在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现前述第一方面或任一实现方式所述方法的步骤。
根据本公开的第四方面,提供了一种计算机存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现前述第一方面或任一实现方式所述方法的步骤。
由此可见,本公开实施例能够利用预先训练好的神经网络确定待识别图像中人物的行为,具体地,该神经网络包括多个分支模型,能够提取丰富的视觉特征,能够重点关注待识别图像中的人物的特定行为,从而使得行为识别的准确率更高。并且,本公开实施例的行为识别的方法能够满足实时性要求,能够进行实时运算,进而满足各种不同的应用领域的识别要求。
附图说明
图1是本公开实施例的电子设备的一个示意性框图;
图2是本公开实施例的神经网络的训练过程的一个示意图;
图3是本公开实施例的神经网络的网络结构的一个示意图;
图4是本公开实施例的卷积块的结构的一个示意图;
图5是本公开实施例的根据输出确定人物的行为的一个示意图;
图6是本公开实施例的行为识别的方法的一个示意性流程图;
图7是本公开实施例的行为识别的装置的一个示意性框图;
图8是本公开实施例的行为识别的装置的另一个示意性框图。
具体实施方式
为了使得本公开的目的、技术方案和优点更为明显,下面将参照附图详细描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全 部实施例,应理解,本公开不受这里描述的示例实施例的限制。基于本公开中描述的本公开实施例,本领域技术人员在没有付出创造性劳动的情况下所得到的所有其它实施例都应落入本公开的保护范围之内。
近年来行为识别技术的应用越来越广泛,基础研究也得到了非常迅速的发展,但是行为识别仍然是一项非常具有挑战性的任务。由于光照条件多样性、视角多样性、背景复杂性、行为状态多样性等诸多因素的存在,使得行为识别的精度和鲁棒性在应用领域并没有完全满足要求。
本公开实施例可以应用于电子设备,图1所示为本公开实施例的电子设备的一个示意性框图。图1所示的电子设备10包括一个或更多个处理器102、一个或更多个存储装置104、输入装置106、输出装置108、图像传感器110以及一个或更多个非图像传感器114,这些组件通过总线系统112和/或其它形式互连。应当注意,图1所示的电子设备10的组件和结构只是示例性的,而非限制性的,根据需要,所述电子设备也可以具有其他组件和结构。
所述处理器102可以包括中央处理单元(Central Processing Unit,CPU)1021和图形处理单元(Graphics Processing Unit,GPU)1022或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,例如现场可编程门阵列(Field-Programmable Gate Array,FPGA)或进阶精简指令集机器(Advanced RISC(Reduced Instruction Set Computer)Machine,ARM)等,并且处理器102可以控制所述电子设备10中的其它组件以执行期望的功能。
所述存储装置104可以包括一个或更多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器1041和/或非易失性存储器1042。所述易失性存储器1041例如可以包括随机存取存储器(Random Access Memory,RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器1042例如可以包括只读存储器(Read-Only Memory,ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或更多个计算机程序指令,处理器102可以运行所述程序指令,以实现各种期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。
所述输入装置106可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或更多个。
所述输出装置108可以向外部(例如用户)输出各种信息(例如图像或声音),并且可以包括显示器、扬声器等中的一个或更多个。
所述图像传感器110可以拍摄用户期望的图像(例如照片、视频等),并且将所拍摄的图像存储在所述存储装置104中以供其它组件使用。
当注意,图1所示的电子设备10的组件和结构只是示例性的,尽管图1示出的电子设备10包括多个不同的装置,但是根据需要,其中的一些装置可以不是必须的,其中的一些装置的数量可以更多等等,本公开对此不限定。
本公开实施例提供了一种神经网络,该神经网络能够用于识别人物的行为。该神经网络至少包括三个分支模型,分别为第一分支模型、第二分支模型和第三分支模型。其中,第一 分支模型可以为主分支模型,例如可以利用该第一分支模型对输入该神经网络的图像进行主要识别。第二分支模型可以主要用于识别第一行为,例如第一异常行为。第三分支模型可以主要用于识别第二行为,例如第二异常行为。
作为一例,第一行为可以为吸烟行为,第二行为可以为打电话行为。相应地,本公开实施例中,可以将三个分支模型分别称为:主分支模型、吸烟分支模型和打电话分支模型。
可选地,该神经网络还可以包括第四分支模型,可以用于识别第三行为,例如第三异常行为。也就是说,本公开实施例中的神经网络除了包括第一分支模型之外,还可以包括任意数量的分支模型,用于识别相应数量的行为。可选地,第三行为可以为喝水行为。
为了描述的方便,本公开后续实施例以该神经网络为三分支模型的网络为例进行详细阐述。
本公开实施例中的神经网络可以是基于训练数据集,通过训练得到的。
示例性地,可以通过如下方式构建所述训练数据集:获取N个原始数据,其中每个原始数据都包括人像区域;针对每一个原始数据:分割出其中的人像区域,在人像区域之外的其他区域分别添加多个噪声,生成多个训练数据;针对N个原始数据的多个训练数据的集合即为训练数据集,该训练数据集包括M个训练数据,其中,M大于N且均为正整数。
具体地,获取N个原始数据可以包括:通过对视频流数据进行抽帧,获取到原始数据集,其包括足够的数据。如果该神经网络将用于驾驶领域,对驾驶员的行为进行识别,那么该原始数据集可以包含正常驾驶、抽烟、打电话三个类的数据。随后可以对该原始数据集进行人工筛选或脚本筛选,将其中不包含人脸、严重模糊等情况的错误样本删除掉,最终得到N个原始数据。示例性地,该过程也可以理解为是数据获取并清洗的过程。
可理解,由于已经将不包含人脸的样本剔除掉,因此,N个原始数据中每个原始数据均包含人像区域。
可选地,作为一例,可以类似地构建验证数据集(例如,可以表示为V),其中可以包括N1个验证数据。或者,可选地,作为另一例,在将原始数据集筛选之后,可以将其中的N个原始数据用于训练,将其他的N个数据作为验证数据集。可理解,验证数据集用于对训练好的神经网络进行验证,用于判断神经网络模型的好坏。
具体地,可以基于N个原始数据得到M个训练数据,包括:首先采用分割模型分割出人像区域,并在非人像区域添加噪声,从而得到训练数据集,数据量为M。示例性地,该过程可以在将数据送入待训练的神经网络之前进行,因此也可以称为是数据线下增强的过程。
例如,可以在一个原始数据的非人像区域(即人像区域以外的其他区域)添加不同的噪声,如通过添加p个不同的噪声,基于一个原始数据得到p个训练数据。在一种实现方式中,M为N的整数倍,例如,M=p×N。
本公开实施例中,在得到M个训练数据时,可以在非人像区域添加噪声,如此能够在一定程度上去除掉背景的干扰,从而使得训练过程更快地收敛,且保证训练后的神经网络的精度更高。并且可理解,通过在非人像区域添加噪声,使得得到的神经网络能够针对复杂背景下的人物的行为进行识别,消除了复杂背景的干扰。
在对神经网络进行训练时,可以将训练数据集(如上,包括M个训练数据)送到待训练的神经网络中。随后,可以基于该训练数据集生成输入至神经网络的各个不同的分支模型的输入数据。
具体地,可以对训练数据进行数据增强处理,得到增强后的数据;对增强后的数据进行人脸下半部分裁剪,得到裁剪后的数据;对增强后的数据进行边缘检测,得到边缘检测后的数据;将增强后的数据作为第一分支模型的输入;将裁剪后的数据作为第二分支模型的输入;将边缘检测后的数据与增强后的数据进行融合之后,作为第三分支模型的输入。
示例性地,参照图2,可以对训练数据2001进行数据增强处理,得到增强后的数据,如图2中示出为“数据输入”2002。其中,数据增强处理可以包括镜像、亮度变化、随机裁剪中的至少一项,或者,数据增强处理也可以包括其他类型的处理,这里不再一一罗列。可理解,通过进行数据增强处理,能够消除视角多样性、不同光照等的影响。
增强后的数据(即“数据输入”)可以作为第一分支模型的输入,也就是说,图2中所示出的数据输入2002可以直观地理解为是第一分支模型2005的输入。可选地,第一分支模型2005可以称为主分支模型。
第二分支模型2006可以为吸烟分支模型,该分支模型主要关注吸烟的行为,因此可以只重点关注人的嘴部周围的行为。相应地,可以对增强后的数据进行人脸下半部分裁剪(如图2中的区域裁剪203),这样,通过裁剪可以得到吸烟分支所关注的嘴部周围的图像。
第三分支模型2007可以为打电话分支模型,该分支模型主要关注打电话行为,因此可以只重点关注手机等对象。由于手机属于边缘凸显的刚体,因此可以通过边缘检测方法204检测有效边缘信息,得到边缘检测后的数据,随后再与增强后的数据进行融合之后,作为第三分支模型2007的输入。
其中,可以通过sobel算子做边缘检测,或者也可以采用其他的方式进行边缘检测,本公开对此不限定。其中,融合可以包括concatenate操作(简称Concat),即将边缘检测后的数据与增强后的数据的特征进行结合,从而能够确保第三分支模型重点关注打电话行为区域。
然后用处理好的数据进行模型训练2008,训练三分支模型,训练完成后,进行模型测试和部署2009。
可见,本公开实施例中,考虑到行为状态的多样性,有针对性地设计多个不同的分支模型,能够更有针对性地对人物的行为进行分析和识别。
本公开实施例中的神经网络的三分支模型可以具有如图3所示的网络结构,应当注意的是,图3所示的网络结构仅是示意性的,不应当将其作为对神经网络的结构的限制。
图3中,IP1表示输入至第一分支模型的数据,具体地为图2中的增强后的数据。IP2表示输入至第二分支模型的数据,具体地为图2中的区域裁剪后的数据,可选地,IP2的大小可以为IP1的一半。IP3表示边缘检测后的数据,其与IP1进行特征融合之后,输入至第三分支模型。
图3中的卷积层包括常规的卷积层(Convolution,简记为CONV)以及卷积组(Convolution  Group,简记为CONVG)。Convolution Group为一个带有残差结构的包含多个卷积块(Conv Block)的组结构,其中采用Pool(池化)为最大值池化(Max Pooling)来进行降维。其中,卷积块的结构可以如图4所示,包括通道分割(Channel Split)401、多个卷积算子、特征融合(Concat)406以及通道重排(Channel Shuffle)407。其中,多个卷积算子包括1×1卷积402、3×3DW卷积403、1×1卷积404以及SE(Squeeze-and-Excitation)405,并且还包括批归一化(batch-normalization,BN)和批归一化修正线性单元(batch-normalization Rectified Linear Unit,BN ReLU)。具体地点,关于卷积块可以参见已有神经网络的结构中卷积块的相关描述,为避免重复,这里不再赘述。
另外,对于第二分支模型和第三分支模型,在卷积CONV2和CONV4处理之后,在卷积CONV3和CONV5处理之前,还进行尺寸变换(resize)和特征融合(cat)操作,其中,在图3中未示出resize操作。
示例性地,第一分支模型的第一卷积层CONV1的输出与第二分支模型的第二卷积层CONV2的输出进行融合作为第二分支模型的第三卷积层CONV3的输入。第一分支模型的第一卷积层CONV1的输出与第三分支模型的第四卷积层CONV4的输出进行融合作为第三分支模型的第五卷积层CONV5的输入。结合图3,第一分支模型的第一卷积层表示为CONV1,第二分支模型的第二卷积层和第三卷积层分别表示为CONV2和CONV3,第三分支模型的第四卷积层和第五卷积层分别表示为CONV4和CONV5。图3中,CONV1的输出与CONV2的输出进行融合(cat)再输入至CONV3,CONV1的输出与CONV4的输出进行融合(cat)再输入至CONV5。
应当理解的是,第一卷积层(CONV1)可以是指第一分支模型所包括的多个卷积层中的一个,示例性地可以为4块(blocks)卷积组。并且,第一分支模型中位于第一卷积层之后的卷积层(如图3中的CONV6)可以是6块(blocks)卷积组,再其后(如图3中的CONV7)是4块卷积组。
第二卷积层(CONV2)可以是第二分支模型所包括的多个卷积层中的一个,第三卷积层(CONV3)可以是第二分支模型包括的多个卷积层中的位于第二卷积层之后的卷积层,示例性地,第三卷积层(CONV3)可以为6块(blocks)卷积组。
类似地,第四卷积层(CONV4)可以是第三分支模型所包括的多个卷积层中的一个,第五卷积层(CONV5)可以是第三分支模型包括的多个卷积层中的位于第四卷积层之后的卷积层,示例性地,第五卷积层(CONV5)可以为6块(blocks)卷积组。
在一个实施例中,第一分支模型的数据依次经过卷积层CONV0、卷积组层CONV1、卷积组层CONV6、卷积组层CONV7、卷积层CONV8和全连接层FC1的处理。
在一个实施例中,第二分支模型的数据依次经过卷积层CONV2、卷积组层CONV3和全连接层FC_S的处理。
在一个实施例中,第三分支模型的数据依次经过卷积层CONV4、卷积组层CONV5和全连接层FC_P的处理。
图3中每个分支模型的全连接层(Fully Connect,FC)之后,还可以包括一个softmax (分类模型)。参见图3,第一分支模型的softmax输出正常行为与异常行为的概率。第二分支模型(如吸烟分支模型)的softmax输出第一行为(如吸烟行为)和非第一行为(如非吸烟行为)的概率。第三分支模型(如打电话分支模型)的softmax输出第二行为(如打电话行为)和非第二行为(如非打电话行为)的概率。
示例性地,在训练神经网络时,可以通过设定单次迭代时的数据量、总的迭代次数、学习率衰减策略来控制训练过程的终止。具体地,可以设置初始学习率为η,batch(每次迭代送入模型的数据的个数)的大小为B,总的迭代(epoch)次数为E,学习率衰减策略为每隔K个epoch减小十倍的衰减。达到总的epoch后,模型训练终止。并且在训练过程中可以根据在验证集V上的模型验证正确率,隔I次迭代对最好的验证正确率的模型进行保存,最终得到最好的模型,进行实际测试,并进行部署。
参照图3,神经网络的三个分支模型将得到三个softmax的输出,假设所得到的是:第一分支模型输出的正常行为的概率为Pn,第二分支模型输出的第一行为的概率为Ps,第三分支模型输出的第二行为的概率为Pc。那么,在对神经网络的训练完成之后,在测试阶段,可以对这三个输出的概率进行组合以确定人物的行为。示例性地,可以设定与三个分支模型分别对应的阈值th1、th2、th3,并且可以按照如图5所示的过程来确定人物的行为为正常行为或第一行为或第二行为。
如图5所示,在步骤S501、判断Pn是否大于等于th1,如果是,则输出正常行为,否则,继续执行步骤S502;
在步骤S502、判断Ps是否大于等于Pc,如果是,则继续执行步骤S503,否则,继续执行步骤S504;
在步骤S503、判断Ps是否大于等于th2,如果是,则输出第一行为,否则,输出正常行为;
在步骤S504、判断Pc是否大于等于th3,如果是,则输出第二行为,否则,输出正常行为。
可选地,第一行为和第二行为可以分别为第一异常行为和第二异常行为,例如分别为吸烟行为和打电话行为。
这样,通过本公开实施例,能够得到用于进行行为识别的神经网络,该神经网络包括多个分支模型,如上面描述的三个分支模型。这样的多分支模型的神经网络能够重点关注到特定行为的行为区域,从而使得对行为的识别准确率更高。进一步地,可以将训练得到的神经网络进行部署,从而应用于特定的领域,并且能够满足在特定的应用领域的识别要求。
并且,本公开实施例利用深度学习模型的复杂建模能力,使得训练得到的神经网络对行为的识别的准确率更高。以驾驶领域为例,假设正常行为为正常驾驶行为,第一行为为吸烟行为,第二行为为打电话行为,下面的表1示出了本公开实施例的包括三分支模型的神经网络与现有技术中传统的单模型进行行为识别的准确率的比较,可以看出,本公开实施例中的神经网络的行为识别的准确率更高。
表1
Figure PCTCN2020119735-appb-000001
图6是本公开实施例的一种行为识别的方法的示意性流程图,图6所示的方法可以由如图1所示的设备10执行,或者更具体地由处理器102执行,图6所示的方法可以包括:
S110,获取待识别图像,并将所述待识别图像输入至预先训练好的神经网络;
S120,获取所述神经网络的输出,所述输出包括第一输出、第二输出和第三输出,其中,所述第一输出表示所述待识别图像中人物行为正常的概率,所述第二输出表示所述待识别图像中人物正在进行第一行为的概率,所述第三输出表示所述待识别图像中人物正在进行第二行为的概率;
S130,根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为。
可理解,该过程可以用于对待识别图像中的人物的行为进行识别。图6中提到的神经网络可以是如上结合图2至图5所描述的神经网络,对该神经网络的训练过程可以参见上述的相关描述。
图6所示的方法对应用场景不作限定,例如,可以应用于视频监控、自动驾驶等领域。假设图6所示的方法应用于驾驶领域,相应地,人物的正常行为可以表示正常驾驶行为,第一行为可以是第一异常行为,如吸烟行为,第二行为可以是第二异常行为,如打电话行为。
作为一例,待识别图像可以是实时采集的一张图像或者可以是实时采集的视频流中的一帧图像;待识别图像可以是预先存储的一张图像或者也可以是预先存储的视频流中的一帧图像。
作为一例,S110中的获取待识别图像可以包括提取视频流中的一帧图像。
示例性地,神经网络可以包括第一分支模型、第二分支模型和第三分支模型,且第一分支模型产生第一输出,第二分支模型产生第二输出,第三分支模型产生第三输出。假设图6所示的方法应用于驾驶领域,那么作为一例,第一输出可以表示待识别图像中人物(即司机)正常驾驶的概率,第二输出表示待识别图像中人物正在进行第一异常行为(如抽烟行为)的概率,第三输出表示待识别图像中人物正在进行第二异常行为(如打电话行为)的概率。相应地,可以将第一分支模型称为主分支模型,第二分支模型称为吸烟行为识别模型,第三分支模型称为打电话行为识别模型。
本公开实施例在进行行人识别时,只需将待识别图像输入至神经网络,便可以得到第一输出、第二输出和第三输出,进而能够通过S130确定人物的行为,该操作简单方便,不包括繁琐的步骤,用户体验高。
示例性地,S130可以包括:根据第一输出与第一阈值的比较结果、第二输出与第二阈值的比较结果、第三输出与第三阈值的比较结果中的至少一项,确定待识别图像中的人物的行为。其中,第一阈值、第二阈值和第三阈值是预先设定的,例如可以根据应用场景、精度需求等预先设定,并且可理解,第一阈值、第二阈值和第三阈值均是0至1之间的值,且,这三个值两两之间可以相等或不相等,本公开对此不限定。
示例性地,S130可以包括:若第一输出大于或等于第一阈值,或者,若第一输出小于第一阈值且第二输出大于第三输出且第二输出小于第二阈值,或者,若第一输出小于第一阈值且第二输出小于或等于第三输出且第三输出小于第三阈值,则确定待识别图像中的人物的行为正常。若第一输出小于第一阈值且第二输出大于第三输出且第二输出大于或等于第二阈值,则确定待识别图像中的人物正在进行第一行为。若第一输出小于第一阈值且第二输出小于或等于第三输出且第三输出大于或等于第三阈值,则确定待识别图像中的人物正在进行第二行为。
为了更直观地描述S130,可以假设将第一输出、第二输出和第三输出依次表示为Pn、Ps和Pc。将第一阈值、第二阈值和第三阈值依次表示为th1、th2和th3。
结合图5,若(1)Pn≥th1,或(2)Pn<th1且Ps>Pc且Ps<th2,或(3)Pn<th1且Ps<Pc且Pc<th3,则可以确定人物的行为为正常行为,如在驾驶领域,为正常驾驶行为。
结合图5,若Pn<th1且Ps>Pc且Ps≥th2,则可以确定人物的行为为第一行为,如在驾驶领域,为第一异常行为(如吸烟行为)。
结合图5,若Pn<th1且Ps≤Pc且Pc≥th3,则可以确定人物的行为为第二行为,如在驾驶领域,为第二异常行为(如打电话行为)。
可见,本公开实施例可以有效地针对视角多样性、不同光照、复杂背景、行为状态多样性下人物的行为进行识别,可有效识别打电话、吸烟等行为状态,并有效排除可能的攻击行为,克服现有技术中存在的在复杂场景下的不敏感问题。
本公开实施例中的神经网络包括多个分支模型,能够利用多分支融合模型来提取丰富的视觉特征,能够重点关注待识别图像中的人物的行为,并得到有效的行为表达和解释。
另外,本公开实施例能够满足实时性要求,能够在嵌入式端(手机端,车机端)进行实时运算,进而满足实际应用。
图7是本公开实施例的行为识别的装置的一个示意性框图。图7所示的装置20包括:获取模块210、输入模块220和确定模块230。
获取模块210可以用于获取待识别图像。
输入模块220可以用于将获取模块210获取的待识别图像输入至预先训练好的神经网络。
获取模块210还可以用于获取神经网络的输出,输出包括第一输出、第二输出和第三输出,其中,第一输出表示待识别图像中人物行为正常的概率,第二输出表示待识别图像中人物正在进行第一行为的概率,第三输出表示待识别图像中人物正在进行第二行为的概率;
确定模块230可以用于根据第一输出、第二输出和第三输出确定待识别图像中的人物的 行为。
示例性地,确定模块230可以具体用于根据第一输出与第一阈值的比较结果、第二输出与第二阈值的比较结果、第三输出与第三阈值的比较结果中的至少一项,确定待识别图像中的人物的行为。
示例性地,确定模块230可以具体用于:若第一输出大于或等于第一阈值,或者,若第一输出小于第一阈值且第二输出大于第三输出且第二输出小于第二阈值,或者,若第一输出小于第一阈值且第二输出小于或等于第三输出且第三输出小于第三阈值,则确定待识别图像中的人物的行为正常。若第一输出小于第一阈值且第二输出大于第三输出且第二输出大于或等于第二阈值,则确定待识别图像中的人物正在进行第一行为;若第一输出小于第一阈值且第二输出小于或等于第三输出且第三输出大于或等于第三阈值,则确定待识别图像中的人物正在进行第二行为。
示例性地,神经网络包括第一分支模型、第二分支模型和第三分支模型,且第一分支模型产生第一输出,第二分支模型产生第二输出,第三分支模型产生第三输出。其中,该神经网络可以预选训练得到,参见前述图2至图5的相关描述。
图7所示的装置20能够实现前述图6所示的行为识别的方法,为避免重复,这里不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
另外,本公开实施例还提供了另一种行为识别的装置,包括存储器、处理器及存储在所述存储器上且在所述处理器上运行的计算机程序,处理器执行所述程序时实现前述图6所示的行为识别的方法的步骤。
如图8所示,该装置30可以包括存储器310和处理器320。
存储器310存储用于实现根据本公开实施例的行为识别的方法中的相应步骤的计算机程序代码。
处理器320用于运行存储器310中存储的计算机程序代码,以执行根据本公开实施例的行为识别的方法的相应步骤。
示例性地,在所述计算机程序代码被处理器320运行时执行以下步骤:获取待识别图像,并将所述待识别图像输入至预先训练好的神经网络;获取所述神经网络的输出,所述输出包括第一输出、第二输出和第三输出,其中,所述第一输出表示所述待识别图像中人物行为正常的概率,所述第二输出表示所述待识别图像中人物正在进行第一行为的概率,所述第三输出表示所述待识别图像中人物正在进行第二行为的概率;根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为。
另外,本公开实施例还提供了一种电子设备,该电子设备可以为图1所示的电子设备10, 或者,该电子设备可以包括图7或图8所示的行为识别的装置。该电子设备可以实现前述图6所示的行为识别的方法。
其中,该电子设备可以为移动终端,该移动终端可以包括图像采集装置,以及图7或图8所示的行为识别的装置。例如,该移动终端可以为智能电话,或者可以为车载设备等等。
举例来说,该移动终端可以设置于车辆内部并使得其图像采集装置朝向驾驶员,例如方向盘的后方或侧方等,这样该移动终端可以通过其图像采集装置采集到驾驶员的视频流数据或图像数据,并采用如图6所示的方法实时地确定驾驶员的行为。可选地,如果通过识别确定驾驶员正在进行异常行为,如吸烟或打电话等可能影响安全驾驶的行为,那么该移动终端可以实时地发出告警信息,以便提醒驾驶员及时更正其行为,进而保障驾驶安全。
另外,本公开实施例还提供了一种计算机存储介质,其上存储有计算机程序。当所述计算机程序由处理器执行时,可以实现前述图6所示的行为识别的方法的步骤。例如,该计算机存储介质为计算机可读存储介质。
在一个实施例中,所述计算机程序指令在被计算机或处理器运行时使计算机或处理器执行以下步骤:获取待识别图像,并将所述待识别图像输入至预先训练好的神经网络;获取所述神经网络的输出,所述输出包括第一输出、第二输出和第三输出,其中,所述第一输出表示所述待识别图像中人物行为正常的概率,所述第二输出表示所述待识别图像中人物正在进行第一行为的概率,所述第三输出表示所述待识别图像中人物正在进行第二行为的概率;根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为。
计算机存储介质例如可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、或者上述存储介质的任意组合。计算机可读存储介质可以是一个或多个计算机可读存储介质的任意组合。
另外,本公开实施例还提供了一种计算机程序产品,其包含指令,当该指令被计算机所执行时,使得计算机执行上述图6中所示的行为识别的方法的步骤。
由此可见,本公开实施例提供了一种行为识别的方法、装置及计算机存储介质,能够利用预先训练好的神经网络确定待识别图像中人物的行为,具体地,该神经网络包括多个分支模型,能够提取丰富的视觉特征,能够重点关注待识别图像中的人物的特定行为,从而使得行为识别的准确率更高。并且,本公开实施例的行为识别的方法能够满足实时性要求,能够进行实时运算,进而满足各种不同的应用领域的识别要求。
尽管这里已经参考附图描述了示例实施例,应理解上述示例实施例仅仅是示例性的,并且不意图将本公开的范围限制于此。本领域普通技术人员可以在其中进行各种改变和修改,而不偏离本公开的范围和精神。所有这些改变和修改意在被包括在所附权利要求所要求的本公开的范围之内。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每 个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个设备,或一些特征可以忽略,或不执行。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本公开的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个公开方面中的一个或多个,在对本公开的示例性实施例的描述中,本公开的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该本公开的方法解释成反映如下意图:即所要求保护的本公开要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如相应的权利要求书所反映的那样,其公开点在于可以用少于某个公开的单个实施例的所有特征的特征来解决相应的技术问题。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本公开的单独实施例。
本领域的技术人员可以理解,除了特征之间相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本公开的范围之内并且形成不同的实施例。例如,在权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本公开的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(Digital Signal Processing,DSP)来实现根据本公开实施例的物品分析设备中的一些模块的一些或者全部功能。本公开还可以实现为用于执行这里所描述的方法的一部分或者全部的装置程序(例如,计算机程序和计算机程序产品)。这样的实现本公开的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
应该注意的是上述实施例对本公开进行说明而不是对本公开进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本公开可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举 了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
以上所述,仅为本公开的具体实施方式或对具体实施方式的说明,本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。本公开的保护范围应以权利要求的保护范围为准。

Claims (14)

  1. 一种行为识别的方法,其中,所述方法包括:
    获取待识别图像,并将所述待识别图像输入至预先训练好的神经网络;
    获取所述神经网络的输出,所述输出包括第一输出、第二输出和第三输出,其中,所述第一输出表示所述待识别图像中人物行为正常的概率,所述第二输出表示所述待识别图像中人物正在进行第一行为的概率,所述第三输出表示所述待识别图像中人物正在进行第二行为的概率;
    根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为。
  2. 根据权利要求1所述的方法,其中,所述根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为,包括:
    根据所述第一输出与第一阈值的比较结果、所述第二输出与第二阈值的比较结果、所述第三输出与第三阈值的比较结果中的至少一项,确定所述待识别图像中的人物的行为。
  3. 根据权利要求1或2所述的方法,其中,所述根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为,包括:
    若所述第一输出大于或等于第一阈值,或者,若所述第一输出小于所述第一阈值且所述第二输出大于所述第三输出且所述第二输出小于第二阈值,或者,若所述第一输出小于所述第一阈值且所述第二输出小于或等于所述第三输出且所述第三输出小于第三阈值,则确定所述待识别图像中的人物的行为正常;
    若所述第一输出小于所述第一阈值且所述第二输出大于所述第三输出且所述第二输出大于或等于所述第二阈值,则确定所述待识别图像中的人物正在进行所述第一行为;
    若所述第一输出小于所述第一阈值且所述第二输出小于或等于所述第三输出且所述第三输出大于或等于所述第三阈值,则确定所述待识别图像中的人物正在进行所述第二行为。
  4. 根据权利要求1至3中任一项所述的方法,其中,所述神经网络包括第一分支模型、第二分支模型和第三分支模型,且所述第一分支模型产生所述第一输出,所述第二分支模型产生所述第二输出,所述第三分支模型产生所述第三输出。
  5. 根据权利要求4所述的方法,其中,所述第二分支模型为吸烟行为识别模型,所述第三分支模型为打电话行为识别模型。
  6. 根据权利要求5所述的方法,其中,
    所述第一分支模型的第一卷积层的输出与所述第二分支模型的第二卷积层的输出进行融合作为所述第二分支模型的第三卷积层的输入,
    所述第一分支模型的第一卷积层的输出与所述第三分支模型的第四卷积层的输出进行融合作为所述第三分支模型的第五卷积层的输入。
  7. 根据权利要求5或6所述的方法,其中,所述神经网络是基于训练数据集,通过训练得到的。
  8. 根据权利要求7所述的方法,其中,通过如下方式构建所述训练数据集:
    获取N个原始数据,其中每个原始数据都包括人像区域;
    针对每一个原始数据:分割出其中的人像区域,在所述人像区域之外的其他区域分别添加多个噪声,生成多个训练数据;
    针对N个原始数据的多个训练数据的集合即为所述训练数据集,所述训练数据集包括M个训练数据,其中,M大于N且均为正整数。
  9. 根据权利要求7或8所述的方法,其中,在训练所述神经网络时,针对所述训练数据集中的训练数据:
    对所述训练数据进行数据增强处理,得到增强后的数据;
    对所述增强后的数据进行人脸下半部分裁剪,得到裁剪后的数据;
    对所述增强后的数据进行边缘检测,得到边缘检测后的数据;
    将所述增强后的数据作为所述神经网络的所述第一分支模型的输入;
    将所述裁剪后的数据作为所述神经网络的所述第二分支模型的输入;
    将所述边缘检测后的数据与所述增强后的数据进行融合之后,作为所述神经网络的所述第三分支模型的输入。
  10. 根据权利要求9所述的方法,其中,所述数据增强处理包括如下至少一项:镜像、亮度变化、随机裁剪。
  11. 根据权利要求7至10中任一项所述的方法,其中,在训练所述神经网络时,通过设定单次迭代时的数据量、总的迭代次数、学习率衰减策略来控制所述训练过程的终止。
  12. 一种行为识别的装置,其中,所述装置包括:
    获取模块,用于获取待识别图像;
    输入模块,用于将所述待识别图像输入至预先训练好的神经网络;
    所述获取模块,还用于获取所述神经网络的输出,所述输出包括第一输出、第二输出和第三输出,其中,所述第一输出表示所述待识别图像中人物行为正常的概率,所述第二输出表示所述待识别图像中人物正在进行第一行为的概率,所述第三输出表示所述待识别图像中人物正在进行第二行为的概率;
    确定模块,用于根据所述第一输出、所述第二输出和所述第三输出确定所述待识别图像中的人物的行为。
  13. 一种行为识别的装置,包括存储器、处理器及存储在所述存储器上且在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现权利要求1至11中任一项所述方法的步骤。
  14. 一种计算机存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至11中任一项所述方法的步骤。
PCT/CN2020/119735 2020-03-12 2020-09-30 行为识别的方法、装置及计算机存储介质 WO2021179591A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010173400.8 2020-03-12
CN202010173400.8A CN111444788B (zh) 2020-03-12 2020-03-12 行为识别的方法、装置及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2021179591A1 true WO2021179591A1 (zh) 2021-09-16

Family

ID=71627471

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119735 WO2021179591A1 (zh) 2020-03-12 2020-09-30 行为识别的方法、装置及计算机存储介质

Country Status (2)

Country Link
CN (1) CN111444788B (zh)
WO (1) WO2021179591A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546875A (zh) * 2022-11-07 2022-12-30 科大讯飞股份有限公司 基于多任务的座舱内行为检测方法、装置以及设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444788B (zh) * 2020-03-12 2024-03-15 成都旷视金智科技有限公司 行为识别的方法、装置及计算机存储介质
CN113362070A (zh) * 2021-06-03 2021-09-07 中国工商银行股份有限公司 用于识别操作用户的方法、装置、电子设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886209A (zh) * 2019-02-25 2019-06-14 成都旷视金智科技有限公司 异常行为检测方法及装置、车载设备
US20190205629A1 (en) * 2018-01-04 2019-07-04 Beijing Kuangshi Technology Co., Ltd. Behavior predicton method, behavior predicton system, and non-transitory recording medium
CN110084228A (zh) * 2019-06-25 2019-08-02 江苏德劭信息科技有限公司 一种基于双流卷积神经网络的危险行为自动识别方法
CN110222554A (zh) * 2019-04-16 2019-09-10 深圳壹账通智能科技有限公司 欺诈识别方法、装置、电子设备及存储介质
CN110348335A (zh) * 2019-06-25 2019-10-18 平安科技(深圳)有限公司 行为识别的方法、装置、终端设备及存储介质
CN111444788A (zh) * 2020-03-12 2020-07-24 成都旷视金智科技有限公司 行为识别的方法、装置及计算机存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960080B (zh) * 2018-06-14 2020-07-17 浙江工业大学 基于主动防御图像对抗攻击的人脸识别方法
CN108921054B (zh) * 2018-06-15 2021-08-03 华中科技大学 一种基于语义分割的行人多属性识别方法
ES2900162T3 (es) * 2018-07-27 2022-03-16 JENOPTIK Traffic Solutions UK Ltd Método y aparato para reconocer una placa de matrícula de un vehículo
CN109241880B (zh) * 2018-08-22 2021-02-05 北京旷视科技有限公司 图像处理方法、图像处理装置、计算机可读存储介质
CN109377509B (zh) * 2018-09-26 2021-03-26 达闼机器人有限公司 图像语义分割标注的方法、装置、存储介质和设备
CN109543513A (zh) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 智能监控实时处理的方法、装置、设备及存储介质
CN109815881A (zh) * 2019-01-18 2019-05-28 成都旷视金智科技有限公司 行为识别模型的训练方法、行为识别方法、装置及设备
GB2585708B (en) * 2019-07-15 2022-07-06 Huawei Tech Co Ltd Generating three-dimensional facial data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205629A1 (en) * 2018-01-04 2019-07-04 Beijing Kuangshi Technology Co., Ltd. Behavior predicton method, behavior predicton system, and non-transitory recording medium
CN109886209A (zh) * 2019-02-25 2019-06-14 成都旷视金智科技有限公司 异常行为检测方法及装置、车载设备
CN110222554A (zh) * 2019-04-16 2019-09-10 深圳壹账通智能科技有限公司 欺诈识别方法、装置、电子设备及存储介质
CN110084228A (zh) * 2019-06-25 2019-08-02 江苏德劭信息科技有限公司 一种基于双流卷积神经网络的危险行为自动识别方法
CN110348335A (zh) * 2019-06-25 2019-10-18 平安科技(深圳)有限公司 行为识别的方法、装置、终端设备及存储介质
CN111444788A (zh) * 2020-03-12 2020-07-24 成都旷视金智科技有限公司 行为识别的方法、装置及计算机存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546875A (zh) * 2022-11-07 2022-12-30 科大讯飞股份有限公司 基于多任务的座舱内行为检测方法、装置以及设备

Also Published As

Publication number Publication date
CN111444788B (zh) 2024-03-15
CN111444788A (zh) 2020-07-24

Similar Documents

Publication Publication Date Title
WO2021179591A1 (zh) 行为识别的方法、装置及计算机存储介质
CN113095124B (zh) 一种人脸活体检测方法、装置以及电子设备
CN111986183B (zh) 一种染色体散型图像自动分割识别系统及装置
US11443438B2 (en) Network module and distribution method and apparatus, electronic device, and storage medium
KR20210058887A (ko) 이미지 처리 방법 및 장치, 전자 기기 및 저장 매체
CN114241548A (zh) 一种基于改进YOLOv5的小目标检测算法
CN111027481B (zh) 基于人体关键点检测的行为分析方法及装置
CN109815797B (zh) 活体检测方法和装置
CN111626243B (zh) 口罩遮挡人脸的身份识别方法、装置及存储介质
CN111291761B (zh) 用于识别文字的方法和装置
WO2019218116A1 (en) Method and apparatus for image recognition
CN111898610A (zh) 卡片缺角检测方法、装置、计算机设备及存储介质
CN110110631B (zh) 一种识别打电话的方法及设备
CN110751004A (zh) 二维码检测方法、装置、设备及存储介质
CN107886093B (zh) 一种字符检测方法、系统、设备及计算机存储介质
CN112949526B (zh) 人脸检测方法和装置
US20220122341A1 (en) Target detection method and apparatus, electronic device, and computer storage medium
KR102348123B1 (ko) 카메라 렌즈 오염 경고 장치 및 방법
CN114240926B (zh) 板卡缺陷类别识别方法、装置、设备及可读存储介质
CN115546909A (zh) 活体检测方法、装置、门禁系统、设备及存储介质
CN114973268A (zh) 文本识别方法、装置、存储介质及电子设备
CN108875467B (zh) 活体检测的方法、装置及计算机存储介质
CN115830686A (zh) 基于特征融合的生物识别方法、系统、装置及存储介质
CN114596638A (zh) 人脸活体检测方法、装置及存储介质
CN111259753A (zh) 人脸关键点处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923961

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923961

Country of ref document: EP

Kind code of ref document: A1