CN113312942A - Data processing method and equipment and converged network architecture - Google Patents

Data processing method and equipment and converged network architecture Download PDF

Info

Publication number
CN113312942A
CN113312942A CN202010122774.7A CN202010122774A CN113312942A CN 113312942 A CN113312942 A CN 113312942A CN 202010122774 A CN202010122774 A CN 202010122774A CN 113312942 A CN113312942 A CN 113312942A
Authority
CN
China
Prior art keywords
face
feature
features
state
human face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010122774.7A
Other languages
Chinese (zh)
Other versions
CN113312942B (en
Inventor
杨攸奕
刘力哲
古鉴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010122774.7A priority Critical patent/CN113312942B/en
Publication of CN113312942A publication Critical patent/CN113312942A/en
Application granted granted Critical
Publication of CN113312942B publication Critical patent/CN113312942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application discloses a data processing method and device and a converged network architecture, and the application realizes the flexible application to different application scenes through a converged network, and further improves the operation efficiency of the application.

Description

Data processing method and equipment and converged network architecture
Technical Field
The present application relates to, but not limited to, artificial intelligence technology, and in particular, to a data processing method and apparatus and a converged network architecture.
Background
ADAS systems are becoming more and more common to utilize various sensors for fatigue detection of drivers, for learning concentration assessment of mental states of students, and so on. However, the types, settings and specific detection methods of the two current application scenarios are highly scene-dependent and cannot be copied to other scenarios, that is, the detection method provided in the related art is limited to a certain application scenario and cannot be applied to different application scenarios.
Disclosure of Invention
The application provides a data processing method and device and a converged network architecture, which can be flexibly suitable for different application scenarios.
The embodiment of the invention provides a data processing method, which comprises the following steps:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the human face overall features and the human face local features, wherein the features except the main features are auxiliary features;
and calculating the physiological state corresponding to the face included in the image data by fusing a neural network based on the main feature and the auxiliary feature.
In an exemplary instance, the acquiring of the whole face features and the at least one local face feature corresponding to the image data includes:
acquiring a face state sequence, at least one face characteristic state sequence and a face deep learning characteristic sequence corresponding to a plurality of frames of images in the image data;
selecting main features from the whole human face features and the local human face features, wherein the main features comprise:
and in the human face state sequence and at least one facial feature state sequence, one of the human face state sequence and the at least one facial feature state sequence is taken as the main feature, and the rest are taken as the auxiliary features.
In an exemplary instance, the calculating the physiological state corresponding to the face included in the image data includes:
performing fusion processing through the fusion neural network based on the face state sequence and at least one face feature state sequence to obtain more than two optimized main feature sequences;
taking more than two optimized main feature sequences as auxiliary weights, and taking the face deep learning feature sequences as main features to perform fusion processing to obtain optimized face deep learning feature sequences;
and classifying the optimized human face deep learning feature sequence to obtain the detection result of the physiological state.
In an illustrative example, the face deep learning feature sequence includes a face Convolutional Neural Network (CNN) feature sequence.
In one illustrative example, the multi-frame image includes N consecutive frame images.
In one illustrative example, frame skipping is allowed in the consecutive N-frame images.
In one illustrative example, the at least one facial feature state sequence comprises: the first facial feature state sequence, the second facial feature state sequence …, Mth facial feature state, M being an integer greater than or equal to 1;
the acquiring more than two optimized main feature sequences comprises:
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights, and the first facial feature state sequence as a main feature into the fusion network; outputting (M +1) optimized first facial feature state sequences after fusion processing;
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights, and the second facial feature state sequence as a main feature into the fusion network; outputting (M +1) optimized second facial feature state sequences after fusion processing;
by analogy, the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence are used as auxiliary weights, the Mth facial feature state sequence is used as a main feature, and the M face feature state sequences are respectively input into the fusion network; outputting (M +1) optimized Mth facial feature state sequences after fusion processing;
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights and the human face state sequence as main features into the fusion network; and (M +1) optimized human face state sequences are output after fusion processing.
In an exemplary embodiment, the optimized face deep learning feature sequence is processed by a neural network classifier.
In one illustrative example, the method further comprises:
detecting a single frame image, and extracting a face image and at least one face local image;
and detecting the face image and at least one face local image, and temporarily storing the obtained face state information, at least one face characteristic state information and face deep learning characteristic information.
In an exemplary instance, the face state sequence includes the face states respectively corresponding to the plurality of frames of images;
the face feature state sequence comprises the face feature states respectively corresponding to the multiple frames of images;
the face deep learning feature sequence comprises the face deep learning features respectively corresponding to the multiple frames of images.
In one illustrative example, the method further comprises:
detecting fatigue expression according to the calculated physiological state corresponding to the human face included in the image data; alternatively, the first and second electrodes may be,
and detecting drunk driving according to the calculated physiological state corresponding to the face included in the image data.
The present application further provides a computer-readable storage medium storing computer-executable instructions for performing any of the data processing methods described above.
The application also provides a data processing device, comprising a memory and a processor, wherein the memory stores the following instructions executable by the processor: for performing the steps of the data processing method described above.
The present application further provides a converged network architecture, comprising: a convolutional neural network CNN processing module and a fusion processing module; wherein the content of the first and second substances,
the CNN processing module is used for performing CNN operation on the input multiple auxiliary weights;
and the fusion processing module is used for carrying out element multiplication on the CNN operated result and the input main characteristic to output more than two optimized main characteristic sequences.
In one illustrative example, the plurality of auxiliary weights employ a concatenated input.
In an exemplary embodiment, the dimension of the output result after the plurality of auxiliary weights perform CNN operation is the same as the dimension of the main feature of the input.
In one illustrative example, the CNN structure is a backend converged network LAF-Net.
The present application further provides a data processing apparatus, comprising: the device comprises a first acquisition unit, a second acquisition unit, a first processing unit and a second processing unit; wherein the content of the first and second substances,
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring image data to be recognized, and the image data comprises a human face;
the second acquisition unit is used for acquiring the whole human face features and at least one local human face feature corresponding to the image data;
the first processing unit is used for selecting main features from the whole human face features and the local human face features, wherein the features except the main features are auxiliary features;
and the second processing unit is used for calculating the physiological state corresponding to the face included in the image data through a fusion neural network based on the main feature and the auxiliary feature.
In an exemplary embodiment, the second obtaining unit is specifically configured to: acquiring a face state sequence, at least one face characteristic state sequence and a face deep learning characteristic sequence corresponding to a plurality of frames of images in the image data;
the first processing unit is specifically configured to: and in the human face state sequence and at least one facial feature state sequence, one of the human face state sequence and the at least one facial feature state sequence is taken as the main feature, and the rest are taken as the auxiliary features.
In one illustrative example, the second processing unit includes a first fusion module, a second fusion module, and a classification module, wherein,
the first fusion unit is used for carrying out fusion processing through the fusion neural network based on the human face state sequence and at least one face characteristic state sequence to obtain more than two optimized main characteristic sequences;
the second fusion unit is used for taking more than two optimized main feature sequences as auxiliary weights and carrying out fusion processing by taking the human face deep learning feature sequence as a main feature to obtain an optimized human face deep learning feature sequence;
and the classification unit is used for processing the optimized human face deep learning feature sequence by using a neural network classifier to obtain the result of the physiological state detection.
In one illustrative example, further comprising:
the detection unit is used for detecting the single-frame image and extracting a face image and at least one face local image; and detecting the face image and at least one face local image, and temporarily storing the obtained face state information, at least one face characteristic state information and face deep learning characteristic information.
In an illustrative example, the first or second converged unit comprises the converged network of any one of the above.
The application also provides a data processing method, which comprises the following steps:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the human face overall features and the human face local features, wherein the features except the main features are auxiliary features;
and calculating the emotional state corresponding to the face included in the image data by fusing a neural network based on the main feature and the auxiliary feature.
The present application further provides a data processing method, including:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the whole features and the local features of the human face, wherein the features except the main features are auxiliary features;
acquiring more than two optimized main feature information by fusing a neural network based on the main feature and the auxiliary feature;
and performing makeup recommendation according to the more than two optimized main characteristic information.
The embodiment of the application realizes flexible application to different application scenes by combining the design of the network, and is not limited by a cockpit, a classroom, an office and the like; moreover, the operation efficiency of the application is further improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
FIG. 1 is a flow chart of a data processing method of the present application;
FIG. 2 is a schematic diagram illustrating the components of the converged network architecture of the present application;
FIG. 3 is a schematic process diagram illustrating an embodiment of drowsiness detection according to the present application;
fig. 4 is a schematic diagram of a configuration of a data processing apparatus according to the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In one exemplary configuration of the present application, a computing device includes one or more processors (CPUs), input/output interfaces, a network interface, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Fig. 1 is a flowchart of a data processing method according to the present application, as shown in fig. 1, including:
step 100: acquiring image data to be recognized, wherein the image data comprises a human face.
Step 101: and acquiring the whole face features and at least one face local feature corresponding to the image data.
In one illustrative example, the step may include:
and acquiring a face state sequence, at least one face characteristic state sequence and a face deep learning characteristic sequence corresponding to a plurality of frames of images in the image data.
In an exemplary instance, the face deep learning feature sequence may include, but is not limited to, a face Convolutional Neural Network (CNN) feature sequence, for example. CNN is a kind of feed forward Neural Networks (fed Neural Networks) including convolution calculation and having a deep structure, and is one of the representative algorithms of deep learning (deep learning).
In one illustrative example, the facial feature state of each frame of image includes, but is not limited to:
a first facial feature state, such as an eye state, including a left eye state, a right eye state, such as open eye, closed eye, drowsiness, etc.;
and/or, a second facial feature state, such as a mouth state, e.g., normal, yawning, laughing, etc.;
and/or, an mth facial feature state, such as an eyebrow state, etc. M is an integer greater than or equal to 0.
In one illustrative example, the face states include, but are not limited to: such as normal, low head, etc.
In an exemplary example, the face state sequence is composed of face states corresponding to multiple frames of images respectively, the face feature state sequence is composed of face feature states corresponding to multiple frames of images respectively, and the face deep learning feature sequence is composed of face deep learning features corresponding to multiple frames of images respectively, such as face CNN features.
In one illustrative example, the multi-frame image may be a continuous N-frame image.
In one illustrative example, frame skipping is allowed in consecutive N-frame images. Here, the frame skipping may be implemented by changing a number of transmission Frames Per Second (FPS, Frames Per Second).
Wherein the value of N can be changed by changing the duration of the input image sequence.
Step 102: and selecting main features from the whole features and the local features of the human face, wherein the features except the main features are auxiliary features.
In one illustrative example, the step may include:
and in the human face state sequence and at least one facial feature state sequence, one of the human face state sequence and the at least one facial feature state sequence is used as a main feature, and the rest are used as auxiliary features.
Step 103: and calculating the physiological state corresponding to the face included in the image data by fusing the neural network based on the obtained main features and the auxiliary features.
In one illustrative example, the step may include:
performing fusion processing through a fusion neural network based on the face state sequence and at least one face characteristic state sequence to obtain more than two optimized main characteristic sequences;
taking more than two optimized main feature sequences as auxiliary weights, and taking the face deep learning feature sequence as a main feature to carry out fusion processing to obtain an optimized face deep learning feature sequence;
and classifying the optimized human face deep learning feature sequence to obtain a detection result of the physiological state.
In an exemplary example, the obtaining more than two optimized main feature sequences may include:
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights, and the first facial feature state sequence as a main feature into the fusion network; outputting (M +1) optimized first facial feature state sequences after fusion processing;
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights, and the second facial feature state sequence as a main feature into the fusion network; outputting (M +1) optimized second facial feature state sequences after fusion processing;
by analogy …
Respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights, and the Mth facial feature state sequence as a main feature into the fusion network; outputting (M +1) optimized Mth facial feature state sequences after fusion processing;
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights and the human face state sequence as main features into the fusion network; and (M +1) optimized human face state sequences are output after fusion processing.
In an illustrative example, the fusion network in the embodiments of the present application may include a plurality of auxiliary weight inputs, a main feature input, all of the input information having the same sequence length N, i.e., each input information is from the same number of frames of images. Converged network as shown in fig. 2, the converged network may include: a CNN processing module and a fusion processing module; wherein the content of the first and second substances,
the CNN processing module is used for performing CNN operation on the input multiple auxiliary weights;
and the fusion processing module is used for carrying out element multiplication on the CNN operated result and the input main characteristic to output more than two optimized main characteristic sequences.
In an exemplary embodiment, the plurality of auxiliary weights may be input in a form of a concatenation (successive) manner, that is, the plurality of auxiliary weights are input as an auxiliary weight after being concatenated, and three parts separated by black bold lines in the leftmost box of fig. 2 represent that different auxiliary weights are concatenated together and input to the convergence network.
In an exemplary embodiment, the dimension of the output result after the CNN operation is performed on the plurality of auxiliary weights is the same as the dimension of the input main feature.
In an exemplary embodiment, the output result after the CNN operation can be any type of CNN result, and can also be a sequence network RNN, such as a long-short memory network (LSTM) or the like. Among them, the Recurrent Neural Network (RNN) is one of timing networks, and has better performance on the input of timing sequence.
In an exemplary example, based on the embodiments of the present application, those skilled in the art will readily understand that the overall framework of the converged network shown in fig. 2 can perform face tracking for a situation of multiple persons, and perform the fusion processing on different individuals respectively.
In an illustrative example, the CNN structure may be, but is not limited to, such as a back-end fusion network (LAF-Net). The LAF-Net is a CNN network based on a back-end Fusion (Late Fusion), the Late Fusion is a structural design of the CNN for the video, the Late Fusion separately processes images of each frame of the video in a backbone CNN, and the content Fusion of each frame is considered when the video arrives at a main network. The general complete CNN structure is composed of backbone + main network, and backbone is also a CNN architecture and responsible for extracting features for use by the main network. It should be noted that the CNN in the present application may also be an existing network, and is not used to limit the scope of the present application.
The method optimizes the existing detection result by fusing the relationship between the network and the serialized detection result. Thus, the operating efficiency of 15FPS can be achieved for the low end equipment.
In an exemplary embodiment, the optimized human face deep learning feature sequence is classified to obtain a physiological state detection result.
In an exemplary embodiment, the optimized human face deep learning feature sequence can be processed by a neural network classifier to obtain a physiological state detection result.
In the present application, the form of the neural network classifier is not limited, and for example, Average particle +2FC may be adopted.
In one illustrative example, the method of the present application further comprises:
detecting a single frame image, and extracting a face image and at least one face local image;
and detecting the face image and at least one face local image, and temporarily storing the obtained face state information, at least one face characteristic state information and face deep learning characteristic information.
The present application also provides a computer-readable storage medium storing computer-executable instructions for performing any of the data processing methods described above.
The present application further provides a data processing apparatus comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor: for performing the steps of the data processing method of any one of the above.
The ADAS system in the related art uses various sensors outside the vision and can only be used on the vehicle, but can be used outside the vehicle by using the technical scheme provided by the application;
the intelligent campus solution in the related art emphasizes motion detection, and cannot detect fine expressions of the face, such as eye closure and dense blinking, but by using the technical scheme provided by the application, the detection accuracy of the fine expressions (such as eye closure, continuous blinking, squinting and the like) is improved through mutual complementation between continuous frames, and the detection accuracy of large-amplitude expressions (such as yawning, napping, cheek support and the like) is improved;
the intelligent learning in the related art is focused on the fact that the force detection equipment must wear contact equipment, but the technical scheme provided by the application can be conveniently realized only by using an infrared camera or an RGB camera;
the intelligent driving scheme in the related technology only has single-frame detection and simple filtering, but by utilizing the technical scheme provided by the application, further fusion optimization is carried out on the basis of single-frame detection through the time sequence information and the fusion network, so that the identification effect is further improved;
the independent early warning instrument (ear-hung type) in the related technology can only detect the fatigue action related to the whole head, but the technical scheme provided by the application realizes the detection of the fatigue performance related to eyes, mouth and the like;
independent early warning appearance (infrared camera) equipment among the correlation technique is simpler, can only detect the tired action of eyes, but, the technical scheme who utilizes this application to provide has then realized the detection to more tired performances.
In an exemplary embodiment, according to the obtained physiological status, detection of drunk driving may also be implemented, that is, whether a person corresponding to the face in the image data is drunk driving or not is obtained.
In summary, the application realizes flexible application to different application scenes by combining the design of the network, and is not limited by a cockpit, a classroom, an office and the like; moreover, the operation efficiency of the application is further improved.
Fig. 3 is a schematic process diagram of an embodiment of drowsiness state detection according to the present application, in the embodiment, a facial feature state includes two facial feature states, namely, eyes (left eye and right eye) and mouth, as shown in fig. 3, the detection process includes:
firstly, detecting N frames of images to obtain an eye state sequence consisting of N eye states, a mouth state sequence consisting of N mouth states and a face state sequence consisting of N personal face states; and performing CNN processing on the N frames of images to obtain a face deep learning feature sequence consisting of N face deep learning features.
Then, the eye state sequence, the mouth state sequence and the face state sequence are used as auxiliary weights (as shown by a thin solid line arrow in fig. 3), the eye state sequence is used as a main feature (as shown by a thick solid line arrow in fig. 3), and the auxiliary weights are respectively input into the fusion network; outputting (2+1) ═ 3 optimized eye state sequences after fusion processing; respectively inputting the eye state sequence, the mouth state sequence and the face state sequence into the fusion network by taking the eye state sequence, the mouth state sequence and the face state sequence as auxiliary weights (as shown by a thin solid arrow in fig. 3) and taking the mouth state sequence as a main characteristic (as shown by a thick solid arrow in fig. 3); outputting (2+1) ═ 3 optimized mouth state sequences after fusion processing; respectively inputting the eye state sequence, the mouth state sequence and the face state sequence into the fusion network by taking the eye state sequence, the mouth state sequence and the face state sequence as auxiliary weights (as shown by a thin solid arrow in fig. 3) and by taking the face state sequence as a main characteristic (as shown by a thick solid arrow in fig. 3); and (2+1) is output as 3 optimized face state sequences after fusion processing.
And then. Respectively inputting the optimized eye state sequence, the optimized mouth state sequence and the optimized face state sequence into a fusion network by taking the optimized eye state sequence, the optimized mouth state sequence and the optimized face state sequence as auxiliary weights (as shown by thin solid line arrows in fig. 3) and by taking the face deep learning feature sequence as a main feature (as shown by thick solid line arrows in fig. 3); and (2+1) is output as 3 optimized human face deep learning feature sequences after fusion processing.
And finally, inputting the 3 optimized human face deep learning feature sequences into a neural network classifier, and processing the optimized human face deep learning feature sequences to obtain a physiological state detection result, namely a drowsiness state. The state of the pajamas in this embodiment is a yes state or a no state, i.e., sleepiness or no sleepiness.
In an illustrative example, the present application further provides a data processing method, comprising:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the whole features and the local features of the human face, wherein the features except the main features are auxiliary features;
based on the main features and the auxiliary features, the emotional states, such as calmness, depression, excitement, anger and the like, corresponding to the human faces included in the image data are calculated by fusing the neural networks.
In an exemplary example, whether the person corresponding to the face is a person with a violent emotion can be evaluated by counting the obtained emotional state corresponding to the face, so that when the person purchases the car insurance, adjustment of an appropriate amount of insurance can be made according to the emotional state of the person, and the like.
In an illustrative example, the present application further provides a data processing method, comprising:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the whole features and the local features of the human face, wherein the features except the main features are auxiliary features;
acquiring more than two optimized main feature information by fusing a neural network based on the main feature and the auxiliary feature;
and performing makeup recommendation according to the more than two optimized main characteristic information.
In the embodiment, the feature information obtained by fusing different parts included by the face is optimized, and the reliability of makeup recommendation is further improved.
In an exemplary embodiment, the data processing method provided by the application can also be applied to a live scene, and the image data including a physiological state, an emotional state and the like corresponding to the human face are detected in the live scene.
Fig. 4 is a schematic structural diagram of a data processing apparatus according to the present application, as shown in fig. 4, at least including: the device comprises a first acquisition unit, a second acquisition unit, a first processing unit and a second processing unit; wherein the content of the first and second substances,
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring image data to be recognized, and the image data comprises a human face;
the second acquisition unit is used for acquiring the whole human face features and at least one local human face feature corresponding to the image data;
the first processing unit is used for selecting main features from the whole features and the local features of the human face, wherein the features except the main features are auxiliary features;
and the second processing unit is used for calculating the physiological state corresponding to the face included in the image data through fusing the neural network based on the main characteristic and the auxiliary characteristic.
In an exemplary embodiment, the second obtaining unit is specifically configured to:
acquiring a face state sequence, at least one face characteristic state sequence and a face deep learning characteristic sequence corresponding to a plurality of frames of images in the image data;
the first processing unit is specifically configured to: and in the human face state sequence and at least one facial feature state sequence, one of the human face state sequence and the at least one facial feature state sequence is taken as the main feature, and the rest are taken as the auxiliary features.
In an illustrative example, the second processing unit includes a first fusion module, a second fusion module, and a classification module, wherein,
the first fusion module is used for carrying out fusion processing through a fusion neural network based on the human face state sequence and at least one facial feature state sequence to obtain more than two optimized main feature sequences;
the second fusion module is used for taking more than two optimized main feature sequences as auxiliary weights and taking the face deep learning feature sequence as a main feature to carry out fusion processing to obtain the optimized face deep learning feature sequence;
and the classification module is used for processing the optimized human face deep learning feature sequence by using a neural network classifier to obtain a detection result of the physiological state.
In an illustrative example, the data processing apparatus of the present application further includes:
the detection unit is used for detecting the single-frame image and extracting a face image and at least one face local image; and detecting the face image and at least one face local image, and temporarily storing the obtained face state information, at least one face characteristic state information and face deep learning characteristic information.
In one illustrative example, the first or second fusion module comprises:
the CNN processing module is used for performing CNN operation on the input multiple auxiliary weights;
and the fusion processing module is used for carrying out element multiplication on the CNN operated result and the input main characteristic to output more than two optimized main characteristic sequences.
Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims (24)

1. A method of data processing, comprising:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the human face overall features and the human face local features, wherein the features except the main features are auxiliary features;
and calculating the physiological state corresponding to the face included in the image data by fusing a neural network based on the main feature and the auxiliary feature.
2. The data processing method according to claim 1, wherein the acquiring of the whole face features and the at least one local face feature corresponding to the image data comprises:
acquiring a face state sequence, at least one face characteristic state sequence and a face deep learning characteristic sequence corresponding to a plurality of frames of images in the image data;
selecting main features from the whole human face features and the local human face features, wherein the main features comprise:
and in the human face state sequence and at least one facial feature state sequence, one of the human face state sequence and the at least one facial feature state sequence is taken as the main feature, and the rest are taken as the auxiliary features.
3. The data processing method according to claim 2, wherein the calculating the physiological state corresponding to the face included in the image data comprises:
performing fusion processing through the fusion neural network based on the face state sequence and at least one face feature state sequence to obtain more than two optimized main feature sequences;
taking more than two optimized main feature sequences as auxiliary weights, and taking the face deep learning feature sequences as main features to perform fusion processing to obtain optimized face deep learning feature sequences;
and classifying the optimized human face deep learning feature sequence to obtain the detection result of the physiological state.
4. The data processing method of claim 2, wherein the face deep learning feature sequence comprises a face Convolutional Neural Network (CNN) feature sequence.
5. The data processing method according to claim 2, wherein the multi-frame image includes N consecutive frame images.
6. The data processing method of claim 5, wherein frame skipping is allowed in the consecutive N-frame images.
7. The data processing method of claim 3, wherein the at least one facial feature state sequence comprises: the first facial feature state sequence, the second facial feature state sequence …, Mth facial feature state, M being an integer greater than or equal to 1;
the acquiring more than two optimized main feature sequences comprises:
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights, and the first facial feature state sequence as a main feature into the fusion network; outputting (M +1) optimized first facial feature state sequences after fusion processing;
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights, and the second facial feature state sequence as a main feature into the fusion network; outputting (M +1) optimized second facial feature state sequences after fusion processing;
by analogy, the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence are used as auxiliary weights, the Mth facial feature state sequence is used as a main feature, and the M face feature state sequences are respectively input into the fusion network; outputting (M +1) optimized Mth facial feature state sequences after fusion processing;
respectively inputting the first facial feature state sequence, the second facial feature state sequence …, the Mth facial feature state and the human face state sequence as auxiliary weights and the human face state sequence as main features into the fusion network; and (M +1) optimized human face state sequences are output after fusion processing.
8. The data processing method of claim 3, wherein the optimized face deep learning feature sequence is processed using a neural network classifier.
9. A data processing method according to any of claims 1 to 8, the method further comprising:
detecting a single frame image, and extracting a face image and at least one face local image;
and detecting the face image and at least one face local image, and temporarily storing the obtained face state information, at least one face characteristic state information and face deep learning characteristic information.
10. The data processing method of claim 9,
the face state sequence comprises the face states respectively corresponding to the multiple frames of images;
the face feature state sequence comprises the face feature states respectively corresponding to the multiple frames of images;
the face deep learning feature sequence comprises the face deep learning features respectively corresponding to the multiple frames of images.
11. The data processing method of claim 1, the method further comprising:
detecting fatigue expression according to the calculated physiological state corresponding to the human face included in the image data; alternatively, the first and second electrodes may be,
and detecting drunk driving according to the calculated physiological state corresponding to the face included in the image data.
12. A computer-readable storage medium storing computer-executable instructions for performing the data processing method of any one of claims 1 to 11.
13. A data processing apparatus comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor: steps for performing the data processing method of any one of claims 1 to 11.
14. A converged network architecture, comprising: a convolutional neural network CNN processing module and a fusion processing module; wherein the content of the first and second substances,
the CNN processing module is used for performing CNN operation on the input multiple auxiliary weights;
and the fusion processing module is used for carrying out element multiplication on the CNN operated result and the input main characteristic to output more than two optimized main characteristic sequences.
15. The converged network architecture of claim 14, wherein the plurality of auxiliary weights employ a concatenated input.
16. The converged network architecture of claim 14, wherein the plurality of secondary weights perform CNN operations with output results having the same dimensions as the input primary features.
17. The converged network architecture of claim 14, wherein the CNN structure is a backend converged network LAF-Net.
18. A data processing apparatus comprising: the device comprises a first acquisition unit, a second acquisition unit, a first processing unit and a second processing unit; wherein the content of the first and second substances,
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring image data to be recognized, and the image data comprises a human face;
the second acquisition unit is used for acquiring the whole human face features and at least one local human face feature corresponding to the image data;
the first processing unit is used for selecting main features from the whole human face features and the local human face features, wherein the features except the main features are auxiliary features;
and the second processing unit is used for calculating the physiological state corresponding to the face included in the image data through a fusion neural network based on the main feature and the auxiliary feature.
19. The data processing device according to claim 18, wherein the second obtaining unit is specifically configured to: acquiring a face state sequence, at least one face characteristic state sequence and a face deep learning characteristic sequence corresponding to a plurality of frames of images in the image data;
the first processing unit is specifically configured to: and in the human face state sequence and at least one facial feature state sequence, one of the human face state sequence and the at least one facial feature state sequence is taken as the main feature, and the rest are taken as the auxiliary features.
20. The data processing device of claim 19, wherein the second processing unit comprises a first fusion module, a second fusion module, and a classification module, wherein,
the first fusion unit is used for carrying out fusion processing through the fusion neural network based on the human face state sequence and at least one face characteristic state sequence to obtain more than two optimized main characteristic sequences;
the second fusion unit is used for taking more than two optimized main feature sequences as auxiliary weights and carrying out fusion processing by taking the human face deep learning feature sequence as a main feature to obtain an optimized human face deep learning feature sequence;
and the classification unit is used for processing the optimized human face deep learning feature sequence by using a neural network classifier to obtain the result of the physiological state detection.
21. The data processing device of claim 18, further comprising:
the detection unit is used for detecting the single-frame image and extracting a face image and at least one face local image; and detecting the face image and at least one face local image, and temporarily storing the obtained face state information, at least one face characteristic state information and face deep learning characteristic information.
22. The physiological state detection device of claim 18, the first or second fusion unit comprising the fusion network of any one of claims 13-16.
23. A method of data processing, comprising:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the human face overall features and the human face local features, wherein the features except the main features are auxiliary features;
and calculating the emotional state corresponding to the face included in the image data by fusing a neural network based on the main feature and the auxiliary feature.
24. A method of data processing, comprising:
acquiring image data to be recognized, wherein the image data comprises a human face;
acquiring a human face overall characteristic and at least one human face local characteristic corresponding to the image data;
selecting main features from the whole features and the local features of the human face, wherein the features except the main features are auxiliary features;
acquiring more than two optimized main feature information by fusing a neural network based on the main feature and the auxiliary feature;
and performing makeup recommendation according to the more than two optimized main characteristic information.
CN202010122774.7A 2020-02-27 2020-02-27 Data processing method and device and converged network architecture system Active CN113312942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010122774.7A CN113312942B (en) 2020-02-27 2020-02-27 Data processing method and device and converged network architecture system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010122774.7A CN113312942B (en) 2020-02-27 2020-02-27 Data processing method and device and converged network architecture system

Publications (2)

Publication Number Publication Date
CN113312942A true CN113312942A (en) 2021-08-27
CN113312942B CN113312942B (en) 2024-05-17

Family

ID=77370136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010122774.7A Active CN113312942B (en) 2020-02-27 2020-02-27 Data processing method and device and converged network architecture system

Country Status (1)

Country Link
CN (1) CN113312942B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084186A1 (en) * 2015-11-18 2017-05-26 华南理工大学 System and method for automatic monitoring and intelligent analysis of flexible circuit board manufacturing process
CN108509880A (en) * 2018-03-21 2018-09-07 南京邮电大学 A kind of video personage behavior method for recognizing semantics
CN109740536A (en) * 2018-06-12 2019-05-10 北京理工大学 A kind of relatives' recognition methods based on Fusion Features neural network
CN110111848A (en) * 2019-05-08 2019-08-09 南京鼓楼医院 A kind of human cyclin expressing gene recognition methods based on RNN-CNN neural network fusion algorithm
KR20190130808A (en) * 2018-05-15 2019-11-25 연세대학교 산학협력단 Emotion Classification Device and Method using Convergence of Features of EEG and Face

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084186A1 (en) * 2015-11-18 2017-05-26 华南理工大学 System and method for automatic monitoring and intelligent analysis of flexible circuit board manufacturing process
CN108509880A (en) * 2018-03-21 2018-09-07 南京邮电大学 A kind of video personage behavior method for recognizing semantics
KR20190130808A (en) * 2018-05-15 2019-11-25 연세대학교 산학협력단 Emotion Classification Device and Method using Convergence of Features of EEG and Face
CN109740536A (en) * 2018-06-12 2019-05-10 北京理工大学 A kind of relatives' recognition methods based on Fusion Features neural network
CN110111848A (en) * 2019-05-08 2019-08-09 南京鼓楼医院 A kind of human cyclin expressing gene recognition methods based on RNN-CNN neural network fusion algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔繁盛;蒋周良;胡斌;张欢;: "基于PCA融合神经网络的移动设备威胁研究", 电信工程技术与标准化, no. 11 *

Also Published As

Publication number Publication date
CN113312942B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
Sun et al. A visual attention based ROI detection method for facial expression recognition
CN111401177B (en) End-to-end behavior recognition method and system based on adaptive space-time attention mechanism
Dewan et al. A deep learning approach to detecting engagement of online learners
Mitra et al. A machine learning based approach for deepfake detection in social media through key video frame extraction
Rangesh et al. Driver gaze estimation in the real world: Overcoming the eyeglass challenge
CN109063626B (en) Dynamic face recognition method and device
Zhao et al. Scale-aware crowd counting via depth-embedded convolutional neural networks
Ghosh et al. Spatiotemporal filtering for event-based action recognition
CN110858316A (en) Classifying time series image data
Khan et al. Classification of human's activities from gesture recognition in live videos using deep learning
Abu-Ein et al. Analysis of the current state of deepfake techniques-creation and detection methods
Ma et al. Convolutional three-stream network fusion for driver fatigue detection from infrared videos
CN113312942B (en) Data processing method and device and converged network architecture system
Mustafa et al. Gender classification and age prediction using CNN and ResNet in real-time
Revi et al. Gan-generated fake face image detection using opponent color local binary pattern and deep learning technique
Kumar et al. Facial emotion recognition and detection using cnn
Rao et al. Non-local attentive temporal network for video-based person re-identification
CN110969109B (en) Blink detection model under non-limited condition and construction method and application thereof
KoÇak et al. Deepfake generation, detection and datasets: a rapid-review
Supriya et al. Affective music player for multiple emotion recognition using facial expressions with SVM
Mallavarapu et al. Image Based Sentiment Analysis using Bayesian Networks and Deep Learning
Shit et al. Real-time emotion recognition using end-to-end attention-based fusion network
Kao et al. Activity recognition using first-person-view cameras based on sparse optical flows
CN113988260B (en) Data processing method, device, equipment and system
Adnan et al. Deepfake video detection based on convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant