CN107463919A - A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks - Google Patents

A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks Download PDF

Info

Publication number
CN107463919A
CN107463919A CN201710713962.5A CN201710713962A CN107463919A CN 107463919 A CN107463919 A CN 107463919A CN 201710713962 A CN201710713962 A CN 201710713962A CN 107463919 A CN107463919 A CN 107463919A
Authority
CN
China
Prior art keywords
msub
mrow
facial
network
markers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710713962.5A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201710713962.5A priority Critical patent/CN107463919A/en
Publication of CN107463919A publication Critical patent/CN107463919A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The present invention proposes a kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks, and its main contents includes:3D starting residual errors network, facial markers, shot and long term memory network unit, its process is, by the time relationship between different frame in the spatial relationship in convolutional neural networks extraction face-image and video, facial markers contribute to prior facial composition in network attention characteristic pattern, therefore input of the extraction facial markers as network, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to more accurately be identified.The present invention proposes a kind of method that time relationship between every frame in video sequence is extracted using 3D convolutional neural networks and shot and long term memory network, and extract facial marks, emphasize the facial composition of more expressive force, improve the recognition capability of the delicate change of facial expression, innovative solution for the new design in the field of detecting a lie, and judicial domain has done further contribution.

Description

A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks
Technical field
The present invention relates to Expression Recognition field, and face is carried out based on depth 3D convolutional neural networks more particularly, to a kind of The method of Expression Recognition.
Background technology
Human facial expression recognition refers to isolate specific emotional state from given still image or dynamic video sequence, So that it is determined that the mental emotion of identified object, realizes understanding and identification of the computer to human face expression, fundamentally changes people It is the premise of computer understanding people's emotion so as to reach more preferable man-machine interaction with the relation of computer, and people explore With the effective way for understanding intelligence.Therefore expression recognition psychology, intelligent robot, intelligent monitoring, virtual reality and There is very big potential using value in the fields such as digital filming.Specifically, in psychological field, the table of people is analyzed by computer Feelings information, so as to infer the psychological condition of people, finally reach realize it is man-machine between intelligent interaction, with human facial expression recognition, The change of research human psychology mood is the important breakthrough of modern science and technology.And in field in intelligent robotics, carried out using computer Facial Expression Image acquisition, facial expression image pretreatment, Expression analysis etc., promote man-machine communication, reach a higher scientific and technological water It is flat.In addition, in digital filming field, can be according to the automatic capture pictures of smiling face's expression detected.Although at present in expression knowledge side The research in face is a lot of, but due to the complexity and cost consideration of method, in the market does not obtain popularization also and used, and by Very fast in human face expression pace of change, part expression is difficult to catch identification, therefore in terms of Expression Recognition rate is improved, even in the presence of one Fixed challenge.
The present invention proposes a kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks, network structure Residual error Internet (3DIR) is originated by a 3D and shot and long term memory network forms (LSTM) and formed, extracts the sky in face-image Between time relationship in relation and video between different frame, facial markers contribute to prior face in network attention characteristic pattern Portion's composition, therefore input of the facial markers as network is extracted, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to More accurately identified.The present invention proposes one kind and extracts video using 3D convolutional neural networks and shot and long term memory network Per the method for time relationship between frame in sequence, and facial marks is extracted, emphasize the facial composition of more expressive force, improve face The recognition capability of the delicate change of expression, for the new design of field in intelligent robotics, and the innovative solution of psychological field Further contribution is done.
The content of the invention
For Expression Recognition, it is proposed that one kind extracts video sequence using 3D convolutional neural networks and shot and long term memory network Per the method for time relationship between frame in row, and facial marks is extracted, improve the recognition capability of the delicate change of facial expression, be intelligence The new design of energy robot field, and the innovative solution of psychological field have done further contribution.
To solve the above problems, present invention offer is a kind of to carry out human facial expression recognition based on depth 3D convolutional neural networks Method, its main contents include:
(1) 3D originates residual error network;
(2) facial markers;
(3) shot and long term memory network unit.
Wherein, described depth 3D convolutional neural networks, residual error Internet (3DIR) is originated by a 3D and shot and long term is remembered Recall network composition (LSTM) composition, LSTM extracts the spatial relationship in face-image behind 3D starting residual error Internets And the time relationship in video between different frame, facial markers contribute in network attention characteristic pattern it is prior face into Point, therefore input of the facial markers as network is extracted, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to carry out More accurately identify.
Wherein, described shot and long term memory network (LSTM), LSTM provide memory function, are responsible for non-volatile recording context Information, comprising input structure (i), forget structure (f) and export structure (o), three structures are each responsible for storing up on time step t Memory cell c rewriting, maintain and retrieve, make σ (x)=(1+exp (- x))-1For Sigmoid functions, For hyperbolic tangent function, x, h, c, W and b are respectively to input, output, location mode, ginseng Matrix number and parameter vector,
Given input xt,ht-1And ct-1When on time step t, LSTM renewal is provided by equation (1).
Further, described 3D startings residual error network, 3D starting residual error networks have higher discrimination, its network knot Structure is:The video that size is 10 × 299 × 299 × 3 is inputted, wherein frame number is 10, represents color per frame size for 299 × 299,3 Chrominance channel, an adsorption layer is followed by, 3DIR includes A, B, C layer, and by 3DIR-A layers, size of mesh opening is reduced to 18 by 38 × 38 × 18,8 × 8 are reduced to by 18 × 18 by 3DIR-B size of mesh opening, average pond is carried out by 3DIR-C layers, finally by complete Articulamentum output result.
Further, described facial markers, in the network architecture using facial markers, major facial composition and face are distinguished The less other parts of portion's expression, in human facial expression recognition, extraction facial markers improve discrimination, retain every frame in network Time sequencing, CNN and LSTM are trained simultaneously in ad-hoc network, on raw residual network, with reference to facial markers, with residue Unit replaces optimal path, the input tensor of facial markers and remaining unit is carried out element multiplication, in order to extract facial marks, Facial bounding box is obtained using cross-platform computer vision library face recognition, alignd using the face for returning partial binary feature Algorithm, extract 66 facial markers points.
Wherein, described facial alignment algorithm, after detecting and preserving the facial markers of all databases, it is in the training stage Each sequence generates a facial markers wave filter, the facial markers in given sequence per frame, all images in sequence is adjusted Whole is the correspondingly sized of network median filter, and according to the marking path detected, power is distributed to all pixels in sequence per frame Weight, pixel is endowed bigger weight closer to facial markers, the then pixel, by using manhatton distance and linear weight letter Number, makes the Expression Recognition rate in database reach higher level, the manhatton distance between facial markers and pixel is its phase The difference sum of part is answered, weighted value is distributed to the weighting function of its individual features, is a simple line of manhatton distance Property function, is defined as follows:
W (L, P)=1-0.1dM(L,P) (2)
Wherein dM(L,P)It is the manhatton distance between facial markers L and pixel P, facial marks position has peak, its The pixel of surrounding has with corresponding facial markers apart from proportional relatively low weight, in order to avoid two adjacent facials mark weights It is folded, 7 × 7 windows are defined around each facial markers, each mark uses the weighting function of this 49 pixels respectively, Facial markers are added in network, the optimal road in raw residual network is replaced by weighting function w and input layer x element multiplication Footpath:
Wherein, xlAnd xl+1It is the input and output of l layers respectively,It is Hadamard product code, F is survival function, and f is sharp Function living.
Wherein, described facial markers point, after detecting face, facial markers point is extracted by facial alignment algorithm, it It is 299 × 299 pixels to reset face-image afterwards, can be from sequence because larger image and sequence possess deeper network More abstract characteristics are extracted, therefore select large-size images to be used as input, all networks have identical setting, respectively to every Individual database is trained, and the accuracy of network is assessed using independent motif task and integration across database task.
Wherein, described independent motif task, each database with strict theme independent mode be divided into training set and Checking collection, in all databases, using 5 times of Cross-Validation technique testing results, by 5 times of discrimination average out to, for each Database and each network for folding, being proposed using above-mentioned setting nursery, delete the multiplication unit of mark, and with remaining unit Input and output between simple and fast mode replace, select 20% target at random as test group, and report these mesh Target test result.
Wherein, described integration across database task, in integration across database task, in order to test each database, the data Storehouse is entirely used for test network, and remaining database is used for training network, test result indicates that, the method proposed improves table The success rate of feelings identification.
Further, described shot and long term memory network unit, the characteristic pattern obtained from 3DIR units include characteristic pattern sequence Concept of time in row, vector quantization is carried out to resulting 3DIR characteristic patterns in its sequence dimension, as needed for LSTM units Sequencing input, the disorder feature figure of vectorization is fed to LSTM units, preserves the time sequencing of list entries and by the spy Sign figure is delivered to LSTM units, and in the training stage, using asynchronous stochastic gradient descent, 0.9 momentum, weight decays to 0.0001, Learning rate is 0.01, and loss function and the evaluation index of accuracy are used as using classification cross entropy.
Brief description of the drawings
Fig. 1 is a kind of system flow for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention Figure.
Fig. 2 is a kind of network frame for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention Figure.
Fig. 3 is a kind of facial markers for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention Figure.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system flow for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention Figure.Mainly include 3D starting residual errors network, facial markers, shot and long term memory network unit.
Wherein, described depth 3D convolutional neural networks, residual error Internet (3DIR) is originated by a 3D and shot and long term is remembered Recall network composition (LSTM) composition, LSTM extracts the spatial relationship in face-image behind 3D starting residual error Internets And the time relationship in video between different frame, facial markers contribute in network attention characteristic pattern it is prior face into Point, therefore input of the facial markers as network is extracted, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to carry out More accurately identify.
Wherein, described shot and long term memory network (LSTM), LSTM provide memory function, are responsible for non-volatile recording context Information, comprising input structure (i), forget structure (f) and export structure (o), three structures are each responsible for storing up on time step t Memory cell c rewriting, maintain and retrieve, make σ (x)=(1+exp (- x))-1For Sigmoid functions, For hyperbolic tangent function, x, h, c, W and b are respectively to input, output, location mode, ginseng Matrix number and parameter vector,
Given input xt,ht-1And ct-1When on time step t, LSTM renewal is provided by equation (1).
Fig. 2 is a kind of network frame for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention Figure.Input video sequence, 3DIR combination facial markers strengthen countenance feature, and LSTM networks afterwards produce 3DIR layers Enhanced feature figure be fully connected as input, and therefrom extracting time information by associated with softmax activation primitives Layer output result.
Further, described 3D startings residual error network, 3D starting residual error networks have higher discrimination, its network knot Structure is:The video that size is 10 × 299 × 299 × 3 is inputted, wherein frame number is 10, represents color per frame size for 299 × 299,3 Chrominance channel, an adsorption layer is followed by, 3DIR includes A, B, C layer, and by 3DIR-A layers, size of mesh opening is reduced to 18 by 38 × 38 × 18,8 × 8 are reduced to by 18 × 18 by 3DIR-B size of mesh opening, average pond is carried out by 3DIR-C layers, finally by complete Articulamentum output result.
Fig. 3 is a kind of facial markers for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention Figure.Further, described facial markers, in the network architecture using facial markers, difference major facial composition and facial table The less other parts of feelings, in human facial expression recognition, extraction facial markers improve discrimination, retain the time per frame in network Sequentially, CNN and LSTM are trained simultaneously in ad-hoc network, on raw residual network, with reference to facial markers, with remaining unit Optimal path is replaced, the input tensor of facial markers and remaining unit is carried out element multiplication, in order to extract facial marks, is used Cross-platform computer vision library face recognition obtains facial bounding box, is calculated using the face alignment for returning partial binary feature Method, extract 66 facial markers points.
Wherein, described facial alignment algorithm, after detecting and preserving the facial markers of all databases, it is in the training stage Each sequence generates a facial markers wave filter, the facial markers in given sequence per frame, all images in sequence is adjusted Whole is the correspondingly sized of network median filter, and according to the marking path detected, power is distributed to all pixels in sequence per frame Weight, pixel is endowed bigger weight closer to facial markers, the then pixel, by using manhatton distance and linear weight letter Number, makes the Expression Recognition rate in database reach higher level, the manhatton distance between facial markers and pixel is its phase The difference sum of part is answered, weighted value is distributed to the weighting function of its individual features, is a simple line of manhatton distance Property function, is defined as follows:
W (L, P)=1-0.1dM(L,P) (2)
Wherein dM(L,P)It is the manhatton distance between facial markers L and pixel P, facial marks position has peak, its The pixel of surrounding has with corresponding facial markers apart from proportional relatively low weight, in order to avoid two adjacent facials mark weights It is folded, 7 × 7 windows are defined around each facial markers, each mark uses the weighting function of this 49 pixels respectively, Facial markers are added in network, the optimal road in raw residual network is replaced by weighting function w and input layer x element multiplication Footpath:
Wherein, xlAnd xl+1It is the input and output of l layers respectively,It is Hadamard product code, F is survival function, and f is sharp Function living.
Wherein, described facial markers point, after detecting face, facial markers point is extracted by facial alignment algorithm, it It is 299 × 299 pixels to reset face-image afterwards, can be from sequence because larger image and sequence possess deeper network More abstract characteristics are extracted, therefore select large-size images to be used as input, all networks have identical setting, respectively to every Individual database is trained, and the accuracy of network is assessed using independent motif task and integration across database task.
Wherein, described independent motif task, each database with strict theme independent mode be divided into training set and Checking collection, in all databases, using 5 times of Cross-Validation technique testing results, by 5 times of discrimination average out to, for each Database and each network for folding, being proposed using above-mentioned setting nursery, delete the multiplication unit of mark, and with remaining unit Input and output between simple and fast mode replace, select 20% target at random as test group, and report these mesh Target test result.
Wherein, described integration across database task, in integration across database task, in order to test each database, the data Storehouse is entirely used for test network, and remaining database is used for training network, test result indicates that, the method proposed improves table The success rate of feelings identification.
Further, described shot and long term memory network unit, the characteristic pattern obtained from 3DIR units include characteristic pattern sequence Concept of time in row, vector quantization is carried out to resulting 3DIR characteristic patterns in its sequence dimension, as needed for LSTM units Sequencing input, the disorder feature figure of vectorization is fed to LSTM units, preserves the time sequencing of list entries and by the spy Sign figure is delivered to LSTM units, and in the training stage, using asynchronous stochastic gradient descent, 0.9 momentum, weight decays to 0.0001, Learning rate is 0.01, and loss function and the evaluation index of accuracy are used as using classification cross entropy.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and change.

Claims (10)

  1. A kind of 1. method that human facial expression recognition is carried out based on depth 3D convolutional neural networks, it is characterised in that mainly including 3D Originate residual error network (one);Facial markers (two);Shot and long term memory network unit (three).
  2. 2. based on the depth 3D convolutional neural networks described in claims 1, it is characterised in that originate residual error net by a 3D Network layers (3DIR) and shot and long term memory network composition (LSTM) composition, LSTM is behind 3D starting residual error Internets, extraction Time relationship in spatial relationship and video in face-image between different frame, facial markers contribute to network attention feature Prior facial composition in figure, therefore input of the facial markers as network is extracted, improve the delicate change of sequence pair facial expression Recognition capability, so as to more accurately be identified.
  3. 3. based on the shot and long term memory network (LSTM) described in claims 2, it is characterised in that LSTM provides memory function, It is responsible for non-volatile recording contextual information, comprising input structure (i), forgets structure (f) and export structure (o), three structures exist Storage element c rewriting is each responsible on time step t, maintains and retrieves, make σ (x)=(1+exp (- x))-1For Sigmoid letters Number,For hyperbolic tangent function, x, h, c, W and b are respectively to input, and output is single First state, parameter matrix and parameter vector,
    <mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>f</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&amp;sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>f</mi> </msub> <mo>&amp;CenterDot;</mo> <mo>&amp;lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&amp;rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>f</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>i</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&amp;sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>i</mi> </msub> <mo>&amp;CenterDot;</mo> <mo>&amp;lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&amp;rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&amp;sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>o</mi> </msub> <mo>&amp;CenterDot;</mo> <mo>&amp;lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&amp;rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>o</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>g</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&amp;phi;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>C</mi> </msub> <mo>&amp;CenterDot;</mo> <mo>&amp;lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&amp;rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>C</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>C</mi> <mi>t</mi> </msub> <mo>=</mo> <msub> <mi>f</mi> <mi>t</mi> </msub> <mo>*</mo> <msub> <mi>C</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>i</mi> <mi>t</mi> </msub> <mo>*</mo> <msub> <mi>g</mi> <mi>t</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>h</mi> <mi>t</mi> </msub> <mo>=</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>*</mo> <mi>&amp;phi;</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
    Given input xt,ht-1And ct-1When on time step t, LSTM renewal is provided by equation (1).
  4. 4. based on the 3D starting residual error networks (one) described in claims 1, it is characterised in that 3D starting residual error networks have Higher discrimination, its network structure are:The video that size is 10 × 299 × 299 × 3 is inputted, wherein frame number is 10, per frame chi Very little is 299 × 299,3 expression color channels, is followed by an adsorption layer, 3DIR includes A, B, C layer, passes through 3DIR-A layers, grid Size is reduced to 18 × 18 by 38 × 38, is reduced to 8 × 8 by 18 × 18 by 3DIR-B size of mesh opening, is carried out by 3DIR-C layers Average pond, finally by being fully connected a layer output result.
  5. 5. based on the facial markers (two) described in claims 2, it is characterised in that in the network architecture using facial markers, Major facial composition and the less other parts of facial expression are distinguished, in human facial expression recognition, extraction facial markers, which improve, to be known Not rate, retain the time sequencing per frame in network, CNN and LSTM are trained simultaneously in ad-hoc network, in raw residual network On, with reference to facial markers, optimal path is replaced with remaining unit, the input tensor of facial markers and remaining unit is entered row element It is multiplied, in order to extract facial marks, facial bounding box is obtained using cross-platform computer vision library face recognition, using recurrence office The facial alignment algorithm of portion's binary features, extract 66 facial markers points.
  6. 6. based on the facial alignment algorithm described in claims 5, it is characterised in that detect and preserve the face of all databases It is that each sequence generates a facial markers wave filter in the training stage, the facial markers in given sequence per frame will after mark All Image Adjustings in sequence are the correspondingly sized of network median filter, according to the marking path detected, to every in sequence The all pixels distribution weight of frame, pixel is endowed bigger weight closer to facial markers, the then pixel, by using graceful Kazakhstan Pause distance and Line Weight Function, the Expression Recognition rate in database is reached higher level, between facial markers and pixel Manhatton distance be its corresponding component difference sum, weighted value is distributed to the weighting function of its individual features, is Man Ha One simple linear function of distance of pausing, is defined as follows:
    W (L, P)=1-0.1dM(L,P) (2)
    Wherein dM(L,P)It is the manhatton distance between facial markers L and pixel P, facial marks position has peak, around it Pixel have with corresponding facial markers apart from proportional relatively low weight, in order to avoid two adjacent facials marks are overlapping, 7 × 7 windows are defined around each facial markers, each mark uses the weighting function of this 49 pixels respectively, in net Facial markers are added in network, the optimal road in raw residual network is replaced by weighting function w and input layer x element multiplication Footpath:
    xl+1=f (yl) (3)
    Wherein, xlAnd xl+1It is the input and output of l layers respectively, ° is Hadamard product code, F is survival function, and f is activation letter Number.
  7. 7. based on the facial markers point described in claims 5, it is characterised in that after detecting face, calculated by face alignment Method extract facial markers point, afterwards reset face-image be 299 × 299 pixels, due to larger image and sequence possess it is deeper Network, more abstract characteristics can be extracted from sequence, therefore select large-size images to have as input, all networks Identical is set, and each database is trained respectively, network is assessed using independent motif task and integration across database task Accuracy.
  8. 8. based on the independent motif task described in claims 7, it is characterised in that each database is only with strict theme Cube formula is divided into training set and checking collects, in all databases, using 5 times of Cross-Validation technique testing results, by discrimination 5 times of average out to, for each database and each network for folding, being proposed using above-mentioned setting nursery, delete multiplying for mark Method unit, and replaced with the simple and fast mode between the input and output of remaining unit, 20% target conduct is selected at random Test group, and report the test result of these targets.
  9. 9. based on the integration across database task described in claims 7, it is characterised in that in integration across database task, in order to test Each database, the database being entirely used for test network, remaining database is used for training network, test result indicates that, The method proposed improves the success rate of Expression Recognition.
  10. 10. based on the shot and long term memory network unit (three) described in claims 1, it is characterised in that obtained from 3DIR units Characteristic pattern include concept of time in feature graphic sequence, vector is carried out to resulting 3DIR characteristic patterns in its sequence dimension Change, as the sequencing input needed for LSTM units, the disorder feature figure of vectorization is fed to LSTM units, preserves list entries Time sequencing and this feature figure is delivered to LSTM units, in the training stage, using asynchronous stochastic gradient descent, 0.9 momentum, Weight decays to 0.0001, learning rate 0.01, loss function and the evaluation index of accuracy is used as using classification cross entropy.
CN201710713962.5A 2017-08-18 2017-08-18 A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks Withdrawn CN107463919A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710713962.5A CN107463919A (en) 2017-08-18 2017-08-18 A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710713962.5A CN107463919A (en) 2017-08-18 2017-08-18 A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks

Publications (1)

Publication Number Publication Date
CN107463919A true CN107463919A (en) 2017-12-12

Family

ID=60550015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710713962.5A Withdrawn CN107463919A (en) 2017-08-18 2017-08-18 A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks

Country Status (1)

Country Link
CN (1) CN107463919A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062538A (en) * 2017-12-29 2018-05-22 成都智宝大数据科技有限公司 Face identification method and device
CN108229338A (en) * 2017-12-14 2018-06-29 华南理工大学 A kind of video behavior recognition methods based on depth convolution feature
CN108280400A (en) * 2017-12-27 2018-07-13 广东工业大学 A kind of expression recognition method based on depth residual error network
CN108319900A (en) * 2018-01-16 2018-07-24 南京信息工程大学 A kind of basic facial expression sorting technique
CN108376234A (en) * 2018-01-11 2018-08-07 中国科学院自动化研究所 emotion recognition system and method for video image
CN108596865A (en) * 2018-03-13 2018-09-28 中山大学 A kind of characteristic pattern for convolutional neural networks enhances system and method
CN108682006A (en) * 2018-04-25 2018-10-19 南京农业大学 Contactless canned compost maturity judgment method
CN108960122A (en) * 2018-06-28 2018-12-07 南京信息工程大学 A kind of expression classification method based on space-time convolution feature
CN109165573A (en) * 2018-08-03 2019-01-08 百度在线网络技术(北京)有限公司 Method and apparatus for extracting video feature vector
CN109657716A (en) * 2018-12-12 2019-04-19 天津卡达克数据有限公司 A kind of vehicle appearance damnification recognition method based on deep learning
CN109815835A (en) * 2018-12-29 2019-05-28 联动优势科技有限公司 A kind of interactive mode biopsy method
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model
CN110287773A (en) * 2019-05-14 2019-09-27 杭州电子科技大学 Transport hub safety check image-recognizing method based on autonomous learning
CN110363129A (en) * 2019-07-05 2019-10-22 昆山杜克大学 Autism early screening system based on smile normal form and audio-video behavioural analysis
CN110414544A (en) * 2018-04-28 2019-11-05 杭州海康威视数字技术股份有限公司 A kind of dbjective state classification method, apparatus and system
WO2021042372A1 (en) * 2019-09-06 2021-03-11 中国医药大学附设医院 Atrial fibrillation prediction model and prediction system thereof
US11423634B2 (en) 2018-08-03 2022-08-23 Huawei Cloud Computing Technologies Co., Ltd. Object detection model training method, apparatus, and device
CN117218422A (en) * 2023-09-12 2023-12-12 北京国科恒通科技股份有限公司 Power grid image recognition method and system based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BEHZAD HASANI等: ""Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks"", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1705.07871V1》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229338A (en) * 2017-12-14 2018-06-29 华南理工大学 A kind of video behavior recognition methods based on depth convolution feature
CN108280400A (en) * 2017-12-27 2018-07-13 广东工业大学 A kind of expression recognition method based on depth residual error network
CN108062538A (en) * 2017-12-29 2018-05-22 成都智宝大数据科技有限公司 Face identification method and device
CN108376234A (en) * 2018-01-11 2018-08-07 中国科学院自动化研究所 emotion recognition system and method for video image
CN108319900A (en) * 2018-01-16 2018-07-24 南京信息工程大学 A kind of basic facial expression sorting technique
CN108596865B (en) * 2018-03-13 2021-10-26 中山大学 Feature map enhancement system and method for convolutional neural network
CN108596865A (en) * 2018-03-13 2018-09-28 中山大学 A kind of characteristic pattern for convolutional neural networks enhances system and method
CN108682006A (en) * 2018-04-25 2018-10-19 南京农业大学 Contactless canned compost maturity judgment method
CN108682006B (en) * 2018-04-25 2021-07-20 南京农业大学 Non-contact type canned compost maturity judging method
CN110414544A (en) * 2018-04-28 2019-11-05 杭州海康威视数字技术股份有限公司 A kind of dbjective state classification method, apparatus and system
CN108960122A (en) * 2018-06-28 2018-12-07 南京信息工程大学 A kind of expression classification method based on space-time convolution feature
CN109165573A (en) * 2018-08-03 2019-01-08 百度在线网络技术(北京)有限公司 Method and apparatus for extracting video feature vector
US11605211B2 (en) 2018-08-03 2023-03-14 Huawei Cloud Computing Technologies Co., Ltd. Object detection model training method and apparatus, and device
US11423634B2 (en) 2018-08-03 2022-08-23 Huawei Cloud Computing Technologies Co., Ltd. Object detection model training method, apparatus, and device
CN109657716A (en) * 2018-12-12 2019-04-19 天津卡达克数据有限公司 A kind of vehicle appearance damnification recognition method based on deep learning
CN109657716B (en) * 2018-12-12 2020-12-29 中汽数据(天津)有限公司 Vehicle appearance damage identification method based on deep learning
CN109815835A (en) * 2018-12-29 2019-05-28 联动优势科技有限公司 A kind of interactive mode biopsy method
CN110046551A (en) * 2019-03-18 2019-07-23 中国科学院深圳先进技术研究院 A kind of generation method and equipment of human face recognition model
CN110287773A (en) * 2019-05-14 2019-09-27 杭州电子科技大学 Transport hub safety check image-recognizing method based on autonomous learning
CN110363129B (en) * 2019-07-05 2022-05-27 昆山杜克大学 Early autism screening system based on smiling paradigm and audio-video behavior analysis
CN110363129A (en) * 2019-07-05 2019-10-22 昆山杜克大学 Autism early screening system based on smile normal form and audio-video behavioural analysis
JP2022523835A (en) * 2019-09-06 2022-04-26 中國醫藥大學附設醫院 Atrial fibrillation prediction model and its prediction system
WO2021042372A1 (en) * 2019-09-06 2021-03-11 中国医药大学附设医院 Atrial fibrillation prediction model and prediction system thereof
CN117218422A (en) * 2023-09-12 2023-12-12 北京国科恒通科技股份有限公司 Power grid image recognition method and system based on machine learning
CN117218422B (en) * 2023-09-12 2024-04-16 北京国科恒通科技股份有限公司 Power grid image recognition method and system based on machine learning

Similar Documents

Publication Publication Date Title
CN107463919A (en) A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks
Kang et al. Deep unsupervised embedding for remotely sensed images based on spatially augmented momentum contrast
Yu et al. Deep learning in remote sensing scene classification: a data augmentation enhanced convolutional neural network framework
CN104217214B (en) RGB D personage&#39;s Activity recognition methods based on configurable convolutional neural networks
Oh et al. Approaching the computational color constancy as a classification problem through deep learning
CN107506740B (en) Human body behavior identification method based on three-dimensional convolutional neural network and transfer learning model
CN109344736B (en) Static image crowd counting method based on joint learning
Tao et al. Smoke detection based on deep convolutional neural networks
CN105678284B (en) A kind of fixed bit human body behavior analysis method
CN108921822A (en) Image object method of counting based on convolutional neural networks
CN105469041B (en) Face point detection system based on multitask regularization and layer-by-layer supervision neural network
CN110147743A (en) Real-time online pedestrian analysis and number system and method under a kind of complex scene
CN106682697A (en) End-to-end object detection method based on convolutional neural network
CN108229338A (en) A kind of video behavior recognition methods based on depth convolution feature
CN107341452A (en) Human bodys&#39; response method based on quaternary number space-time convolutional neural networks
CN106503687A (en) The monitor video system for identifying figures of fusion face multi-angle feature and its method
CN109697435A (en) Stream of people&#39;s quantity monitoring method, device, storage medium and equipment
Baveye et al. Deep learning for image memorability prediction: The emotional bias
CN107016357A (en) A kind of video pedestrian detection method based on time-domain convolutional neural networks
CN107403154A (en) A kind of gait recognition method based on dynamic visual sensor
CN109376747A (en) A kind of video flame detecting method based on double-current convolutional neural networks
CN108090403A (en) A kind of face dynamic identifying method and system based on 3D convolutional neural networks
CN106529499A (en) Fourier descriptor and gait energy image fusion feature-based gait identification method
CN106156765A (en) safety detection method based on computer vision
CN104298974A (en) Human body behavior recognition method based on depth video sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20171212

WW01 Invention patent application withdrawn after publication