CN114241565A

CN114241565A - Facial expression and target object state analysis method, device and equipment

Info

Publication number: CN114241565A
Application number: CN202111561312.6A
Authority: CN
Inventors: 刘奇文; 潘秀平; 孙坤鹏; 张昱
Original assignee: Beijing E Hualu Information Technology Co Ltd
Current assignee: Beijing E Hualu Information Technology Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-25

Abstract

The invention provides a facial expression and target object state analysis method, a device and equipment, wherein the facial expression analysis method comprises the following steps: acquiring image data of a target object; and performing facial expression analysis based on a preset facial expression recognition model and the image data to obtain an analysis result, wherein the preset facial expression recognition model is generated by training based on an initial state image of the target object and a sample image containing state change. Through comprehensive analysis of the facial expression image data acquired in real time and the preset facial expression recognition model, the real emotion of the target object can be objectively and accurately obtained through facial expression change analysis of the target object.

Description

Facial expression and target object state analysis method, device and equipment

Technical Field

The invention relates to the technical field of human face recognition, in particular to a method, a device and equipment for analyzing facial expressions and target object states.

Background

At present, in the criminal case detection process, the most widely and mature lie detection technology is a multi-channel psychological test technology, a tester inquires a series of standardized questions of a test subject, a wearable lie detection instrument records the physiological response of the test subject to each question, and whether the test subject lies is judged according to the physiological response. However, in the prior art, various devices are generally adopted to collect bioelectricity signals of a subject, such as a sphygmomanometer to detect blood pressure, and thus, the various devices are used to detect the bioelectricity signals of the subject respectively, and related instruments are frequently worn, so that the subject cannot be continuously detected for a long time, only related detection data at each stage in the whole inquiry process can be obtained, the lie detection effect is not ideal, and trial results are mostly based on subjective judgment to analyze behavioral responses and self-rationalization expressions of the subject during interviewing/presentation, so that the real emotion of the subject cannot be accurately and objectively analyzed.

Disclosure of Invention

Therefore, the invention provides a method, a device and equipment for analyzing facial expressions and target object states, aiming at solving the defects that the required information cannot be continuously acquired and the real emotion cannot be accurately and objectively analyzed in the prior art.

According to a first aspect, an embodiment of the present invention provides a facial expression analysis method, including: acquiring image data of a target object; and performing facial expression analysis based on a preset facial expression recognition model and the image data to obtain an analysis result, wherein the preset facial expression recognition model is generated by training based on an initial state image of the target object and a sample image containing state change.

Optionally, the training of the process of generating the preset facial expression recognition model includes: acquiring an initial state image of the target object as a reference sample image; adjusting the color space attribute of the reference sample image to generate a reference sample image; acquiring a change image of the target object in an emotion change stage; correlating the state change of the target object based on the amplitude and the frequency of the image information in the change image to generate a contrast sample image; training a neural network model based on the reference sample image and the comparison sample image to generate the preset facial expression recognition model.

Optionally, the adjusting the color space attribute of the reference sample image to generate a reference sample image includes: adjusting a brightness and/or contrast of a region of interest portion of the reference sample image; and replacing the region of interest part in the original reference sample image by using the adjusted region of interest part to generate the reference sample image.

Optionally, the adjusting the color space attribute of the reference sample image to generate a reference sample image further includes: inputting the adjusted reference sample image into a depth generation model, and outputting a reference sample image with the similarity to the adjusted reference sample image smaller than a preset threshold;

optionally, the training process of the depth generation model includes: generating a false sample image according to random noise and the adjusted reference sample image by a generator of the depth generation model; judging the similarity between the false sample image and the original reference sample image through a discriminator of the depth generation model; and adjusting parameters of the depth generation model based on the judgment result until the similarity between the generated false sample image and the original reference sample image is greater than a preset threshold value.

According to a second aspect, an embodiment of the present invention provides a target state analysis method, including: acquiring image information of a target object, acquiring audio information of the target object, and determining a corresponding relation between the image information and the audio information based on time information; performing facial expression analysis based on a preset facial expression recognition model and the image information to obtain a facial expression analysis result; and performing state analysis according to the corresponding relation between the image information and the audio information and the facial expression analysis result to obtain a state analysis result.

Optionally, the preset facial expression recognition model is generated by training using the facial expression analysis method as described in the first aspect or any embodiment.

Optionally, the performing state analysis according to the correspondence between the image information and the audio information and the facial expression analysis result to obtain a state analysis result includes: determining a corresponding relation between the audio information and the facial expression analysis result based on the corresponding relation between the image information and the audio information; and analyzing the facial expression analysis result by using three-dimensional control based on the corresponding relation between the audio information and the facial expression analysis result, and calculating and analyzing to obtain the state analysis result.

According to a second aspect, a facial expression analysis apparatus includes: the acquisition module is used for acquiring image data of a target object; and the training module is used for carrying out facial expression analysis based on a preset facial expression recognition model and the image data to obtain an analysis result, wherein the preset facial expression recognition model is generated by training based on the initial state image of the target object and the sample image containing the state change.

According to a third aspect, a target object state analysis apparatus includes: the acquisition module is used for acquiring image information of a target object, acquiring audio information of the target object, and determining the corresponding relation between the image information and the audio information based on time information; the analysis module is used for carrying out facial expression analysis based on a preset facial expression recognition model and the image information to obtain a facial expression analysis result; and the communication module is used for carrying out state analysis according to the corresponding relation between the image information and the audio information and the facial expression analysis result to obtain a state analysis result.

According to a fourth aspect, a computer device comprises: a communication unit, a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor being configured to execute the computer instructions to perform the steps of the method according to the first aspect, the second aspect or any one of the alternative embodiments.

According to a fifth aspect, a computer-readable storage medium is characterized by storing computer instructions for causing a computer to perform the steps of the method of the first aspect, the second aspect or any optional implementation.

The technical scheme of the invention has the following advantages:

the embodiment of the invention provides a facial expression analysis method, a device and equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining facial expression image data of a target object in real time, training image data of an initial state of the target object and sample data during state change to obtain a facial expression recognition model, and carrying out facial expression analysis based on the facial expression recognition model and the obtained facial expression image data. According to the embodiment of the invention, the real emotion of the target object can be objectively and accurately obtained from the facial expression change analysis of the target object by comprehensively analyzing the facial expression image data acquired in real time and the preset facial expression recognition model.

The embodiment of the invention provides a method, a device and equipment for analyzing the state of a target object, wherein the method comprises the following steps: the method comprises the steps of acquiring image data of a target object in each time period in real time, simultaneously acquiring audio information of the target object in each time period, determining the corresponding relation between the image information and the audio information based on time information, carrying out facial analysis based on the image information and a facial expression recognition model generated by the facial expression analysis method to obtain a facial analysis result, and obtaining a state analysis result of the target object according to the corresponding relation between the image information and the audio information and the facial analysis result. According to the embodiment of the invention, the state analysis result of the target object is obtained by comprehensively analyzing the face in combination with the facial expression recognition model according to the corresponding relation between the image data and the audio information of each time period of the target object, so that the analysis result of the corresponding relation between the facial expression and the audio information of the target object can be more scientifically obtained, and the state change of each stage of the target object can be more objectively obtained.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart illustrating a specific example of a facial expression analysis method according to an embodiment of the present invention;

FIG. 2 is a flow chart of yet another specific example of a facial expression analysis method according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a calm state of a target object in a facial expression analysis method according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a target object tension state in a facial expression analysis method according to an embodiment of the present invention;

fig. 5 is a flowchart illustrating a specific example of a method for analyzing a state of a target object according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating another specific example of a method for analyzing a state of a target object according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating analysis results of a target object state analysis method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an apparatus for analyzing facial expressions according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a device of a target object state analysis method according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a computer according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The embodiment of the invention is an example in the technical field of criminal case detection which is the most widely applied to human face recognition technology, and specifically, facial expression analysis is performed according to a preset facial expression recognition model and image data by acquiring facial expression image information of a target object.

Fig. 1 shows a flowchart of a facial expression analysis method according to an embodiment of the present invention, which specifically includes the following steps:

s100: acquiring image data of a target object;

s200: and performing facial expression analysis based on a preset facial expression recognition model and the image data to obtain an analysis result, wherein the preset facial expression recognition model is generated by training based on an initial state image of the target object and a sample image containing state change.

In the embodiment of the invention, video recording equipment such as a law enforcement instrument and a camera is utilized to acquire non-contact facial expression image information in real time, acquired image data of a target object in an initial state and sample images of the target object in various stages during state change in the whole acquisition process are trained to obtain a facial expression recognition model, and the facial expression change condition of the target object is accurately obtained by analyzing the facial expression recognition model and the image data of the target object, so that the emotion change caused by the facial expression change of the target object can be accurately and objectively judged.

Fig. 2 shows an alternative embodiment of the present invention, and the process of training and generating the preset facial expression recognition model specifically includes the following steps:

s201: and acquiring an initial state image of the target object as a reference sample image.

Specifically, an initial state image of the target object is acquired, that is, image data of the target object in a calm state is acquired and used as a reference sample image. In an actual application process, when the initial state image of the target object is acquired, the target object may be in a state of sadness (Sad), Happy (Happy), Fear (Fear), Disgust (distust), Surprise (surrise), anger (Angry), and the like, for example, which is not limited by the invention.

S202: and adjusting the color space attribute of the reference sample image to generate a reference sample image.

Specifically, the color space attributes such as brightness and contrast of the reference sample image are adjusted, the color space attributes such as brightness and contrast of the reference sample image are all adjusted to appropriate degrees, and a reference sample is generated based on the adjusted image data.

S203: and acquiring a change image of the target object in the emotion change stage.

Specifically, a change image of emotion change of the target object in the whole stage is acquired, wherein the emotion change is the emotion stage different from the initial state, and the change image during the emotion change is acquired in real time.

S204: and correlating the state change of the target object based on the amplitude and the frequency of the image information in the change image to generate a contrast sample image.

Specifically, the frequency and amplitude of the head peripheral motion of the target object in the change image are extracted, the parameter value of each fine vibration variable of the frequency and the amplitude is acquired, based on the parameter values, various mental states of the target object, such as vitality, tension, attack state and the like, are associated and resolved, the mental states are associated with the state of the target object, and a comparison sample image is generated based on the corresponding relation. In practice, the maximum frequency and average amplitude of the motion around the head is displayed in the form of a ring of vibrating light. As shown in fig. 3 and 4, the vibration frequency is displayed as the color of the vibration halo, red represents the active and aggressive color, yellow represents the annoying and tense color, green represents the normal state color, purple represents the resting and calm color, the average amplitude is displayed as the size of the vibration halo, and the emotional change of the target object is visually displayed according to the change of the color and the size of the vibration halo, which is not limited in the present invention.

S205: training a neural network model based on the reference sample image and the comparison sample image to generate the preset facial expression recognition model.

Specifically, three-dimensional image processing is performed based on image information of the reference sample image to obtain a dynamic image in a three-dimensional space, head and neck movement and facial muscle fluctuation of the target object caused by emotion change are analyzed through three-dimensional control, a plurality of video images are collected in the whole image information collection process, frame differences are collected in the plurality of video images while frames are accumulated, digital visualization of each parameter for determining psychology and physiology is achieved, display is performed through amplitude pixels and frequency pixels of the change image, and the preset facial expression recognition model is generated.

In the embodiment of the invention, an initial state image of the target object in a calm state is acquired through a network or a camera and is used as a reference sample image, the color space attribute of the reference sample image is adjusted, the adjusted reference sample image is generated into a reference sample image, micro-motion data of facial muscles of the target object is acquired through the network or the camera in real time in the whole image acquisition process, namely, change image data of facial expressions when emotion changes occur in each stage of the target object in the whole image acquisition process is acquired, vibration parameters of all pixel points in the change image data are extracted, the vibration parameters are frequency change and amplitude change data of the change image data, and are associated with the state change of the target object based on the frequency change and amplitude change data of each frame in the change image data, and correspondingly forming a contrast sample image, training a neural network model based on the reference sample image and the contrast sample image, and generating the preset facial expression recognition model. The method has the advantages that the image data of the initial state of the target object are generated into the reference sample image, the image data are converted to generate the comparison sample image, the facial expression recognition model is generated through training of the reference sample image and the comparison sample image, the state change conditions of the target object at all stages in the whole interrogation process can be comprehensively analyzed, the real change conditions of the target object can be obtained through more objective and accurate analysis, case detection is further assisted, and working efficiency is improved.

In the embodiment of the invention, the whole image acquisition process is divided into five stages, wherein the first stage acquires the image data of the initial state of the target object; the second stage is used for acquiring facial expression data of the target object when the target object is calmer; the third stage collects the facial expression image of the target object in real time, and analyzes the psychological state and the emotion change of the target object in real time based on the vibration image generated by the facial expression image; a fourth stage of acquiring the facial expression of the target object in real time by exciting strong emotion change of the target object, extracting frequency and amplitude in image information of a target object image facial image, associating parameter values based on the frequency and the amplitude with the emotion change of the target object, and judging the authenticity of the acquired audio information of the target object through the emotion change; and in the fifth stage, the facial information of the target object is collected and compared with the facial expression image collected in the first stage, and the psychological change condition of the target object in the whole image collection process is comprehensively obtained based on the comparison result. In practical application, the embodiment of the invention combines a five-stage inquiry method to assist criminal case investigation work, specifically, in a first-stage mental State Tracking Stage (ST), a video recording device collects image data of a sitting State of a subject, and analyzes a basic psychological State of the subject at the moment; a second stage Adaptation (AP) for continuously collecting the image data of the subject when the subject is calm and simultaneously collecting the audio data of the subject freely speaking, and analyzing the psychological state of the subject at the moment based on the image data and the audio information; a third stage recall stage (IR) of continuously collecting audio information and facial expression images of the subject, establishing the association between the audio information and the facial expression images, and analyzing the generated vibration images so as to judge the emotion change state of the subject; a fourth stage of deepening question stage (Conscrete Classification), actively stimulating the emotion change of the subject through inquiry, analyzing the facial expression image data of the subject in the stage in real time, judging the psychological state of the subject through the change condition of the vibration image, and adjusting the inquiry mode in a pertinence manner through the change condition of the vibration image; and in a fifth stage adjustment stage (Arrange Tracking), acquiring facial expression image data of the subject in the stage through video recording equipment, comparing the facial expression image data with the facial expression image data of the subject acquired in the first stage to obtain a comparison result, analyzing the psychological change condition of the subject in the whole inquiry process based on the comparison result, comprehensively obtaining the psychological change condition of the subject in the five-stage inquiry method process, and playing an auxiliary reference role in the investigation of criminal case detection.

In an optional embodiment of the present invention, the step S202 mentioned above adjusts the color space attribute of the reference sample image, and the method for generating the reference sample image specifically includes the following steps:

(1) adjusting the brightness and/or contrast of the region of interest portion of the reference sample image.

Specifically, in this embodiment, parameter values of frequency and amplitude in the reference sample image are extracted, three-dimensional spatial processing is performed on the reference sample image based on the parameter values, a dynamic image of the three-dimensional space of the reference sample image is obtained, different color images are displayed in the dynamic image after the parameter values are comprehensively processed, the maximum frequency and the evaluation amplitude obtained by the head movement of the target object are regarded as a halo, the frequency of vibration is used for representing the color of the vibration halo, the average amplitude is used for representing the size of the halo, and the brightness and/or the contrast of the halo are/is adjusted, so that the region of interest in the reference sample image is displayed more clearly. In practical application, the halo in the reference sample image is uniform, and the region of interest is a region where the facial features of the target object are more prominent, which is not limited by the present invention.

(2) And replacing the region of interest part in the original reference sample image by using the adjusted region of interest part to generate the reference sample image.

Specifically, in this embodiment, an expression data enhancement method based on a region of interest is used, a region of the target object, such as the facial five sense organs and the chin, where amplitude and frequency characteristics are more significant in expression recognition research is set as the region of interest, and the reference sample image is generated by replacing image data of the target object in an initial state with a facial image, which is adjusted by color space attributes and has a more significant region of interest.

In the embodiment of the present invention, the color space attribute of the remembered sample image is adjusted, the color space attribute such as the brightness, the contrast, and the like of the reference sample image is adjusted, the region of interest is divided from the facial image data of the target object on the basis of the expression data enhancement method of the region of interest, and the region of interest of the original reference sample image is replaced by the adjusted region of interest, so as to generate the reference sample image. By adjusting the color space attribute of the original reference sample image, the influence of illumination on the expression recognition is reduced, and the accuracy and the scientificity of the expression recognition in the auxiliary interrogation process are further improved.

In an optional embodiment of the present invention, in the step S202, the color space attribute of the reference sample image is adjusted, and the method for generating a reference sample image may further include the following steps:

(1) and inputting the adjusted reference sample image into a depth generation model, and outputting a reference sample image with the similarity to the original reference sample image being greater than a preset threshold value.

Specifically, the adjusted reference sample image is input into a depth generation model, the depth generation model includes a generator and a discriminator, the generator receives random noise data of the reference sample image, generates a false sample image according to the random noise data, the false sample image is a noise image generated based on one random noise in the reference sample image data, the discriminator receives the false sample image and the original reference sample image, compares the similarity of the false sample image and the original reference sample image, and performs parameter adjustment on the false sample image based on the similarity until the similarity is greater than a preset threshold, and outputs a reference sample image whose similarity with the original reference sample image is greater than the preset threshold.

In this embodiment, a false sample image is generated based on random noise data generated by the depth generation model receiving the adjusted reference sample image, the added sample image is compared with the original reference sample image by using the discriminator for similarity comparison, and finally the false sample image with the similarity larger than the preset threshold is output as the reference sample image. The false sample image is continuously generated through a depth generation model, and is continuously distinguished from the original reference sample image, and finally, an image with the same style as the initial state image is generated, so that the robustness of the facial expression analysis method can be greatly improved.

The above embodiments are merely preferred embodiments of the present invention, and the present invention can be applied to various application scenarios, for example, in the technical fields of computer vision, social emotion analysis, medical diagnosis, etc., and can also be applied to analyzing the psychological state of a target object in various fields.

Fig. 5 shows a target object state analysis method according to an embodiment of the present invention, where the target object state analysis method specifically includes the following steps:

s300: acquiring image information of a target object, acquiring audio information of the target object, and determining a corresponding relation between the image information and the audio information based on time information;

s400: performing facial expression analysis based on a preset facial expression recognition model and the image information to obtain a facial expression analysis result;

s500: and performing state analysis according to the corresponding relation between the image information and the audio information and the facial expression analysis result to obtain a state analysis result.

Specifically, in this embodiment, video recording devices such as a law enforcement instrument and a camera are used to obtain facial expression image information of a target object, audio devices such as a microphone are used to collect audio information of the target object, in the whole image collection process, different audio information collected at different time stages and image information of the target object at corresponding stages are corresponded to establish a corresponding relationship, facial expression analysis is performed on the image information according to a preset facial expression recognition model, visual novel image information is formed by relying on medical imaging and biological feature recognition technologies, states of different emotions such as aggressivity, pressure, false reaction and the like of the target object are analyzed and recognized according to different curves reflected by the novel image information and the corresponding audio information, and a state analysis result is obtained.

Specifically, the facial expression recognition model used in this embodiment is generated by training using the facial expression analysis method described in the above embodiment, and for details, reference may be made to the related description of any method embodiment described above.

In this embodiment, the change condition of the image information of the target object at each time stage in the whole image acquisition process is comprehensively analyzed and identified through the corresponding relationship between the image information of the target object and the audio information, so that the emotion change and the state change of the target object at different stages in the whole image acquisition process are more scientifically and objectively obtained.

Fig. 6 shows an alternative embodiment of the present invention, wherein the method for performing state analysis according to the correspondence between the image information and the audio information and the facial expression analysis result in step S500 to obtain a state analysis result specifically includes the following steps:

s501: determining a corresponding relation between the audio information and the facial expression analysis result based on the corresponding relation between the image information and the audio information;

s502: and analyzing the facial expression analysis result by using three-dimensional control based on the corresponding relation between the audio information and the facial expression analysis result, and calculating and analyzing to obtain the state analysis result.

Specifically, in this embodiment, the correspondence between the audio information and the facial expression analysis result is determined based on the correspondence between the image information and the audio information, the association between the facial microvibration of the target subject and the vestibular organ (emotional reflex VER) of the target subject is analyzed by three-dimensional control (3D), head and neck movement and fluctuation due to emotion change are analyzed by three-dimensional control, frame differences are collected in a plurality of image data in the change image, frames are accumulated at the same time, a special mathematical formula is used to calculate and analyze the corresponding result, digital visualization of each parameter for determining psychology and physiology is realized, and technical grade classification discrimination is performed, as shown in fig. 7, this novel image displays unique information of the target subject, so that various psychological states can be analyzed, and it is more clearly discriminated that the target subject is, for example, angry, Stress, aggression, and the like. In practical application, the novel image is an emotion-energy change map, unique information can be generated like a color map, a thermodynamic map or an X-ray map, the emotion-energy change map can sense emotion fluctuation of the target object in the whole image acquisition process, and the credibility of the audio information of the target object in the image acquisition process can be inferred by combining the corresponding relation between the image information and the audio information.

In this embodiment, the corresponding relationship between the audio information and the facial expression analysis result is analyzed through three-dimensional control (3D), and a state analysis result of the target object is obtained, so that during the auxiliary interrogation, can produce objective and accurate analysis conclusion, and can produce data through the emotion-energy change graph, such as parameters of excitement, sensitivity of the target object, vibration amplitude parameters, overall parameters of mood values, lie probability parameters, mood variation parameters, scientifically judging the emotion change condition of the target object through the change condition of each parameter and the vibration image generated by analyzing the face image amplitude and frequency change of the target object, and judging the psychological change condition of the target object based on the emotional change condition, and establishing scientific basis for the incidence relation of the audio information and the image information. In practical application, the vibration graph reflects the psycho-physiological change of the target object in real time, and the video recording device obtains the noise picture of the target object, wherein the data generated by the emotion-energy change graph may be one of the parameter data, any parameter combination in the parameter data, or other related parameter data, and the invention is not limited thereto.

It should be understood that, although the steps in the flowcharts of fig. 1, 2, 5, and 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1, 2, 5, and 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or in alternation with other steps or at least a portion of the other steps or stages.

As shown in fig. 8, the present embodiment provides a facial expression analysis apparatus including: an acquisition module 1 and a training module 2, wherein:

an obtaining module 1, configured to obtain image data of a target object, for details, see the related description of step S100 in any of the above method embodiments;

the training module 2 is configured to perform facial expression analysis based on a preset facial expression recognition model and the image data to obtain an analysis result, where the preset facial expression recognition model is generated by performing training based on an initial state image of the target object and a sample image containing a state change, and details of the preset facial expression recognition model may be referred to in the related description of step S200 in any method embodiment.

In the embodiment of the invention, video recording equipment such as a law enforcement instrument and a camera is utilized to carry out unconscious and non-contact facial expression image information acquisition in real time, the acquired image data of the target object in the initial state and the sample images of the target object in the state change of each stage in the whole acquisition process are trained to obtain a facial expression recognition model, and the facial expression change condition of the target object is accurately obtained by analyzing the facial expression recognition model and the image data of the target object, so that the emotion change caused by the facial expression change of the target object can be accurately and objectively judged.

As shown in fig. 9, an embodiment of the present invention provides a target object state analysis apparatus, including: collection module 3, analysis module 4, communication module 5, wherein:

the acquisition module 3 is configured to acquire image information of a target object, acquire audio information of the target object, and determine a corresponding relationship between the image information and the audio information based on time information, where details may refer to related description of step S300 in any of the above method embodiments;

an analysis module 4, configured to perform facial expression analysis based on a preset facial expression recognition model and the image information to obtain a facial expression analysis result, where details may refer to relevant description of step S400 in any of the above method embodiments;

the communication module 5 is configured to perform state analysis according to the correspondence between the image information and the audio information and the facial expression analysis result to obtain a state analysis result, and the detailed content may refer to the related description of step S500 in any of the above method embodiments.

In this embodiment, the change condition of the image information of the target object at each time stage in the whole process of the auxiliary interrogation is comprehensively analyzed and identified through the corresponding relation between the image information of the target object and the audio information, so that the emotion change and the state change of the target object at different stages in the interrogation process are more scientifically and objectively obtained, and the interrogation efficiency is improved.

For specific limitations and beneficial effects of the facial expression analysis apparatus and the target object state analysis apparatus, reference may be made to the above limitations of the facial expression analysis method and the target object state analysis method, which are not described herein again. The modules of the facial expression analysis apparatus and the target object state analysis apparatus may be wholly or partially implemented by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.

Fig. 10 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, where the computer device may include at least one processor 41, at least one communication interface 42, at least one communication bus 43, and at least one memory 44, where the communication interface 42 may include a Display (Display) and a Keyboard (Keyboard), and the alternative communication interface 42 may also include a standard wired interface and a standard wireless interface. The Memory 44 may be a high-speed RAM Memory (volatile Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 44 may alternatively be at least one memory device located remotely from the aforementioned processor 41. Wherein the processor 41 may be combined with the apparatus described in fig. 8 and fig. 9, the memory 44 stores an application program, and the processor 41 calls the program code stored in the memory 44 for executing the steps of the facial expression analysis method and the target object state analysis method of any of the above-mentioned method embodiments.

The communication bus 43 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 43 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.

The memory 44 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 44 may also comprise a combination of the above-mentioned kinds of memories.

The processor 41 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.

The processor 41 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the memory 44 is also used to store program instructions. Processor 41 may call program instructions to implement a method as described in any of the embodiments of the invention.

Embodiments of the present invention further provide a non-transitory computer storage medium, where computer-executable instructions are stored, and the computer-executable instructions may execute the method described in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. A facial expression analysis method, comprising:

acquiring image data of a target object;

and performing facial expression analysis based on a preset facial expression recognition model and the image data to obtain an analysis result, wherein the preset facial expression recognition model is generated by training based on an initial state image of the target object and a sample image containing state change.

2. The method of claim 1, wherein training the process of generating the preset facial expression recognition model comprises:

acquiring an initial state image of the target object as a reference sample image;

adjusting the color space attribute of the reference sample image to generate a reference sample image;

acquiring a change image of the target object in an emotion change stage;

correlating the state change of the target object based on the amplitude and the frequency of the image information in the change image to generate a contrast sample image;

training a neural network model based on the reference sample image and the comparison sample image to generate the preset facial expression recognition model.

3. The method of analyzing facial expressions according to claim 2, wherein the adjusting color space attributes of the reference sample image to generate a reference sample image comprises:

adjusting a brightness and/or contrast of a region of interest portion of the reference sample image;

and replacing the region of interest part in the original reference sample image by using the adjusted region of interest part to generate the reference sample image.

4. A facial expression analysis method according to claim 2 or 3, wherein the adjusting of the color space attribute of the reference sample image to generate a reference sample image further comprises:

inputting the adjusted reference sample image into a depth generation model, and outputting a reference sample image with the similarity to the adjusted reference sample image smaller than a preset threshold;

the training process of the depth generation model comprises the following steps:

generating a false sample image according to random noise and the adjusted reference sample image by a generator of the depth generation model;

judging the similarity between the false sample image and the original reference sample image through a discriminator of the depth generation model;

and adjusting parameters of the depth generation model based on the judgment result until the similarity between the generated false sample image and the original reference sample image is greater than a preset threshold value.

5. A target object state analysis method is characterized by comprising the following steps:

acquiring image information of a target object, acquiring audio information of the target object, and determining a corresponding relation between the image information and the audio information based on time information;

performing facial expression analysis based on a preset facial expression recognition model and the image information to obtain a facial expression analysis result;

and performing state analysis according to the corresponding relation between the image information and the audio information and the facial expression analysis result to obtain a state analysis result.

6. The target object state analysis method according to claim 5, wherein the preset facial expression recognition model is generated by training using the facial expression analysis method according to any one of claims 2 to 4.

7. The method for analyzing the state of a target object according to claim 5, wherein performing the state analysis according to the correspondence between the image information and the audio information and the facial expression analysis result to obtain the state analysis result comprises:

determining a corresponding relation between the audio information and the facial expression analysis result based on the corresponding relation between the image information and the audio information;

and analyzing the facial expression analysis result by using three-dimensional control based on the corresponding relation between the audio information and the facial expression analysis result, and calculating and analyzing to obtain the state analysis result.

8. A facial expression analysis apparatus, comprising:

the acquisition module is used for acquiring image data of a target object;

and the training module is used for carrying out facial expression analysis based on a preset facial expression recognition model and the image data to obtain an analysis result, wherein the preset facial expression recognition model is generated by training based on the initial state image of the target object and the sample image containing the state change.

9. A target object state analysis device, comprising:

the acquisition module is used for acquiring image information of a target object, acquiring audio information of the target object, and determining the corresponding relation between the image information and the audio information based on time information;

the analysis module is used for carrying out facial expression analysis based on a preset facial expression recognition model and the image information to obtain a facial expression analysis result;

and the communication module is used for carrying out state analysis according to the corresponding relation between the image information and the audio information and the facial expression analysis result to obtain a state analysis result.

10. A computer device, comprising:

a communication unit, a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor performing the steps of the facial expression analysis method according to any one of claims 1 to 4 or performing the steps of the target object state analysis method according to any one of claims 5 to 7 by executing the computer instructions.

11. A computer-readable storage medium, characterized in that it stores computer instructions for causing the computer to perform the steps of the method of any one of claims 1-4 or the steps of the target object state analysis method of any one of claims 5-7.