CN113053385A - Abnormal emotion detection method and device - Google Patents

Abnormal emotion detection method and device Download PDF

Info

Publication number
CN113053385A
CN113053385A CN202110338416.4A CN202110338416A CN113053385A CN 113053385 A CN113053385 A CN 113053385A CN 202110338416 A CN202110338416 A CN 202110338416A CN 113053385 A CN113053385 A CN 113053385A
Authority
CN
China
Prior art keywords
abnormal
recognition result
emotion
result
person
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110338416.4A
Other languages
Chinese (zh)
Inventor
高伟
张磊
郑广斌
郭锐鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110338416.4A priority Critical patent/CN113053385A/en
Publication of CN113053385A publication Critical patent/CN113053385A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Psychiatry (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Signal Processing (AREA)
  • Social Psychology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an abnormal emotion detection method and device, which can be used in the financial field or other fields. The method comprises the following steps: acquiring station video data, regional video data and personnel voice data, analyzing the station video data and the regional video data, and respectively obtaining a station picture group and a regional picture group; identifying the station picture group and the region picture group to obtain a plurality of personnel emotion identification results, and identifying the personnel voice data to obtain a plurality of voice identification results; and generating an abnormal emotion detection result according to the emotion recognition result of the person, the voice recognition result and a preset behavior rule parameter. The invention can not only improve the automation level of field detection, but also ensure the stable emotion of the staff, provide the working quality and assist the rapid development of enterprise business by acquiring the audio and video data of the staff during working in real time and identifying and detecting the emotion of the staff on the working field in real time.

Description

Abnormal emotion detection method and device
Technical Field
The invention relates to the technical field of emotion detection, in particular to an abnormal emotion detection method and device.
Background
At present, due to the characteristics of the industries, the emotional state of the enterprise staff on the working site directly influences the performance of the enterprise and even influences the development of the enterprise. For example, in industries where customer service personnel and acquirers deal with special problems, staff need to strictly comply with the on-site work behavior rules, but often face emotional interference and other problems. If abnormal emotions of the staff cannot be detected in time, the system and the method have great influence on the body and mind of the staff and even the development of enterprises. However, in the current industries, abnormal emotions of employees cannot be detected comprehensively, accurately and timely.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention mainly aims to provide an abnormal emotion detection method and device, so that abnormal emotions of staff can be comprehensively, accurately and timely detected.
In order to achieve the above object, an embodiment of the present invention provides an abnormal emotion detection method, including:
acquiring station video data, regional video data and personnel voice data, analyzing the station video data and the regional video data, and respectively obtaining a station picture group and a regional picture group;
identifying the station picture group and the region picture group to obtain a plurality of personnel emotion identification results, and identifying the personnel voice data to obtain a plurality of voice identification results;
and generating an abnormal emotion detection result according to the emotion recognition result of the person, the voice recognition result and a preset behavior rule parameter.
Optionally, in an embodiment of the present invention, the identifying the workstation group of pictures and the region group of pictures to obtain a plurality of emotion recognition results includes:
identifying the gait of the personnel in the regional picture group to obtain a personnel gait identification result; wherein the person gait recognition result belongs to the person emotion recognition result.
Optionally, in an embodiment of the present invention, the generating an abnormal emotion detection result according to the person emotion recognition result, the voice recognition result, and a preset behavior rule parameter includes:
if the gait emotion result in the person gait recognition result is abnormal, judging whether the gait abnormal time in the person gait recognition result is greater than a gait time threshold value in the behavior rule parameter, if so, generating a gait abnormal recognition result according to the gait abnormal time; wherein the gait abnormal recognition result belongs to the abnormal emotion detection result.
Optionally, in an embodiment of the present invention, the identifying the workstation group of pictures and the region group of pictures to obtain a plurality of emotion recognition results further includes:
carrying out facial expression recognition on personnel in the station picture group to obtain a personnel expression recognition result; wherein the person expression recognition result belongs to the person emotion recognition result.
Optionally, in an embodiment of the present invention, the generating an abnormal emotion detection result according to the person emotion recognition result, the voice recognition result, and a preset behavior rule parameter further includes:
if the facial expression result in the human expression recognition result is abnormal, judging whether the expression abnormal time in the human expression recognition result is greater than the expression time threshold value in the behavior rule parameter, if so, generating an expression abnormal recognition result according to the expression abnormal time; and the expression abnormal recognition result belongs to the abnormal emotion detection result.
Optionally, in an embodiment of the present invention, the recognizing the person voice data to obtain a plurality of voice recognition results includes:
performing voice recognition on the personnel voice data to obtain a text emotion recognition result, a speech speed recognition result and a volume recognition result; and the text emotion recognition result, the speech speed recognition result and the volume recognition result belong to the voice recognition result.
Optionally, in an embodiment of the present invention, the generating an abnormal emotion detection result according to the person emotion recognition result, the voice recognition result, and a preset behavior rule parameter further includes:
if the text emotion recognition result in the text emotion recognition result is abnormal, judging whether text abnormal time in the text emotion recognition result is larger than a text time threshold value in the behavior rule parameter, and if so, generating a text abnormal recognition result according to the text abnormal time; wherein the text abnormal recognition result belongs to the abnormal emotion detection result;
if the maximum value of the speed of speech of the person in the speed of speech recognition result is larger than the speed of speech threshold of the person according to the speed of speech threshold of the person in the behavior rule parameter, judging whether the speed of speech time of the person in the speed of speech recognition result is larger than the overspeed time threshold of the behavior rule parameter, and if so, generating a speed of speech abnormal result according to the speed of speech time of the person; wherein the person abnormal speed result belongs to the abnormal emotion detection result;
if the maximum value of the personnel volume in the volume recognition result is larger than the personnel volume threshold according to the personnel volume threshold in the behavior rule parameter, judging whether the personnel volume time in the volume recognition result is larger than the volume time threshold in the behavior rule parameter, and if so, generating a personnel volume abnormal result according to the personnel volume time; wherein the person volume abnormal result belongs to the abnormal emotion detection result.
The embodiment of the invention also provides an abnormal emotion detection device, which comprises:
the data acquisition module is used for acquiring station video data, regional video data and personnel voice data, analyzing the station video data and the regional video data and respectively obtaining a station picture group and a regional picture group;
the data identification module is used for identifying the station picture group and the region picture group to obtain a plurality of personnel emotion identification results, and identifying the personnel voice data to obtain a plurality of voice identification results;
and the detection result module is used for generating an abnormal emotion detection result according to the emotion recognition result of the person, the voice recognition result and a preset behavior rule parameter.
Optionally, in an embodiment of the present invention, the data identification module is further configured to identify the gait of the person in the region picture group to obtain a result of identifying the gait of the person; wherein the person gait recognition result belongs to the person emotion recognition result.
Optionally, in an embodiment of the present invention, the detection result module is further configured to, if a gait emotion result in the person gait recognition result is abnormal, determine whether gait abnormal time in the person gait recognition result is greater than a gait time threshold in the behavior rule parameter, and if so, generate a gait abnormal recognition result according to the gait abnormal time; wherein the gait abnormal recognition result belongs to the abnormal emotion detection result.
Optionally, in an embodiment of the present invention, the data identification module is further configured to perform facial expression identification on a person in the workstation image group to obtain a person expression identification result; wherein the person expression recognition result belongs to the person emotion recognition result.
Optionally, in an embodiment of the present invention, the detection result module is further configured to, if a facial expression result in the human expression recognition result is abnormal, determine whether an expression abnormal time in the human expression recognition result is greater than an expression time threshold in the behavior rule parameter, and if so, generate an expression abnormal recognition result according to the expression abnormal time; and the expression abnormal recognition result belongs to the abnormal emotion detection result.
Optionally, in an embodiment of the present invention, the data recognition module is further configured to perform voice recognition on the person voice data to obtain a text emotion recognition result, a speech rate recognition result, and a volume recognition result; and the text emotion recognition result, the speech speed recognition result and the volume recognition result belong to the voice recognition result.
Optionally, in an embodiment of the present invention, the detection result module includes:
the text emotion unit is used for judging whether text abnormal time in the text emotion recognition result is larger than a text time threshold value in the behavior rule parameter if the text emotion recognition result in the text emotion recognition result is abnormal, and generating a text abnormal recognition result according to the text abnormal time if the text emotion recognition result in the text emotion recognition result is abnormal; wherein the text abnormal recognition result belongs to the abnormal emotion detection result;
a speed detection unit, configured to determine whether a speed time of a person in the speed recognition result is greater than an overspeed time threshold in the behavior rule parameter if it is known that a maximum speed of the person in the speed recognition result is greater than the speed threshold, and if so, generate a speed abnormal result according to the speed time of the person; wherein the person abnormal speed result belongs to the abnormal emotion detection result;
the volume detection unit is used for judging whether the personnel volume time in the volume identification result is greater than the personnel volume threshold value in the behavior rule parameter or not if the personnel volume maximum value in the volume identification result is greater than the personnel volume threshold value in the behavior rule parameter according to the personnel volume threshold value in the behavior rule parameter, and generating a personnel volume abnormal result according to the personnel volume time if the personnel volume maximum value in the volume identification result is greater than the personnel volume threshold value in the behavior rule parameter; wherein the person volume abnormal result belongs to the abnormal emotion detection result.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the program.
The present invention also provides a computer-readable storage medium storing a computer program for executing the above method.
The invention can not only improve the automation level of field detection, but also ensure the stable emotion of the staff, provide the working quality and assist the rapid development of enterprise business by acquiring the audio and video data of the staff during working in real time and identifying and detecting the emotion of the staff on the working field in real time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for detecting abnormal emotions according to an embodiment of the present invention;
FIG. 2 is a flow chart of abnormal emotion detection in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an abnormal emotion detecting apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a detection result module according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method and a device for detecting abnormal emotion, which can be used in the financial field or other fields.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart illustrating an abnormal emotion detection method according to an embodiment of the present invention, where an execution subject of the abnormal emotion detection method provided by the embodiment of the present invention includes, but is not limited to, a computer. The method shown in the figure comprises the following steps:
and step S1, station video data, regional video data and personnel voice data are obtained, the station video data and the regional video data are analyzed, and a station picture group and a regional picture group are obtained respectively.
The workstation video data, the regional video data and the personnel voice data can be acquired through a workstation camera of an employee workstation, a regional camera of an office region and a microphone of the employee workstation respectively. The station video data and the region video data are analyzed, specifically, a conventional technical means can be adopted to analyze the video stream, and the station video data and the region video data are analyzed into a continuous picture group to obtain a station picture group and a region picture group.
And step S2, recognizing the station picture group and the region picture group to obtain a plurality of personnel emotion recognition results, and recognizing the personnel voice data to obtain a plurality of voice recognition results.
The method comprises the steps of identifying a station picture group and an area picture group by adopting the conventional artificial intelligence identification technology, such as a Resnet18 model, an LSTM + random classifier of an Ewalk data set and the like, and obtaining a plurality of personnel emotion identification results. Specifically, the person emotion recognition result includes a person gait recognition result and a person expression recognition result.
Furthermore, the existing voice recognition technology is adopted to recognize the voice data of the personnel, specifically, the voice data of the personnel is converted into text data, and the speed and the volume of the voice data of the personnel are detected. And the obtained text emotion recognition result, the obtained speed recognition result and the obtained volume recognition result all belong to the voice recognition result.
And step S3, generating an abnormal emotion detection result according to the emotion recognition result of the person, the voice recognition result and the preset behavior rule parameters.
The preset behavior parameters comprise a gait time threshold, an expression time threshold, a text time threshold, a person speed threshold, an overspeed time threshold, a person volume threshold, a volume time threshold and the like. And obtaining an abnormal behavior detection result by using the preset behavior parameters, the emotion recognition result of the personnel and the voice recognition result. Specifically, the abnormal behavior detection result includes a gait abnormal recognition result, an expression abnormal recognition result, a text abnormal recognition result, a person speed abnormal result, and a person volume abnormal result. The abnormal behavior detection result represents the condition that the person has emotional fluctuation such as difficulty, anger, sadness, anger, dissatisfaction and the like.
As an embodiment of the present invention, the identifying the workstation group of pictures and the region group of pictures to obtain a plurality of emotion recognition results includes: identifying the gait of the personnel in the regional picture group to obtain a personnel gait identification result; wherein the person gait recognition result belongs to the person emotion recognition result.
The gait of the personnel in the continuous regional image group is identified, and particularly, the detection and identification can be carried out by adopting an LSTM + random classifier of an Ewalk data set. The detection and identification result of the gait of the person comprises gait emotion results such as too much emotion, anger, calmness, cheerfulness and the like of the person presented by the gait of the person. Specifically, when the gait emotion result is sadness, anger or anger, the gait emotion result is determined to be abnormal. In addition, the human gait recognition result also comprises specific time and duration corresponding to different abnormal emotions, namely gait abnormal time.
In this embodiment, generating an abnormal emotion detection result according to the emotion recognition result of the person, the speech recognition result, and the preset behavior rule parameter includes: if the gait emotion result in the person gait recognition result is abnormal, judging whether the gait abnormal time in the person gait recognition result is greater than a gait time threshold value in the behavior rule parameter, if so, generating a gait abnormal recognition result according to the gait abnormal time; wherein the gait abnormal recognition result belongs to the abnormal emotion detection result.
And if the gait emotion result in the person gait recognition result is sadness, difficulty, anger or anger, the gait emotion result is considered to be abnormal. And if the gait emotion result in the person gait recognition result is abnormal, judging whether the gait abnormal time is greater than a gait time threshold (for example, 30 seconds) in the behavior rule parameter. If yes, the person is indicated to have abnormal emotion. And generating a gait abnormal recognition result according to the gait abnormal time, wherein the gait abnormal recognition result specifically comprises the personnel identity, the gait emotional result (anger, sadness and the like), and the specific time and duration (gait abnormal time) corresponding to the gait emotional result.
Furthermore, the gait abnormal recognition result belongs to an abnormal emotion detection result, when the abnormal emotion detection result is detected, early warning information is generated and sent to corresponding abnormal condition handling personnel, abnormal emotion of the personnel is handled in time, and the personnel is pacified.
As an embodiment of the present invention, identifying the workstation group of pictures and the region group of pictures to obtain a plurality of emotion recognition results further includes: carrying out facial expression recognition on personnel in the station picture group to obtain a personnel expression recognition result; wherein the person expression recognition result belongs to the person emotion recognition result.
And performing facial expression recognition on the personnel in the continuous workstation picture group by adopting a Resnet18 model to obtain a personnel expression recognition result. Specifically, the human expression recognition result comprises a facial expression result and corresponding expression abnormal time, and the facial expression result comprises happiness, anger, discomfort, anger and the like. And when the facial expression result is sadness, anger, difficulty or anger, judging that the facial expression result is abnormal. The facial expression result is the specific time corresponding to the abnormal time, and the duration is the expression abnormal time.
In this embodiment, generating an abnormal emotion detection result according to the emotion recognition result of the person, the speech recognition result, and the preset behavior rule parameter further includes: if the facial expression result in the human expression recognition result is abnormal, judging whether the expression abnormal time in the human expression recognition result is greater than the expression time threshold value in the behavior rule parameter, if so, generating an expression abnormal recognition result according to the expression abnormal time; and the expression abnormal recognition result belongs to the abnormal emotion detection result.
And if the facial expression result in the facial expression recognition result of the person is sadness, anger, difficulty or anger, judging that the facial expression result is abnormal. If the facial expression result is abnormal, whether the corresponding expression abnormal time is greater than an expression time threshold (for example, 30 seconds) in the behavior rule parameters is judged. If yes, the person is indicated to have abnormal emotion. And generating an expression abnormal recognition result according to the expression abnormal time, wherein the expression abnormal recognition result comprises the personnel identity, the facial expression result (anger, hurry and the like), and the specific time and duration of the facial expression result.
Furthermore, the expression abnormal recognition result belongs to an abnormal emotion detection result, when the abnormal emotion detection result is detected, early warning information is generated and sent to corresponding abnormal condition handling personnel, abnormal emotion of the handling personnel is handled in time, and the handling personnel are pacified.
As an embodiment of the present invention, recognizing the human voice data to obtain a plurality of voice recognition results includes: performing voice recognition on the personnel voice data to obtain a text emotion recognition result, a speech speed recognition result and a volume recognition result; and the text emotion recognition result, the speech speed recognition result and the volume recognition result belong to the voice recognition result.
Wherein, the speech recognition technology is adopted to carry out the speech recognition on the collected personnel speech data. The obtained speech recognition results comprise a text emotion recognition result, a speech speed recognition result and a volume recognition result. Specifically, the text emotion recognition result is the emotion (e.g., anger, sadness, etc.) of the person presented by the person speech content in the speech data, the corresponding specific time and the duration (text abnormal time), and the emotion recognition can be performed on the person speech data by using the existing artificial intelligence recognition tool (e.g., a Baidu AI development platform). The speech rate identification result comprises the maximum value of the speech rate of the person and the duration of the maximum speech rate value. Specifically, the duration of the maximum speech rate value when changing within the preset error range may be set, so as to obtain the speech rate time of the person. In addition, the volume recognition result is the maximum volume value when the person speaks and the duration of the maximum volume value. Specifically, the duration of the maximum volume value when the maximum volume value changes within a preset error range (e.g., 2 db) may be used to obtain the volume time of the person.
In this embodiment, as shown in fig. 2, generating an abnormal emotion detection result according to the emotion recognition result of the person, the speech recognition result, and the preset behavior rule parameter further includes:
step S21, if the text emotion result in the text emotion recognition result is abnormal, judging whether the text abnormal time in the text emotion recognition result is greater than the text time threshold value in the behavior rule parameter, if so, generating a text abnormal recognition result according to the text abnormal time; wherein the text abnormal recognition result belongs to the abnormal emotion detection result.
And if the text emotion result is too difficult, angry or sad, judging that the text emotion result is abnormal. And if the text emotion result is abnormal, judging whether the text abnormal time in the text emotion recognition result is greater than a text time threshold (for example, 30 seconds) in the behavior rule parameters. If yes, the situation that the emotion of the person is abnormal is shown. And generating a text abnormal recognition result according to the text abnormal time, wherein the text abnormal recognition result comprises the personnel identity, the text emotional result (difficulty, anger or sadness), and the corresponding specific time and duration.
Step S22, if the maximum value of the speed of speech of the person in the speed of speech recognition result is larger than the speed of speech threshold of the person according to the speed of speech threshold of the person in the behavior rule parameter, judging whether the speed of speech time of the person in the speed of speech recognition result is larger than the overspeed time threshold of the behavior rule parameter, if so, generating a speed of speech abnormal result according to the speed of speech time of the person; wherein the person abnormal speech rate result belongs to the abnormal emotion detection result.
The behavior rule parameter includes a human speech rate threshold, which may be, for example, 5 words/second. And judging whether the maximum value of the speed of speech of the person is greater than the threshold value of the speed of speech of the person, if so, indicating that the speed of speech of the person is overspeed. And judging whether the speed time of the person in the speed identification result is greater than an overspeed time threshold value in the behavior rule parameter, if so, indicating that the speed of the person is overspeed. And generating a person speed abnormal result according to the person speed time, namely the specific time when the person speed is over speed. Specifically, the abnormal result of the speech rate of the person includes the identity of the person and the specific time when the speech rate of the person exceeds the speed.
Step S23, if the maximum value of the personnel volume in the volume recognition result is larger than the personnel volume threshold according to the personnel volume threshold in the behavior rule parameter, judging whether the personnel volume time in the volume recognition result is larger than the volume time threshold in the behavior rule parameter, if so, generating a personnel volume abnormal result according to the personnel volume time; wherein the person volume abnormal result belongs to the abnormal emotion detection result.
The behavior rule parameter includes a human volume threshold, which may be, for example, 20 db. And judging whether the maximum value of the personnel volume in the volume identification result is larger than a personnel volume threshold value, if so, indicating that the situation of overlarge personnel volume possibly occurs. Further, whether the personnel volume time in the volume identification result, namely the duration time of the overlarge volume, is greater than the volume time threshold in the behavior rule parameter is judged, and if yes, the situation that the personnel volume is overlarge is definitely caused is shown. And generating a personnel volume abnormal result according to the personnel volume time, wherein the personnel volume abnormal result specifically comprises the personnel identity and the specific time when the volume is overlarge.
Further, the text abnormal recognition result, the person speed abnormal result and the person volume abnormal result belong to abnormal emotion detection results, when the abnormal emotion detection results are detected, early warning information is generated and sent to corresponding abnormal condition handling personnel, abnormal emotions of the personnel are handled in time, and the personnel are pacified.
Fig. 3 is a schematic structural diagram of an abnormal emotion detecting apparatus according to an embodiment of the present invention, where the apparatus includes:
the data acquisition module 10 is configured to acquire station video data, regional video data, and personnel voice data, analyze the station video data and the regional video data, and obtain a station group of pictures and a regional group of pictures, respectively.
The workstation video data, the regional video data and the personnel voice data can be acquired through a workstation camera of an employee workstation, a regional camera of an office region and a microphone of the employee workstation respectively. The station video data and the region video data are analyzed, specifically, a conventional technical means can be adopted to analyze the video stream, and the station video data and the region video data are analyzed into a continuous picture group to obtain a station picture group and a region picture group.
And the data identification module 20 is configured to identify the station picture group and the region picture group to obtain a plurality of people emotion identification results, and identify the people voice data to obtain a plurality of voice identification results.
The method comprises the steps of identifying a station picture group and an area picture group by adopting the conventional artificial intelligence identification technology, such as a Resnet18 model, an LSTM + random classifier of an Ewalk data set and the like, and obtaining a plurality of personnel emotion identification results. Specifically, the person emotion recognition result includes a person gait recognition result and a person expression recognition result.
Furthermore, the existing voice recognition technology is adopted to recognize the voice data of the personnel, specifically, the voice data of the personnel is converted into text data, and the speed and the volume of the voice data of the personnel are detected. And the obtained text emotion recognition result, the obtained speed recognition result and the obtained volume recognition result all belong to the voice recognition result.
And the detection result module 30 is used for generating an abnormal emotion detection result according to the emotion recognition result of the person, the voice recognition result and the preset behavior rule parameters.
The preset behavior parameters comprise a gait time threshold, an expression time threshold, a text time threshold, a person speed threshold, an overspeed time threshold, a person volume threshold, a volume time threshold and the like. And obtaining an abnormal behavior detection result by using the preset behavior parameters, the emotion recognition result of the personnel and the voice recognition result. Specifically, the abnormal behavior detection result includes a gait abnormal recognition result, an expression abnormal recognition result, a text abnormal recognition result, a person speed abnormal result, and a person volume abnormal result. The abnormal behavior detection result represents the condition that the person has emotional fluctuation such as difficulty, anger, sadness, anger, dissatisfaction and the like.
Further, when an abnormal emotion detection result is detected, early warning information is generated and sent to corresponding abnormal condition handling personnel, abnormal emotion of the personnel is handled in time, and the personnel are pacified.
As an embodiment of the present invention, the data identification module is further configured to identify the gait of the person in the region picture group to obtain a person gait identification result; wherein the person gait recognition result belongs to the person emotion recognition result.
In this embodiment, the detection result module is further configured to, if the gait emotion result in the person gait recognition result is abnormal, determine whether gait abnormal time in the person gait recognition result is greater than a gait time threshold in the behavior rule parameter, and if so, generate a gait abnormal recognition result according to the gait abnormal time; wherein the gait abnormal recognition result belongs to the abnormal emotion detection result.
As an embodiment of the invention, the data identification module is further configured to perform facial expression identification on the personnel in the workstation image group to obtain a personnel expression identification result; wherein the person expression recognition result belongs to the person emotion recognition result.
In this embodiment, the detection result module is further configured to, if a facial expression result in the human expression recognition result is abnormal, determine whether expression abnormal time in the human expression recognition result is greater than an expression time threshold in the behavior rule parameter, and if so, generate an expression abnormal recognition result according to the expression abnormal time; and the expression abnormal recognition result belongs to the abnormal emotion detection result.
As an embodiment of the present invention, the data recognition module is further configured to perform voice recognition on the person voice data to obtain a text emotion recognition result, a speech rate recognition result, and a volume recognition result; and the text emotion recognition result, the speech speed recognition result and the volume recognition result belong to the voice recognition result.
In this embodiment, as shown in fig. 4, the detection result module 30 includes:
a text emotion unit 31, configured to, if a text emotion result in the text emotion recognition result is abnormal, determine whether text abnormal time in the text emotion recognition result is greater than a text time threshold in the behavior rule parameter, and if yes, generate a text abnormal recognition result according to the text abnormal time; wherein the text abnormal recognition result belongs to the abnormal emotion detection result;
a speed detection unit 32, configured to determine whether a speed time of a person in the speed recognition result is greater than an overspeed time threshold in the behavior rule parameter if it is known that a maximum speed of the person in the speed recognition result is greater than the speed threshold in the behavior rule parameter according to the speed threshold of the person in the behavior rule parameter, and if so, generate a speed abnormal result according to the speed time of the person; wherein the person abnormal speed result belongs to the abnormal emotion detection result;
the volume detection unit 33 is configured to, if it is known that the maximum value of the person volume in the volume identification result is greater than the person volume threshold according to the person volume threshold in the behavior rule parameter, determine whether the person volume time in the volume identification result is greater than the volume time threshold in the behavior rule parameter, and if so, generate a person volume abnormal result according to the person volume time; wherein the person volume abnormal result belongs to the abnormal emotion detection result.
Based on the same application concept as the abnormal emotion detection method, the invention also provides the abnormal emotion detection device. Because the principle of solving the problem of the abnormal emotion detection device is similar to that of an abnormal emotion detection method, the implementation of the abnormal emotion detection device can refer to the implementation of the abnormal emotion detection method, and repeated parts are not repeated.
The invention can not only improve the automation level of field detection, but also ensure the stable emotion of the staff, provide the working quality and assist the rapid development of enterprise business by acquiring the audio and video data of the staff during working in real time and identifying and detecting the emotion of the staff on the working field in real time.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the program.
The present invention also provides a computer-readable storage medium storing a computer program for executing the above method.
As shown in fig. 5, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in fig. 5; furthermore, the electronic device 600 may also comprise components not shown in fig. 5, which may be referred to in the prior art.
As shown in fig. 5, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.
The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.
The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.
The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.
The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).
The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (16)

1. An abnormal emotion detection method, characterized in that the method comprises:
acquiring station video data, regional video data and personnel voice data, analyzing the station video data and the regional video data, and respectively obtaining a station picture group and a regional picture group;
identifying the station picture group and the region picture group to obtain a plurality of personnel emotion identification results, and identifying the personnel voice data to obtain a plurality of voice identification results;
and generating an abnormal emotion detection result according to the emotion recognition result of the person, the voice recognition result and a preset behavior rule parameter.
2. The method of claim 1, wherein the identifying the workstation group of pictures and the region group of pictures to obtain a plurality of emotion recognition results of people comprises:
identifying the gait of the personnel in the regional picture group to obtain a personnel gait identification result; wherein the person gait recognition result belongs to the person emotion recognition result.
3. The method of claim 2, wherein the generating of the abnormal emotion detection result according to the person emotion recognition result, the voice recognition result and the preset behavior rule parameter comprises:
if the gait emotion result in the person gait recognition result is abnormal, judging whether the gait abnormal time in the person gait recognition result is greater than a gait time threshold value in the behavior rule parameter, if so, generating a gait abnormal recognition result according to the gait abnormal time; wherein the gait abnormal recognition result belongs to the abnormal emotion detection result.
4. The method of claim 1, wherein the identifying the workstation group of pictures and the region group of pictures to obtain a plurality of emotion recognition results further comprises:
carrying out facial expression recognition on personnel in the station picture group to obtain a personnel expression recognition result; wherein the person expression recognition result belongs to the person emotion recognition result.
5. The method of claim 4, wherein generating abnormal emotion detection results according to the emotion recognition results of the person, the voice recognition results and preset behavior rule parameters further comprises:
if the facial expression result in the human expression recognition result is abnormal, judging whether the expression abnormal time in the human expression recognition result is greater than the expression time threshold value in the behavior rule parameter, if so, generating an expression abnormal recognition result according to the expression abnormal time; and the expression abnormal recognition result belongs to the abnormal emotion detection result.
6. The method of claim 1, wherein the recognizing the human voice data to obtain a plurality of voice recognition results comprises:
performing voice recognition on the personnel voice data to obtain a text emotion recognition result, a speech speed recognition result and a volume recognition result; and the text emotion recognition result, the speech speed recognition result and the volume recognition result belong to the voice recognition result.
7. The method of claim 6, wherein generating abnormal emotion detection results according to the emotion recognition results of the person, the voice recognition results and preset behavior rule parameters further comprises:
if the text emotion recognition result in the text emotion recognition result is abnormal, judging whether text abnormal time in the text emotion recognition result is larger than a text time threshold value in the behavior rule parameter, and if so, generating a text abnormal recognition result according to the text abnormal time; wherein the text abnormal recognition result belongs to the abnormal emotion detection result;
if the maximum value of the speed of speech of the person in the speed of speech recognition result is larger than the speed of speech threshold of the person according to the speed of speech threshold of the person in the behavior rule parameter, judging whether the speed of speech time of the person in the speed of speech recognition result is larger than the overspeed time threshold of the behavior rule parameter, and if so, generating a speed of speech abnormal result according to the speed of speech time of the person; wherein the person abnormal speed result belongs to the abnormal emotion detection result;
if the maximum value of the personnel volume in the volume recognition result is larger than the personnel volume threshold according to the personnel volume threshold in the behavior rule parameter, judging whether the personnel volume time in the volume recognition result is larger than the volume time threshold in the behavior rule parameter, and if so, generating a personnel volume abnormal result according to the personnel volume time; wherein the person volume abnormal result belongs to the abnormal emotion detection result.
8. An abnormal emotion detecting apparatus, characterized in that the apparatus comprises:
the data acquisition module is used for acquiring station video data, regional video data and personnel voice data, analyzing the station video data and the regional video data and respectively obtaining a station picture group and a regional picture group;
the data identification module is used for identifying the station picture group and the region picture group to obtain a plurality of personnel emotion identification results, and identifying the personnel voice data to obtain a plurality of voice identification results;
and the detection result module is used for generating an abnormal emotion detection result according to the emotion recognition result of the person, the voice recognition result and a preset behavior rule parameter.
9. The apparatus according to claim 8, wherein the data recognition module is further configured to recognize the gait of the person in the region picture group to obtain a person gait recognition result; wherein the person gait recognition result belongs to the person emotion recognition result.
10. The apparatus according to claim 9, wherein the detection result module is further configured to determine whether a gait abnormal time in the person gait recognition result is greater than a gait time threshold in the behavior rule parameter if a gait emotion result in the person gait recognition result is abnormal, and generate a gait abnormal recognition result according to the gait abnormal time if the gait emotion result in the person gait recognition result is abnormal; wherein the gait abnormal recognition result belongs to the abnormal emotion detection result.
11. The device of claim 8, wherein the data recognition module is further configured to perform facial expression recognition on the person in the workstation group of pictures to obtain a person expression recognition result; wherein the person expression recognition result belongs to the person emotion recognition result.
12. The device of claim 11, wherein the detection result module is further configured to, if a facial expression result in the human expression recognition result is abnormal, determine whether an expression abnormal time in the human expression recognition result is greater than an expression time threshold in the behavior rule parameter, and if so, generate an expression abnormal recognition result according to the expression abnormal time; and the expression abnormal recognition result belongs to the abnormal emotion detection result.
13. The device of claim 8, wherein the data recognition module is further configured to perform speech recognition on the human speech data to obtain a text emotion recognition result, a speech rate recognition result, and a volume recognition result; and the text emotion recognition result, the speech speed recognition result and the volume recognition result belong to the voice recognition result.
14. The apparatus of claim 13, wherein the detection result module comprises:
the text emotion unit is used for judging whether text abnormal time in the text emotion recognition result is larger than a text time threshold value in the behavior rule parameter if the text emotion recognition result in the text emotion recognition result is abnormal, and generating a text abnormal recognition result according to the text abnormal time if the text emotion recognition result in the text emotion recognition result is abnormal; wherein the text abnormal recognition result belongs to the abnormal emotion detection result;
a speed detection unit, configured to determine whether a speed time of a person in the speed recognition result is greater than an overspeed time threshold in the behavior rule parameter if it is known that a maximum speed of the person in the speed recognition result is greater than the speed threshold, and if so, generate a speed abnormal result according to the speed time of the person; wherein the person abnormal speed result belongs to the abnormal emotion detection result;
the volume detection unit is used for judging whether the personnel volume time in the volume identification result is greater than the personnel volume threshold value in the behavior rule parameter or not if the personnel volume maximum value in the volume identification result is greater than the personnel volume threshold value in the behavior rule parameter according to the personnel volume threshold value in the behavior rule parameter, and generating a personnel volume abnormal result according to the personnel volume time if the personnel volume maximum value in the volume identification result is greater than the personnel volume threshold value in the behavior rule parameter; wherein the person volume abnormal result belongs to the abnormal emotion detection result.
15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when executing the computer program.
16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 7.
CN202110338416.4A 2021-03-30 2021-03-30 Abnormal emotion detection method and device Pending CN113053385A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110338416.4A CN113053385A (en) 2021-03-30 2021-03-30 Abnormal emotion detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110338416.4A CN113053385A (en) 2021-03-30 2021-03-30 Abnormal emotion detection method and device

Publications (1)

Publication Number Publication Date
CN113053385A true CN113053385A (en) 2021-06-29

Family

ID=76516266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110338416.4A Pending CN113053385A (en) 2021-03-30 2021-03-30 Abnormal emotion detection method and device

Country Status (1)

Country Link
CN (1) CN113053385A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548788A (en) * 2015-09-23 2017-03-29 中国移动通信集团山东有限公司 A kind of intelligent emotion determines method and system
US20180132776A1 (en) * 2016-11-15 2018-05-17 Gregory Charles Flickinger Systems and methods for estimating and predicting emotional states and affects and providing real time feedback
CN110705419A (en) * 2019-09-24 2020-01-17 新华三大数据技术有限公司 Emotion recognition method, early warning method, model training method and related device
CN111274932A (en) * 2020-01-19 2020-06-12 平安科技(深圳)有限公司 State identification method and device based on human gait in video and storage medium
CN111904441A (en) * 2020-08-20 2020-11-10 金陵科技学院 Emotion analysis system and method based on multi-modal characteristics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548788A (en) * 2015-09-23 2017-03-29 中国移动通信集团山东有限公司 A kind of intelligent emotion determines method and system
US20180132776A1 (en) * 2016-11-15 2018-05-17 Gregory Charles Flickinger Systems and methods for estimating and predicting emotional states and affects and providing real time feedback
CN110705419A (en) * 2019-09-24 2020-01-17 新华三大数据技术有限公司 Emotion recognition method, early warning method, model training method and related device
CN111274932A (en) * 2020-01-19 2020-06-12 平安科技(深圳)有限公司 State identification method and device based on human gait in video and storage medium
CN111904441A (en) * 2020-08-20 2020-11-10 金陵科技学院 Emotion analysis system and method based on multi-modal characteristics

Similar Documents

Publication Publication Date Title
CN112804400B (en) Customer service call voice quality inspection method and device, electronic equipment and storage medium
US11282516B2 (en) Human-machine interaction processing method and apparatus thereof
JP6826205B2 (en) Hybrid speech recognition combined performance automatic evaluation system
US10069971B1 (en) Automated conversation feedback
CN112836691A (en) Intelligent interviewing method and device
CN110956956A (en) Voice recognition method and device based on policy rules
CN108920640B (en) Context obtaining method and device based on voice interaction
CN111524526B (en) Voiceprint recognition method and voiceprint recognition device
CN104538043A (en) Real-time emotion reminder for call
CN103916513A (en) Method and device for recording communication message at communication terminal
CN112883932A (en) Method, device and system for detecting abnormal behaviors of staff
CN108039181B (en) Method and device for analyzing emotion information of sound signal
CN111901627B (en) Video processing method and device, storage medium and electronic equipment
CN106356077B (en) A kind of laugh detection method and device
US10971149B2 (en) Voice interaction system for interaction with a user by voice, voice interaction method, and program
CN113095202A (en) Data segmentation method and device in double-record data quality inspection
CN106531195B (en) A kind of dialogue collision detection method and device
CN109065036A (en) Method, apparatus, electronic equipment and the computer readable storage medium of speech recognition
CN114373472A (en) Audio noise reduction method, device and system and storage medium
CN113095204B (en) Double-recording data quality inspection method, device and system
CN111339282A (en) Intelligent online response method and intelligent customer service system
CN111048115A (en) Voice recognition method and device
CN108962228B (en) Model training method and device
CN112861816A (en) Abnormal behavior detection method and device
CN113053385A (en) Abnormal emotion detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210629

RJ01 Rejection of invention patent application after publication